Wiktionary, Improved Profiles

tanorexia

Here’s a quick overview of some recent site updates. First, Wiktionary has been added as a definition source, giving Wordnik much better coverage of slang and pop cultural terms, among other things. We’ll be periodically re-importing Wiktionary data, so if you’d like to add material to Wordnik, editing the Wiktionary is now one way improve an awesome public resource and help Wordnik at the same time.

Profiles have also been updated. You can show your location, add a web address, and add links to other social services you’re on. Profiles also now optionally display your recent lookups.

Lastly, lists and other pages belonging to specific users now give a synopsis of that person’s contributions, something I personally missed and am psyched to have back. It’s fun to see who the real obsessives are, see when someone has passed a milestone, and to see who’s just getting rolling, so that we can welcome them into the fold.

Wordnik, Now With More Thesaurus

We’ve added some new features to the ‘related words’ page, reorganized it, and given it a promotion: Wordnik now sports a thesaurus.

By far the coolest of its new powers is the ability to compare two words on the same page, showing definitions or examples for each side-by-side. It’s like a comparison shopping site, but for words.

To use it click the ‘Compare’ button in the right-hand column of any thesaurus page. Check the boxes next to the two words you’d like to see next to each other, et voilà — as soon as you’ve made your second selection, an overlay shows their definitions side by side. You can also see side-by-side examples, and tweet the comparison. It’s a comparithesaurus.

The tweet option brings up another featurette: Comparisons, despite being page-within-a-page, Pale Fire-style affairs, have URLs of their own, like this. So they can be tweeted, or emailed, or IMed, or whatever. We’ll be adding more built-in share options soon.

This is, like all of Wordnik, an ongoing effort. The underlying data is being continually improved, and the features will be added to and refined. If you have suggestions or criticism, please let us know in the comments or through feedback@wordnik.com.

25,000 Lists!

Congratulations, Wordniks and Wordies! This week we passed 25,000 lists*!

What is a list? A list is just a collection of words that anyone with a Wordnik account can create. The words may be related or not, real or not, common or proper, single words or phrases. It’s really up to you.

At Wordnik, we like lists so much that we share one every day on Twitter — our List of the Day. With more than 25,000, we have a lot to choose from!

There are lists that are weatherrelated, colorrelated, or that make us hungry. There are lists that play with words, describe words, and have fun with sound. There are lists that bring back memories, celebrate a holiday or an author (like Shakespeare). There are lists that are sporty, sleepy, scary, or spicy; vegetable or animal; hot or cold; naughty, naughtier, and naughtiest.

The Community page shows you what other Wordniks are doing with their lists, as well as recent activity, such as words that have been recently listed, the latest lists, and the most-commented-on lists and words. You can also find recently viewed words, the latest comments, recent pronunciations, and recent favorites.

Now go and make your own list. We know you want to.

*Special thanks to mollusque for bringing this awesome stat to our attention.

The Orthoepist: Would you like some cumin with your bruschetta?

If my experience as an orthoepist has taught me anything, it’s that most people who get paid to talk on television or the radio know diddly-squat about pronunciation. But because these people are professional broadcasters or entertainers, the rest of us tend to assume, to our pronunciatory peril, that they do. This is how beastly mispronunciations are often spread: from the slipshod media to the unsuspecting masses.

Yet, once in a great while, someone who makes a living smiling in front of a camera or spraying saliva into a microphone does know something about how words ought to be pronounced, and like those “ironic points of light” that “flash out wherever the Just exchange their messages,” we’re surprised and enlightened by a dazzling moment of on-air orthoepy.

For example, last July on the Late Show with David Letterman, Katie Couric gently but firmly corrected Dave’s pronunciation of preternatural. Although Dave insisted, with the blustery, overbearing assurance of the philodox, that the first syllable was pronounced pret as in preterit, Katie held her ground and (despite mangling the spelling of the word) showed Dave and the world that the proper way to say it is with a long “e” as in pretext: pree-tur-NACH-ur-ul. This is the only dictionary-sanctioned pronunciation, a fact that a much-deflated Dave had to admit. (It has not yet been confirmed whether he ate any of the crow that Katie’s handlers offered him during the commercial break.)

More recently, Martha Stewart, during one of her regular appearances on NBC’s Today Show, made orthoepic broadcasting history by informing the world that the Italian appetizer bruschetta is correctly pronounced broo-SKET-uh, not broo-SHET-uh, as Meredith Vieira confessed she had always mispronounced it.  Of course, if the imperious Martha had told Meredith that “pie” was properly pronounced “pee,” no doubt Meredith and the rest of the world would have believed her.

In this case, however, Martha was not only redoubtable but also right, because the consonant blend sch should sound like sk, as in school, the musical term scherzo (SKAIRT-soh), and maraschino, which in the cultivated speech of the cognoscenti is pronounced ma-ruh-SKEE-noh, not ma-ruh-SHEE-noh. The many speakers who, like Meredith Vieira, have always thought bruschetta was pronounced with an sh sound in the middle may have been misled by false analogy with the toothsome Italian ham, prosciutto, which even poor Meredith knows is pronounced proh-SHOO-toh.

But even the formidable Martha Stewart gets it wrong sometimes, proving yet again that an authority on any given subject is not, by extension, also an authority on how to pronounce words related to that subject. I can’t tell you how many times I’ve heard doctors, lawyers, college professors, research scientists, and other specialists of all stripes mispronounce words pertaining to their specialty — sometimes in the capacity of an on-air expert. And the fearsome Martha, homemaker extraordinaire, is no exception, for, I regret to report, she mispronounces cumin as KYOO-min.

This trendy variant, with what is sometimes called a y-glide for the u (as in cubic or humor), and another popular variant, KOO-min, without the y-glide, are speculative pronunciations based on how the word is spelled. But as a peek into the Oxford English Dictionary reveals, cumin is but one of many spellings for this venerable word, which dates back to the 9th century. This long line includes cummin (still recorded in some modern dictionaries), commin, comin(e), comeyn, cummyn, and comyn, all of which pointedly do not suggest a KYOO- pronunciation with a y-glide. And indeed, as a little historical research also quickly reveals, the traditional pronunciation of cumin until the late 20th century was KUM-in (as in “Hold your horses, I’m comin’”). Not surprisingly, KUM-in is the only pronunciation listed in the Oxford English Dictionary and the first listed in the American Heritage Dictionary, which you can hear at Wordnik.com.

Dictionary editors, or lexicographers, are honor-bound to list any pronunciation in widespread use at a given time, so most current dictionaries now recognize KYOO-min and KOO-min. But the Orthoepist, mindful that his job is to give judicious advice on what is correct, is duty-bound to reject what is fashionable for what is traditional and cultivated. Therefore, this is my ruling on cumin: Ignore misguided Martha and other foolish foodniks who say K(Y)OO-min and say it as it has been said for hundreds of years: KUM-in.

[Charles Harrington Elster, Wordnik’s Orthoepist (and token prescriptivist!), blogs about pronunciations. His tenth book, The Accidents of Style: Good Advice on How Not to Write Badly, has just been published by St. Martin’s Press.]

B is for Billion

Only 10 years ago having a structured database with 100 million records in it was quite a feat. Today Wordnik passed the 9 billion record mark with the open-source MongoDB from 10gen. But a record in an object store is quite different from a row in a circa-1999 relational database.

Object-oriented programming concepts flew right by the RDBMS long ago. Inner Joins, left/right outer, unions, etc., have served us well, but how much of our data can we model in a tabular fashion? Have you ever tried doing anything complicated in Excel with just ONE sheet?

MongoDB removes an enormous amount of friction from the development process. A record shouldn’t be limited to things like the standard “user” table, with first_name, last_name, email, etc. They should be able to hold more meaningful and conceptually deep data, like “the frequency usage of a word across all time” or “the graph of all relationships to a word”, concepts difficult to express in tabular data. By using a document-oriented database, we at Wordnik don’t need to nag a DBA to add a field or column (well, we’re a startup, so more like nag the guy sitting next to you). If we can model it in software, MongoDB can store it, simple as that. And if MongoDB can store it, we can not only get it back (very important) but *find* it with very rich and flexible queries. Object-relational mapping (ORM) has been around about as long as OOP, but let’s face it: there is no ORM solution that (a) is flexible for the developer and (b) works in harmony with the storage system (i.e. performance doesn’t suck). MongoDB does both, easily, and it’s very, very fast.

So we hit 9 billion records, which is of course very exciting. Traffic to our public API is keeps growing–MongoDB served 100M queries in the last week and didn’t break a sweat. And what’s most exciting is the number of features this helps us develop very rapidly, which we will be sneaking out over the next few weeks.

Ruby and Python Libraries, Rails and Django Demo Apps

We’re excited to announce two new client libraries for the Wordnik API: an official Ruby gem and a Python package.

The Ruby gem is available on github and rubygems, and the Python package is on github and pypi.

To illustrate the use of these libraries we’ve also put together “Hello Dictionary” apps in both Rails and Django.  Both apps live in the wordnik/api-examples repo on github (here are direct links to the Rails app and the Django app). The README for each project shows some example usage, and for the Rails and Django apps, the README is a tutorial that takes you from scratch to a fully-functional dictionary app in about 15 minutes.

As always, let us know if you have questions or find bugs, and let us know what you build!  Code contributions are gratefully appreciated, and we’d like to sincerely thank Martin Marcher and Vince Spicer for their contributions to the Python library, and Jason Adams for inspiring the Ruby work. We’re currently working on a full-fledged PHP library, and plan Java and Objective-C libraries down the line (in the meanwhile basic examples are available in all those languages). If you have suggestions or requests for support in other languages, please let us know.

The Orthoepist: Introduction

[Note: Although Wordnik is a descriptive project, we do feel it’s appropriate to give some prescriptivist guidance in the area of pronunciations. So we’re happy to introduce Charles Harrington Elster as Wordnik’s pronunciation editor.]

Greetings, denizens of Wordnik Universe. My name is Charles Harrington Elster, and I am Wordnik’s new orthoepist. I come in peace — and to speak my piece.

An orthoepist, in case you’re wondering, is a pronunciation expert, specifically someone who studies correct pronunciation (Greek orthos, right, correct + epos, word) and who issues opinions about how words are properly or improperly spoken. As Wordnik’s orthoepist, my responsibilities will include recording correct pronunciations for difficult words and names and for so-called problematic words, where there is doubt or dispute about what is acceptable — or, in the lingo of linguists, “standard.” In some cases I will provide a comment, comparable to a usage note in a print dictionary, to give you more information that will help you decide how best to say a particular word. And each month I will contribute a post to this blog addressing various matters of orthoepy and usage.

But I would be remiss in my duties if I did not tell you, immediately and ex cathedra, how to pronounce orthoepist and orthoepy. The tricky question is where to put the main stress in these words, and, wouldn’t you know, even the orthoepists have never been able to agree on that.

Many phonological cognoscenti, especially in the United States, stress orthoepist and orthoepy on the second syllable: or-THOH-uh-pist, or-THOH-uh-pee. But authorities have also long countenanced first-syllable stress: OR-thoh-uh-pist, OR-thoh-uh-pee. And the great Oxford English Dictionary, which reflects British preference, lists OR-thoh-EP-ist and the peculiar OR-thoh-EE-pee first followed by several variants. Perhaps because I like the idea of emphasizing the notion of correctness in these words (ortho-), or perhaps because I was born a contrarian, I prefer OR-thoh-uh-pist and OR-thoh-uh-pee. Now it’s your turn to choose, and I trust you will choose wisely.

For choosing wisely is what this business of orthoepy is all about. As Abraham and Betty Lass observed in their Dictionary of Pronunciation (1976), “You can . . . make a million, have friends, influence people, be admired for your good sense, be loved for your good heart, send your children to the best colleges, become President of the United States even if your pronunciation is not what it should be. But you will still be judged by the words you mispronounce. And you may not be judged kindly.” (Think Dubya and his infamous nucular for nuclear.)

As Wordnik’s orthoepist — your orthoepist, really, because I’ll be working for and accessible to you — it will be my job to make sure that you are not judged unkindly for your pronunciation. What are my credentials for this job? As the author of The Big Book of Beastly Mispronunciations, the pronunciation editor of the seventh and eighth editions of Black’s Law Dictionary, and a longtime radio commentator on language, I bring more than twenty-five years of orthoepic (OR-thoh-EP-ik) experience to the table.

My definition of standard does not include controversial, stigmatized, or eccentric pronunciations, and I will not sanction anything questionable, as some lexicographers regrettably do. And because I frown equally on ostentation and carelessness, I will counsel you to avoid both affected and slovenly speech. In short, you can rest assured that any pronunciation I record or recommend here will be cultivated, not merely in vogue or in widespread use. So as you wander the Wordnik Universe, when you see chelster beneath a recorded pronunciation, that’s me giving you an unimpeachable way of saying a word.

I welcome your comments, your questions, and especially your suggestions on words to record. I will do my best to respond to all communications that are composed with a civil tongue. You can reach me at Orthoepist@wordnik.com.

And now, let us “engage the instrument of the language,” as the poet and etymologist John Ciardi once put it, and have fun playing it.