Announcing: “Secret Word Wednesday”

We’re trying something new here on Wednesday at Wordnik — “Secret Word Wednesday.” It’ll work like this:

  • We’ll choose a secret word.
  • We won’t tell you what it is. (That’s why it’s SECRET.)
  • We’ll tweet clues to the word (follow us at @wordnik).
  • When you think you know the SECRET WORD, tweet it! Use @wordnik so we can find your tweet.
  • If your guess is correct, we’ll re-tweet it, and ask you to email us so we can send you Wordnik stickers and other cool stuff!

And of course it wouldn’t be a SECRET WORD if it didn’t have a special extra … when you think you’ve found the secret word of the day, check the audio pronunciations for confirmation.

HA!

Photo by, and licensed (C BY-NC-ND 2.0) from, heather.

P.S. The secret word today is NOT “Door.”

Cricket!

Cricket!

Photo by, and licensed (C BY-NC-SA 2.0) from, kkseema.

At Wordnik, we love all the words (that’s what the heart in the logo means, after all) but there are some words that are especial favorites (that’s what the favorite function is for, after all) … and quite a few of those words are cricketing terms. Only a couple of Wordniks can honestly claim to be true cricket fans (Krishna and Kumanan) but none of us can resist the lure of words like googly (“A googly is a ball delivered by a bowler that looks as if it ought to break from left to right across the bat of a right-handed batsman.”) and Dilscoop (” … [the] stroke “Dilscoop” [ invented by Tilakaratne Dilshan] which involves going down on one knee and scooping the ball over his head in area behind the wicketkeeper.”) Not to mention the best phrase in all of sports: silly mid-off. (Which is the same as the silly mid-on, just on the other side of the pitch. Make sense? No? Well, it doesn’t matter.)

Cricket words are so compelling, in fact, that Wordnik has three different lists devoted to cricket! They are: Sportie: Cricket, i don’t like cricket — i love it, and Cricket! That last link is to an open list — feel free to add your own favorite cricket terms to it!

Is there some other topic that you think has better words than cricket? You can always sign in to Wordnik and create your own list of great words to share …

Are your words smart enough?

it's love

Today, at the O’Reilly Tools of Change conference, we’ll be announcing an initiative to create a new standard for getting and publishing information about words.

We’re calling it “smartwords”, and it will be an open standard — meaning anyone can publish data sets or develop applications using it. Smartwords will be context-aware and real-time … but also lightweight, easy-to-use, and versatile. We’re developing this standard with help from our first smartwords partners, including The New York Times, Forbes, The Huffington Post, O’Reilly, Vook, Scribd, ibis reader, and the Internet Archive.

With smartwords, you’ll be able to access not just traditional “dictionary-style” information, but also metadata, such as how frequently a word is used, where words are used, and who uses particular words. You’ll also be able to publish information about words — if you create a word, you can put a flag in the ground and claim it for your own — and smartwords will enable cool social features, like sharing and tagging.

What would a world with smarter words look like?

— You’re reading a new popular-science bestseller and your reader shows you quick definitions of the most difficult words, set right in the text … based on knowing what books you’ve already read and what words you’ve already seen!

— You’re a consumer and you have a few sources you trust for information (like, say, the New York Times). When you’re reading something from a different source, you can set your ereader to highlight what you’re reading to link you to good definitions (or similar content) in your trusted sources. (Instant fact-check!)

— You’re reading a great new novel and you see a great quote you’d like to pass along — you highlight it and share it on Facebook or Twitter.

The question is: if every word became a smart word, what would you ask it and what would it tell you?

We’ll be releasing version 1 of the smartwords standard in Summer 2010. With this new standard, we should able to do fantastic things with smartwords — and we want to hear from you about the kinds of information you would like to access and the kinds of applications you would like to build. Visit us at smartwords.wordnik.com to learn more!

(There’s more information in this nice writeup about smartwords from the Wall Street Journal’s Digits blog.)

What has technology done for words lately?

There are two significant computing advancements which are enabling Wordnik to deliver more words and more information about them to you: eventual consistency and document-oriented storage.

Eventual consistency is a parallel computing concept that was first presented in the context of fault tolerance but is now completely applicable to engines like Wordnik. Why is eventual consistency important to us? Because we do a lot of counting. Since we add about 150 million words a day to the corpus, getting an accurate count of the current size is not only impossible but pointless. We can add 150 words every second.

In a traditional, transactional database,  a counting-type operation will typically do one of two things: either it will lock the relevant database objects so that it can guarantee accuracy *right now* or it will perform a number of isolation operations so that your count *was* accurate at a given point in time. Sometimes it’s important to have an exact number — like when you’re checking your account balance at an ATM. But at Wordnik, we’d rather give you a rough estimate and keep the data flowing in as fast as possible.  More data is almost always better, and it’s our goal to have as much as we can. With eventual consistency, we count as many words as possible when we can, and add them all up when there’s a lag. The count’s always in the ballpark, and we never have to stop.

The next big computing advance that’s helping Wordnik is document-oriented storage.  Hierarchy is part of most data structures, but storing hierarchical data in a flattened, tabular manner makes creative search and retrieval very difficult.

Take a dictionary entry.  An entry’s hierarchy isn’t overly complex, but it does have a number of relationships–between the entry and the definitions, the parts of speech, pronunciations, citations, etc. Most software engineers have modeled hierarchal relationships in relational databases using primary & foreign keys, normalized tables, etc.  But doesn’t it make more sense to look at a dictionary entry as a “document” rather than a set of related tables?  It’s faster to find data with syntax like “dictionary.definitions.partOfSpeech=’noun'” instead of with a series of complex (and often expensive) joins across dozens of tables.

Luckily for us the fine folks at 10gen have created MongoDB, an open-source, document-oriented database that solves these and many other technical challenges.  Working with their system has been delightful and it has opened many doors for Wordnik, speeding up the development of new features!