Hi! We’re Wordnik! Nice to meet you!

Hi! Welcome to Wordnik!

Tenzing the kitten.

Wordnik is a new way to learn about words. Our goal — lofty as it sounds! — is to show you some information about every word in English. (We don’t have every word yet, but we’re working on it.)

At Wordnik, we believe:

  • the best way to learn how to use a word is to see how other people use it. So we’ll try to show you as many example sentences as we can find for each word.
  • information about how a word works is as important as what a word means—so we’ll show you how often you could expect to see a word, notes about where you might find a word, and how a word’s behavior has changed over time.
  • your feedback is important! So there are lots of ways for you to give us yours—you can add tags, suggest related words, point out new words for us, and leave us notes at any word!
  • sites about words should always be fun and never boring.

We’re still in closed beta, but we’ll be open to new users in just a little while — see you soon! Until then, please enjoy the adorable kitten at the top of this post.

Pick Me! *sits up straight, waves hand wildly*

yahoo picksIt’s been a while since I bragged about a Wordie media mention*, but I liked the tone of this one: yesterday Wordie was chosen as the Yahoo! Pick of the Day. The kicker says it all: “¡Que viva Wordie! Romp through the “recent words” section, acquaint yourself with the top 100 citers, linger among the most recent themed attractions, and then declare yourself a wordie. Go forth, friends, and flourish linguistically.”

Right on, Yahoo!**. Right on.

* To balance the scales, here’s a gem from the archives. Read this list bottom to top.
** Question: if Yahoo! is the last word in a sentence, do you still use closing punctuation? It makes sense that you would, but man, it’s chugly.

Natural Language Search

Natural language processing–designing computer systems that understand human language– has proven a tough nut to crack. Yesterday a TechCrunch UK post covered the beta launch of True Knowledge, a UK startup offering a natural language search engine. They join competitors like Powerset, which has yet to launch but is apparently tackling the same problem. It’ll be interesting to see if these go anywhere, or become the next Ask Jeeves.

True Knowledge has a good demo video explaining their technology. The most interesting part of which is hearing a British voice pronounce the word “beta.” It comes out sounding like beet-ah. Is that how it’s actually pronounced over there, or is that just a quirk of the guy talking? Team Wordie, UK division, please report.

Paginated Word Lists

Finally, word lists have been broken into pages, to make it easier to go through long lists (and to prevent long lists from crashing browsers–Wordie now passes the stpeter test).

I cranked this out, so it’s pretty basic, and probably buggy. Right now each page is 100 words long; eventually I’ll make that configurable, and otherwise fancy it up. Let me know if you see any problems.

Wordie becomes xenophobic, gets over it

So as a few of you noticed, I screwed up the database transfer from the old hosting company to the new, and rendered all non-English characters unreadable. I just re-did the transfer of the old data, merged it with the stuff that’s been added since the transfer, and, knock on wood, all the words with Chinese and German and Greek and Hebrew and Arabic (I think) characters, along with ones in a bunch of languages I didn’t recognize, should be working again. I’m sorry if that startled anyone else. It scared the crap out of me, frankly, when I thought for a moment that we’d permanently lost all that good stuff.

This did provide an interesting data point. Of the 90,031 unique words in Wordie when I did the transfer, 1,973, or around 2.2%, contained unicode characters. A few of those are accented English words or words entered with ligatures, but the great majority are in other languages. Hopefully that number will grow, and hopefully someday Wordie can cater fully to other languages, with localized and translated versions.

Tomorrow I’ll fix the comments and profile info that contain international characters, but for tonight I’m going to quit while I’m ahead. Let me know if you come across any words or lists that are still munged.

update, 9/11/07: The unicode in the comments is fixed now, too.