WotD Perfect Tweet Challenge Roundup

Every week, we pose a challenge: using any word of the day from the week, create a perfect tweet, otherwise known as a twoosh. If we like it, your tweet will appear on our blog.

Here are our favorites from last week:

Thanks to everyone for playing! You’ll have another chance this week to perfect your word of the day perfect tweets. To get the word of the day, follow us on Twitter, like us on Facebook, or subscribe via email.

Discovering words on the iPad

Have an iPad? Like words? Then you might like the project our friends at IDEA.org are putting together.

The educational nonprofit organization is using the Wordnik API to build an iPad app for browsing and discovering words and the connections between them. In their words:

The app will create word maps that blossom with related words, branching out to synonyms and definitions. Users will fly through floating constellations of words and discover relationships between them.

IDEA.org is raising funding over at Kickstarter – go over and take a look!

This Week’s Language Blog Roundup

Welcome to this week’s Language Blog Roundup, in which we bring you the highlights from our favorite language blogs and the latest in word news and culture.

In case you didn’t know, a little show called Mad Men is having its season premiere this Sunday. While the show may get most fashion and design details right for the period, what about the language? “Anachronism machineBen Schmidt takes a look.

In the Boston Globe, Ben Zimmer discussed Dr. Fill and the rise of crossword puzzle-solving robots, while Erin McKean considered the QWERTY effect and why we like some more words more than others. Johnson weighed in on linguistic manners and the rivalry between two slang-masters.

At Language Log, Geoff Nunberg also had some words about the slang dictionaries in question, while Mark Liberman shed and cast some doubt and light, and Victor Mair interpreted a dubious Chinese tattoo. Meanwhile, BBC News profiled Zhou Youguang, the man who helped invent pinyin, “a writing system that turns Chinese characters into words using letters from the Roman alphabet.”

At Macmillan Dictionary Blog, Orin Hargaves explained speech acts, and Stan Carey posted about nonsense terms (balderdash!), and on his own site, had some great suggestions for new language sites, including Oz Words. Check out their posts on budgie smugglers, “a colloquial term for a pair of men’s swimming briefs” (and an excellent name for a rock band); stormstick, or umbrella; and Johnniedom, “the social world of fashionable young men.”

At Lingua Franca, Ben Yagoda discussed double standard logical fallacies, and glottalization, Mockney, and why some 20-something women from New Jersey sound like Jamie Oliver. Allan Metcalf told the story behind the phrase OK, which originated today.

Across the pond, Lynneguist parsed the difference between get a break and catch a break. Fritinancy deliberated on two big companies with bad name changes (why, Kraft?), and picked for words of the week, minaudière, “a small, hard-sided, often bejeweled evening bag meant to be carried in the hand,” and akrasia, “a lack of command over oneself; a weakness of will.”

In her week in words, Erin McKean noticed stiction, “surface friction that tends to keep mechanisms from beginning to move’”; cheechako, “Alaskan slang for ‘newcomer’”; shengnu, “used to describe an unmarried woman ever so precariously teetering near the age of 30,” literally, “leftover woman”; and quenelle, a kind of dumpling. Meanwhile, Word Spy spotted cisgender, “identifying with one’s physical gender.”

The Virtual Linguist drank in some builder’s tea, the origin of the word gossip, the word cabbage meaning “stuff made out of over-ordered material in a factory,” and the history behind scruple, which once meant “a small unit of weight, as used by apothecaries.” Sesquiotica opined on pell-mell, euphuism, mojo, and irregardless, and Dialect Blog offered up some ‘going to’ contractions and lax vowels for English learners.

NPR dished on that other four-letter word, slut, and the bad girls of history and their not-so-good nicknames. Some weird restaurant names had us scratching our heads (not sure we’d eat at a place called Virus), while these regional sandwich names got our stomachs growling.

We learned about the benefits of bilingualism, the science of the birth and death of words, and the controversial claims one linguist is making about Universal Grammar. We loved this letter from screenwriter Robert Pirosh (“I like fat buttery words, such as ooze, turpitude, glutinous, toady”) and this piece from Jhumpa Lahiri (“The best sentences orient us, like stars in the sky, like landmarks on a trail”). We wanted to know what books these guys were fighting about.

Finally, our favorite site of the week is Good Show Sir, “Only the worst Sci-fi/Fantasy book covers.”

That’s it for now! We’ll see you next week (though we won’t call you “rock god“), OK?

Prison Terms

Alcatraz prison cells

The pokey. The slammer. The clink. How many different ways are there to say prison, and where do these words come from? We decided to find out.

Our latest obsession isn’t completely arbitrary. Forty-nine years ago today, Alcatraz closed as a federal penitentiary. Also called the Rock, the island was named for bird that roosted there, the pelican, which in Spanish is, you guessed it, alcatraz.

From the other side of the country came the expression up the river, which, according to the Online Etymology Dictionary, originally referred to the Hudson River and Sing Sing, a maximum security prison in Ossining, New York. Sing Sing was the original name of Ossining, and was derived from the name of the Native American tribe, the Sint Sinck, who sold the area to one Frederick Philipse.

Some prisons were so famous (or in some cases, infamous), their names became common words. The stir in stir-crazy (“distraught or restless from long confinement in or as if in prison”) is a slang term for prison, and comes from Start Newgate, “a former prison in London notorious for its unsanitary conditions and burnt down in riots in 1780.” Meanwhile, Newgate became a verb meaning “to imprison.”

Another synonym for prison, bocardo, originally referred to Bocardo Prison in Oxford, England. Bastille, which comes from an Old French word meaning “fortress, tower, fortified, building,” was “built in Paris in the 14th century and used as a prison in the 17th and 18th centuries,” and now refers to “a jail or prison (especially one that is run in a tyrannical manner).”

But how does one get to prison? You could take a paddy wagon, which may come from Paddy, which originated from “the pet form of the common Irish proper name Patrick (Ir. Padraig),” and became a disparaging term for someone of Irish descent. Paddy wagon was so-called perhaps “because many police officers were Irish” at the time (around 1930). The paddy wagon is also known as a meat wagon, cattle car, or Black Maria.

According to World Wide Words, Black Maria is American in origin, though its exact etymology is unclear. The name may come from a Boston story “about Maria Lee, a large black woman who kept a boarding house in the 1820s with such severity that she became more feared than the police, who called on her to help them catch and restrain criminals,” or from the name of a “famous black racehorse of the period, also named Black Maria.”

Before heading to the big house, prisoners may first be held in a sponging-house, “a victualing-house or tavern where persons arrested for debt were kept by a bailiff for twenty-four hours before being lodged in prison, in order that their friends might have an opportunity of settling the debt.” The sponging-house was so-named “from the extortionate charges made upon prisoners for their accommodation therein,” with sponge meaning “to drain; harass by extortion; squeeze.”

A bridewell was “a house of correction for the confinement of vagrants and disorderly persons,” and became a name “generally given to a prison in connection with a police-station, for the temporary detention of those who have been arrested by the police.” According to the Virtual Linguist, the term bridewell “comes from an old area of London near modern-day Fleet Street, where there was a well dedicated to St Bride,” a patron saint of Ireland, and “is still used by some police forces in the UK, usually as the name of a police station, or of a custody suite.”

A spinning house was “a house of correction, so-called because women of loose character were obliged to spin or to beat hemp as punishment.” Spinster, which originally referred to any person, man or woman, whose occupation was spinning, also meant “a woman of an evil life or character: so called from being forced to spin in the house of correction.” The word is now commonly known as “a woman who has remained single beyond the conventional age for marrying.”

A rogue-house is a house for rogues; a lobspound is a pound for lobs or louts, and was “often applied to the juvenile prison made for a child between the feet of a grown-up person.” Another prison term, hoosegow, was coined in 1911 in the western U.S. probably as a mispronunciation of the Mexican Spanish juzgao, “tribunal, court.” An older term, calaboose (1792), is from the Louisiana French calabouse, which comes from the Spanish calabozo, “dungeon.”

A panopticon was a prison proposed by Jeremy Bentham, “so arranged that the inspector can see each of the prisoners at all times without being seen by them.” On a smaller scale is the Judas, “a small opening in the door or wall of a cell to enable the guards to watch the prisoners.” Also called a judas-hole.

Whatever you call the joint, be sure to keep your nose clean and stay out.

With Software, Small is the new Big


For almost 2 years, Wordnik had been running its infrastructure on really fast, dedicated servers—with lots of storage, RAM and very impressive CPU + I/O. Seeing our API firing on all 8 cores using 60GB of RAM in a single process was a beautiful thing—and end users got to enjoy the same benefits. Huge corpus of MongoDB data hit anywhere, anytime and fast. Using just one of these boxes, we could push thousands of API calls a second, hitting mysql, mongo, memcached, Lucene, etc.

But we turned all that off almost 2 weeks ago and quietly moved to the humble virtual servers of Amazon’s AWS infrastructure. We now have our choice of relatively low-performance virtualized servers with significantly less horsepower. In the operations world of “CPU, memory, I/O” we can now “choose one of the above”. What gives? Is this all about saving money? And how will this affect Wordnik’s infrastructure?

Before I dig into the details let me summarize a couple facts. First, our system runs almost exactly the same speed in EC2 as it did on those beefy servers (I’ll get to the reason why). Next, there’s no less data than before. In fact we haven’t slowed down the growth of our corpus of text. Finally, yep, we saved money. But that wasn’t the goal (nice side effect though)!

So first off, the reason for the move is really quite simple. We needed multi-datacenters to run our system with higher availability. We also need the ability to burst into a larger deployment for launches and heavy traffic periods. We needed to be able to perform incremental cluster upgrades and roll out features faster. Sounds like an advertisement for a cloud services company, right?

Well when it comes down to it, your software either works or it doesn’t, and nothing puts it to a better test than switching infrastructure to something that’s intrinsically less powerful on a unit basis. And in the process of switching to the cloud is certainly not free unless you’re running a “hello world” application. If you wrote software to take advantage of monster physical servers, it will almost certainly fail to run efficiently in the cloud.

When we started the process of migrating back to EC2 (yes, we started here and left it—you can read more here) it was clear that a number of things had to change or we’d have to run the largest, most expensive EC2 instances which — believe it or not — actually cost more money to run than those big servers that we were moving away from. The biggest driver was our database. We have been using MongoDB for 2 years now, but in that time, a chunk of data has remained in MySQL in something on the order of 100GB of transactional tables. Simply restoring one of these tables and rebuilding indexes takes days on a virtual server (I’m sure someone out there could make it faster). We knew that this had to change.

So this data was completely migrated to MongoDB. That was no small feat, as it was simply the oldest, most crusty Java relic in our software stack (we switched to Scala almost 18 months ago). One of our team members, Gregg Carrier, tirelessly worked this code into MongoDB and silently migrated the data about a week before our datacenter move. Without doing this, we couldn’t have made the datacenter move.

Also on the data front, we had a number of monster MongoDB databases — these have been operational ever since we first transitioned our corpus to MongoDB! Back to the previous point, these would have been a challenge to run with good performance on the EC2 instances with their weak I/O performance. Our disk seeks in the cloud were tested out to be as much as 1/10th the performance.

To address this, we made a significant architectural shift. We have split our application stack into something called Micro Services — a term that I first heard from the folks at Netflix. The idea is that you can scale your software, deployment and team better by having smaller, more focused units of software. The idea is simple — take the library (jar) analogy and push it to the nth degree. If you consider your “distributable” software artifact to be a server, you can better manage the reliability, testability, deployability of it, as well as produce an environment where the performance of any one portion of the stack can be understood and isolated from the rest of the system. Now the question of “whose pager should ring” when there’s an outage is easily answered! The owner of the service, of course.

This translates to the data tier as well. We have low cost servers, and they work extremely well when they stay relatively small. Make them too big and things can go sour, quickly. So from the data tier, each service gets its own data cluster. This keeps services extremely focused, compact, and fast — there’s almost no fear that some other consumer of a shared data tier is going to perform some ridiculously slow operation which craters the runtime performance. Have you ever seen what happens when a BI tool is pointed at the runtime database? This is no different.

MongoDB has replica sets, and making these smaller clusters work is trivial — maybe 1/10th the effort of setting up a MySQL cluster. You add a server, it syncs, becomes available to services. When you don’t need it, you remove it from the set and shut it down — it just doesn’t get much easier than this.

We got the performance of these smaller clusters quite respectable by running them on ephemeral (volatile) storage. To keep our precious data safe, we run one non-participating server on an EBS-backed volume in a different zone. Whoa — you might ask — what happens if your data gets too big for that ephemeral storage? Easy! If the data gets too big, it’s time to shard/partition. This is a self-imposed design constraint to keep the data tier both performant and manageable. If it gets too big, other problems will arise, cost being one of them.

Finally, if you have all these services floating around, how do you communicate? Isn’t this a communication and versioning nightmare?

Yes! It definitely can be. To solve the communication issues, we developed Swagger. This defines a simple mechanism for how you communicate with our APIs—both internally and externally. If your service is going to be used in the cluster, it needs to expose this spec. Then all consumers have a contract as to how to communicate. You can see more about Swagger here.

For the versioning challenge, we developed an internal software package named Caprica. This is a provisioning and configuration management tool which provides distributed configuration management based on version and cluster compatibility. That means that a “production cluster” in us-west-1c will have different services to talk to than a “development cluster” in the same zone. I’ll cover more about Caprica in another post, it’s a really exciting way to think of infrastructure.

Hopefully this has been entertaining! Find out more about swagger-izing your infrastructure in this short deck.

WotD Perfect Tweet Challenge Roundup

Every week, we pose a challenge: using any word of the day from the week, create a perfect tweet, otherwise known as a twoosh. If we like it, your tweet will appear on our blog.

Here are our favorites from last week:

Thanks to everyone for playing! You’ll have another chance this week to perfect your word of the day perfect tweets. To get the word of the day, follow us on Twitter, like us on Facebook, or subscribe via email.

Welcome Beatrice, Rami, and Tiger!

The Wordnik office is hopping lately — we’d like to welcome three new Wordniks: Beatrice, Rami, and Tiger!

Beatrice Bernard

Beatrice has been instrumental in building and optimizing the office and process infrastructure for numerous start-up companies across Silicon Valley. Most recently she was at Criteo, an advertising re-targeting company, where she successfully managed all aspects of several company moves as the team grew from 0 to 80 US employees (450 worldwide). While working for the French-based company she was able to hone her language skills to interact with corporate HQ as well as better understand jokes told by the French expats and interns.

Beatrice has held similar positions at Habeas, Flowpoint/Efficient Networks, and a few others, in all cases handling whatever needed to be done to facilitate the growth and expansion of those companies. Born and raised in Switzerland (although she does not ski or yodel), she brings to each job a passion, commitment, and attention to detail that have been key to her career success over the years. In her spare time, she tries to stay healthy and likes to hike, do yoga, read, and is slightly obsessed with Sudoku (but her real passion is adventure travel).

Rami Habal

Rami comes to Wordnik from cloud security leader Proofpoint, where he was an early employee and instrumental in growing the business to an IPO filing, holding various product and marketing roles. Prior to Proofpoint, Rami held positions at Mohr Davidow Ventures, Cisco, Hughes Electronics and several startups. Rami has also cofounded 2 non-profits, started 3 businesses and serves as an advisor to early stage startups in Silicon Valley. In addition to an MBA from MIT and an MPA from Harvard, Rami has a BS in Electrical Engineering from the University of Virginia. Rami is passionate about form+function, the post-pc mobile hypernet, discovering new things, Moleskine notebooks and jazz. Follow him on Twitter at @rhabal.

Tiger Lan

Tiger Lan is a seasoned technology veteran with expertise in large-scale web software development and operations. His mantra is “Ship it!” Tiger has successfully built up strong and talented engineering teams at Reputation.com as CTO and VPE, and at Plaxo as Head of Development, and his focus has always been on developing entrepreneurial developers in an engineering-driven company culture. Tiger holds a BS in Computer Science from Tsinghua University in Beijing China and an MS in Computer Science from Michigan State University.

Remember, you can always find out more about the folks at Wordnik by checking out our team page.