Stop SOPA and PIPA

Wordnik is participating in the day of protest against the (now-temporarily-shelved) SOPA Act (Stop Online Piracy Act) and the (still active) PIPA Act (Protect IP Act).

What’s SOPA? Here are some example sentences that we think make it clear that this bill is a bad idea:

SOPA – the Stop Online Piracy Act – and a sister bill, PIPA – the Protect IP Act – seek to minimize the dissemination of copyrighted material online by targeting sites that promote and enable the sharing of copyright-protected material, like The Pirate Bay. While this goal may be laudable, entrepreneurs, legal scholars and free speech activists are worried about the consequences of these bills for the architecture of the Internet. Ethan Zuckerman: MIT Media Lab opposes SOPA, PIPA

[T]he bills represent an unprecedented, legally sanctioned assault on the Internet’s critical technical infrastructure. Based upon nothing more than an application by a federal prosecutor alleging that a foreign website is “dedicated to infringing activities,” Protect IP authorizes courts to order all U.S. Internet service providers, domain name registries, domain name registrars, and operators of domain name servers—a category that includes hundreds of thousands of small and medium-sized businesses, colleges, universities, nonprofit organizations, and the like—to take steps to prevent the offending site’s domain name from translating to the correct Internet protocol address. These orders can be issued even when the domains in question are located outside of the United States and registered in top-level domains (e.g., .fr, .de, or .jp) whose operators are themselves located outside the United States; indeed, some of the bills’ remedial provisions are directed solely at such domains. Stanford Law Review: Don’t Break the Internet

At a minimum, this means that [under SOPA] any service that hosts user generated content is going to be under enormous pressure to actively monitor and filter that content. That’s a huge burden, and worse for services that are just getting started – the YouTubes of tomorrow that are generating jobs today. EFF: “SOPA: Hollywood Finally Gets A Chance to Break the Internet”

Now, enter SOPA. § 103 of SOPA allows private parties to require payment processors and advertising services to cut ties with websites that are allegedly “dedicated to the theft of U.S. property.” Note: this is all done outside of the court system, so no judge actually reviews any of these claims before they’re enforced by the payment and ad networks. Public Knowledge: SOPA and Section 1201: A Frightening Combination

The latest move in a decades-long battle with piracy and copyright infringement is a bill called the PROTECT-IP Act that would essentially allow the U.S. government to block access to sites they deemed inappropriate. The bill would criminalize posting all sorts of standard web content — music playing in the background of videos, footage of people dancing, kids playing video games, and posting video of people playing cover songs. A move that would not only stifle free speech and creative expression, but potentially endanger hundreds of user-generated media sites like Vimeo, Tumblr, SoundCloud and more. The Creators Project: Artists Band Together To Fight Censorship And Oppose The PROTECT-IP Act

Laws like SOPA make us sclerotic as a country, where we have all these extra burdens that provide little benefit. In general it makes America less competitive. If SOPA goes through, it could very well force certain innovative companies to go offshore. There are incumbent industries that will always protest every new technology; but any forward-looking country needs to protect its emerging industries. GigaOm: Tim O’Reilly: Why I’m fighting SOPA

So you don’t run a website … how might SOPA and PIPA affect you?

The harm that does to ordinary, non-infringing users is best described via a hypothetical user: Abe. Abe has never even so much as breathed on a company’s copyright but he does many of the things typical of Internet users today. He stores the photos of his children, now three and six years old, online at PickUpShelf* so that he doesn’t have to worry about maintaining backups. He is a teacher and keeps copies of his classes accessible for his students via another service called SunStream that makes streaming audio and video easy. He engages frequently in conversation in several online communities and has developed a hard-won reputation and following on a discussion host called SpeakFree. And, of course, he has a blog called “Abe’s Truths” that is hosted on a site called NewLeaflet. He has never infringed on any copyright and each of the entities charged with enforcing SOPA know that he hasn’t.

And yet, none of that matters. Under SOPA, every single one of the services that Abe uses can be obliterated from his view without him having any remedy. Abe may wake up one morning and not be able to access any of his photos of his children. Neither he, nor his students, would be able to access any of his lectures. His trove of smart online discussions would likewise evaporate and he wouldn’t even be able to complain about it on his blog. And, in every case, he has absolutely no power to try to regain access. That may sound far-fetched but under SOPA, all that needs to happen for this scenario to come true is for the Attorney General to decide that some part of PickUpShelf, SunStream, SpeakFree and NewLeaflet would be copyright infringement in the US. If a court agrees, and with no guarantee of an adversarial proceeding that seems very likely, the entire site is “disappeared” from the US internet. Bricoleur: Overbroad Censorship & Users

You can track this legislation and read the full text here.

At Wordnik, we’re against piracy, but we think that SOPA and PIPA create more problems than they solve. So we’re happy to stand alongside such giants of the Internet as Wikipedia, the Internet Archive, O’Reilly Media, WordPress, Reddit, BoingBoing, and ICanHazCheezburger and add our voices to the chorus of those protesting this ill-thought-out and Internet-wrecking legislation.

Want an easy way to make your opinion heard in Congress? You can send emails via FightForTheFuture.org and AmericanCensorship.org. (AmericanCensorship.org also has HTML code for you to use to add a black “Stop Censorship” banner to your own blog or site.)

If you’re in San Francisco, you can join an in-person protest Wednesday from noon to 2 p.m.; details here. (Ditto New York and Seattle.)

And if you have an Android device, here’s a link to an app that will help you boycott SOPA-supporting companies and organizations.

PS Our word of the day today, spiflicate, is also in protest of SOPA and PIPA. SOPA and PIPA are set to spiflicate (‘stifle, suffocate, kill’) the Internet; but before that happens we hope to spiflicate (‘beat, confound, dismay’) them!

Wordnik Now Makes SmartMoney Smarter (Wordnik Means Business)

Wordnik means business — we’re happy to announce today that Wordnik is powering SmartMoney.com’s new financial terms glossary!

SmartMoney Glossary

The New SmartMoney Glossary

With more than 4000 words and phrases, SmartMoney’s new glossary is the place to go to make sense of the words that matter in your financial life. Keeping track of your finances is difficult enough, without the added hurdle of wading through financial jargon, too. Wordnik helps demystify opaque terms such as recission, dilution, and butterfly spread, making it easier for you to make meaningful choices about how you live your financial life. In addition to traditional definitions and explanatory notes, the new SmartMoney glossary also includes helpful example sentences showing the terms in real-world contexts, from up-to-date articles from across the The Wall Street Journal Digital Network.

flight to quality at SmartMoney.com

Alongside the stand-alone glossary, selected articles in the The Wall Street Journal Digital Network will also have a useful footer line to highlight important terms you may want to look up.

SmartMoney Glossary

To provide the example sentences, Wordnik has analyzed thousands of The Wall Street Journal Digital Network articles (from SmartMoney, The Wall Street Journal, and MarketWatch) to show the most explanatory and illuminating content for the most important words and concepts, leading readers to current trending articles as well as rich archival information. Taken together, these enhancements will not only allow SmartMoney readers to understand the traditional meanings of important financial terms, but will also let them interact with news content in ways that provide fresh discovery of words, phrases, concepts, and entire articles.

Welcome Gregg!

We’re happy to announce the addition of Gregg Carrier to Wordnik!

Gregg Carrier

Gregg joins us as a Senior Server Engineer and comes to us from DreamWorks Animation, where he worked on core service infrastructure for their next generation of animation tools. Gregg has also taught community college CS classes, beertended in the Anderson Valley, worked at a winery, was a park ranger at Shenandoah National Park, and has been a ski instructor!

In his non-server-engineering time, Gregg homebrews (and has for 18 years!), loves scuba diving, hiking, and camping (and is waiting for his two little boys to get big enough to do those things, too). He also plays the ukulele and spins glow poi. Gregg (the extra ‘g’ is for ‘great’) can be reached at gregg@wordnik.com.

Welcome JeanFrancois!

We are very happy to welcome both JeanFrancois Arcand and Atmosphere to Wordnik!

JF at Wordnik

JeanFrancois already improving the Atmosphere at Wordnik

JeanFrancois is a contributor to the extremely popular Apache Tomcat web server and created the GlassFish web container. He authored project Grizzlyand the Glassfish v3 micro-kernel, a framework for creating NIO and HTTP applications. JeanFrancois led the GlassFish Application Server project including its migration to open-source. Grizzly was one of the first production ajax push+comet frameworks, the technology which brought “chat” to the web.

At Ning and Sonatype he went on to author the Asynchronous HTTP Client (AHC), a client library for asynchronous java remote event processing. He conceived of Atmosphere, a framework for real-time communications through HTTP streaming, Comet, and WebSockets and has been leading it ever since. He was an active member for NIO.2 and Servlet 3.0 JSP committee.

JeanFrancois will work on Wordnik’s software architecture and algorithms, and Wordnik will become the sponsor of the open-source Atmosphere framework. We welcome him and his framework to the team!

Wordnik’s Word Graph Helps TaskRabbit Help You

TaskRabbit Is First to Use Word Graph API to Boost Real-Time Transactions Through Deeper Understanding of Words

SAN MATEO, CA–(Marketwire – Jul 11, 2011) – Wordnik, maker of the web’s first word navigation system, today announced its new Word Graph API for online content and commerce partners. Developed using Wordnik’s Word Graph — the world’s largest and most comprehensive graph of words and their meaning — the API provides the first-ever automated context discovery capability to help partners offer increased value to their users beyond standard word look-up. The Word Graph API builds into a partner’s content the ability to recognize the relationship between disparate words thereby creating more accurate and deeper discovery.

The Word Graph API is currently being used by TaskRabbit, an online and mobile marketplace that brings people in a community together to get things done. A two-way marketplace, TaskRabbit connects ‘TaskPosters,’ people who need extra help, with ‘TaskRabbits,’ a network of background-checked and pre-approved individuals who have the skills and time available to complete Tasks. The API enhances three components of the TaskRabbit service: recommending which tasks should be targeted to TaskRabbits; matching relevant tasks so a user creating a new posting can instantly see related tasks (which can help determine the most appropriate description and pricing); and auto-tagging content, which was previously done manually.

“We’ve always respected Wordnik for its outstanding presentation of words and their meanings,” said Leah Busque, founder and CEO of TaskRabbit. “When we learned that Wordnik was making this technology available to companies, we were on board immediately. User-generated content is key to how TaskRabbit works and accurate results are critical for our business. The Word Graph API enables us to enhance our service and provide a great customer experience through relevant results.”

To create the Word Graph, Wordnik developed a number of proprietary techniques to both discover the meaning of new terms and analyze how they are used, which ultimately captures nearly unlimited relationships between words. By using a graph structure, Wordnik is able to identify relationships in real-time that were not possible until now, such as finding similarly used words; finding word usage and trends within text content; and performing rapid clustering of terms. These capabilities are now being leveraged to provide value to content and commerce partners — with TaskRabbit being the first.

“We are excited to partner with TaskRabbit, as it signals a new realm of opportunities for Wordnik, which we look forward to cultivating,” said Joe Hyrkin, CEO of Wordnik. “There’s a universe of online content publishers and commerce partners that can benefit almost immediately from our Word Graph’s unique capabilities. We make it easy for partners to leverage our context discovery expertise to add value to their sites, engage new users and solve complex site problems related to their text content.”

About Wordnik

Wordnik is the first word navigation system that helps consumers unlock the value of words and phrases to discover what information is most meaningful and matters to them. Unlike search engines that provide an overabundance of information or online dictionaries that are static or limited to general information, Wordnik helps consumers zero in and fully understand words and content in context. Wordnik’s team includes experts in search engine architecture, social networking, computational linguistics and lexicography. For more information, visit http://www.wordnik.com, follow us on Twitter, or Facebook. To find out more about the Wordnik’s APIs, visit http://developer.wordnik.com. Wordnik investors include Roger McNamee, Steve Anderson of Baseline, Mohr Davidow Ventures, Floodgate, Radar Partners, SV Angel, and Lucas Venture Group.

About TaskRabbit

TaskRabbit, the nation’s first service networking platform, is pioneering the way people get things done. TaskRabbit leverages the latest technology and the social networking movement to bring neighbors together to live smarter and more efficiently. A two-way marketplace, TaskRabbit connects ‘TaskPosters,’ people who need help, with ‘TaskRabbits,’ a network of pre-approved and background-checked individuals, who have the time and skills needed to complete the job. Based in San Francisco, Calif., TaskRabbit is backed by notable investors, including Baseline Ventures, First Round Capital, FLOODGATE Fund, Collaborative Fund, and Shasta Ventures. The company has been featured in the Wall Street Journal as well as on NBC’s The Today Show. For more information, visit http://www.taskrabbit.com.

“New Word Graph API Takes Wordnik From Fun and Funky Apps to Some Serious Business Services”

from the post at Semantic Web:

You may know Wordnik from subscribing to its Word of the Day service (by the way, today that word is eloign). Or perhaps you know it from some of the apps that have used its API – such as Freebase WordNet Explorer, or one of the many mobile ones that let users access direct features of the system through their smart phones.

Now comes something new on the API front: Word Graph is the latest result of some three years of algorithm development around analyzing the digital text that Wordnik has collected from partners, to understand the relationship between words in order to derive meaning. Word Graph matches content based on digital text from partners who need to understand more of what their content says and is, and to help them and their services make decisions based on that understanding.

In that respect, it’s taking Wordnik’s API services closer to helping accomplish business requirements, rather than drive neat B-to-C apps, from crossword puzzles to jumble games to pronunciation voice services, where its APIs have currently mostly been employed.

The first partner to use the API is TaskRabbit, an online service that matches task creators (e.g. someone who needs child care) with task runners (e.g. babysitters). Previous to integrating the API into its business logic tier, the key to-dos of the service were all manually accomplished, says Tony Tam, co-founder and vp of engineering at Wordnik. Submitted tasks, for instance, would need to be manually categorized, but now the system has been trained, based on TaskRabbit content, to appropriate treat terms from its domains. That is, for example, to understand that babysitting and child care are roughly the same thing, and to automatically categorize together the tasks submitted with the various terms. Now it knows to show task runners who perform those services tasks that used either term; in fact, it can find those task runners who’ve done a certain type of job (whether it’s called babysitting, child care, mother’s helper, day care, and so on) multiple times, and tell them about new tasks in the same vein. Similarly, for task posters, the API is used to match relevant tasks, so that they can quickly see how others with categorically similar requirements have posted their tasks, what they’re offering as fees, and possibly revise their own job postings to be better and more competitive matches.

“The goal is to match these task posters with the task runners as efficiently as possible based on the content of that task,” says Tam. But he sees potential for other ways products in many different verticals would benefit from recommending or matching content based on digital text – online publishing among them, of course. Why it’s different from other semantically-oriented attempts to do the same, Tam says, is that “our whole existence is built on the concept of marrying lexicography with computational linguistics.” It’s captured billions of words of English over its lifetime to feed its Word Graph word relationship graph and developing analytical algorithms so that it can do very strong recommendations and matching without a large training set. “So the Graph itself is one of our strongest tools in our toolbox,” he says. “We are taking a very different approach as far as how we can apply user behavior and content from the digital text on top of each other. We may be looking at similar problems but our approach is radically different.”

One of the important capabilities around its Word Graph is accounting for how dynamic language is – even when meaning seems undercut by cacography (yes, it is too a word) or something else. Tam estimates that roughly 200 words are created in the online digital set every day – perhaps unintentionally because of a misspelling, perhaps thanks to a new Twitter hashtag, or maybe in response to something taking place in society — the branding of Charlie Sheen as a ‘shenius,” for instance, Tam offers. “That went into our Word Graph and kicked into our algorithms,” he says. “So when you analyze text with current events and words, if you don’t know that relationship, the ability to do real processing is severely hampered. That is really core about what Wordnik is doing and is essential in building out a graph of words so we can make those associations. It’s not fair for me to say I can match your content only as long as it’s perfect. By taking text shorthand, misspellings, Twitter hashtags and so on, and and translate those into something that can be understood, now we can do real analysis on text.”

What Wordnik also is doing in the next little while is open sourcing its infrastructure to help solve real-world problems for API developers. Much of this will reflect the scaling expertise that Wordnik has been building, having had on its own plate dealing with documents that are millions of words long and that need to be processed efficiently in real time. Tam’s own background is in that area, including expertise in federated data query technologies, and Wordnik aims at making the scale big enough so it can be used at run time for tens of millions of nodes and millions of edges. He notes that Wordnik is one of the larger known instances of Mongo DB and uses the Scala programming language that runs on the Java Virtual Machine platform, and it also leverages cloud computing for locality requirements.

All-Star Words

In honor of tomorrow’s All-Star Game, we’re happy to announce a new word-of-the-day list from Paul Dickson, the author of The Dickson Baseball Dictionary, Third Edition (and more than fifty other books)!

Why are baseball words so hugely entertaining (even if you’re not a hard-core baseball fan)? Paul puts it best:

Baseball is a metaphoric circus.

The game has a particular infatuation with what one critic of sportswriters termed “the incorrect use of correct words.” There are hundreds of examples, but the point can be made by simply listing a selection of synonyms for the hard-hit ball or line drive. It is variously known as an aspirin, a BB, a bolt, a clothesline, a frozen rope, a pea, a rocket, and a seed. A player’s throwing arm seems to be called everything but an arm: gun, hose, rifle, soupbone, whip, and wing, to name just a few. The arm is not the only renamed body part. From top to bottom, players have lamps (eyes), a pipe (neck or throat), hooks (hands), wheels (legs), and tires (feet).
So many allusions are made to food and dining, including pitches that seem to fall off the table, that a fairly well-balanced diet suggests itself in terms like can of corn, cup of coffee, fish cakes, banana stalk, mustard, pretzel, rhubarb, green pea, juice, meat hand, grapefruit league, and tater. Among the many terms for the ball itself are apple, cantaloupe, egg, lemon, orange, pea, potato, and tomato. Implements? There is the plate (also known as the platter, pan, and dish) and, of course, the forkball. Dessert? The red abrasion from a slide into base is a strawberry and the fan’s time-honored sound of disapproval is a raspberry.
The game proudly displays its rustic roots and there is a tone to the language of the game that is remarkably pastoral. If any imagery dominates, it is that of rural America. Even under a dome, it is a game of fields and fences, where ducks [sit] on the pond and pitchers sit in the catbird seat. New players come out of the farm system and a farm hand who pitches may get to work in the bullpen.

You can sign up to receive Paul’s baseball words here; and if you have a Nook Color, we’ll be featuring baseball words this whole week via the Wordnik Nook Word of the Day app! [Want a hard copy of the Dictionary? You can find one at any of these fine booksellers.]

We hope you enjoy exploring this rich slice of English … and if you’ll be in Pasadena, CA this Sunday, the 17th of July, you can also see Paul Dickson at the Baseball Reliquary, where he will be presented with the 2011 Tony Salin Memorial Award for his commitment to the preservation of baseball history.