October 11, 2012 by tonytam

Swagger codgen 2.0 Released!

Swagger codegen is now at 2.0! What does this mean for you?

Swagger codegen is a way to take advantage of the API declarations provided by swagger. Yes you get a beautiful user interface! But having knowledge of your API allows for some amazing things. Swagger codegen provides the structure to take this to the next level.

The code generation is NOT limited to just client libraries. You can see how an interface-driven development process can facilitate your server code as well through some very popular server frameworks (javascript with nodejs, ruby with sinatra, scala with scalatra). But these are just examples–you can easily take advantage of your own framework with codegen. Even more, you can use the mustache template structure to write static documentation, supporting classes, test harnesses, etc. We think you’ll find it quite valuable.

Yes the client library generation is hugely valuable. In the 2.0 release we’ve added support for ruby clients, python3, and objective-c. Again, the templates are YOURS to modify to suit your coding style.

Finally, and most importantly, swagger-ui, swagger-core, swagger-codegen is open source. We have open sourced this framework to give back to the OSS community which has helped Wordnik so substantially. Plus, it’s WAY too cool to keep in-house only.

You can find swagger-codegen in maven central and github.

Thanks, and as always, please feel free to post questions on our support group or in irc at #swagger

The Wordnik Engineering Team
http://developer.wordnik.com

April 26, 2012 by jfarcand - 4 Comments

Introducing SwaggerSocket: A REST over WebSocket Protocol

Today we are proud to announce the first release of SwaggerSocket, a REST over WebSocket Protocol.

Why?

REST is a style of software architecture which is almost always delivered over HTTP. It powers most of the World Wide Web and has enabled the rapid construction of APIs–it has allowed developers to easily integrate many complex services with their applications. It has been truly transformational over the last 10 years.

The problem is, HTTP is a chatty, synchronous communication protocol based entirely on “request/response”. That is, there is no easy way to open continuous communication between a client and a server. You have to “ask a question” and wait for the response. There have been a number of techniques developed to make HTTP less “chatty”–these include long-polling, Comet, HTTP streaming and recently, web sockets. While REST over HTTP has transformed “over-the-internet” communication, it is usually a poor choice for high-throughput or asynchronous communication. For example, most software programs do not communicate to their databases over HTTP. It’s typically too slow because of HTTP overhead.

Websockets, however, have provided a new type of communication fabric for the Internet. By being full duplex–meaning, a program can both ask a question and listen to a response simultaneously–and truly asynchronous, they can not only speed up software by “doing more things at once”, they can provide a more efficient pipe to send information through.

The challenge, though, is that WebSockets are simply a protocol for communication–they do not define the structure itself. All the goodness that came with REST (structure, human readability, self-description) is abandoned for the sake of efficiency. This is why Wordnik created SwaggerSocket!

How Does SwaggerSocket Work?

First, let’s see how SwaggerSocket improves performance over a typical REST implementation:

Note this is a simple, sample of REST over WebSocket performance. A follow-up post will go into the gory details of testing methodology and scenarios, as well as to the theoretical why performance will be better. For the record, the above graph was produced on an Amazon M1.Large EC2 server.

The REST resource used for testing looks like this simple Swagger annotated resource:

object Counter {
  @volatile var count: Double = 0 
  def increment: Double = { count += 1 count } } 

trait TestResource {
@GET
@Path("/simpleFetch") 
@ApiOperation(value = "Simple fetch method", notes = "", 
  responseClass = "com.wordnik.demo.resources.ApiResponse")
@ApiErrors(Array( new ApiError(code = 400, reason = "Bad request")))
def getById(): Response = {
  Counter.increment Response.ok.entity(new ApiResponse(200, "success")).build }
} 

@Path("/test.json") 
@Api(value = "/test", description = "Test resource")
@Produces(Array("application/json")) 
class TestResourceJSON extends Help
  with ProfileEndpointTrait
  with TestResource

On the client we incrementally hit the resource by either using the Jersey REST Client (for REST) or using SwaggerSocket Scala Client for WebSocket (based on AHC). As you can see, SwaggerSocket can easily outperform normal REST requests. To keep the comparison fair, the SwaggerSocket client used is blocking waiting for the response. This is the same way that a typical REST client would behave–using an asynchronous programming style will not only be faster for the client, but it allows the server to degrade more gracefully under load as well as perform batch operations transparently.

Introducing the SwaggerSocket Protocol

The SwaggerSocket Protocol is:

Pipelined: unlike the HTTP protocol (unless HTTP Pipelining is used), when you open a connection and send a request, you don’t have to wait for the response to be arrive before being allowed to send another request. SwaggerSocket requests can be sent without waiting for the corresponding responses. Unlike HTTP Pipelining (which is not supported by all browsers), SwaggerSocket works with any browser supporting WebSocket.
Transparent: The SwaggerSocket Protocol implementation from Wordnik is transparent and doesn’t require modification of existing applications. For example, any existing JAX RS/Jersey applications can work with no changes. For the client side, an application can either use the SwaggerSocket’s Javascript library, SwaggerSocket’s Scala library or open WebSocket and manipulate the protocol object.
Asynchronous: The SwaggerSocket Protocol is fully asynchronous. A client can send several requests without waiting or blocking for the response. The server will asynchronously process the requests and send responses asynchronously.
Simple: The SwaggerSocket Protocol uses JSON for encoding the requests and responses. This makes the protocol easy to understand and to implement.

How does it work?

The SwaggerSocket Protocol uses JSON to pass information between clients and servers via a WebSocket connection. A client first connect to the server and wait for some authorization token. On success, the client can start sending requests and gets responses asynchronously.

Software Languages Supported

SwaggerSocket as a protocol can be implemented in nearly all programming languages. The Wordnik implementation currently supports Java, Scala, Groovy and JRuby for the server components, and ships with a Scala and Javascript client library. The swagger-codegen will be updated to support SwaggerSocket for other clients

How can I try SwaggerSocket?

The easiest way to try SwaggerSocket is to go to our Github site and download one of the sample and read our Quick Start. Details of the SwaggerSocket Protocol can be read here.

Getting involved

To get involved with SwaggerSocket, subscribe to our mailing list or follow us on Twitter or fork us on Github!

March 19, 2012 by tonytam - 4 Comments

With Software, Small is the new Big

For almost 2 years, Wordnik had been running its infrastructure on really fast, dedicated servers—with lots of storage, RAM and very impressive CPU + I/O. Seeing our API firing on all 8 cores using 60GB of RAM in a single process was a beautiful thing—and end users got to enjoy the same benefits. Huge corpus of MongoDB data hit anywhere, anytime and fast. Using just one of these boxes, we could push thousands of API calls a second, hitting mysql, mongo, memcached, Lucene, etc.

But we turned all that off almost 2 weeks ago and quietly moved to the humble virtual servers of Amazon’s AWS infrastructure. We now have our choice of relatively low-performance virtualized servers with significantly less horsepower. In the operations world of “CPU, memory, I/O” we can now “choose one of the above”. What gives? Is this all about saving money? And how will this affect Wordnik’s infrastructure?

Before I dig into the details let me summarize a couple facts. First, our system runs almost exactly the same speed in EC2 as it did on those beefy servers (I’ll get to the reason why). Next, there’s no less data than before. In fact we haven’t slowed down the growth of our corpus of text. Finally, yep, we saved money. But that wasn’t the goal (nice side effect though)!

So first off, the reason for the move is really quite simple. We needed multi-datacenters to run our system with higher availability. We also need the ability to burst into a larger deployment for launches and heavy traffic periods. We needed to be able to perform incremental cluster upgrades and roll out features faster. Sounds like an advertisement for a cloud services company, right?

Well when it comes down to it, your software either works or it doesn’t, and nothing puts it to a better test than switching infrastructure to something that’s intrinsically less powerful on a unit basis. And in the process of switching to the cloud is certainly not free unless you’re running a “hello world” application. If you wrote software to take advantage of monster physical servers, it will almost certainly fail to run efficiently in the cloud.

When we started the process of migrating back to EC2 (yes, we started here and left it—you can read more here) it was clear that a number of things had to change or we’d have to run the largest, most expensive EC2 instances which — believe it or not — actually cost more money to run than those big servers that we were moving away from. The biggest driver was our database. We have been using MongoDB for 2 years now, but in that time, a chunk of data has remained in MySQL in something on the order of 100GB of transactional tables. Simply restoring one of these tables and rebuilding indexes takes days on a virtual server (I’m sure someone out there could make it faster). We knew that this had to change.

So this data was completely migrated to MongoDB. That was no small feat, as it was simply the oldest, most crusty Java relic in our software stack (we switched to Scala almost 18 months ago). One of our team members, Gregg Carrier, tirelessly worked this code into MongoDB and silently migrated the data about a week before our datacenter move. Without doing this, we couldn’t have made the datacenter move.

Also on the data front, we had a number of monster MongoDB databases — these have been operational ever since we first transitioned our corpus to MongoDB! Back to the previous point, these would have been a challenge to run with good performance on the EC2 instances with their weak I/O performance. Our disk seeks in the cloud were tested out to be as much as 1/10th the performance.

To address this, we made a significant architectural shift. We have split our application stack into something called Micro Services — a term that I first heard from the folks at Netflix. The idea is that you can scale your software, deployment and team better by having smaller, more focused units of software. The idea is simple — take the library (jar) analogy and push it to the nth degree. If you consider your “distributable” software artifact to be a server, you can better manage the reliability, testability, deployability of it, as well as produce an environment where the performance of any one portion of the stack can be understood and isolated from the rest of the system. Now the question of “whose pager should ring” when there’s an outage is easily answered! The owner of the service, of course.

This translates to the data tier as well. We have low cost servers, and they work extremely well when they stay relatively small. Make them too big and things can go sour, quickly. So from the data tier, each service gets its own data cluster. This keeps services extremely focused, compact, and fast — there’s almost no fear that some other consumer of a shared data tier is going to perform some ridiculously slow operation which craters the runtime performance. Have you ever seen what happens when a BI tool is pointed at the runtime database? This is no different.

MongoDB has replica sets, and making these smaller clusters work is trivial — maybe 1/10th the effort of setting up a MySQL cluster. You add a server, it syncs, becomes available to services. When you don’t need it, you remove it from the set and shut it down — it just doesn’t get much easier than this.

We got the performance of these smaller clusters quite respectable by running them on ephemeral (volatile) storage. To keep our precious data safe, we run one non-participating server on an EBS-backed volume in a different zone. Whoa — you might ask — what happens if your data gets too big for that ephemeral storage? Easy! If the data gets too big, it’s time to shard/partition. This is a self-imposed design constraint to keep the data tier both performant and manageable. If it gets too big, other problems will arise, cost being one of them.

Finally, if you have all these services floating around, how do you communicate? Isn’t this a communication and versioning nightmare?

Yes! It definitely can be. To solve the communication issues, we developed Swagger. This defines a simple mechanism for how you communicate with our APIs—both internally and externally. If your service is going to be used in the cluster, it needs to expose this spec. Then all consumers have a contract as to how to communicate. You can see more about Swagger here.

For the versioning challenge, we developed an internal software package named Caprica. This is a provisioning and configuration management tool which provides distributed configuration management based on version and cluster compatibility. That means that a “production cluster” in us-west-1c will have different services to talk to than a “development cluster” in the same zone. I’ll cover more about Caprica in another post, it’s a really exciting way to think of infrastructure.

Hopefully this has been entertaining! Find out more about swagger-izing your infrastructure in this short deck.

December 9, 2011 by tonytam - 1 Comment

What’s with the Swaggering?

If you’re a developer and have seen the Wordnik Developer documentation, you might have noticed some links to raw JSON like this. Behind all that lovely notation is the Swagger API framework. We needed to solve some recurring sources of pain at Wordnik, and we knew how our needs were going to evolve, so we took a proactive step and built Swagger.

First, our documentation was really tough to keep up to date. When you’re in a situation where your capabilities precede your documentation, you can end up in a tough spot: Your users are unable to benefit from the tools you’ve worked so hard to build, or they end up using them the wrong way. Both are bummers for developers, who typically have many choices and should get your best efforts.

As we made updates to our API, it got harder and harder to keep our client libraries current. Add an API, modify all your drivers. As with documentation, this is an unnecessarily tedious thing to do. Our developer community helped out tremendously by open-sourcing a number of different libraries, but this led to “driver drift”. Our developers shouldn’t have to worry about writing our code! It’s our job to make it easy.

Next, we needed a way to create APIs for our partners. Guess what? The same two previous issues apply. More busywork for us.

Finally, we needed a zero-code way to try out our API. A real sandbox — not white papers, video tours, slide decks. A full-featured mechanism to call the API without monkeying with code.

So that was our goal. The outcome is what we now call Swagger. So how does this thing work? Should you use it?

Our server produces a Resource Declaration. This is like a sitemap for the API — it tells what APIs are available for the person asking. Who is asking? Well, if you pass your API key, the Swagger framework shows what is associated with that principal! That keeps sensitive APIs from being exposed. It also gives you a way to let people try out new features in an incremental fashion. It’s completely pluggable and we provide some simple, demo implementations.

Follow one of the paths to an API and you get all the operations available to the client, an operation being an HTTP method + a resource path. Look further and you’ll see something useful — the input and output models! Now you know what you’re going to get before you call. It’s a contract between your server and the client.

So that’s all fine and dandy but what’s the use? How do we address the pain points from above?

Well, once you know both how to call an API and what you get back, you can do some interesting things. First, how about a sandbox? There’s really not much to it — you can see it in action at petstore.swagger.wordnik.com and developer.wordnik.com. Neat! Simple sandbox, calls your API exactly how you would from your code. You can try out those different levers to calling an API without editing your source. Heck, even your boss can try out the API!

Next, how about clients? Well, we wrote a code generator which creates clients in a number of languages. Don’t like our codegen style? We’re not offended! It’s template driven. Make your own templates, or even your own code generator. Best of all, change your API and rebuild your client libraries. It’s all *automatic*. And for those folks who want special APIs? Build them their own client by passing their API key in. Really, it works.

Finally, documentation. Wasn’t that the first point in this post? Yes! If you look in one of our sample Swagger apps, you’ll see how this is accomplished.

See that code? The documentation is built in. This function defines the GET method of the /findByStatus path. There is a required query param with allowable values of “available, pending, sold”, with a default value of “available”. It returns a Pet object. Best of all, it serves as both the input declaration as well as the documentation system. See here:

http://petstore.swagger.wordnik.com/#!/pet/findPetsByStatus_GET

All of Swagger is open-source. Check out swagger.wordnik.com for a list of repos. More on the Swagger roadmap in an upcoming post!

August 12, 2011 by Angela Tung

Wordnik News: Swagger, Jobs, NoSQL Now

Here’s a roundup of the latest in Wordnik-related happenings.

Wordnik’s got Swagger. This week we released Swagger, a specification and complete framework implementation for describing, producing, consuming, and visualizing RESTful web services. The goal of Swagger is to enable client and documentation systems to update at the same pace as the server. The documentation of methods, parameters and models are tightly integrated into the server code, allowing APIs to always stay in sync. See the write-up on ReadWriteWeb for more information.

Jobs. Wordnik’s hiring! Check out the Jobs page for the latest openings. More to come!

NoSQL Now Conference. Our own Tony Tam will be speaking at the NoSQL Now Conference in San Jose on August 25, on the topic “What Drove Wordnik Non-Relational?” Sign up now!

Reminders. Remember that Erin McKean’s TED book, Aftercrimes, Geoslavery, and Thermogeddon: Thought-Provoking Words from a Lexicographer’s Notebook, is now available. Erin’s book takes a “revealing look at a torrent of new words and phrases—in science, politics, social life—that reveal our changing societies,” and is available on Amazon for the Kindle. Also take a look at Erin’s latest Boston Globe column for more collectible words.

Finally, don’t forget about the Wordnik-powered weekend feature in The Wall Street Journal, “The Week in Words,” a field guide to unusual words in that week’s WSJ issue. Here’s last week’s column and this week’s.

July 11, 2011 by Erin McKean

Wordnik’s Word Graph Helps TaskRabbit Help You

TaskRabbit Is First to Use Word Graph API to Boost Real-Time Transactions Through Deeper Understanding of Words

SAN MATEO, CA–(Marketwire – Jul 11, 2011) – Wordnik, maker of the web’s first word navigation system, today announced its new Word Graph API for online content and commerce partners. Developed using Wordnik’s Word Graph — the world’s largest and most comprehensive graph of words and their meaning — the API provides the first-ever automated context discovery capability to help partners offer increased value to their users beyond standard word look-up. The Word Graph API builds into a partner’s content the ability to recognize the relationship between disparate words thereby creating more accurate and deeper discovery.

The Word Graph API is currently being used by TaskRabbit, an online and mobile marketplace that brings people in a community together to get things done. A two-way marketplace, TaskRabbit connects ‘TaskPosters,’ people who need extra help, with ‘TaskRabbits,’ a network of background-checked and pre-approved individuals who have the skills and time available to complete Tasks. The API enhances three components of the TaskRabbit service: recommending which tasks should be targeted to TaskRabbits; matching relevant tasks so a user creating a new posting can instantly see related tasks (which can help determine the most appropriate description and pricing); and auto-tagging content, which was previously done manually.

“We’ve always respected Wordnik for its outstanding presentation of words and their meanings,” said Leah Busque, founder and CEO of TaskRabbit. “When we learned that Wordnik was making this technology available to companies, we were on board immediately. User-generated content is key to how TaskRabbit works and accurate results are critical for our business. The Word Graph API enables us to enhance our service and provide a great customer experience through relevant results.”

To create the Word Graph, Wordnik developed a number of proprietary techniques to both discover the meaning of new terms and analyze how they are used, which ultimately captures nearly unlimited relationships between words. By using a graph structure, Wordnik is able to identify relationships in real-time that were not possible until now, such as finding similarly used words; finding word usage and trends within text content; and performing rapid clustering of terms. These capabilities are now being leveraged to provide value to content and commerce partners — with TaskRabbit being the first.

“We are excited to partner with TaskRabbit, as it signals a new realm of opportunities for Wordnik, which we look forward to cultivating,” said Joe Hyrkin, CEO of Wordnik. “There’s a universe of online content publishers and commerce partners that can benefit almost immediately from our Word Graph’s unique capabilities. We make it easy for partners to leverage our context discovery expertise to add value to their sites, engage new users and solve complex site problems related to their text content.”

About Wordnik

Wordnik is the first word navigation system that helps consumers unlock the value of words and phrases to discover what information is most meaningful and matters to them. Unlike search engines that provide an overabundance of information or online dictionaries that are static or limited to general information, Wordnik helps consumers zero in and fully understand words and content in context. Wordnik’s team includes experts in search engine architecture, social networking, computational linguistics and lexicography. For more information, visit http://www.wordnik.com, follow us on Twitter, or Facebook. To find out more about the Wordnik’s APIs, visit http://developer.wordnik.com. Wordnik investors include Roger McNamee, Steve Anderson of Baseline, Mohr Davidow Ventures, Floodgate, Radar Partners, SV Angel, and Lucas Venture Group.

About TaskRabbit

TaskRabbit, the nation’s first service networking platform, is pioneering the way people get things done. TaskRabbit leverages the latest technology and the social networking movement to bring neighbors together to live smarter and more efficiently. A two-way marketplace, TaskRabbit connects ‘TaskPosters,’ people who need help, with ‘TaskRabbits,’ a network of pre-approved and background-checked individuals, who have the time and skills needed to complete the job. Based in San Francisco, Calif., TaskRabbit is backed by notable investors, including Baseline Ventures, First Round Capital, FLOODGATE Fund, Collaborative Fund, and Shasta Ventures. The company has been featured in the Wall Street Journal as well as on NBC’s The Today Show. For more information, visit http://www.taskrabbit.com.

July 11, 2011 by Erin McKean

“New Word Graph API Takes Wordnik From Fun and Funky Apps to Some Serious Business Services”

from the post at Semantic Web:

You may know Wordnik from subscribing to its Word of the Day service (by the way, today that word is eloign). Or perhaps you know it from some of the apps that have used its API – such as Freebase WordNet Explorer, or one of the many mobile ones that let users access direct features of the system through their smart phones.

Now comes something new on the API front: Word Graph is the latest result of some three years of algorithm development around analyzing the digital text that Wordnik has collected from partners, to understand the relationship between words in order to derive meaning. Word Graph matches content based on digital text from partners who need to understand more of what their content says and is, and to help them and their services make decisions based on that understanding.

In that respect, it’s taking Wordnik’s API services closer to helping accomplish business requirements, rather than drive neat B-to-C apps, from crossword puzzles to jumble games to pronunciation voice services, where its APIs have currently mostly been employed.

The first partner to use the API is TaskRabbit, an online service that matches task creators (e.g. someone who needs child care) with task runners (e.g. babysitters). Previous to integrating the API into its business logic tier, the key to-dos of the service were all manually accomplished, says Tony Tam, co-founder and vp of engineering at Wordnik. Submitted tasks, for instance, would need to be manually categorized, but now the system has been trained, based on TaskRabbit content, to appropriate treat terms from its domains. That is, for example, to understand that babysitting and child care are roughly the same thing, and to automatically categorize together the tasks submitted with the various terms. Now it knows to show task runners who perform those services tasks that used either term; in fact, it can find those task runners who’ve done a certain type of job (whether it’s called babysitting, child care, mother’s helper, day care, and so on) multiple times, and tell them about new tasks in the same vein. Similarly, for task posters, the API is used to match relevant tasks, so that they can quickly see how others with categorically similar requirements have posted their tasks, what they’re offering as fees, and possibly revise their own job postings to be better and more competitive matches.

“The goal is to match these task posters with the task runners as efficiently as possible based on the content of that task,” says Tam. But he sees potential for other ways products in many different verticals would benefit from recommending or matching content based on digital text – online publishing among them, of course. Why it’s different from other semantically-oriented attempts to do the same, Tam says, is that “our whole existence is built on the concept of marrying lexicography with computational linguistics.” It’s captured billions of words of English over its lifetime to feed its Word Graph word relationship graph and developing analytical algorithms so that it can do very strong recommendations and matching without a large training set. “So the Graph itself is one of our strongest tools in our toolbox,” he says. “We are taking a very different approach as far as how we can apply user behavior and content from the digital text on top of each other. We may be looking at similar problems but our approach is radically different.”

One of the important capabilities around its Word Graph is accounting for how dynamic language is – even when meaning seems undercut by cacography (yes, it is too a word) or something else. Tam estimates that roughly 200 words are created in the online digital set every day – perhaps unintentionally because of a misspelling, perhaps thanks to a new Twitter hashtag, or maybe in response to something taking place in society — the branding of Charlie Sheen as a ‘shenius,” for instance, Tam offers. “That went into our Word Graph and kicked into our algorithms,” he says. “So when you analyze text with current events and words, if you don’t know that relationship, the ability to do real processing is severely hampered. That is really core about what Wordnik is doing and is essential in building out a graph of words so we can make those associations. It’s not fair for me to say I can match your content only as long as it’s perfect. By taking text shorthand, misspellings, Twitter hashtags and so on, and and translate those into something that can be understood, now we can do real analysis on text.”

What Wordnik also is doing in the next little while is open sourcing its infrastructure to help solve real-world problems for API developers. Much of this will reflect the scaling expertise that Wordnik has been building, having had on its own plate dealing with documents that are millions of words long and that need to be processed efficiently in real time. Tam’s own background is in that area, including expertise in federated data query technologies, and Wordnik aims at making the scale big enough so it can be used at run time for tens of millions of nodes and millions of edges. He notes that Wordnik is one of the larger known instances of Mongo DB and uses the Scala programming language that runs on the Java Virtual Machine platform, and it also leverages cloud computing for locality requirements.