12 Months with MongoDB

Happy Monday everyone!

As previously blogged, Wordnik is a heavy user of 10gen’s MongoDB. One year ago today we started the investigation to find an alternative to MySQL to store, find, and retrieve our corpus data. After months of experimentation in the non-relational landscape (and running a scary number of nightly builds), we settled on MongoDB. To mark the one-year anniversary of what ended up being a great move for Wordnik, I’ll describe a summary of how the migration has worked out for us.

Performance. The primary driver for migrating to MongoDB was for performance. We had issues with MySQL for both storage and retrieval, and both were alleviated by MongoDB. Some statistics:

Mongo serves an average of 500k requests/hour for us (that does include nights and weekends). We typically see 4x that during peak hours

We have > 12 billion documents in Mongo

Our storage is ~3TB per node

We easily sustain an insert speed of 8k documents/second, often burst to 50k/sec

A single java client can sustain 10MB/sec read over the backend (gigabit) network to one mongod. Four readers from the same client pull 40MB/sec over the same pipe

Every type of retrieval has become significantly faster than our MySQL implementation:

– example fetch time reduced from 400ms to 60ms
– dictionary entries from 20ms to 1ms
– document metadata from 30ms to .1ms
– spelling suggestions from 10ms to 1.2ms

One wonderful benefit to the built-in caching from Mongo is that taking our memcached layer out actually sped up calls by 1-2ms/call under load. This also frees up many GB of ram. We clearly cannot fit all our corpus data in RAM so the 60ms average for examples includes disk access.

Flexibility. We’ve been able to add a lot of flexibility to our system since we can now efficiently execute queries against attributes deep in the object graph. You’d need to design a really ugly schema to do this in mysql (although it can be done). Best of all, by essentially building indexes on object attributes, these queries are blazingly fast.

Other benefits:

We now store our audio files in MongoDB’s GridFS. Previously we used a clustered file system so files could be read and written from multiple servers. This created a huge amount of complexity from the IT operations point of view, and it meant that system backups (database + audio data) could get out of sync. Now that they’re in Mongo, we can reach them anywhere in the data center with the same mongo driver, and backups are consistent across the system.

Capped collections. We keep trend data inside capped collections, which have been wonderful for keeping datasets from unbounded growth.

Reliability. Of course, storing all your critical data in a relatively new technology has its risks. So far, we’ve done well from a reliability standpoint. Since April, we’ve had to restart Mongo twice. The first restart was to apply a patch on 1.4.2 (we’re currently running 1.4.4) to address some replication issues. The second was due to an outage in our data center. More on that in a bit.

Maintainability. This is one challenge for a new player like MongoDB. The administrative tools are pretty immature when compared with a product like MySQL. There is a blurry hand-off between engineering and IT Operations for this product, which is something worth noting. Luckily for all of us, there are plenty of hooks in Mongo to allow for good tools to be built, and without a doubt there will be a number of great applications to help manage Mongo.

The size of our database has required us to build some tools for helping to maintain Mongo, which I’ll be talking about at MongoSV in December. The bottom line is yes–you can run and maintain MongoDB, but it is important to understand the relationship between your server and your data.

The outage we had in our data center caused a major panic. We lost our DAS device during heavy writes to the server–this caused corruption on both master and slave nodes. The master was busy flushing data to disk while the slave was applying operations via oplog. When the DAS came back online, we had to run a repair on our master node which took over 24 hours. The slave was compromised yet operable–we were able to promote that to being the master while repairing the other system.

Restoring from tape was an option but keep in mind, even a fast tape drive will take a chunk of time to recover 3TB data, let alone lose the data between the last backup and the outage. Luckily we didn’t have to go down this path. We also had an in-house incremental backup + point-in-time recovery tool which we’ll be making open-source before MongoSV.

Of course, there have been a few surprises in this process, and some good learnings to share.

Data size. At the MongoSF conference in April, I whined about the 4x disk space requirements of MongoDB. Later, the 10gen folks pointed out how collection-level padding works in Mongo and for our scenario–hundreds of collections with an average of 1GB padding/collection–we were wasting a ton of disk in this alone. We also were able to embed a number of objects in subdocuments and drop indexes–this got our storage costs under control–now only about 1.5-2x that of our former MySQL deployment.

Locking. There are operations that will lock MongoDB at the database level. When you’re serving hundreds of requests a second, this can cause requests to pile up and create lots of problems. We’ve done the following optimizations to avoid locking:

If updating a record, we always query the record before issuing the update. That gets the object in RAM and the update will operate as fast as possible. The same logic has been added for master/slave deployments where the slave can be run with “–pretouch” which causes a query on the object before issuing the update

Multiple mongod processes. We have split up our database to run in multiple processes based on access patterns.

In summary, life with MongoDB has been good for Wordnik. Our code is faster, more flexible and dramatically smaller. We can code up tools to help out the administrative side until other options surface.

Hope this has been informative and entertaining–you can always see MongoDB in action via our public api.

Tony

31 thoughts on “12 Months with MongoDB”

Thanks for the post! I am looking forward to you guys open-sources your tools.

Now that you’ve outlined the kind of volume/time you process on your cluster, can you maybe tell a bit about the topology and maybe also hardware specs of the nodes you use?

Cheers,
Markus

Hi Markus, we use dual quad-core 2.4GHz intel cpus with 72GB ram. They are physical servers and in master-slave mode and use 5.3TB LUNs on our DAS.

Typically, we are never CPU constrained. The system is limited by the DAS speed.

Curious, when you say you always query before update, do you do two discrete operations, a query and an update, or do you roll those into one using the findandmodify operation?

@Daniel We do a query, then an update.

Hey Tony, What do you think about the hosted service for mongo, MongoHQ? Not for Wordnik specifically (I see that Wordnik uses dual quad-core 2.4GHz intel cpus with 72GB ram) but in general, for someone running their app inside inside AWS. What are your thoughts on using a managed MongoHQ instance instead of running/maintaining your own?

Coming from the world of relational databases, 24 hours offline because of datastore corruption seems pretty scary. I’m not sure how MySQL would handle that but this is one of the reasons people pay for Oracle.

In any account, this was a very interesting article and it’s great to know how your team is tackling these challenges. Good work!

I saw your post referenced on YCombinator. We have been exploring the use of Mongo vs. CouchDB. I am curious if you evaluated Couch, and if so, why you chose Mongo?

Also, have you used any of Mongo’s geospatial features and if so, how have you found them to be vis a vis MySQL or other relational geospatial technologies?

Great post, and thanks for sharing. Looking forward to the new tools emerging.

Thanks for the write-up.

Fernando, I just wanted to point out that, even if you pay for Oracle, it doesn’t make you immune to outages. I think the JPMorgan Chase outage, last month, was greater than 24 hours. No chase.com logins, no ACH, no loan applications, no private client trading portfolio access. And all the finger pointing was at Oracle.

That said, I don’t think document-oriented datastores are a one-size-fits-all solution, but I can see how it fits in nicely with wordnik’s data model. Thanks again!

Fernando,
Database server unavailable for 24 hrs? scary yes, but it happens. It all boils down on how critical the application is and accordingly one comes up with plans for backup and redundancy. Oracle DB prior to version 7, was known to fail – now it is stable as a rock. MongoDB is evolving, it too will find ways to be more stable and ‘enterprise’ ready. Do not forget this is a product of volunteers coming together to create an open-source product.

Tony – thanks for writing such a useful article, with details on throughput based on real numbers. I found it excellent for sizing.

Cheers
Atul

Tony,

You say you are running MongoDB 1.4.x. Any plans to move to version 1.6x or beyond and take advantage of the replica sets and/or sharding?

Excellent article BTW. Looking forward to your open source tools.

About outages, MTTF, etc. … In my experience the question is not whether or not a piece of technology will fail, it is intristic by its nature that it will.

The question is more like “how often”, “for how long” and “what can you do” to ease the effects caused. If you have a look at the 10gen website, you will notice that differential pricing is in place
already. With it comes different tiers/layers of support/marketing and thus different ways/opportunities to improve a product.

Now that MongoDB is still young, what we can expect is a slowdown on major features over time but at the same time a improvement in terms of edge-case failures/scenarios and increased intervals of MTTF. Also, the UI features/goodies will go up (read more reliable/comfortable command line and GUI utils etc.).

However this is nothing new or unusual as it is the usual cause with any technology (especially DBMSs). I assume people will, at some point, have the same level of confidence and trust in MongoDB as they
already place in products from Oracle these days. If 10gen manages to build out this intermediary support layer (which I believe will happen assuming the right people take on the job) then MongoDB will become a valuable choice, more often than not.

Very nice trick, the query-before-update, might help us here a lot. One question though: do you fetch all data of the query over the network or are you doing something more advanced which forces the server to put the data into RAM without the need to fetch it. Eval() is out of the question because it locks everything. I’m thinking of something like db.find({_id: {$in: […]}, ‘_dummy_field’:0}).count()

Cheers, Swen

This was a very interesting write up. Its great to see the support you’re getting from 10gen.

I dont understand query before update part – you never do any in-place update like $push or $inc in your systems? Or you query that record just so it gets in memory and then do the in-place update call?

@Mark, Yes, we’ll be upgrading very shortly.

@Marc We do use those operations but typically on single documents. So if you fetch the object first, the operation will be faster.

@Swen You just need to find the document that you’re going to modify. I should mention that if the *lookup* of the document is expensive, this trick might not help much.

@Ayush I think a shared instance might be tricky until 1.8–there are quite a few ways to require a database-level lock. But the dedicated instances probably work well.

What does Wordnik do that you have such a large database and move so much information?

Can you comment on your data modeling?
What is the average size of a document?
How extensive you do you use embedded documents? (ie, 4 levels of nesting?)

I’ve been investigating MongoDB and similar systems for some projects. I’m curious to learn more about the scalability of GridFS. How much data are you guys storing in GridFS, and what size of files?

@Kamil, we store audio files of 50-250kb. Got roughly 100k of them right now. We haven’t stressed that part of the system but we are gearing up for serving audio out to our API users, which will increase the traffic to the files substantially

Can you comment about your data modeling?

How many levels of embedded documents do you use?

On average, how large is a document in a collection?

Do you have to deal with de/serializing between your API and your web frontend?

when you said java client, did you mean thread?

@tommy, documents are pretty similar to what you see over our public api, with the exception of fields used for internal use. Some of the doucments (dictionary model http://github.com/wordnik/api-examples/blob/master/docs/dictionary.xsd) has 6 levels of hierarchy (if I remember correctly).

Our web front end uses the same REST api as the public API, so yes, we consume JSON in the templating layer. The mapping between JSON/XML in mongo and over the REST front end is done a couple different ways, via proprietary ODM. In a couple cases we work against the DBObject structure sent by the java driver, in others we rely on Jackson (http://jackson.codehaus.org/) to convert MongoDB JSON into java objects. We use two different mappers to give different behavior between the object annotations, which works out really well.

@DC, no I meant separate VM instances, not threads. Since we use the driver in the recommended manner (singleton), we wanted to benchmark the throughput without concerns of blocking in the driver. I’ll repost after I get a chance to rerun the benchmark with a multi-threaded client (same VM).

i have the same one here did it mean thread

Great numbers, thanks for publishing it.
I had one question about data distribution.
How were you sharding the data?

I am really intrigued by mono DB now.

Hey, do you have a facebook fanpage?

Yes, here it is: https://www.facebook.com/wordnik.fans

Comments are closed.