Mammatus Blog‎ > ‎

MongoDB Caveats and Warnings

posted Sep 23, 2012, 3:00 PM by Rick Hightower   [ updated Sep 23, 2012, 3:06 PM ]

Caveats

MongoDB indexes may not be as flexible as Oracle/MySQL/Postgres or other even other NoSQL solutions. The order of index matters as it uses B-TreesRealtime queries might not be as fast as Oracle/MySQL and other NoSQL solutions especially when dealing with array fields, and the query plan optimization work is not as far as long as more mature solutions. You can make sure MongoDB is using the indexes you setup quite easily with an explain function. Not to scare you, MongoDB is good enough if queries are simple and you do a little homework, and is always improving.

MongoDB does not have or integrate a full text search engine like many other NoSQL solutions do (many use Lucene under the covers for indexing), although it seems to support basic text search better than most traditional databases.

Every version of MongoDB seems to add more features and addresses shortcoming of the previous releases. MongoDB added journaling a while back so they can have single server durability; prior to this you really needed a replica or two to ensure some level of data safety. 10gen improved Mongo's replication and high availability with Replica Sets.

Another issue with current versions of MongoDB (2.0.5) is lack of concurrency due to MongoDB having a global read/write lock, which allows reads to happen concurrently while write operations happen one at a time. There are workarounds for this involving shards, and/or replica sets, but not always ideal, and does not fit the "it should just work mantra" of MongoDB. Recently at the MongoSF conference, Dwight Merriman, co-founder and CEO of 10gen, discussed the concurrency internals of MongoDB v2.2 (future release). Dwight described that MongoDB 2.2 did a major refactor to add database level concurrency, and will soon have collection level concurrency now that the hard part of the concurrency refactoring is done. Also keep in mind, writes are in RAM and eventually get synced to disk since MongoDB uses a memory mapped files. Writes are not as expensive as if you were always waiting to sync to disk. Speed can mitigate concurrency issues.

This is not to say that MongoDB will never have some shortcomings and engineering tradeoffs. Also, you can, will and sometimes should combine MongoDB with a relational database or a full text search like Solr/Lucene for some applications. For example, if you run into issue with effectively building indexes to speed some queries you might need to combine MongoDB with Memcached. None of this is completely foreign though, as it is not uncommon to pair RDBMS with Memcached or Lucene/Solr. When to use MongoDB and when to develop a hybrid solution is beyond the scope of this article. In fact, when to use a SQL/RDBMS or another NoSQL solution is beyond the scope of this article, but it would be nice to hear more discussion on this topic.

The price you pay for MongoDB, one of the youngest but perhaps best managed NoSQL solution, is lack of maturity. It does not have a code base going back three decades like RDBMS systems. It does not have tons and tons of third party management and development tools. There have been issues, there are issues and there will be issues, but MongoDB seems to work well for a broad class of applications, and is rapidly addressing many issues.

Also finding skilled developers, ops (admins, devops, etc.) that are familiar with MongoDB or other NoSQL solutions might be tough. Somehow MongoDB seems to be the most approachable or perhaps just best marketed. Having worked on projects that used large NoSQL deployments, few people on the team really understand the product (limitations, etc.), which leads to trouble. 

In short if you are kicking the tires of NoSQL, starting with MongoDB makes a lot of sense.

If you would like to learn more about MongoDB consider the following resources:

Comments