MongoDB (Single-Server) Data Durability Guide

If you are a MongoDB user or just interested in NoSQL databases in general, you may have seen the excellent “MongoDB has poor data durability by default!” (I am paraphrasing) conversation started by Mikeal Rogers.

It is an excellent topic to bring up and regardless of Mikeal’s association (he’s a CouchDB developer) I never once got the impression that he was setting up a straw man to knock it back down; he brought up some really excellent points about how MongoDB favors performance over single-server data durability by default which in turn spurred a great conversation between him, the MongDB/10gen team and some users from the community.

In this guide I want to outline a number of ways you can increase your single-server durability using MongoDB as well as some optional safer driver behaviors you can make use of to ensure fsync’ing as well as synchronous writes and lastly a tip on keeping your data more aggressively synced across all the nodes in your MongoDB cluster.

REMINDER: Before getting started, please note that any use of ‘fsync’ syncs all pending changes and not just the last change made.

  1. MongoDB Durability Overview
  2. fsync’ing MongoDB – from the command line
  3. fsync’ing MongoDB – from the command shell
  4. fsync’ing MongoDB – from a database driver
  5. MongoDB Synchronous Writes – getLastError
  6. MongoDB Synchronous Writes – Write Concern
  7. BONUS: “fsync’ing” MongoDB (1.5+) – across multiple servers
  8. HELP: How can I fixed a corrupted MongoDB database?
  9. Conclusion
  10. MongoDB 1.4 Command Line Reference

MongoDB Durability Overview

The ensuing conversation between Mikeal and the MongoDB/10gen team, like Mike and Kristina, was hugely valuable. If this type of conversation interests you as much as it does me, you can follow this thread “How reliable has MongoDB become?” and my own thread “How is a corrupt master repaired from a slave?” for more replies on the subject.

The MongoDB data-durability highlights that have come out of these conversations so far are:

  • MongoDB is not designed around single-server durability, but rather multi-server durability.
  • For better single-server durability (at the cost of performance) you can configure MongoDB to fsync at differing intervals (we’ll show you how below).
  • MongoDB will add support for better single-server durability in dev stream 1.7 and released to production in 1.8 (Issue #980). At the time of this writing it isn’t clear if this will be a “transaction log” style solution as mentioned on Mikeal’s blog comments, or if it will be some other strategy. 10gen hasn’t spec’ed it out yet.
  • The upcoming “Replica Sets” feature in MongoDB 1.5 (dev) and eventually 1.6 (production) are intended to fully address the problem of creating a high-availability cluster (2 or more) of MongoDB instances that all work together to keep the cluster operational with things like automatic failover and data recovery when a node is brought back online. For multi-server setups, this is what you want to care about.

Mongo’s core strategy of data reliability is still focused around the multi-server setup, but it is nice to see the addition of the “transaction log” going in to the server to help with single-server resilience.

fsync’ing MongoDB – from the command line

On the topic of single-server resilience, the best thing you can currently do (until transaction log support is added) is to increase the number of times you have MongoDB fsync.

fsync‘ing is an operating-system level operation that gets data out of volatile caches and commits it to the disk. Out of the box MongoDB performs an fsync every 60 seconds. What this means for you is that in a worst-case scenario, like a power outage, your server can loose up to 59.99 seconds worth of data (NOTE: I realize in a real worst-case scenario it could be more due to disk failure, but let’s keep this example manageable).

If you are working on a write-heavy server that does thousands of writes a second, loosing 59.99 seconds worth of data could be millions of records. If you deem this too heavy of a loss, you can modify how frequently MongoDB fsync’s to disk by using the –syncdelay=SECONDS command line argument like so:

mongod --syncdelay=5

Which will force the server to flush it’s caches to disk every 20 seconds regardless of what is going on.

While this might seem like a good idea, you could imagine in a high-activity production environment, fsyncing every few seconds regardless of the operation being performed and how heavy of a load the DB is under might be a bit too heavy handed; you might want to control exactly which operations get fsync’ed to disk and which ones don’t.

Luckily, you can do that!

fsync’ing MongoDB – from the command shell

As it turns out, MongoDB also has an internal “fsync” command you can execute from the MongoDB shell like so:

use admin
db.runCommand({fsync:1});

While will execute the fsync synchronously, waiting for it to complete before returning control back to the shell. If you would like to execute the fsync asynchronously which fires the command into the DB but then returns control back to the shell immediately, you can use the optional async:true argument:

use admin
db.runCommand({fsync:1,async:true});

NOTE: These examples are straight out of the MongoDB fsync documentation.

This is all well and good, but what about executing these commands from a driver connected to DB instead of needing to always have the Mongo shell up?

Great news, you can do that too!

fsync’ing MongoDB – from a database driver

MongoDB’s fsync functionality can be invoked from almost all of the DB drivers supported, we will focus on the Java MongoDB driver.

Invoking the ‘fsync’ command from the MongoDB drivers is as easy as executing the “fsync:1” command. In the case of the Java driver, you can use the com.mongodb.DB.command(DBObject) or com.mongodb.DB.command(String) methods. Using the String-based method is the easiest to write an example for and would look like:

DB conn = Mongo.connect(...);
conn.command("{fsync:1}");

You could imagine in any Java web app you could write a simpler utility method that fired off a sync command that you could call after any important write operation; allowing you to dictate when the server took it’s time to block and fsync or not.

For example, maybe after user-registration you would fsync to ensure (as best as you can) that the user account is safe and user-registration occurs infrequently enough that you wouldn’t be causing that many locks on the DB to impact performance. But in the case of say users commenting on something, you would avoid fsync’ing because it happens frequently and loosing a comment or two isn’t that bad.

Now the next logical comment likely on the tip of your brain is “That’s cool, but what about synchronous writing that I know is safe on the server before I continue?”

And you know, today is your lucky day, because that’s the next section!

MongoDB Synchronous Writes – getLastError

I think we have beaten to death the different ways you can ensure MongoDB keeps its internal cache’s consistent with persisted data on disk using fsync. Parallel to the concerns of fsync’ing is the idea of synchronous writes. More specifically, by default MongoDB operates in a “write and forget” mode to callers. You perform a write operation from a MongoDB driver and the call immediately returns, with you putting your trust in MongoDB that it got the write request and will successfully service it.

MongoDB behaving like this as a default is a great idea in my opinion. Assuming, by default, that MongoDB won’t explode into flames and will service your request makes sense when you consider the performance benefits.

For more important data that cannot be write-and-forgot-ted-ded, MongoDB can optionally support synchronous writes to the DB by way of the “getLastError” command. The getLastError command does exactly what you think it would: it gets the last error that occurred for the last operation issued for that connection (it operates on a per-connection scope).

Keeping a connection to the DB open and using the getLastErrror command immediately after a write has the effect of blocking until getLastError returns a result to the caller.

Different drivers implement the support for getLastError differently; the Python driver adds support for a boolean value to write operations causing the call to block automatically until complete; the Java driver (because it pools connections) requires you to tell it when you want to keep a particular connection open until you are done with it by way of the com.mongodb.DB.requestStart() marking method and matching com.mongodb.DB.requestDone() method.

REMINDER: While the requestStart and requestDone methods seem like transaction delimiters, they aren’t. They are just marker methods used by the MongoDB Java driver to ensure that subsequent calls get serviced through the same pooled MongoDB connection and not cycled over to another pooled connection.

Using the Java driver as an example, to ensure that our commands go to the same connection (as required by getLastError) our code would look something like this for a synchronous write:

DB conn = Mongo.connect(...);
 
conn.requestStart();
// At this point, all commands will be issued through
// the same connection from the internal Mongo driver
// connection pool
 
conn.getCollection(...).insert(...);
CommandResult result = conn.getLastError();
 
// Handle a failure case from result arg
conn.requestDone();
 
// Now the Java driver will stop shuffling all commands
// over that same connection and return it to the pool.

Again, I could imagine utility classes in a web app that provide synchronous writes to Mongo by wrapping these implementation details in a method like “DAO.saveUser(User user, boolean synchronous)” or something like that.

MongoDB Synchronous Writes – Write Concern

This tip is for the MongoDB Java Driver folks out there; I don’t know if the other drivers support this, but the examples below apply to the Java driver.

At both the database and collection level, MongoDB’s Java driver supports setting a WriteConcern value (values: NONE, NORMAL, STRICT) that indicates to the DB how it should implicitly handle write operations.

So if you need all your commands against a MongoDB source to be synchronous, you can simply set the WriteConcern on the entire DB connection to be com.mongodb.DB.WriteConcern.STRICT and not worry about handling requestStart() and requestDone() calls manually as mentioned above.

NOTE: If you still want to retrieve the getLastError value to process it, you will still need to use the code from the previous tip to denote to the Java driver that you want to share the same connection for all the commands so the error retrieved is appropriate for the command you issued.

You might be wondering why use WriteConcern at all if the previous tip does the same thing? Well, I think that depends on you.

If you’d prefer the driver did everything for you as far as managing your synchronicity, then use a WriteConcern setting; if you’d rather manage it via your own API calls that you can refactor and modify later, then use the previous tip.

The choice is up to you.

BONUS: “fsync’ing” MongoDB (1.5+) – across multiple servers

Our last bonus tip for this guide is not a single-server tip, but is so closely related to the topic of data durability that we decided to include it.

In the world of MongoDB it is common to have a few Mongo servers setup to run together; for example, in a Master-Slave configuration. Given that this entire guide has been about data durability, the topic of data consistency (in Mongo-land) can be the next question on your lips and we are going to give you a tip on how to manage that across multiple MongoDB nodes.

This guide so far has been talking about “fsync’ing” a single-server MongoDB’s cache to disk to persist your data (as well as synchronous writes); but, what if you wanted to effectively “fsync” your written data across multiple MongoDB nodes and not just to the disk of a single server?

Never fear, I shall tell you!

REMINDER: This tip requires MongoDB 1.5.0 or higher.

This replication acknowledgement command is actually an extension of the already-helpful getLastError command we covered above in the form of an optional argument: “w”.

Yep, just the letter “w”.

The way you use this command is to issue the getLastError command as you’ve already learned how to do, but include an additional “w” argument with a value > 2 where “w” represents the number of servers in your cluster to force the replication to, how cool is that?

An example usage would look like this:

DB conn = Mongo.connect(...);
 
conn.requestStart(); // ensure same connection
 
conn.getCollection(...).insert(...);
conn.getCollection(...).remove(...);
 
// Now force the write operation to complete across
// at least 2 nodes before returning.
conn.command("{getlasterror: 1, w: 2}");
 
conn.requestDone(); // release connection

This is one of the more powerful additions in MongoDB 1.5+, allowing developers to have direct control over how important data is handled by the servers. You could imagine in a cluster of 10 MongoDB servers, if a new user registers and you don’t want them to continue on into the system until that user account exists on all the servers, how handy this operation can become.

Again, wrapping this in a utility class to make using it easier is probably the way to go.

Update #1: Kristina from the MongoDB team has provided PHP and Perl code examples for this tip, thanks!

HELP: How can I fix a corrupted MongoDB database?

As part of this data-durability conversation around MongoDB, a sub-conversation around MongoDB database corruption and repair has sprung up.

Some take-aways from this sub-conversation is:

  • GOOD: Using CTRL-C to shutdown MongoDB is the preferred method, it produces a clean shutdown sequence (Picture by Sam Millman).
  • GOOD: Sending SIGINT or SIGTERM signals to the MongoDB process is the same as using CTRL-C; it produces a clean shutdown sequence.
  • BAD: Sending KILL or kill -9 signals to the MongoDB process will likely corrupt the database as MongoDB cannot shut down cleanly.

All that being said, in the off-chance that you have a power outage and your MongoDB database gets corrupted, the correct way to repair it is using the repair command.

You can use the repair command directly from the command line, like so:

mongod --repair

or you can execute it from inside the MongoDB shell, like so:

db.repairDatabase();

In either case, the repair operation will cycle through the disk contents and repair the corrupted portions of your file.

IMPORTANT: This can result in loss of corrupted data.

To clarify, you have no other choice. Your database is already messed up, running the repair command will get it back into a serviceable state, but that could include pruning data from your database that is corrupted. Just because you run repair doesn’t mean the world will automatically be rosy again, you have to always take 2nd and 3rd level precautions against data-loss.

To help cover anyone in a multi-server jam, in a Master/Slave setup where the Master fails and your Slave continues chugging along, you have to forget about recovering the Master and promote the Slave to the Master and then bing the old-Master back online as a Slave (switch their roles). At which point you’ll have to repair the slave to ensure the data store is in good condition, then likely issue an –autoresync command to get the Slave back up to speed, or a –fastsync if you have grabbed a disk-snapshot of the master and copied it over to the slave to use as a “starting point” for it’s snapshot.

Either way, data recovery is hairy business. That is why everyone in the MongoDB community is so excited for Replica Sets to show up in 1.6 stable. In that setup, the individual nodes are configured to fail-over between one another and automatically resync themselves either others in the cluster when they come back up; making your life a bit easier.

Conclusion

I hope you find this guide helpful. If you have any questions, spot any errors or have recommended additions for the guide, please leave a comment and I’ll take a look.

Happy (and safe) Mongo’ing!

MongoDB 1.4 Command Line Reference

Below is the full list of command line options for MongoDB 1.4 for easy reference.

 General options:
-h [ --help ]             show this usage information
--version                 show version information
-f [ --config ] arg       configuration file specifying additional options
-v [ --verbose ]          be more verbose (include multiple times for more
verbosity e.g. -vvvvv)
--quiet                   quieter output
--port arg                specify port number
--logpath arg             file to send all output to instead of stdout
--logappend               append to logpath instead of over-writing
--bind_ip arg             local ip address to bind listener - all local ips
bound by default
--dbpath arg (=/data/db/) directory for datafiles
--directoryperdb          each database will be stored in a separate directory
--repairpath arg          root directory for repair files - defaults to dbpath
--cpu                     periodically show cpu and iowait utilization
--noauth                  run without security
--auth                    run with security
--objcheck                inspect client data for validity on receipt
--quota                   enable db quota management
--quotaFiles arg          number of files allower per db, requires --quota
--appsrvpath arg          root directory for the babble app server
--nocursors               diagnostic/debugging option
--nohints                 ignore query hints
--nohttpinterface         disable http interface
--rest                    turn on simple rest api
--noscripting             disable scripting engine
--noprealloc              disable data file preallocation
--smallfiles              use a smaller default file size
--nssize arg (=16)        .ns file size (in MB) for new databases
--diaglog arg             0=off 1=W 2=R 3=both 7=W+some reads
--sysinfo                 print some diagnostic system information
--upgrade                 upgrade db if needed
--repair                  run repair on all dbs
--notablescan             do not allow table scans
--syncdelay arg (=60)     seconds between disk syncs (0 for never)
--profile arg             0=off 1=slow, 2=all
--slowms arg (=100)       value of slow for profile and console log
--maxConns arg            max number of simultaneous connections
--install                 install mongodb service
--remove                  remove mongodb service
--service                 start mongodb service
Replication options:
--master              master mode
--slave               slave mode
--source arg          when slave: specify master as <server:port>
--only arg            when slave: specify a single database to replicate
--pairwith arg        address of server to pair with
--arbiter arg         address of arbiter server
--slavedelay arg      specify delay (in seconds) to be used when applying
master ops to slave
--fastsync            indicate that this instance is starting from a dbpath
snapshot of the repl peer
--autoresync          automatically resync if slave data is stale
--oplogSize arg       size limit (in MB) for op log
--opIdMem arg         size limit (in bytes) for in memory storage of op ids
Sharding options:
--configsvr           declare this is a config db of a cluster
--shardsvr            declare this is a shard db of a cluster

Tags: , , , , , , , , , , , ,

About Riyad Kalla

Software development, video games, writing, reading and anything shiny. I ultimately just want to provide a resource that helps people and if I can't do that, then at least make them laugh.

, , , , , , , , , , , ,

21 Responses to “MongoDB (Single-Server) Data Durability Guide”

  1. Sam S July 9, 2010 at 12:08 pm #

    thanks, i was saying to myself, “wtf is fsync, I’ll have to read up on it.”

    Thanks for the guide!

    • Riyad Kalla July 9, 2010 at 12:21 pm #

      Sam, that’s excellent. That’s the only reason I wrote it, to help anyone like me that have been following these conversations and scratching my head half the time.

      • Sam S July 9, 2010 at 12:25 pm #

        I gotta admit, at first I’m like what the hell is everyone talking about? I’m still learning mongodb and how it differs from other DB technology, whether it’s nosql or mysql/postgresql but I didn’t really understand why ppl were picking on mongodb. The thread in the user group gave me a better understanding and I must admit, lots of people who have valid points but at the same time give me a break.

        Why the hell would anyone run a production server off one server? I’m a sysadmin during the day and programming is a hobby. I can tell you from a sysadmin perspective, lots of replication and backups happen for my servers. I’ll be damn if I have an app or server rely on one single server.

        Anyways, I feel much better now knowing what the pros and cons are. I’ll continue to develop using mongodb and I’m not afraid of it. When replica sets come out, it’ll only get better.

        • Riyad Kalla July 9, 2010 at 12:44 pm #

          I’m right there with you Sam; especially with commodity hardware, I couldn’t imagining launching something without at least a Master/Slave setup, even on MySQL or PostgreSQL.

          I have zero faith in hard disks now adays, but it just makes me more cautious which I think is likely a good thing.

          Replica Sets absolutely sound great and I’ll admit, I didn’t understand them/the purpose/etc. until this conversation started with Mikeal’s post — once I understood the problem (complex, multi-node durability) suddenly the solution (replica sets) dinged in my head and made all the sense in the world.

          Take care!

  2. Mathias Stearn July 9, 2010 at 12:08 pm #

    Good article, but I would like to correct a few misconceptions that could cause surprises.

    –syncdelay does not in general guarantee that you will not lose data older than 60 seconds. That will likily be the case assuming that you are only inserting to a collection and not doing updates or deletes, but even in that case we don’t make any guarantees of durability in the case of unclean shutdown.

    Calling fsync on every request doesn’t only fsync those changes, it fsyncs all changes since the last fsync, so you probably don’t want to call it every on every request. Also, it still wont guarantee consistency in the case of unclean shutdown since the server could die in the middle of the writes.

    The main take away is that unless you do a clean shutdown, mongo makes *no* guarantees of durability, even for old data. Therefore you should use replication and take backups. But you should do that for any database, including mongodb after we’ve added optional single server durability. The only difference is that the suggestion in the case of unclean shutdown will then be to do a repair rather than to blow away the dbpath and resync from a replica or restore from a backup.

    • Riyad Kalla July 9, 2010 at 12:42 pm #

      Mathias,

      Good pts.

      1. I added a reminder at the top to clarify the behavior of fsync
      2. I wasn’t sure how to clarify the –syncdelay contract; I just assume it’s a timed trigger to fire the fsync command so any durability guarantee’s from it would be in line with fsync’s — is this not the right understanding?
      3. I address the issue of a corrupted database, I added a new HELP section that covers those tips and a tip you guys helped me with for a Master/Slave setup.

      Great feedback, thank you!

  3. kristina July 9, 2010 at 12:14 pm #

    Nice! Here’s the syntax for PHP & Perl:

    PHP:

    Set w at a DB or collection level:

    $collection->w = 4;

    Then do a safe insert, as normal, at it will be replicated to that many servers (see http://www.php.net/manual/en/class.mongocollection.php).

    For fsync, use the MongoDB::command method.

    ——

    Perl:

    w is set as an field on the connection: http://search.cpan.org/~kristina/MongoDB-0.35/lib/MongoDB/Connection.pm#w. Do a normal safe insert and it will be replicated to that many servers.

    For fsync, use the MongoDB::Database::run_command method.

    • Riyad Kalla July 9, 2010 at 12:51 pm #

      Kristina you are a gem; just updated the post with the update you provided.

      Thanks!

  4. Tero July 9, 2010 at 1:16 pm #

    I have to say thank you for you, Mikael, Kristina and everyone who has take part on conversation about Mongodb durability. I really like Mongodb and as few already mentioned I have learned lot of this conversation.

    Keep going on, this is how make Mongodb even greater!

  5. El Duderino July 9, 2010 at 2:19 pm #

    Thanks for the excellent and informative post. Cheers!

    • Riyad Kalla July 9, 2010 at 4:03 pm #

      Very welcome, hopefully it was helpful!

  6. Nick August 25, 2010 at 12:44 pm #

    This is kind of retarded.
    You can’t be lacking single server durability and magically gain it by adding a second server.
    CS 101: you need a basis case!

    • Riyad Kalla August 25, 2010 at 6:44 pm #

      Nick,

      I think the term “durability” is getting overloaded here; in the general sense, yes you can absolutely increase the durability of a data store by adding redundancy to it, like this guide helps you do with a Mongo cluster and that 10gen recommends when using Mongo.

      You are probably referring to a sense of single-server durability more absolute along the lines of a transaction log like in CouchDB or MySQL, and then saying that if you don’t at least have that, adding more servers doesn’t suddenly *give* you that, which I agree with — I just think the point of the article is more broad than that.

      I would also point out that Mongo is getting single-server durability (tx append log) in 1.8, so in a few months this is all moot anyway and you’ll have multi-node cluster durability as well as single-server durability with Mongo.

      In summary, I don’t find any of this “retarded”.

      • Nick August 25, 2010 at 7:26 pm #

        I think it foolhardy to think that we can overload the meaning of the word “durability” (the “D” in ACID) and weaken its promise, especially in a data storage context.

        What mongodb should be terming it’s “durability” in the “relative sense” to which you refer, could be termed better as “availability”. Adding more servers increases availability–but as you admit, without single server durability, does not increase durability.

        There is no need to qualify durability between multiple and single-server scenarios since it is one and the same.

        In short, I think it would be beneficial if the MongoDB docs, feature map, and community communications used terminology common to data storage semantics the industry is familiar with. E.g. 1.8 has plans to add durability. (period). 1.6 currently has increased availability.

        This would lessen the confusion and all the posts from people trying to understand what exactly mongo “durability” semantics promise.

        • Riyad Kalla August 25, 2010 at 7:32 pm #

          Nick,

          I appreciate the clarification; I am no DB expert and I’m afraid it shows in my mangling of the terms.

          This post though is still about improving single-server durability in MongoDB, so at least I got that right ;)

        • Mathias Stearn August 25, 2010 at 8:01 pm #

          Considering this (admittedly extreme) example which would you consider more durable:

          1) data that has been fsynced to your laptop’s hardrive
          2) data in memory on 100 servers in data centers distributed around the world

          In the first case, your data could be wiped out by a clumsy coworker’s cup of coffee, the second would require a world-wide catastrophe to loose data.

          For a more realistic example consider EC2 instance storage. It disappears after reboots (or perhaps just halts, not sure and it doesn’t really matter here). Even software written to guarantee durability simply won’t have it in any meaningful sense if it is writing to instance storage. Compare that to a MongoDB instance storing to EBS (the persistent storage on EC2) that has a snapshot taken every minute.

          Durability is really a spectrum based on the particular risks you are concerned about. ACID-style durability only helps with power-outages and OS crashes, both of which are also coved by multi-datacenter replication.

          Note that I’m not saying that there is never a case where ACID-style durability is needed, just that there are cases where other styles of durability as a holistic property of an entire system are acceptable and perhaps preferable.

          Disclamer: I work on MongoDB

    • sam s August 25, 2010 at 6:51 pm #

      Who in their right mind would run production data on a single server? That’s like driving a car w/o a spare tire.

      • Riyad Kalla August 25, 2010 at 7:01 pm #

        Sam, I don’t think many people would now-adays with hardware accessibility where it is; but that is different from the point that Nick was making — at least the way I understood it.

        Please correct it if I misunderstood it.

  7. Tomasz Nurkiewicz April 3, 2011 at 3:20 am #

    Great article, but please correct this inconsistency: “mongod –syncdelay=5 [..] will force the server to flush [...] every 20 seconds”

Trackbacks/Pingbacks

  1. NoSQL and CMS – Comparing NoSQL with CPS Requirements « The CMS Curmudgeon - July 11, 2010

    [...] consistency and durability (at the cost of performance) – these facilities are described in this great guide. 7 CouchDB stands out in this regard, with exceptional support for geographical distribution. It [...]

  2. Innumerable Advantages Of Pex Tubing « Plumbers Hove - July 17, 2010

    [...] MongoDB (Single-Server) Data Durability Guide | The Buzz Media [...]

Leave a Reply


8 − five =