MongoDB Backups: What is mongodump?

MongoDB Backups: What is mongodump?

Onyancha Brian Henry
20 February 2020

Database backup for any organization should be an important undertaking for ensuring business continuity in any case of database catastrophe. Restoration of the operational database is only possible if there is a backup in place. Backups are achieved through database replication using either database utilities or some external setups. Database backups help administrators to restore the database to its operational state along with its data logs. Besides, backups may be created to ensure a company is compliant with organizations and government regulations that may dictate the need to access critical and essential business data after any technical outage.

Backups can also be used to ensure data integrity and consistency through close comparison of the previous backups with the current database data. In the end, this can provide directions towards fixing application errors that are leading to such.

MongoDB has three backup strategies:

  1. mongodumb, a utility bundled with the MongoDB itself.
  2. Filesystem snapshots whereby data is collected from the database files and then stored in another cloud centre.
  3. MongoDB Management Service (MMS) which is a fully managed cloud service that provides a continuous backup for MongoDB.

Every approach has its pros and cons dependending on the complexity of implementation, Recovery Time Objective, Recovery Point Objective and cost.

In this article, we will only discuss the mongodumb utility, what it entails, how to perform and recovery using this utility and lastly what merits and disadvantages is it associated with.

What is Mongodump?

mongodump is a utility bundled with the MongoDB database. It is used in creating binary exports of the database contents. It can also be downloaded as a separate tool from the MongoDB Download Center. mongodumb usage ranges from three angles:

  1. Standalone deployment.
  2. Replica set deployment.
  3. Sharded cluster deployment

For a complete backup and recovery strategy, mongodump is used alongside the mongorestore tool whereby the latter is used in the restoration process. It is also possible to have partial backups based on a collection, query, changing a replica set storage engine to a standalone or syncing from production to development environment.

mongodump cannot be  used for a sharded cluster with sharded transactions in progress especially for versions later than 4.2  due to the fact that mongodump does not maintain the atomicity guarantee of transactions across the shards. In this case one can opt to use some coordinated backup and restore processes that maintain the atomicity guarantees of transactions across shards such as MongoDB Cloud Manager, MongoDB Ops Manager and MongoDB Cloud Manager.

How mongodump and mongorestore work.

mongodumb normally creates a binary export of the selected contents. This is achieved through reading of the database and creating high fidelity BSON files which are archived as a single directory. mongorestore then uses these BSON files to populate a new MongoDB database. 

However, this backup strategy is efficient and simple for backing up small MongoDB deployments, hence not ideal for large systems backup. The reason behind this is, mongodump adversely degrades the performance of mongod instance when connected to it. If you are backing up a large data set, then system memory may become insufficient due to a large number of queries that will push the working set out of memory. Consequently this will cause page faults. Page faults are interruptions that occur when a program requests data that is not currently in real memory hence triggers the database to fetch the data from a virtual memory and load it into RAM.

mongodumb does not capture the contents of the local database but only manipulates the underlying files directly. Since the two work against the mongod instance, it is not advisable to use this backup strategy for a large data set backup.

One can also include the --oplog flag in case of backing up a replica set. The advantage of this is, the output will also include the oplog entries which capture data that occur during the mongodump operation. With this oplog.bson file in place, mongorestore can replay the captured oplog and include the corresponding data. For a backup created with --oplog flag, the mongorestore operation will include the  --oploReplay option so that the oplog can be replayed in the restoration process.

However, replica set backups are well managed with MongoDB Management Services such as Ops Manager and MongoDB Cloud Manager.

Syntax

mongodump can be run from the system command line:

mongodump [options]

mongodumb without any options connects to a local MongoDB instance running at port 27017. It then uses the default settings to export the database contents.

$mongodump

To specify a host and a port we use the --uri option if using the connection string:

$ mongodumb --uri=“mongodb://mongodb0.host.com:27017”

To specify the hostname and port in --host

$ mongodump --host=“mongodb0.host.com:27017”

To specify the hostname and port in the --host and --port 

$mongodump --host=“mongodb0.host.com” --port=“27017”

To connect to a replica set, we can use the --uri or the --host options by specifying the connection string in this flags separated by a comma:

$ mongodumb --uri=“mongodb://mongodb0.host.com:27017,mongodb1.host.com:27017,mongodb2.host.com:27017”

$ mongodumb --host=“replicaset/mongodb0.host.com:27017,replicaset/mongodb1.host.com:27017,replicaset/mongodb2.host.com:27017”

mongodump uses the primary node as the default backup system unless specified otherwise using the readPreferance option:

$ mongodumb --uri=“mongodb://mongodb0.host.com:27017,mongodb1.host.com:27017,mongodb2.host.com:27017” --readPreference=secondary.

Connecting to a sharded cluster we use either the  --uri or the --host options.

$ mongodumb --uri=“mongodb://mongos.host.com:27017”

$ mongodumb --host=“mongos.host.com:27017”

Likewise, we can specify the read preference so that mongodump does not necessarily read from the primary of the shard replica set:

$ mongodumb --host=“mongos.host.com:27017” --readPreference=secondary

mongodumb always overwrites output files that exist in the backup folder hence one should ensure you no longer need the files or rather rename the files.

mongodump does not include the contents of the local database in its output and that is why we might include the oplog to be replayed in the restoration process. 

The output captures only the documents in the database and does not include index data. Mongorestore will thus have to rebuild the indexes after restoring the data.

The WiredTiger storage engine is the commonly used from version 3.2 and mongodump backup operation results in uncompressed data which may become large enough to outdo the memory limitation. One should thus consider compressing the output files using the --gzip option.

$ mongodumb --uri=“mongodb://mongos.host.com:27017” --gzip

Additional Crucial mongodump Flag Options

--help: returns information regarding options available with the mongodumb utility.

--quiet: runs  mongodump in a quiet mode that attempts to limit output amount. It does so by suppressing: connection accepted events, connection closed events, replication activities and output from database commands.

--username: specifies the username with which to authenticate  to MongoDB database.

--password: specifies the password with which to authenticate  to MongoDB database.

--db:  specifies which database to backup from.

--collection: specifies which collection to backup from. If not included, all collections will be copied.

--query:  provides a JSON document as a query that optionally limits documents to be included in the output. It has be be used with the --collection option:

$mongodump -db=students -collection=classA -query=‘{age:{$gt:20}, height:{$lt:50}}’

--out: specifies the directory where mongodump will write BSON files for the dumped databases.

--archive: writes the output to a specified archive file or standard output if archive file is not specified:

$mongodump --archive or $ mongodump --archive=<file> respectively. Cannot be used with the --out flag.

--oplog: a file, oplog.bson is created as part of mongodump output. The file contains entries that occurred during the mongodump operation hence providing an effective point-in-time snapshot of the mongod instance state. This option works only for nodes that maintain an oplog i.e all members of a replica set.

--excludeCollection: specifies which collection will not be included in the mongodump operation.

To restore data we use the mongorestore utility. For instance, if we want to restore all data in the dump directory:

$mongorestore dump/

To restore data together with the oplog.bson data entries

$mongorestore dump/xyz.gz --gzip --oplogReplay.

The --oplogReplay option restores the database to the point-in-time backup as captured by the last   $mongodump --oplog command.

Conclusion

mongodump is backup utility bundled with MongoDB. It can be used to backup the entire database, collection or result of a query. It can also produce a consistent snapshot of the data when the oplog is included. It is a simple and straightforward approach that can be used for backups that can be filtered in accordance to one’s  needs hence addressing the Recovery-Point-Objective and Recovery-Time-Objective. However, it is only efficient for small deployment regarding that for large systems it may degrade mongo instance performance as a result of multiple queries that may result in out of memory. It exerts much load rendering it efficient for large systems. Besides, its complexity with a large sharded cluster may be very discouraging. The mongorestore utility is used to restore the data to a new existing database. For large deployments, it is advisable to implement backups using MongoDB Management Services such as MongoDB Cloud Manager.