blog

A Comparison of MongoDB Backup Strategies

Onyancha Brian Henry

Published February 27, 2020

Database backups are as important as the database itself, especially when ensuring business continuity after a technical hitch that may destroy the database completely. Typically, a backup is just a replication of the database’s contents at that given time which can be used to restore the database to its operational state.

With a proper backup strategy, one gets peace of mind that in the event of a disaster, the data is always available for restoration of business operation. It can be a government regulatory and compliance requirement in taking a proactive approach towards safeguarding data.

Regular backups can be used to set new development and staging environments without unnecessarily impacting the production setup hence enabling fast development and smooth launch of products.

A chosen backup strategy should meet a number of backup considerations which may include:

Flexibility: are you able to select what is useful or can you backup data that has only changed.
Recovery Point Objective (RPO): This defines how much of the data you want to recover and how much you are willing to lose in the recovery process.
Recovery Time Objective (RTO): This defines how fast would you like the recovery process to be done. If a lot of data is involved then it will take longer and vice versa
To what medium should the backup be written
How frequent are the backups going to be taken and how effective the process will be not to create a large downtime for connected applications.
Are the backups going to be executed manually or automatically through a scheduled tool?
The cost of the backup process
How large in size will the backup files be. If too large, you will incur another storage cost hence you should be able to compress the backup files.
The complexity of the backup strategy when handling shards in case of MongoDB
Isolation: how can the backup files be affected in relation to database location physically and logically. For example, if the backups are close enough to the database, the likelihood of being destroyed at the same time the database is destroyed is high.

MongoDB Backup Strategies

MongoDB backup strategies vary in different scenarios depending on the type of servers, data and application constraints. The major backup strategies for MongoDB are:

Mongodump
Copying the Underlying Files
MongoDB Management Service (MMS)

Backups for MySQL are just .sql files but in the case of MongoDB backup, it will be a complete folder structure in order to maintain the exact database outlook.

Mongodump

This is an inbuilt tool within MongoDB itself that offers the capability to backup the data. Mongodump is a Logical backup approach used to export data from MongoDB into backup files which can be later restored using mongorestore tool.

Mongodump is quite flexible in the selection of what to be backed up. One can use mongodump to backup the entire database, selected collections or result of a query. It captures the documents that have been selected as a snapshot. This snapshot can be consistently produced through the dumping of the oplog.

MongoDB also provides the mongorestore utility that can be used to restore the data easily to an operating database. Considering these utilities, the Mongodump makes it easier to backup the database with an assurance that the backed up data exists without any error. Further, the strategy is flexible in that backups can be filtered based on some specific needs as mentioned that one can either backup the whole database, collection or result of a query. To meet the RTO consideration, selecting small data means the backup up recovery process will be quite fast.

However, this strategy is only best applicable for small to medium sized deployments and not for large systems. For large data systems, too much load is exerted hence becoming resources-intensive for a complete dump at each snapshot point. When more resources are diverted in order to complete this process, the database performance can be reduced especially when the backup is done in the same datacenter as the database itself. If you will be forced to add more resources, you will incur some more operational costs for your database in overall. One should resolve to more cost friendly strategies like filesystem snapshots or the MongoDB Management Service.

The complexity of deploying this strategy for a large database system with shards is comparatively high than when deploying for small configurations.

To do a mongodump backup for a database called xyz you can run the command below on the mongo shell:

$ mongodump --db xyz --out /var/backups/mongo

Mongodump also provides some flags like –archive for compressing the output folder so that you can have a reduced size of the backup i.e.

$ mongodump --db xyz --archive=./backup/xyz.gz --gzip

To do a recovery of this database, you run the mongorestore command below

$ mongorestore --gzip --archive=xyz.gz --db xyz

Depending on the Recovery Point Objective, one can increase or decrease a backup frequency to match a data recovery window through a scheduled plan

Copying the Underlying Files

This is a Filesystem-level backup approach that involves snapshotting the underlying data files at a point in time and also enables the database to cleanly restore itself using the state captured in the snapshot files taken.

Journaling has to be enabled as a way of ensuring that the snapshots are logically consistent. For a consistent snapshot of the database, one can create a snapshot of the entire file system or stop all the write operations to the database and use standard file system copy tools to copy the files.

This approach takes backups from storage level hence making it a more efficient approach than the mongodump strategy. It is suitable for large sized deployments at the expense of increased downtime for the connected applications to the database.

However, this approach is not flexible as the mongodump approach in that you cannot target a specific database, collection or result of a query. This means you will have to backup the entire database setup at the expense of large backup files and increased downtime for applications making requests to this database due to long-running backup operation.

This approach becomes more complex for a sharded cluster and coordinating backups across a number of replicas. This means you require highly expertise personnel to ensure consistency across the various setups. In the end, this means more cost in the overall operation of your database.

MongoDB Management Service (MMS)

This is a fully managed service that provides continuous online backups for MongoDB. It is very efficient for highly critical and frequently changing data. A Backup Agent is installed within the database environment to conduct basic sync to MongoDB’s redundant and secure data centers. It then streams compressed and encrypted oplog data to MMS as a continuous backup.

A snapshot and retention policy can be configured in accordance with your requirements. By default, the snapshots are taken every 6 hours and the oplog data will be retained for 24 hours. This approach is more coarse but has the flexibility for one to select which database or collection to backup.

The approach is not complex for a sharded cluster since the snapshots of the cluster is also taken after every 6 hours and the replica is retained for 24 hours hence creating a point-in-time snapshot that can be used for restoration.

Database performance is also not impacted since the MMs only reads the oplog in a similar way of adding a new node to a replica set. This means you will have no downtime for applications connected to this database. This approach can be generally valued as cost friendly since the enterprise subscriptions are not expensive.