A Comparison of MongoDB Backup Strategies

A Comparison of MongoDB Backup Strategies

Onyancha Brian Henry
27 February 2020

Database backups are as important as the database itself, especially when ensuring business continuity after a technical hitch that may destroy the database completely. Typically, a backup is just a replication of the database’s contents at that given time which can be used to restore the database to its operational state.

With a proper backup strategy, one gets peace of mind that in the event of a disaster, the data is always available for restoration of business operation. It can be a government regulatory and compliance requirement in taking a proactive approach towards safeguarding data.

Regular backups can be used to set new development and staging environments without unnecessarily impacting the production setup hence enabling fast development and smooth launch of products. 

A chosen backup strategy should meet a number of backup considerations which may include:

  • Flexibility: are you able to select what is useful or can you backup data that has only changed.
  • Recovery Point Objective (RPO): This defines how much of the data you want to recover and how much you are willing to lose in the recovery process.
  • Recovery Time Objective (RTO): This defines how fast would you like the recovery process to be done. If a lot of data is involved then it will take longer and vice versa
  • To what medium should the backup be written
  • How frequent are the backups going to be taken and how effective the process will be not to create a large downtime for connected applications.
  • Are the backups going to be executed manually or automatically through a scheduled tool?
  • The cost of the backup process
  • How large in size will the backup files be. If too large, you will incur another storage cost hence you should be able to compress the backup files.
  • The complexity of the backup strategy when handling  shards in case of MongoDB
  • Isolation: how can the backup files be affected in relation to database location physically and logically. For example, if the backups are close enough to the database, the likelihood of being destroyed at the same time the database is destroyed is high.

MongoDB Backup Strategies

MongoDB backup strategies vary in different scenarios depending on the type of servers, data and application constraints. The major backup strategies for MongoDB are:

  1. Mongodump
  2. Copying the Underlying Files
  3. MongoDB Management Service (MMS)

Backups for MySQL are just .sql files but in the case of MongoDB backup, it will be a complete folder structure in order to maintain the exact database outlook.

Mongodump

This is an inbuilt tool within MongoDB itself that offers the capability to backup the data. Mongodump is a Logical backup approach used to export data from MongoDB into backup files which can be later restored using mongorestore tool.

Mongodump is quite flexible in the selection of what to be backed up. One can use mongodump to backup the entire database, selected collections or result of a query. It captures the documents that have been selected as a snapshot. This snapshot can be consistently produced through the dumping of the oplog. 

MongoDB also provides the mongorestore utility that can be used to restore the data easily to an operating database. Considering these utilities, the Mongodump makes it easier to backup the database with an assurance that the backed up data exists without any error. Further, the strategy is flexible in that backups can be filtered based on some specific needs as mentioned that one can either backup the whole database, collection or result of a query. To meet the RTO consideration, selecting small data means the backup up recovery process will be quite fast. 

However, this strategy is only best applicable for small to medium sized deployments and not for large systems. For large data systems, too much load is exerted hence becoming resources-intensive for a complete dump at each snapshot point. When more resources are diverted in order to complete this process, the database performance can be reduced especially when the backup is done in the same datacenter as the database itself. If you will be forced to add more resources, you will incur some more operational costs for your database in overall. One should resolve to more cost friendly strategies like filesystem snapshots or the MongoDB Management Service.

The complexity of deploying this strategy for a large database system with shards is comparatively high than when deploying for small configurations.

To do a mongodump backup for a database called xyz you can run the command below on the mongo shell:

$ mongodump --db xyz --out /var/backups/mongo

Mongodump also provides some flags like --archive for compressing the output folder so that you can have a reduced size of the backup i.e.

$ mongodump --db xyz --archive=./backup/xyz.gz --gzip

To do a recovery of this database, you run the mongorestore command below

$ mongorestore --gzip --archive=xyz.gz --db xyz

Depending on the Recovery Point Objective, one can increase or decrease a backup frequency to match a data recovery window through a scheduled plan

Copying the Underlying Files

This is a Filesystem-level backup approach that involves snapshotting the underlying data files at a point in time and also enables the database to cleanly restore itself using the state captured in the snapshot files taken.

Journaling has to be enabled as a way of ensuring that the snapshots are logically consistent. For a consistent snapshot of the database, one can create a snapshot of the entire file system or stop all the write operations to the database and use standard file system copy tools to copy the files. 

This approach takes backups from storage level hence making it a more efficient approach than the mongodump strategy. It is suitable for large sized deployments at the expense of increased downtime for the connected applications to the database. 

However, this approach is not flexible as the mongodump approach in that you cannot target a specific database, collection or result of a query. This means you will have to backup the entire database setup at the expense of large backup files and increased downtime for applications making requests to this database due to long-running backup operation.

This approach becomes more complex for a sharded cluster and coordinating backups across a number of replicas. This means you require highly expertise personnel to ensure consistency across the various setups. In the end, this means more cost in the overall operation of your database.

MongoDB Management Service (MMS)

This is a fully managed service that provides continuous online backups for MongoDB. It is very efficient for highly critical and frequently changing data. A Backup Agent is installed within the database environment to conduct basic sync to MongoDB’s redundant and secure data centers. It then streams compressed and encrypted oplog data to MMS as a continuous backup.

A snapshot and retention policy can be configured in accordance with your requirements. By default, the snapshots are taken every 6 hours and the oplog data will be retained for 24 hours. This approach is more coarse but has the flexibility for one to select which database or collection to backup.

The approach is not complex for a sharded cluster since the snapshots of the cluster is also taken after every 6 hours and the replica is retained for 24 hours hence creating a point-in-time snapshot that can be used for restoration.

Database performance is also not impacted since the MMs only reads the oplog in a similar way of adding a new node to a replica set. This means you will have no downtime for applications connected to this database. This approach can be generally valued as cost friendly since the enterprise subscriptions are not expensive.

A Comparison of MongoDB Backup Strategies

The table below draws some of the comparisons between the approaches described above towards directing you in which approach to go for.

Objective

Mongodump

Copying underlying files

MMS

Flexibility

More flexible such that one can decide which database, collection or result of a query to backup

Very coarse approach making it impossible to target a specific database or collection to backup

Quite flexible hence one is able to exclude non-mission critical databases and collections.

Recovery Point Objective (RPO)

Due to its flexibility, one has the allowance to meet the desired RPO in that you can backup the only wanted data.

Limited to snapshot moments hence makes it difficult to meet the RPO. All the database data has to be backed up

Limited to snapshot moments but due to lower overhead, frequent backups can be made.

Recovery Time Objective (RPO)

Because one can meet the RPO objective the recovery time will be directly proportional to the amount of data that need to be restored

RTO is dependent on the overall database size. If the database has a large set of data, then it will take longer to backup and increase applications downtime

Backups can meet the RPO objective hence the recovery time is dependent on the size of the selected data.

Database performance

Not appropriate for large deployments because it becomes resource intensive.   

Impacted due to the fact that all read operations have to be stopped and one can lose some data during the downtime

Does not impact database performance since the backups are made from reading the oplogs

Isolation

Not dependent on how far the backups are from the production database

It depends on how far the backup snapshots are kept from the production database.

It depends on how far the backup snapshots are kept from the production database.

Deployment Cost

Becomes more expensive for large deployments.

Relatively cheaper for large deployments.

Quite cheap for large and small deployments since subscriptions are dependent on data size.

Maintenance Cost

Relatively cheap since the process does not require technical skills to perform.

Complexity with shards makes it more expensive in hiring highly skilled personnel to maintain the cluster

Not expensive to maintain a sharded cluster due to a matching consistency of the snapshots

Shard Deployment

Complex especially when dealing with a large set.

Complex especially when dealing with a large set.

Not quite complex when making a backup for a sharded cluster.

Conclusion

Backups are essential in safeguarding data and ensuring business continuity after a business failure due to data loss and also meeting some government regulatory guidelines. MongoDB provides three major strategies to backing up data: Mongodump, Copying the Underlying Files and Mongo Management Service (MMS). Selection of which strategy is dependent on factors like cost of maintenance, Shards deployment complexity, flexibility and database performance impact.