pg_basebackup is a simple but powerful utility provided by PostgreSQL to take online and consistent file system-level backups. The backups created using this command contain the actual database files so you don’t need to actually restore these kinds of backups. You can just uncompress the backup files and copy them to the database data directory. You can use it to create the base backups of your database cluster which in turn can be used for the point in time recovery. You can also set up a standby replica database server using the streaming option of this command.
In this article, we will go through the process of creating backups using the pg_basebackup utility and also learn about various options that can be used for this task.
Backup Process for a PostgreSQL Cluster
You can run the following simple command to take the base backup of your database.
$ pg_basebackup -D /tmp/backups -x
Let’s see these options:
- -D is used to specify the path of the output directory.
- -x flag will include the transaction logs in the backup folder. These logs can be used to start the database server from the newly created base backup. This means that we can just extract the backup files anywhere and use it as a fully operational database.
Before we start taking the backup and run the above command, we need to make some changes in the postgresql.conf file so our database can run properly during the backup process. Change the following values in the postgresql.conf file.
wal_level = archive max_wal_senders = 4
Here, WAL stands for write-ahead logs (transaction logs) for PostgreSQL. By changing the value of wal_level to archive, the backup command will generate the logs in the format that are compatible with pg_basebackup and other replication-based backup tools. The value of max_wal_senders will determine how many clients can connect with the database and request the data files.
Apart from this change, we need to make one more change in the pg_hba.conf file adding the following line:
local replication postgres peer
pg_hba.conf file basically contains the access control list of connections. pg_basebackup command internally uses a replication protocol to take backups so we need to add the above line to allow local connections that can request the data files of the database.
Labeling Your Backups
It’s very important to create consistent and periodic base backups to keep your data safe. When you create multiple backups, it's always good to add labels to your database. You can do it using the following command:
$ pg_basebackup -D /tmp/backups -x -l "This backup was created at $(hostname) on $(date)"
Viewing the Backup Progress
If you want to know the approximate duration of the backup process then you can run the following command:
$ pg_basebackup -D /tmp/backups -x -P
When you provide this -P option, the backup process will start calculating the size of the full database. If your database is having active connections, some new data might be added in the database during the backup process, so this command won’t be able to predict the exact time duration.
Creating Backups from a Remote Server
This command can also be used to take the backups of remote PostgreSQL clusters. You can provide the hostname and port number by specifying the following options in your command.
$ pg_basebackup -D /tmp/backups -x -h 10.0.2.15 -p 5432 -U postgres
Creating gzipped tar Backup
If you create the backup in plain text format, the backup folder will occupy a lot of space on your server. Instead of plain text format, this command can generate backup files in tar format. Also, you can provide -z flag to compress the target backup so it will occupy even less space. Here is the command for it:
$ pg_basebackup -D /tmp/backups -x -Ft -z
Creating a Backup by Streaming the Transaction Logs
Pg_basebackup command can stream the transaction logs (WAL) in parallel while taking the backup. This is useful when some new data is being added to the database while the backup process is running. So, once the backup process is over, you can just extract the backup and start using it as a fully qualified database.
When you specify this option, the pg_basebackup command will open two connections to the server. One to request the data files and another to stream the WAL logs in parallel. So, make sure that max_wal_senders value is greater than 2 in your PostgreSQL configuration file.
$ pg_basebackup -D /tmp/backups -X stream -Ft -z
Also, If you take the backup using streaming enabled, the resulting backup will have all the data along with required WAL logs to start afresh replica server for your production database.
Recovery process of PostgreSQL cluster
First of all, stop your database server to start the recovery process.
$ pg_ctlcluster 10.6 main stop
Now, move the current database files to some other folder to keep this as a backup.
$ mv main main.backup
Copy the archived base backup files to the data directory of the database.
$ cp -r /tmp/backups main
Start the database server now to perform the recovery.
$ pg_ctlcluster 10.6 main start
If you want, you can monitor the recovery process by checking the logs file.
$ tail -f /var/log/postgresql/postgresql-10.6-main.log
Recovery process is fairly simple as we copied actual data files during the backup process. So, you just have to extract the backup files and move it to the data directory of the database.
Pg_basebackup is a handful and easy to use utility for making PostgreSQL backups. For the large size of databases, you should use pg_basebackup command to take backups as pg_dumpall will take a very long time to take a full backup. Also, pg_basebackup provides some useful options such as streaming of logs, parallel compression, etc. This makes it a more attractive tool than pg_dump or pg_dumpall utilities. Pg_basebackup can only be used to create backups of the entire database cluster, as it doesn’t support single database backups. For that purpose, you can use pg_dump utility.