WAL or Write-Ahead Logging as a method that enables PostgreSQL data integrity. Mostly used in replication as it enables High Availability (HA). Now, PostgreSQL replication is a broad topic to cover, so I’ll stick with the issue I’ve faced not long ago on a production environment replication server.
master and main, slave, replica and standby are used interchangeably.
As I was checking the main server error logs I’ve come across the following error message:
ERROR: requested WAL segment 00000003000000070000005D has already been removed.
- PostgreSQL Cluster
Basically, WAL files are written and sent to the standby (replica) servers in an asynchronous way. So, if you faced the same error as me, it turns out that some WAL files have been already removed from the main server before being sent to the replicas, hence you got inconsistent replica servers.
So now, you got to deal with recovering these replica servers.
Step 1. Stop the replica’s PostgreSQL service.
systemctl stop postgresql
Step 2. Just in case, create a backup from the PostgreSQL data directory. To find out the data dir path which by default is under
/var/lib/postgresql/<postgresql_version>/main, you have few options, including:
Search for any
data_directory line under
or, just run:
or, even try running a SQL query:
Step 3. Next, backup the datadir. For instance:
cd /var/lib/postgresql/10 mkdir main-backup cp -r main/* main-backup/
Step 4. Empty (YOLO) the datadir:
rm -rf /var/lib/postgresql/10/main*
Step 5. As for the “magic” restore command goes, run:
pg_basebackup -h 10.0.1.2 -U replica -Xs -P -R -D /var/lib/postgresql/10/main/
-h: master server IP address.
-U: the authentication PostgreSQL user that have permissions to execute the command.
-Xs: collect the WAL files as the backup is running in background.
-P: if you want to follow the progress.
-R: write the connection settings to a
Step 6. Cross your fingers and start the PostgreSQL service.
systemctl start postgresql
Step 7. Watch for any error logs usually saved under
You can always set
wal_keep_segments found under
postgresql.conf to some value greater than the default one. This will lead to increased number of WAL files as well as disk space.
do not try this at hime
Lastly, if you just want to mess around, haven’t tried it yet though, but if you do not care about downtime or anything at all:
- Stop the master and replica instances.
- Empty the replicas
- Copy/sync the WAL files from the main server found under
/var/lib/postgresql/<postgresql_version/main/pg_walto the inconsistent replica servers. Use rsync, scp or some alternate SSH copy utility tools.
If you have another great solution to share, feel free to leave a comment below and if you find this tutorial useful, follow our official channel on Telegram.