postgresql,

PostgreSQL 101: replications

Dec 14, 2022 · 4 mins read · Post a comment

In PostgreSQL, choosing a replication strategy is a big deal, depending on what you want to achieve. This can be anything from High Availability, Backup and Disaster, mullti-master and few others. Hence, there are few types of replication including: asynchronous and synchronous, physical and logical, WAL-shipping and streaming replication.

master, main and primary are used interchangeably.

Prerequisites

  • PostgreSQL

Solution

physical vs logical

In physical replication, the changes are sent as disk blocks, on disk block level. It doesn’t “care” about the content inside. This is usually done using the pg_basebackup tool, as it creates a copy of the primary server’s data files and transfer them to the standby server.

  • Pros: Faster, stable, mature, easier to setup, the most popular and no-brainer choice.
  • Cons: Less flexible than logical replication since it sends almost “everything” (data-related).

While in logical replication, follows the publish/subscriber model, where the main server (publisher) sends data changes to replicas (subscribers) on per-table basis, on a higher level than disk blocks. Sends one database at a time including certain row changes and committed transactions. Logical replication is a “newer” feature than physical replication, released as part of the PostgreSQL 9.4 version, almost 8 years ago.

  • Pros: Allows for fine-grained control over replication and security, you can even apply it in multi-master scenarios which is not possible with the physical one. It’s possible to imagine having a primary server as a testing DB, and a standby server being the production one. More flexible, since it allows the replica servers to be “picky” about which data to be replicated, and which not.
  • Cons: Not so widely adopted yet, more complex to setup and time-consuming.

sync vs async

In synchronous replication, the data must be written to both primary and replica servers. Basically, the main server will wait for all replica servers to confirm they have successfully written the data. So, the key point is that the main instance waits, hence no transaction is considered complete, until it’s confirmed by both primary and replica. The replication is done in real time, and ensures the replica servers are always up-to-date.

  • Pros: Consistent. Can’t lose data if master crashes though.
  • Cons: A bit slower, more complex to setup, has certain performance impact, and if there is more than one replica, only one is synchronous at a time.

While in the asynchronous replication, the master doesn’t wait for replicas to confirm the write operation. Replicas are allowed to fail behind a bit, a certain delay is guaranteed. However, if master fails, you’ll lose some data. If replicas fails behind the main server too much, because of a crash, reboot, or misconfigured max_wal_size (previously known as wal_keep_segments) parameter, you’ll have to deal with WAL has already been removed.

  • Pros: Faster than sync replication, because it doesn’t have the same performance impact. Easier to setup.
  • Cons: Not being consistent and not being up-to-date enough though.

WAL-shipping vs streaming

WAL-shipping, also called Write-Ahead log streaming is a replication strategy that sends these files stored in the pg_wal directory from master to one or more replica servers. The most primitive replication out there.

  • Pros: It’s a physical replication.
  • Cons: Replica servers need to be taken offline if you want to apply the shipped WAL files, so you can have up-to-date data. Also, this requires a manual task of copying and pasting these files, hence slow and prone to human errors.

Streaming replication on the other side, continuously sends the transaction logs from master to replicas as they are committed on master.

  • Pros: Physical replication. Replica servers are allowed to stay online, which makes it a perfect solution for High Availability. Continuous and steady uptime. Easily configured and managed. Changes are applied almost immediately. By default, think of it as a synchronous replication, even it can be configured to ship changes in asynchronous fashion too.
  • Cons: Can’t think of any.

Interesting enough, you can combine them both, by using streaming replication with the archive_command to send the WAL achieves. Next, on replicas, configure the restore_command in recovery.conf file in order to restore the WAL achieves in case of certain issues appear between the master and replicas.

Conclusion

Feel free to leave a comment below and if you find this tutorial useful, follow our official channel on Telegram.