Setup Streaming Replication in PostgreSQL




Below are the steps to set up streaming Replication in PostgreSQL:

Step1: Create the user in master using whichever slave should connect for streaming the WALs. This user must have REPLICATION ROLE.

/*****Syntax*******/
CREATE USER replicator
WITH REPLICATION
ENCRYPTED PASSWORD 'replicator';
/*******************/


Step2: Manage below parameters on the master are mandatory when setting up streaming replication. This is should be followed by a cluster restart

  1. archive_mode: 
    • Must be set to ON to enable archiving of WALs.
  2. wal_level : 
    • Must be at least set to hot_standby  until version 9.5
    • Must be at least set to replica in the later versions.
  3. max_wal_senders: 
    • Must be set to 3 if you are starting with one slave. 
    • For every slave, you may add 2 Wal senders.
  4. wal_keep_segments: 
    • Set the WAL retention in pg_xlog (until PostgreSQL 9.x) 
    • Set the WAL retention in pg_wal (from PostgreSQL 10). 
    • Every WAL requires 16MB of space unless you have explicitly modified the WAL segment size. 
    • You may start with 100 or more depending on the space and the amount of WAL that could be generated during a backup.
  5. archive_command : 
    • This parameter takes a shell command or external programs. 
    • It can be a simple copy command to copy the WAL segments to another location or a script that has the logic to archive the WALs to S3 or a remote backup server.
  6. listen_addresses: 
    • Specifies which IP interfaces could accept connections. 
    • You could specify all the TCP/IP addresses on which the server could listen to connections from the client. ‘*’ means all available IP interfaces. 
    • The default: localhost allows only local TCP/IP connections to be made to the Postgres server.
  7. hot_standby: 
    • Must be set to ON on standby/replica and has no effect on the master. 
    • However, when you set up your replication, parameters set on the master are automatically copied. 
    • This parameter is important to enabling READS on the slave. 
    • Otherwise, you cannot run your SELECT queries against the slave.


/*****Syntax**************************/
ALTER SYSTEM SET wal_level TO 'hot_standby';
ALTER SYSTEM SET archive_mode TO 'ON';
ALTER SYSTEM SET max_wal_senders TO '5';
ALTER SYSTEM SET wal_keep_segments TO '10';
ALTER SYSTEM SET listen_addresses TO '*';
ALTER SYSTEM SET hot_standby TO 'ON';
ALTER SYSTEM SET archive_command TO 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f';
/**************************************/


Step3:

  • Add an entry to pg_hba.conf of the master to allow replication connections from the slave. 
    • We will add below line in pg_hba.conf with then slave Ip address
    • /*****Syntax*******/
    • host replication replicator 191.164.0.20/30 md5
    • /*******************/
  • The default location of pg_hba.conf is the data directory. 
  • However, you may modify the location of this file in the file  postgresql.conf. 
  • In Ubuntu/Debian, pg_hba.conf may be located in the same directory as the postgresql.conf file by default. 
  • You can get the location of postgresql.conf in Ubuntu/Debian by calling an OS command => pg_lsclusters.

To get this changes reflected reload the cluster configuration
/*****Syntax*******/
$ pg_ctl -D $PGDATA reload
Or
$ psql -U postgres -p 5432 -c "select pg_reload_conf()"
/*******************/


Step4:

  • pg_basebackup helps us to stream the data through the  Wal sender process from the master to a slave to set up replication. 
  • Run this statement on slave to stream data directory from master to slave, Make SURE YOU CHANGE IP to your Master's IP
  • /*****Syntax*******/
  • $ pg_basebackup -h 190.160.0.20 -U replicator -p 5432 -D $PGDATA -P -Xs -R
  • /*******************/
  • We can also take a tar format backup from the master and copy that to the slave server. 

Parameters explanation:
  • -U: User with replication permission
  • -h: Hostname / IP
  • -p: Port no.
  • -R: An optional argument, it automatically creates a recovery.conf file that contains the role of the DB instance and the details of its master (Parameters standby_mode  and primary_conninfo are automatically created in the file). 
  • -D: Data directory


It is mandatory to create recovery.conf file on the slave in order to set up a streaming replication. If you are not using the backup type mentioned above, and choose to take a tar format backup on the master that can be copied to the slave, you must create this recovery.conf file manually. This recovery.conf file must exist in the data directory($PGDATA) of Slave.


Contents of the recovery.conf file is

/*****Syntax*******/
$ cat $PGDATA/recovery.conf
/*****Output*******/
standby_mode = 'on'
primary_conninfo = 'host=190.160.0.20 port=5432 user=replicator password=replicator'
restore_command = 'cp /path/to/archive/%f %p'
archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r'
/*******************/


standby_mode: It must be set to ON for slaves in Postgres.
primary_conninfo: To stream WAL data, details of the master server are configured using this parameter.


Step5:

Start your slave once the backup and restoration are completed.

  • If you have configured a backup (remotely) using the streaming method mentioned in Step 4, 
    • It just copies all the files and directories to the data directory of the slave. 
    • This means it is both a back up of the master data directory and also provides for restore in a single step.
  • If you have taken a tar back up from the master and shipped it to the slave. 
    • You must unzip/untar the back up to the slave data directory.
    • You should create a recovery.conf as mentioned in the previous step. 
    • Now we can start your PostgreSQL instance on the slave using the below command.
    • /*****Syntax*******/
    • $ pg_ctl -D $PGDATA start
    • /*******************/

Step6:

Best practice per Postgres SME's says that  "parameter - restore_command should be set appropriately". This parameter takes a shell command (or a script) that can be used to fetch the WAL needed by a slave if the WAL is not available on the master.


/*****Syntax*******/
restore_command = 'cp /mnt/server/archivedir/%f "%p"'
/*******************/

REASON FOR THIS BEST PRACTICE: 

Below issues can cause the slave to lag behind the master for a substantial time.
  • Due to network connectivity.
  • Pausing Slave server
  • Other performance issues on the server
 
It is less likely to have those WALs required by the slave available on the master’s pg_xlog or pg_wal location. Hence, it is sensible to archive the WALs to a safe location and to have the commands that are needed to restore the WAL set to restore_command parameter in the recovery.conf file of your slave. 
                   To achieve this, we have to add "restore_command " in recovery.conf file in the slave server. We can substitute the cp command with a shell command/script or a copy command that helps the slave get the appropriate WALs from the archive location.

Step7: 

To validate that PostgreSQL replication setup, we should verify Wal sender and a Wal receiver process. This should be started on the master and the slave after setting up replication. 

To check for these processes on both master and slave, we will use the below commands.


/*****Syntax on Master*******/
$ ps -eaf | grep sender
/*******************/

/*****Syntax on Slave*******/
$ ps -eaf | grep receiver
/*******************/

/*****Syntax on Slave*******/
$ ps -eaf | grep startup
/*******************/


To generate more detailed information we can run below query 


/*****Syntax on Slave*******/
select * from pg_stat_replication
/*******************/

2 comments:

  1. Thank you so much for this wonderful and unique blog post about SQL and other related services and aspects of it.This in turn is always helpful.

    SSIS Postgresql Read

    ReplyDelete
  2. In today's digital age, having a website that generates essays is a game-changer. These websites leverage advanced algorithms and AI technologies to provide users with customized essays on demand. Users can input their requirements and receive high-quality, plagiarism-free essays tailored to their needs. A website that generates essays is a valuable tool for students facing tight deadlines or those seeking inspiration and guidance. It simplifies the writing process and helps users excel in their academic endeavors.

    ReplyDelete