24.4. Alternative Method for Log Shipping

Note: At present, this section is just taken from PostgreSQL documentation and is subject to revision for Postgres-XC.

An alternative to the built-in standby mode described in the previous sections is to use a restore_command that polls the archive location. This was the only option available in versions 8.4 and below. In this setup, set standby_mode off, because you are implementing the polling required for standby operation yourself. See the pg_standby module for a reference implementation of this.

Note that in this mode, the server will apply WAL one file at a time, so if you use the standby server for queries (see Hot Standby), there is a delay between an action in the master and when the action becomes visible in the standby, corresponding the time it takes to fill up the WAL file. archive_timeout can be used to make that delay shorter. Also note that you can't combine streaming replication with this method.

The operations that occur on both primary and standby servers are normal continuous archiving and recovery tasks. The only point of contact between the two database servers is the archive of WAL files that both share: primary writing to the archive, standby reading from the archive. Care must be taken to ensure that WAL archives from separate primary servers do not become mixed together or confused. The archive need not be large if it is only required for standby operation.

The magic that makes the two loosely coupled servers work together is simply a restore_command used on the standby that, when asked for the next WAL file, waits for it to become available from the primary. The restore_command is specified in the recovery.conf file on the standby server. Normal recovery processing would request a file from the WAL archive, reporting failure if the file was unavailable. For standby processing it is normal for the next WAL file to be unavailable, so the standby must wait for it to appear. For files ending in .backup or .history there is no need to wait, and a non-zero return code must be returned. A waiting restore_command can be written as a custom script that loops after polling for the existence of the next WAL file. There must also be some way to trigger failover, which should interrupt the restore_command, break the loop and return a file-not-found error to the standby server. This ends recovery and the standby will then come up as a normal server.

Pseudocode for a suitable restore_command is:

triggered = false;
while (!NextWALFileReady() && !triggered)
{
    sleep(100000L);         /* wait for ~0.1 sec */
    if (CheckForExternalTrigger())
        triggered = true;
}
if (!triggered)
        CopyWALFileForRecovery();

A working example of a waiting restore_command is provided in the pg_standby module. It should be used as a reference on how to correctly implement the logic described above. It can also be extended as needed to support specific configurations and environments.

The method for triggering failover is an important part of planning and design. One potential option is the restore_command command. It is executed once for each WAL file, but the process running the restore_command is created and dies for each file, so there is no daemon or server process, and signals or a signal handler cannot be used. Therefore, the restore_command is not suitable to trigger failover. It is possible to use a simple timeout facility, especially if used in conjunction with a known archive_timeout setting on the primary. However, this is somewhat error prone since a network problem or busy primary server might be sufficient to initiate failover. A notification mechanism such as the explicit creation of a trigger file is ideal, if this can be arranged.

24.4.1. Implementation

Note: At present, this section is just taken from PostgreSQL documentation and is subject to revision for Postgres-XC.

The short procedure for configuring a standby server using this alternative method is as follows. For full details of each step, refer to previous sections as noted.

  1. Set up primary and standby systems as nearly identical as possible, including two identical copies of PostgreSQL at the same release level.

  2. Set up continuous archiving from the primary to a WAL archive directory on the standby server. Ensure that archive_mode, archive_command and archive_timeout are set appropriately on the primary (see Section 23.3.1).

  3. Make a base backup of the primary server (see Section 23.3.2), and load this data onto the standby.

  4. Begin recovery on the standby server from the local WAL archive, using a recovery.conf that specifies a restore_command that waits as described previously (see Section 23.3.3).

Recovery treats the WAL archive as read-only, so once a WAL file has been copied to the standby system it can be copied to tape at the same time as it is being read by the standby database server. Thus, running a standby server for high availability can be performed at the same time as files are stored for longer term disaster recovery purposes.

For testing purposes, it is possible to run both primary and standby servers on the same system. This does not provide any worthwhile improvement in server robustness, nor would it be described as HA.

24.4.2. Record-based Log Shipping

Note: The following description applies only to PostgreSQL

Streaming replication has not been tested with Postgres-XC yet. Because this version of streaming replication is based upon asynchronous log shipping, there could be a risk to have the status of coordinators and datanodes inconsistent. The development team leaves the test and the use of this entirely to users.

It is also possible to implement record-based log shipping using this alternative method, though this requires custom development, and changes will still only become visible to hot standby queries after a full WAL file has been shipped.

An external program can call the pg_xlogfile_name_offset() function (see Section 9.23) to find out the file name and the exact byte offset within it of the current end of WAL. It can then access the WAL file directly and copy the data from the last known end of WAL through the current end over to the standby servers. With this approach, the window for data loss is the polling cycle time of the copying program, which can be very small, and there is no wasted bandwidth from forcing partially-used segment files to be archived. Note that the standby servers' restore_command scripts can only deal with whole WAL files, so the incrementally copied data is not ordinarily made available to the standby servers. It is of use only when the primary dies — then the last partial WAL file is fed to the standby before allowing it to come up. The correct implementation of this process requires cooperation of the restore_command script with the data copying program.

Starting with PostgreSQL version 9.0, you can use streaming replication (see Section 24.2.5) to achieve the same benefits with less effort.