Postgres-XC is an open source project to provide write-scalable, synchronous symmetric, transparent PostgreSQL cluster solution. It is a collection if tightly coupled database components which can be installed in more than one hardware or virtual machines.
Write-scalable means Postgres-XC can be configured with as many database servers as you want and handle much more writes (updating SQL statements) which single database server cannot do. Symmetric means you can have more than one data base servers which provide single database view. Synchronous means any database update from any database server is immediately visible to any other transactions running in different masters. Transparent means you don't have to worry about how your data is stored in more than one database servers internally. [1]
You can configure Postgres-XC to run on more than one machines. They store your data in a distributed way, that is, partitioned or replicated way at your choice for each table. [2] When you issue queries, Postgres-XC determines where the target data is stored and issue corresponding queries to servers with the target data.
In typical web systems, you can have as many web servers or application servers to handle your transactions. However, you cannot do this for a database server in general because all the changing data have to be visible to all the transactions. Unlike other database cluster solution, Postgres-XC provides this capability. You can install as many database servers as you like. Each database server provides uniform data view to your applications. Any database update from any server is immediately visible to applications connecting the database from other servers. This feature is called "synchronous multi master" capability and this is the most significant feature of Postgres-XC.
Ultimate goal of Postgres-XC is to provide synchronous multi-master PostgreSQL cluster with read/write scalability. That is, Postgres-XC should provide the following features:
Postgres-XC should provide multiple servers to accept transactions and statements from applications, which is known as "master" server in general. In Postgres-XC, this is called "Coordinator".
Postgres-XC should provide more than one masters.
Any "master" should provide consistent database view to applications. Any updates from any master must be visible in real time manner as if such updates are done in single PostgreSQL server.
Tables should be able to be stored in the database in replicated or distributed way (known as fragment or partition). Replication and distribution should be transparent to applications, that is, such replicated and distributed table are seen as single table and location or number of copies of each record/tuple is managed by Postgres-XC and is not visible to applications.
Postgres-XC provides compatible PostgreSQL API to applications.
Postgres-XC should provide single and unified view of underlying PostgreSQL database servers so that SQL statements does not depend on how tables are stored in distributed way.
In this section, we will show main components of Postgres-XC.
Postgres-XC is composed of three major components, called GTM (Global Transaction Manager), Coordinator and Datanode. Their features are given in the following sections.
GTM is a key component of Postgres-XC to provide consistent transaction management and tuple visibility control.
As described later in this manual, PostgreSQL's transaction management is based upon MVCC (Multi-Version Concurrency Control) technology. Postgres-XC extracts this technology into separate component as GTM so that any Postgres-XC component's transaction management is based upon single global status. Details will be described in Chapter 43.
Coordinator is an interface to applications. It acts like conventional PostgreSQL backend process. However, Coordinator does not store any actual data. Actual data is stored by Datanode as described below. Coordinator receives SQL statements, get Global Transaction Id and Global Snapshot as needed, determine which Datanode is involved and ask them to execute (a part of) statement. When issuing statement to Datanodes, it is associated with GXID and Global Snapshot so that Datanode is not confused if it receives another statement from another transaction originated by another Coordinator.
Datanode actually stores your data. Tables may be distributed among Datanodes, or replicated to all the Datanodes. Datanode does not have global view of the whole database, it just takes care of locally stored data. Incoming statement is examined by the Coordinator as described next, and rebuilt to execute at each Datanode involved. It is then transferred to each Datanodes involved together with GXID and Global Snapshot as needed. Datanode may receive request from various Coordinators. However, because each the transaction is identified uniquely and associated with consistent (global) snapshot, data node doesn't have to worry what Coordinator each transaction or statement came from.
Postgres-XC is an extension to PostgreSQL and inherits most of its features.
It is an open-source descendant of PostgreSQL and its original Berkeley code. It supports a large part of the SQL standard and offers many modern features:
Also, similar to PostgreSQL, Postgres-XC can be extended by the user in many ways, for example by adding new
And because of the liberal license same as PostgreSQL, Postgres-XC can be used, modified, and distributed by anyone free of charge for any purpose, be it private, commercial, or academic.
[1] | Of course, you should use the information how tables are stored internally when you design the database physically to get most from Postgres-XC. |
[2] | To distinguish from PostgreSQL's partitioning, we call this as "distributed". In distributed database textbooks, this is often referred to as "horizontal fragment"). |
[3] | Postgres-XC's foreign key usage has some restrictions. For details, see CREATE TABLE. |
[4] | Postgres-XC does not support trigger in the current version. This may be supported in the future releases. |