Postgres-XC 1.0.1 Documentation
Prev	Fast Backward	Chapter 38. PL/pgSQL - SQL Procedural Language	Fast Forward	Next

38.10. PL/pgSQL Under the Hood

Note: The following description applies both to Postgres-XC and PostgreSQL if not described explicitly.

This section discusses some implementation details that are frequently important for PL/pgSQL users to know.

38.10.1. Variable Substitution

Note: The following description applies both to Postgres-XC and PostgreSQL if not described explicitly.

SQL statements and expressions within a PL/pgSQL function can refer to variables and parameters of the function. Behind the scenes, PL/pgSQL substitutes query parameters for such references. Parameters will only be substituted in places where a parameter or column reference is syntactically allowed. As an extreme case, consider this example of poor programming style:

INSERT INTO foo (foo) VALUES (foo);

The first occurrence of foo must syntactically be a table name, so it will not be substituted, even if the function has a variable named foo. The second occurrence must be the name of a column of the table, so it will not be substituted either. Only the third occurrence is a candidate to be a reference to the function's variable.

Note: PostgreSQL versions before 9.0 would try to substitute the variable in all three cases, leading to syntax errors.

Since the names of variables are syntactically no different from the names of table columns, there can be ambiguity in statements that also refer to tables: is a given name meant to refer to a table column, or a variable? Let's change the previous example to

INSERT INTO dest (col) SELECT foo + bar FROM src;

Here, dest and src must be table names, and col must be a column of dest, but foo and bar might reasonably be either variables of the function or columns of src.

By default, PL/pgSQL will report an error if a name in a SQL statement could refer to either a variable or a table column. You can fix such a problem by renaming the variable or column, or by qualifying the ambiguous reference, or by telling PL/pgSQL which interpretation to prefer.

The simplest solution is to rename the variable or column. A common coding rule is to use a different naming convention for PL/pgSQL variables than you use for column names. For example, if you consistently name function variables v_something while none of your column names start with v_, no conflicts will occur.

Alternatively you can qualify ambiguous references to make them clear. In the above example, src.foo would be an unambiguous reference to the table column. To create an unambiguous reference to a variable, declare it in a labeled block and use the block's label (see Section 38.2). For example,

<<block>>
DECLARE
    foo int;
BEGIN
    foo := ...;
    INSERT INTO dest (col) SELECT block.foo + bar FROM src;

Here block.foo means the variable even if there is a column foo in src. Function parameters, as well as special variables such as FOUND, can be qualified by the function's name, because they are implicitly declared in an outer block labeled with the function's name.

Sometimes it is impractical to fix all the ambiguous references in a large body of PL/pgSQL code. In such cases you can specify that PL/pgSQL should resolve ambiguous references as the variable (which is compatible with PL/pgSQL's behavior before PostgreSQL 9.0), or as the table column (which is compatible with some other systems such as Oracle).

To change this behavior on a system-wide basis, set the configuration parameter plpgsql.variable_conflict to one of error, use_variable, or use_column (where error is the factory default). This parameter affects subsequent compilations of statements in PL/pgSQL functions, but not statements already compiled in the current session. To set the parameter before PL/pgSQL has been loaded, it is necessary to have added "plpgsql" to the custom_variable_classes list in postgresql.conf. Because changing this setting can cause unexpected changes in the behavior of PL/pgSQL functions, it can only be changed by a superuser.

You can also set the behavior on a function-by-function basis, by inserting one of these special commands at the start of the function text:

#variable_conflict error
#variable_conflict use_variable
#variable_conflict use_column

These commands affect only the function they are written in, and override the setting of plpgsql.variable_conflict. An example is

CREATE FUNCTION stamp_user(id int, comment text) RETURNS void AS $$
    #variable_conflict use_variable
    DECLARE
        curtime timestamp := now();
    BEGIN
        UPDATE users SET last_modified = curtime, comment = comment
          WHERE users.id = id;
    END;
$$ LANGUAGE plpgsql;

In the UPDATE command, curtime, comment, and id will refer to the function's variable and parameters whether or not users has columns of those names. Notice that we had to qualify the reference to users.id in the WHERE clause to make it refer to the table column. But we did not have to qualify the reference to comment as a target in the UPDATE list, because syntactically that must be a column of users. We could write the same function without depending on the variable_conflict setting in this way:

CREATE FUNCTION stamp_user(id int, comment text) RETURNS void AS $$
    <<fn>>
    DECLARE
        curtime timestamp := now();
    BEGIN
        UPDATE users SET last_modified = fn.curtime, comment = stamp_user.comment
          WHERE users.id = stamp_user.id;
    END;
$$ LANGUAGE plpgsql;

Variable substitution does not happen in the command string given to EXECUTE or one of its variants. If you need to insert a varying value into such a command, do so as part of constructing the string value, or use USING, as illustrated in Section 38.5.4.

Variable substitution currently works only in SELECT, INSERT, UPDATE, and DELETE commands, because the main SQL engine allows query parameters only in these commands. To use a non-constant name or value in other statement types (generically called utility statements), you must construct the utility statement as a string and EXECUTE it.

38.10.2. Plan Caching

Note: The following description applies both to Postgres-XC and PostgreSQL if not described explicitly.

The PL/pgSQL interpreter parses the function's source text and produces an internal binary instruction tree the first time the function is called (within each session). The instruction tree fully translates the PL/pgSQL statement structure, but individual SQL expressions and SQL commands used in the function are not translated immediately.

As each expression and SQL command is first executed in the function, the PL/pgSQL interpreter creates a prepared execution plan (using the SPI manager's SPI_prepare and SPI_saveplan functions). Subsequent visits to that expression or command reuse the prepared plan. Thus, a function with conditional code that contains many statements for which execution plans might be required will only prepare and save those plans that are really used during the lifetime of the database connection. This can substantially reduce the total amount of time required to parse and generate execution plans for the statements in a PL/pgSQL function. A disadvantage is that errors in a specific expression or command cannot be detected until that part of the function is reached in execution. (Trivial syntax errors will be detected during the initial parsing pass, but anything deeper will not be detected until execution.)

A saved plan will be re-planned automatically if there is any schema change to any table used in the query, or if any user-defined function used in the query is redefined. This makes the re-use of prepared plans transparent in most cases, but there are corner cases where a stale plan might be re-used. An example is that dropping and re-creating a user-defined operator won't affect already-cached plans; they'll continue to call the original operator's underlying function, if that has not been changed. When necessary, the cache can be flushed by starting a fresh database session.

Because PL/pgSQL saves execution plans in this way, SQL commands that appear directly in a PL/pgSQL function must refer to the same tables and columns on every execution; that is, you cannot use a parameter as the name of a table or column in an SQL command. To get around this restriction, you can construct dynamic commands using the PL/pgSQL EXECUTE statement — at the price of constructing a new execution plan on every execution.

Another important point is that the prepared plans are parameterized to allow the values of PL/pgSQL variables to change from one use to the next, as discussed in detail above. Sometimes this means that a plan is less efficient than it would be if generated for a specific variable value. As an example, consider

SELECT * INTO myrec FROM dictionary WHERE word LIKE search_term;

where search_term is a PL/pgSQL variable. The cached plan for this query will never use an index on word, since the planner cannot assume that the LIKE pattern will be left-anchored at run time. To use an index the query must be planned with a specific constant LIKE pattern provided. This is another situation where EXECUTE can be used to force a new plan to be generated for each execution.

The mutable nature of record variables presents another problem in this connection. When fields of a record variable are used in expressions or statements, the data types of the fields must not change from one call of the function to the next, since each expression will be planned using the data type that is present when the expression is first reached. EXECUTE can be used to get around this problem when necessary.

If the same function is used as a trigger for more than one table, PL/pgSQL prepares and caches plans independently for each such table — that is, there is a cache for each trigger function and table combination, not just for each function. This alleviates some of the problems with varying data types; for instance, a trigger function will be able to work successfully with a column named key even if it happens to have different types in different tables.

Likewise, functions having polymorphic argument types have a separate plan cache for each combination of actual argument types they have been invoked for, so that data type differences do not cause unexpected failures.

Plan caching can sometimes have surprising effects on the interpretation of time-sensitive values. For example there is a difference between what these two functions do:

CREATE FUNCTION logfunc1(logtxt text) RETURNS void AS $$
    BEGIN
        INSERT INTO logtable VALUES (logtxt, 'now');
    END;
$$ LANGUAGE plpgsql;

and:

CREATE FUNCTION logfunc2(logtxt text) RETURNS void AS $$
    DECLARE
        curtime timestamp;
    BEGIN
        curtime := 'now';
        INSERT INTO logtable VALUES (logtxt, curtime);
    END;
$$ LANGUAGE plpgsql;

In the case of logfunc1, the Postgres-XC main parser knows when preparing the plan for the INSERT that the string 'now' should be interpreted as timestamp, because the target column of logtable is of that type. Thus, 'now' will be converted to a constant when the INSERT is planned, and then used in all invocations of logfunc1 during the lifetime of the session. Needless to say, this isn't what the programmer wanted.

In the case of logfunc2, the Postgres-XC main parser does not know what type 'now' should become and therefore it returns a data value of type text containing the string now. During the ensuing assignment to the local variable curtime, the PL/pgSQL interpreter casts this string to the timestamp type by calling the text_out and timestamp_in functions for the conversion. So, the computed time stamp is updated on each execution as the programmer expects.

Prev	Home	Next
Trigger Procedures	Up	Tips for Developing in PL/pgSQL