Postgres-XC 1.2devel Documentation | ||||
---|---|---|---|---|
Prev | Fast Backward | Chapter 12. Full Text Search | Fast Forward | Next |
Note: The following description applies both to Postgres-XC and PostgreSQL if not described explicitly. You can read PostgreSQL as Postgres-XC except for version number, which is specific to each product.
A text search configuration specifies all options necessary to transform a
document into a tsvector: the parser to use to break text
into tokens, and the dictionaries to use to transform each token into a
lexeme. Every call of
to_tsvector
or to_tsquery
needs a text search configuration to perform its processing.
The configuration parameter
default_text_search_config
specifies the name of the default configuration, which is the
one used by text search functions if an explicit configuration
parameter is omitted.
It can be set in postgresql.conf, or set for an
individual session using the SET command.
Several predefined text search configurations are available, and you can create custom configurations easily. To facilitate management of text search objects, a set of SQL commands is available, and there are several psql commands that display information about text search objects (Section 12.10).
As an example we will create a configuration pg, starting by duplicating the built-in english configuration:
CREATE TEXT SEARCH CONFIGURATION public.pg ( COPY = pg_catalog.english );
We will use a PostgreSQL-specific synonym list and store it in $SHAREDIR/tsearch_data/pg_dict.syn. The file contents look like:
postgres pg pgsql pg postgresql pg
We define the synonym dictionary like this:
CREATE TEXT SEARCH DICTIONARY pg_dict ( TEMPLATE = synonym, SYNONYMS = pg_dict );
Next we register the Ispell dictionary english_ispell, which has its own configuration files:
CREATE TEXT SEARCH DICTIONARY english_ispell ( TEMPLATE = ispell, DictFile = english, AffFile = english, StopWords = english );
Now we can set up the mappings for words in configuration pg:
ALTER TEXT SEARCH CONFIGURATION pg ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH pg_dict, english_ispell, english_stem;
We choose not to index or search some token types that the built-in configuration does handle:
ALTER TEXT SEARCH CONFIGURATION pg DROP MAPPING FOR email, url, url_path, sfloat, float;
Now we can test our configuration:
SELECT * FROM ts_debug('public.pg', ' PostgreSQL, the highly scalable, SQL compliant, open source object-relational database management system, is now undergoing beta testing of the next version of our software. ');
The next step is to set the session to use the new configuration, which was created in the public schema:
=> \dF List of text search configurations Schema | Name | Description ---------+------+------------- public | pg | SET default_text_search_config = 'public.pg'; SET SHOW default_text_search_config; default_text_search_config ---------------------------- public.pg