Hi Arthur,
Thank you for the review.
On Thu, 21 Dec 2017 17:46:42 +0300
Arthur Zakirov <a.zaki...@postgrespro.ru> wrote:
> I noticed that there are typos in the documentation. And I think it
> is necessary to keep information about previous sintax. The syntax
> will be supported anyway. For example, information about TSL_FILTER
> was removed. And it will be good to add more examples of the new
> syntax.
In the current version of the patch, configurations written in old
syntax are rewritten into the same configuration in the new syntax.
Since new syntax doesn't support a TSL_FILTER, it was removed from the
documentation. It is possible to store configurations written in old
syntax in a special way and simulate a TSL_FILTER behavior for them.
But it will lead to maintenance of two different behavior of the FTS
depends on a version of the syntax used during configuration. Do you
think we should keep both behaviors?
> The result of ts_debug() function was changed. Is it possible to keep
> the old ts_debug() result? To be specific, 'dictionaries' column is
> text now, not array, as I understood. It will be good to keep the
> result for the sake of backward compatibility.
Columns' 'dictionaries' and 'dictionary' type were changed to text
because after the patch the configuration may be not a plain array of
dictionaries but a complex expression tree. In the column
'dictionaries' the result is textual representation of configuration
and it is the same as a result of \dF+ description of the configuration.
I decide to rename newly added column to 'configuration' and keep
column 'dictionaries' with an array of all dictionaries used in
configuration (no matter how). Also, I fixed a bug in 'command' output
of the ts_debug in some cases.
Additionally, I added some examples to documentation regarding
multilingual search and combination of exact and linguistic-aware
search and fixed typos.
diff --git a/doc/src/sgml/ref/alter_tsconfig.sgml b/doc/src/sgml/ref/alter_tsconfig.sgml
index ebe0b94b27..a1f483e10b 100644
--- a/doc/src/sgml/ref/alter_tsconfig.sgml
+++ b/doc/src/sgml/ref/alter_tsconfig.sgml
@@ -21,8 +21,12 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
+ALTER TEXT SEARCH CONFIGURATION <replaceable>name</replaceable>
+ ADD MAPPING FOR <replaceable class="parameter">token_type</replaceable> [, ... ] WITH <replaceable class="parameter">config</replaceable>
ALTER TEXT SEARCH CONFIGURATION <replaceable>name</replaceable>
ADD MAPPING FOR <replaceable class="parameter">token_type</replaceable> [, ... ] WITH <replaceable class="parameter">dictionary_name</replaceable> [, ... ]
+ALTER TEXT SEARCH CONFIGURATION <replaceable>name</replaceable>
+ ALTER MAPPING FOR <replaceable class="parameter">token_type</replaceable> [, ... ] WITH <replaceable class="parameter">config</replaceable>
ALTER TEXT SEARCH CONFIGURATION <replaceable>name</replaceable>
ALTER MAPPING FOR <replaceable class="parameter">token_type</replaceable> [, ... ] WITH <replaceable class="parameter">dictionary_name</replaceable> [, ... ]
ALTER TEXT SEARCH CONFIGURATION <replaceable>name</replaceable>
@@ -88,6 +92,17 @@ ALTER TEXT SEARCH CONFIGURATION <replaceable>name</replaceable> SET SCHEMA <repl
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><replaceable class="parameter">config</replaceable></term>
+ <listitem>
+ <para>
+ The dictionaries tree expression. The dictionary expression
+ is a triple of condition/command/else that define way to process
+ the text. The <literal>ELSE</literal> part is optional.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><replaceable class="parameter">old_dictionary</replaceable></term>
<listitem>
@@ -133,7 +148,7 @@ ALTER TEXT SEARCH CONFIGURATION <replaceable>name</replaceable> SET SCHEMA <repl
</para>
</listitem>
</varlistentry>
- </variablelist>
+ </variablelist>
<para>
The <literal>ADD MAPPING FOR</literal> form installs a list of dictionaries to be
@@ -154,6 +169,53 @@ ALTER TEXT SEARCH CONFIGURATION <replaceable>name</replaceable> SET SCHEMA <repl
</refsect1>
+ <refsect1>
+ <title>Dictionaries Map Config</title>
+
+ <refsect2>
+ <title>Format</title>
+ <para>
+ Formally <replaceable class="parameter">config</replaceable> is one of:
+ </para>
+ <programlisting>
+ * dictionary_name
+
+ * config { UNION | INTERSECT | EXCEPT | MAP } config
+
+ * CASE config
+ WHEN [ NO ] MATCH THEN { KEEP | config }
+ [ ELSE config ]
+ END
+ </programlisting>
+ </refsect2>
+
+ <refsect2>
+ <title>Description</title>
+ <para>
+ <replaceable class="parameter">config</replaceable> can be used
+ in three different formats. The most simple format is name of dictionary to
+ use for tokens processing.
+ </para>
+ <para>
+ In order to use more than one dictionary
+ simultaneously user should interconnect dictionaries by operators. Operators
+ <literal>UNION</literal>, <literal>EXCEPT</literal> and
+ <literal>INTERSECT</literal> have same meaning as in operations on sets.
+ Special operator <literal>MAP</literal> gets output of left subexpression
+ and uses it as an input to right subexpression.
+ </para>
+ <para>
+ The third format of <replaceable class="parameter">config</replaceable> is similar to
+ <literal>CASE/WHEN/THEN/ELSE</literal> structure. It's consists of three
+ replaceable parts. First one is configuration which is used to construct lexemes set
+ for matching condition. If the condition is triggered, the command is executed.
+ Use command <literal>KEEP</literal> to avoid repeating of the same
+ configuration in condition and command part. However, command may differ from
+ the condition. The <literal>ELSE</literal> branch is executed otherwise.
+ </para>
+ </refsect2>
+ </refsect1>
+
<refsect1>
<title>Examples</title>
@@ -167,6 +229,34 @@ ALTER TEXT SEARCH CONFIGURATION <replaceable>name</replaceable> SET SCHEMA <repl
ALTER TEXT SEARCH CONFIGURATION my_config
ALTER MAPPING REPLACE english WITH swedish;
</programlisting>
+
+ <para>
+ Next example shows how to analyse documents in both English and German languages.
+ <literal>english_hunspell</literal> and <literal>german_hunspell</literal>
+ return result only if a word is recognized. Otherwise, stemmer dictionaries
+ are used to process a token.
+ </para>
+
+<programlisting>
+ALTER TEXT SEARCH CONFIGURATION my_config
+ ALTER MAPPING FOR asciiword, word WITH
+ CASE english_hunspell WHEN MATCH THEN KEEP ELSE english_stem END
+ UNION
+ CASE german_hunspell WHEN MATCH THEN KEEP ELSE german_stem END;
+</programlisting>
+
+ <para>
+ In order to combine search for both exact and processed forms the vector
+ should contain lexemes produced by <literal>simple</literal> for exact form
+ of the word as well as lexemes produced by linguistic-aware dictionary
+ (e.g. <literal>english_stem</literal>) for processed forms.
+ </para>
+
+<programlisting>
+ALTER TEXT SEARCH CONFIGURATION my_config
+ ALTER MAPPING FOR asciiword, word WITH english_stem UNION simple;
+</programlisting>
+
</refsect1>
<refsect1>
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index 4dc52ec983..049c3fcff6 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -732,10 +732,11 @@ SELECT to_tsvector('english', 'a fat cat sat on a mat - it ate a fat rats');
The <function>to_tsvector</function> function internally calls a parser
which breaks the document text into tokens and assigns a type to
each token. For each token, a list of
- dictionaries (<xref linkend="textsearch-dictionaries"/>) is consulted,
- where the list can vary depending on the token type. The first dictionary
- that <firstterm>recognizes</firstterm> the token emits one or more normalized
- <firstterm>lexemes</firstterm> to represent the token. For example,
+ condition/command pairs is consulted, where the list can vary depending
+ on the token type, condition and command are expressions on dictionaries
+ with matching clause in condition(<xref linkend="textsearch-dictionaries"/>).
+ The first command combined with true-resulted condition emits one or more normalized
+ <firstterm>lexemes</firstterm> to represent the token. For example,
<literal>rats</literal> became <literal>rat</literal> because one of the
dictionaries recognized that the word <literal>rats</literal> is a plural
form of <literal>rat</literal>. Some words are recognized as
@@ -743,7 +744,7 @@ SELECT to_tsvector('english', 'a fat cat sat on a mat - it ate a fat rats');
causes them to be ignored since they occur too frequently to be useful in
searching. In our example these are
<literal>a</literal>, <literal>on</literal>, and <literal>it</literal>.
- If no dictionary in the list recognizes the token then it is also ignored.
+ If none of conditions is <literal>true</literal> the token is ignored.
In this example that happened to the punctuation sign <literal>-</literal>
because there are in fact no dictionaries assigned for its token type
(<literal>Space symbols</literal>), meaning space tokens will never be
@@ -2227,14 +2228,6 @@ SELECT alias, description, token FROM ts_debug('http://example.com/stuff/index.h
(notice that one token can produce more than one lexeme)
</para>
</listitem>
- <listitem>
- <para>
- a single lexeme with the <literal>TSL_FILTER</literal> flag set, to replace
- the original token with a new token to be passed to subsequent
- dictionaries (a dictionary that does this is called a
- <firstterm>filtering dictionary</firstterm>)
- </para>
- </listitem>
<listitem>
<para>
an empty array if the dictionary knows the token, but it is a stop word
@@ -2264,38 +2257,126 @@ SELECT alias, description, token FROM ts_debug('http://example.com/stuff/index.h
type that the parser can return, a separate list of dictionaries is
specified by the configuration. When a token of that type is found
by the parser, each dictionary in the list is consulted in turn,
- until some dictionary recognizes it as a known word. If it is identified
- as a stop word, or if no dictionary recognizes the token, it will be
- discarded and not indexed or searched for.
- Normally, the first dictionary that returns a non-<literal>NULL</literal>
- output determines the result, and any remaining dictionaries are not
- consulted; but a filtering dictionary can replace the given word
- with a modified word, which is then passed to subsequent dictionaries.
+ until command is not selected based on its condition. If none of cases is
+ selected token will be discarded and not indexed or searched for.
+ </para>
+
+ <para>
+ A tree of cases is described as condition/command/else triples. Each
+ condition is evaluated in order to select appropriate command to generate
+ resulted set of lexemes.
+ </para>
+
+ <para>
+ A condition is an expression with dictionaries used as operands and
+ basic set operators <literal>UNION</literal>, <literal>EXCEPT</literal>, <literal>INTERSECT</literal>
+ and special operator <literal>MAP</literal>.
+ Special operator <literal>MAP</literal> use output of left subexpression as
+ input for right subexpression.
+ </para>
+
+ <para>
+ Rules to write command are same as for condition with additional keyword
+ <literal>KEEP</literal> considered to use the result of the condition as an output.
+ </para>
+
+ <para>
+ A comma-separated list of dictionaries is a simplified variant of text
+ search configuration. Each dictionary consulted to process a token and first
+ non-<literal>NULL</literal> output is accepted as a processing result.
</para>
<para>
- The general rule for configuring a list of dictionaries
- is to place first the most narrow, most specific dictionary, then the more
- general dictionaries, finishing with a very general dictionary, like
+ The general rule for configuring tokens processing
+ is to place first case with the most narrow, most specific dictionary, then the more
+ general dictionaries, finishing with a very general dictionaries, like
a <application>Snowball</application> stemmer or <literal>simple</literal>, which
- recognizes everything. For example, for an astronomy-specific search
+ recognizes everything. For example, for an astronomy-specific search
(<literal>astro_en</literal> configuration) one could bind token type
<type>asciiword</type> (ASCII word) to a synonym dictionary of astronomical
terms, a general English dictionary and a <application>Snowball</application> English
- stemmer:
+ stemmer in comma-separated variant of mapping:
+ </para>
<programlisting>
ALTER TEXT SEARCH CONFIGURATION astro_en
ADD MAPPING FOR asciiword WITH astrosyn, english_ispell, english_stem;
</programlisting>
+
+ <para>
+ Another example is a configuration for both English and German languages via
+ operator-separated variant of mapping:
+ </para>
+
+<programlisting>
+ALTER TEXT SEARCH CONFIGURATION multi_en_de
+ ADD MAPPING FOR asciiword, word WITH
+ CASE english_hunspell WHEN MATCH THEN KEEP ELSE english_stem END
+ UNION
+ CASE german_hunspell WHEN MATCH THEN KEEP ELSE german_stem END;
+</programlisting>
+
+ <para>
+ This configuration provides an ability to search on collection of multilingual
+ documents without specifying language:
+ </para>
+
+<programlisting>
+WITH docs(id, txt) as (values (1, 'Das geschah zu Beginn dieses Monats'),
+ (2, 'with old stars and lacking gas and dust'),
+ (3, '25 light-years across, blown bywinds from its central'))
+SELECT * FROM docs WHERE to_tsvector('multi_en_de', txt) @@ to_tsquery('multi_en_de', 'lack');
+ id | txt
+----+-----------------------------------------
+ 2 | with old stars and lacking gas and dust
+
+WITH docs(id, txt) as (values (1, 'Das geschah zu Beginn dieses Monats'),
+ (2, 'with old stars and lacking gas and dust'),
+ (3, '25 light-years across, blown bywinds from its central'))
+SELECT * FROM docs WHERE to_tsvector('multi_en_de', txt) @@ to_tsquery('multi_en_de', 'beginnen');
+ id | txt
+----+-------------------------------------
+ 1 | Das geschah zu Beginn dieses Monats
+</programlisting>
+
+ <para>
+ A combination of stemmer dictionary with <literal>simple</literal> one may be used to mix
+ search for exact form of one word and linguistic search for others.
+ </para>
+
+<programlisting>
+ALTER TEXT SEARCH CONFIGURATION exact_and_linguistic
+ ADD MAPPING FOR asciiword, word WITH english_stem UNION simple;
+</programlisting>
+
+ <para>
+ In the following example a simple dictionary is used to prevent words from normalization in query.
</para>
+<programlisting>
+WITH docs(id, txt) as (values (1, 'Supernova star'),
+ (2, 'Supernova stars'))
+SELECT * FROM docs WHERE to_tsvector('exact_and_linguistic', txt) @@ (to_tsquery('simple', 'stars') && to_tsquery('english', 'supernovae'));
+ id | txt
+----+-----------------
+ 2 | Supernova stars
+</programlisting>
+
+ <caution>
+ <para>
+ Due to lack of information about origin of each lexeme in <literal>tsvector</literal> may
+ lead to false-positive triggers in case of stemmed form being used as exact form in a query.
+ </para>
+ </caution>
+
<para>
- A filtering dictionary can be placed anywhere in the list, except at the
- end where it'd be useless. Filtering dictionaries are useful to partially
+ Filtering dictionaries are useful to partially
normalize words to simplify the task of later dictionaries. For example,
a filtering dictionary could be used to remove accents from accented
letters, as is done by the <xref linkend="unaccent"/> module.
+ Filter dictionary should be placed at left of <literal>MAP</literal>
+ operator. If filter dictionary returns <literal>NULL</literal> it pass initial token
+ to the right subexpression.
</para>
<sect2 id="textsearch-stopwords">
@@ -2462,9 +2543,9 @@ SELECT ts_lexize('public.simple_dict','The');
<screen>
SELECT * FROM ts_debug('english', 'Paris');
- alias | description | token | dictionaries | dictionary | lexemes
------------+-----------------+-------+----------------+--------------+---------
- asciiword | Word, all ASCII | Paris | {english_stem} | english_stem | {pari}
+ alias | description | token | dictionaries | configuration | command | lexemes
+-----------+-----------------+-------+----------------+---------------+--------------+---------
+ asciiword | Word, all ASCII | Paris | {english_stem} | english_stem | english_stem | {pari}
CREATE TEXT SEARCH DICTIONARY my_synonym (
TEMPLATE = synonym,
@@ -2476,9 +2557,12 @@ ALTER TEXT SEARCH CONFIGURATION english
WITH my_synonym, english_stem;
SELECT * FROM ts_debug('english', 'Paris');
- alias | description | token | dictionaries | dictionary | lexemes
------------+-----------------+-------+---------------------------+------------+---------
- asciiword | Word, all ASCII | Paris | {my_synonym,english_stem} | my_synonym | {paris}
+ alias | description | token | dictionaries | configuration | command | lexemes
+-----------+-----------------+-------+---------------------------+---------------------------------------------+------------+---------
+ asciiword | Word, all ASCII | Paris | {my_synonym,english_stem} | CASE my_synonym WHEN MATCH THEN KEEP +| my_synonym | {paris}
+ | | | | ELSE CASE english_stem WHEN MATCH THEN KEEP+| |
+ | | | | END +| |
+ | | | | END | |
</screen>
</para>
@@ -3103,6 +3187,21 @@ CREATE TEXT SEARCH DICTIONARY english_ispell (
Now we can set up the mappings for words in configuration
<literal>pg</literal>:
+<programlisting>
+ALTER TEXT SEARCH CONFIGURATION pg
+ ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
+ word, hword, hword_part
+ WITH
+ CASE pg_dict WHEN MATCH THEN KEEP
+ ELSE
+ CASE english_ispell WHEN MATCH THEN KEEP
+ ELSE english_stem
+ END
+ END;
+</programlisting>
+
+ Or use alternative comma-separated syntax:
+
<programlisting>
ALTER TEXT SEARCH CONFIGURATION pg
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
@@ -3182,7 +3281,8 @@ ts_debug(<optional> <replaceable class="parameter">config</replaceable> <type>re
OUT <replaceable class="parameter">description</replaceable> <type>text</type>,
OUT <replaceable class="parameter">token</replaceable> <type>text</type>,
OUT <replaceable class="parameter">dictionaries</replaceable> <type>regdictionary[]</type>,
- OUT <replaceable class="parameter">dictionary</replaceable> <type>regdictionary</type>,
+ OUT <replaceable class="parameter">configuration</replaceable> <type>text</type>,
+ OUT <replaceable class="parameter">command</replaceable> <type>text</type>,
OUT <replaceable class="parameter">lexemes</replaceable> <type>text[]</type>)
returns setof record
</synopsis>
@@ -3226,14 +3326,20 @@ ts_debug(<optional> <replaceable class="parameter">config</replaceable> <type>re
</listitem>
<listitem>
<para>
- <replaceable>dictionary</replaceable> <type>regdictionary</type> — the dictionary
- that recognized the token, or <literal>NULL</literal> if none did
+ <replaceable>configuration</replaceable> <type>text</type> — the
+ configuration defined for this token type
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <replaceable>command</replaceable> <type>text</type> — the command that describes
+ the way the output was produced
</para>
</listitem>
<listitem>
<para>
<replaceable>lexemes</replaceable> <type>text[]</type> — the lexeme(s) produced
- by the dictionary that recognized the token, or <literal>NULL</literal> if
+ by the command selected according conditions, or <literal>NULL</literal> if
none did; an empty array (<literal>{}</literal>) means it was recognized as a
stop word
</para>
@@ -3246,32 +3352,32 @@ ts_debug(<optional> <replaceable class="parameter">config</replaceable> <type>re
<screen>
SELECT * FROM ts_debug('english','a fat cat sat on a mat - it ate a fat rats');
- alias | description | token | dictionaries | dictionary | lexemes
------------+-----------------+-------+----------------+--------------+---------
- asciiword | Word, all ASCII | a | {english_stem} | english_stem | {}
- blank | Space symbols | | {} | |
- asciiword | Word, all ASCII | fat | {english_stem} | english_stem | {fat}
- blank | Space symbols | | {} | |
- asciiword | Word, all ASCII | cat | {english_stem} | english_stem | {cat}
- blank | Space symbols | | {} | |
- asciiword | Word, all ASCII | sat | {english_stem} | english_stem | {sat}
- blank | Space symbols | | {} | |
- asciiword | Word, all ASCII | on | {english_stem} | english_stem | {}
- blank | Space symbols | | {} | |
- asciiword | Word, all ASCII | a | {english_stem} | english_stem | {}
- blank | Space symbols | | {} | |
- asciiword | Word, all ASCII | mat | {english_stem} | english_stem | {mat}
- blank | Space symbols | | {} | |
- blank | Space symbols | - | {} | |
- asciiword | Word, all ASCII | it | {english_stem} | english_stem | {}
- blank | Space symbols | | {} | |
- asciiword | Word, all ASCII | ate | {english_stem} | english_stem | {ate}
- blank | Space symbols | | {} | |
- asciiword | Word, all ASCII | a | {english_stem} | english_stem | {}
- blank | Space symbols | | {} | |
- asciiword | Word, all ASCII | fat | {english_stem} | english_stem | {fat}
- blank | Space symbols | | {} | |
- asciiword | Word, all ASCII | rats | {english_stem} | english_stem | {rat}
+ alias | description | token | dictionaries | configuration | command | lexemes
+-----------+-----------------+-------+----------------+---------------+--------------+---------
+ asciiword | Word, all ASCII | a | {english_stem} | english_stem | english_stem | {}
+ blank | Space symbols | | | | |
+ asciiword | Word, all ASCII | fat | {english_stem} | english_stem | english_stem | {fat}
+ blank | Space symbols | | | | |
+ asciiword | Word, all ASCII | cat | {english_stem} | english_stem | english_stem | {cat}
+ blank | Space symbols | | | | |
+ asciiword | Word, all ASCII | sat | {english_stem} | english_stem | english_stem | {sat}
+ blank | Space symbols | | | | |
+ asciiword | Word, all ASCII | on | {english_stem} | english_stem | english_stem | {}
+ blank | Space symbols | | | | |
+ asciiword | Word, all ASCII | a | {english_stem} | english_stem | english_stem | {}
+ blank | Space symbols | | | | |
+ asciiword | Word, all ASCII | mat | {english_stem} | english_stem | english_stem | {mat}
+ blank | Space symbols | | | | |
+ blank | Space symbols | - | | | |
+ asciiword | Word, all ASCII | it | {english_stem} | english_stem | english_stem | {}
+ blank | Space symbols | | | | |
+ asciiword | Word, all ASCII | ate | {english_stem} | english_stem | english_stem | {ate}
+ blank | Space symbols | | | | |
+ asciiword | Word, all ASCII | a | {english_stem} | english_stem | english_stem | {}
+ blank | Space symbols | | | | |
+ asciiword | Word, all ASCII | fat | {english_stem} | english_stem | english_stem | {fat}
+ blank | Space symbols | | | | |
+ asciiword | Word, all ASCII | rats | {english_stem} | english_stem | english_stem | {rat}
</screen>
</para>
@@ -3297,13 +3403,22 @@ ALTER TEXT SEARCH CONFIGURATION public.english
<screen>
SELECT * FROM ts_debug('public.english','The Brightest supernovaes');
- alias | description | token | dictionaries | dictionary | lexemes
------------+-----------------+-------------+-------------------------------+----------------+-------------
- asciiword | Word, all ASCII | The | {english_ispell,english_stem} | english_ispell | {}
- blank | Space symbols | | {} | |
- asciiword | Word, all ASCII | Brightest | {english_ispell,english_stem} | english_ispell | {bright}
- blank | Space symbols | | {} | |
- asciiword | Word, all ASCII | supernovaes | {english_ispell,english_stem} | english_stem | {supernova}
+ alias | description | token | dictionaries | configuration | command | lexemes
+-----------+-----------------+-------------+-------------------------------+---------------------------------------------+------------------+-------------
+ asciiword | Word, all ASCII | The | {english_ispell,english_stem} | CASE english_ispell WHEN MATCH THEN KEEP +| english_ispell | {}
+ | | | | ELSE CASE english_stem WHEN MATCH THEN KEEP+| |
+ | | | | END +| |
+ | | | | END | |
+ blank | Space symbols | | | | |
+ asciiword | Word, all ASCII | Brightest | {english_ispell,english_stem} | CASE english_ispell WHEN MATCH THEN KEEP +| english_ispell | {bright}
+ | | | | ELSE CASE english_stem WHEN MATCH THEN KEEP+| |
+ | | | | END +| |
+ | | | | END | |
+ blank | Space symbols | | | | |
+ asciiword | Word, all ASCII | supernovaes | {english_ispell,english_stem} | CASE english_ispell WHEN MATCH THEN KEEP +| english_stem | {supernova}
+ | | | | ELSE CASE english_stem WHEN MATCH THEN KEEP+| |
+ | | | | END +| |
+ | | | | END | |
</screen>
<para>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 394aea8e0f..4806e0b9fc 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -944,55 +944,14 @@ GRANT SELECT (subdbid, subname, subowner, subenabled, subslotname, subpublicatio
-- Tsearch debug function. Defined here because it'd be pretty unwieldy
-- to put it into pg_proc.h
-CREATE FUNCTION ts_debug(IN config regconfig, IN document text,
- OUT alias text,
- OUT description text,
- OUT token text,
- OUT dictionaries regdictionary[],
- OUT dictionary regdictionary,
- OUT lexemes text[])
-RETURNS SETOF record AS
-$$
-SELECT
- tt.alias AS alias,
- tt.description AS description,
- parse.token AS token,
- ARRAY ( SELECT m.mapdict::pg_catalog.regdictionary
- FROM pg_catalog.pg_ts_config_map AS m
- WHERE m.mapcfg = $1 AND m.maptokentype = parse.tokid
- ORDER BY m.mapseqno )
- AS dictionaries,
- ( SELECT mapdict::pg_catalog.regdictionary
- FROM pg_catalog.pg_ts_config_map AS m
- WHERE m.mapcfg = $1 AND m.maptokentype = parse.tokid
- ORDER BY pg_catalog.ts_lexize(mapdict, parse.token) IS NULL, m.mapseqno
- LIMIT 1
- ) AS dictionary,
- ( SELECT pg_catalog.ts_lexize(mapdict, parse.token)
- FROM pg_catalog.pg_ts_config_map AS m
- WHERE m.mapcfg = $1 AND m.maptokentype = parse.tokid
- ORDER BY pg_catalog.ts_lexize(mapdict, parse.token) IS NULL, m.mapseqno
- LIMIT 1
- ) AS lexemes
-FROM pg_catalog.ts_parse(
- (SELECT cfgparser FROM pg_catalog.pg_ts_config WHERE oid = $1 ), $2
- ) AS parse,
- pg_catalog.ts_token_type(
- (SELECT cfgparser FROM pg_catalog.pg_ts_config WHERE oid = $1 )
- ) AS tt
-WHERE tt.tokid = parse.tokid
-$$
-LANGUAGE SQL STRICT STABLE PARALLEL SAFE;
-
-COMMENT ON FUNCTION ts_debug(regconfig,text) IS
- 'debug function for text search configuration';
CREATE FUNCTION ts_debug(IN document text,
OUT alias text,
OUT description text,
OUT token text,
OUT dictionaries regdictionary[],
- OUT dictionary regdictionary,
+ OUT configuration text,
+ OUT command text,
OUT lexemes text[])
RETURNS SETOF record AS
$$
diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c
index adc7cd67a7..e74b68f1e1 100644
--- a/src/backend/commands/tsearchcmds.c
+++ b/src/backend/commands/tsearchcmds.c
@@ -39,9 +39,12 @@
#include "nodes/makefuncs.h"
#include "parser/parse_func.h"
#include "tsearch/ts_cache.h"
+#include "tsearch/ts_public.h"
#include "tsearch/ts_utils.h"
+#include "tsearch/ts_configmap.h"
#include "utils/builtins.h"
#include "utils/fmgroids.h"
+#include "utils/jsonb.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
#include "utils/syscache.h"
@@ -935,11 +938,22 @@ makeConfigurationDependencies(HeapTuple tuple, bool removeOld,
while (HeapTupleIsValid((maptup = systable_getnext(scan))))
{
Form_pg_ts_config_map cfgmap = (Form_pg_ts_config_map) GETSTRUCT(maptup);
+ TSMapElement *mapdicts = JsonbToTSMap(DatumGetJsonbP(&cfgmap->mapdicts));
+ Oid *dictionaryOids = TSMapGetDictionaries(mapdicts);
+ Oid *currentOid = dictionaryOids;
- referenced.classId = TSDictionaryRelationId;
- referenced.objectId = cfgmap->mapdict;
- referenced.objectSubId = 0;
- add_exact_object_address(&referenced, addrs);
+ while (*currentOid != InvalidOid)
+ {
+ referenced.classId = TSDictionaryRelationId;
+ referenced.objectId = *currentOid;
+ referenced.objectSubId = 0;
+ add_exact_object_address(&referenced, addrs);
+
+ currentOid++;
+ }
+
+ pfree(dictionaryOids);
+ TSMapElementFree(mapdicts);
}
systable_endscan(scan);
@@ -1091,8 +1105,7 @@ DefineTSConfiguration(List *names, List *parameters, ObjectAddress *copied)
mapvalues[Anum_pg_ts_config_map_mapcfg - 1] = cfgOid;
mapvalues[Anum_pg_ts_config_map_maptokentype - 1] = cfgmap->maptokentype;
- mapvalues[Anum_pg_ts_config_map_mapseqno - 1] = cfgmap->mapseqno;
- mapvalues[Anum_pg_ts_config_map_mapdict - 1] = cfgmap->mapdict;
+ mapvalues[Anum_pg_ts_config_map_mapdicts - 1] = JsonbPGetDatum(&cfgmap->mapdicts);
newmaptup = heap_form_tuple(mapRel->rd_att, mapvalues, mapnulls);
@@ -1195,7 +1208,7 @@ AlterTSConfiguration(AlterTSConfigurationStmt *stmt)
relMap = heap_open(TSConfigMapRelationId, RowExclusiveLock);
/* Add or drop mappings */
- if (stmt->dicts)
+ if (stmt->dicts || stmt->dict_map)
MakeConfigurationMapping(stmt, tup, relMap);
else if (stmt->tokentype)
DropConfigurationMapping(stmt, tup, relMap);
@@ -1271,6 +1284,108 @@ getTokenTypes(Oid prsId, List *tokennames)
return res;
}
+static TSMapElement *
+CreateCaseForSingleDictionary(Oid dictOid)
+{
+ TSMapElement *result = palloc0(sizeof(TSMapElement));
+ TSMapElement *keepElement = palloc0(sizeof(TSMapElement));
+ TSMapElement *condition = palloc0(sizeof(TSMapElement));
+ TSMapCase *caseObject = palloc0(sizeof(TSMapCase));
+
+ keepElement->type = TSMAP_KEEP;
+ keepElement->parent = result;
+ caseObject->command = keepElement;
+ caseObject->match = true;
+
+ condition->type = TSMAP_DICTIONARY;
+ condition->parent = result;
+ condition->value.objectDictionary = dictOid;
+ caseObject->condition = condition;
+
+ result->value.objectCase = caseObject;
+ result->type = TSMAP_CASE;
+
+ return result;
+}
+
+static TSMapElement *
+ParseTSMapConfig(DictMapElem *elem)
+{
+ TSMapElement *result = palloc0(sizeof(TSMapElement));
+
+ if (elem->kind == DICT_MAP_CASE)
+ {
+ TSMapCase *caseObject = palloc0(sizeof(TSMapCase));
+ DictMapCase *caseASTObject = elem->data;
+
+ caseObject->condition = ParseTSMapConfig(caseASTObject->condition);
+ caseObject->command = ParseTSMapConfig(caseASTObject->command);
+
+ if (caseASTObject->elsebranch)
+ caseObject->elsebranch = ParseTSMapConfig(caseASTObject->elsebranch);
+
+ caseObject->match = caseASTObject->match;
+
+ caseObject->condition->parent = result;
+ caseObject->command->parent = result;
+
+ result->type = TSMAP_CASE;
+ result->value.objectCase = caseObject;
+ }
+ else if (elem->kind == DICT_MAP_EXPRESSION)
+ {
+ TSMapExpression *expression = palloc0(sizeof(TSMapExpression));
+ DictMapExprElem *expressionAST = elem->data;
+
+ expression->left = ParseTSMapConfig(expressionAST->left);
+ expression->right = ParseTSMapConfig(expressionAST->right);
+ expression->operator = expressionAST->oper;
+
+ result->type = TSMAP_EXPRESSION;
+ result->value.objectExpression = expression;
+ }
+ else if (elem->kind == DICT_MAP_KEEP)
+ {
+ result->value.objectExpression = NULL;
+ result->type = TSMAP_KEEP;
+ }
+ else if (elem->kind == DICT_MAP_DICTIONARY)
+ {
+ result->value.objectDictionary = get_ts_dict_oid(elem->data, false);
+ result->type = TSMAP_DICTIONARY;
+ }
+ else if (elem->kind == DICT_MAP_DICTIONARY_LIST)
+ {
+ int i = 0;
+ ListCell *c;
+ TSMapElement *root = NULL;
+ TSMapElement *currentNode = NULL;
+
+ foreach(c, (List *) elem->data)
+ {
+ TSMapElement *prevNode = currentNode;
+ List *names = (List *) lfirst(c);
+ Oid oid = get_ts_dict_oid(names, false);
+
+ currentNode = CreateCaseForSingleDictionary(oid);
+
+ if (root == NULL)
+ root = currentNode;
+ else
+ {
+ prevNode->value.objectCase->elsebranch = currentNode;
+ currentNode->parent = prevNode;
+ }
+
+ prevNode = currentNode;
+
+ i++;
+ }
+ result = root;
+ }
+ return result;
+}
+
/*
* ALTER TEXT SEARCH CONFIGURATION ADD/ALTER MAPPING
*/
@@ -1287,8 +1402,9 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt,
Oid prsId;
int *tokens,
ntoken;
- Oid *dictIds;
- int ndict;
+ Oid *dictIds = NULL;
+ int ndict = 0;
+ TSMapElement *config = NULL;
ListCell *c;
prsId = ((Form_pg_ts_config) GETSTRUCT(tup))->cfgparser;
@@ -1327,15 +1443,18 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt,
/*
* Convert list of dictionary names to array of dict OIDs
*/
- ndict = list_length(stmt->dicts);
- dictIds = (Oid *) palloc(sizeof(Oid) * ndict);
- i = 0;
- foreach(c, stmt->dicts)
+ if (stmt->dicts)
{
- List *names = (List *) lfirst(c);
+ ndict = list_length(stmt->dicts);
+ dictIds = (Oid *) palloc(sizeof(Oid) * ndict);
+ i = 0;
+ foreach(c, stmt->dicts)
+ {
+ List *names = (List *) lfirst(c);
- dictIds[i] = get_ts_dict_oid(names, false);
- i++;
+ dictIds[i] = get_ts_dict_oid(names, false);
+ i++;
+ }
}
if (stmt->replace)
@@ -1357,6 +1476,10 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt,
while (HeapTupleIsValid((maptup = systable_getnext(scan))))
{
Form_pg_ts_config_map cfgmap = (Form_pg_ts_config_map) GETSTRUCT(maptup);
+ Datum repl_val[Natts_pg_ts_config_map];
+ bool repl_null[Natts_pg_ts_config_map];
+ bool repl_repl[Natts_pg_ts_config_map];
+ HeapTuple newtup;
/*
* check if it's one of target token types
@@ -1380,25 +1503,21 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt,
/*
* replace dictionary if match
*/
- if (cfgmap->mapdict == dictOld)
- {
- Datum repl_val[Natts_pg_ts_config_map];
- bool repl_null[Natts_pg_ts_config_map];
- bool repl_repl[Natts_pg_ts_config_map];
- HeapTuple newtup;
-
- memset(repl_val, 0, sizeof(repl_val));
- memset(repl_null, false, sizeof(repl_null));
- memset(repl_repl, false, sizeof(repl_repl));
-
- repl_val[Anum_pg_ts_config_map_mapdict - 1] = ObjectIdGetDatum(dictNew);
- repl_repl[Anum_pg_ts_config_map_mapdict - 1] = true;
-
- newtup = heap_modify_tuple(maptup,
- RelationGetDescr(relMap),
- repl_val, repl_null, repl_repl);
- CatalogTupleUpdate(relMap, &newtup->t_self, newtup);
- }
+ config = JsonbToTSMap(DatumGetJsonbP(&cfgmap->mapdicts));
+ TSMapReplaceDictionary(config, dictOld, dictNew);
+
+ memset(repl_val, 0, sizeof(repl_val));
+ memset(repl_null, false, sizeof(repl_null));
+ memset(repl_repl, false, sizeof(repl_repl));
+
+ repl_val[Anum_pg_ts_config_map_mapdicts - 1] = JsonbPGetDatum(TSMapToJsonb(config));
+ repl_repl[Anum_pg_ts_config_map_mapdicts - 1] = true;
+
+ newtup = heap_modify_tuple(maptup,
+ RelationGetDescr(relMap),
+ repl_val, repl_null, repl_repl);
+ CatalogTupleUpdate(relMap, &newtup->t_self, newtup);
+ pfree(config);
}
systable_endscan(scan);
@@ -1408,24 +1527,22 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt,
/*
* Insertion of new entries
*/
+ config = ParseTSMapConfig(stmt->dict_map);
+
for (i = 0; i < ntoken; i++)
{
- for (j = 0; j < ndict; j++)
- {
- Datum values[Natts_pg_ts_config_map];
- bool nulls[Natts_pg_ts_config_map];
+ Datum values[Natts_pg_ts_config_map];
+ bool nulls[Natts_pg_ts_config_map];
- memset(nulls, false, sizeof(nulls));
- values[Anum_pg_ts_config_map_mapcfg - 1] = ObjectIdGetDatum(cfgId);
- values[Anum_pg_ts_config_map_maptokentype - 1] = Int32GetDatum(tokens[i]);
- values[Anum_pg_ts_config_map_mapseqno - 1] = Int32GetDatum(j + 1);
- values[Anum_pg_ts_config_map_mapdict - 1] = ObjectIdGetDatum(dictIds[j]);
+ memset(nulls, false, sizeof(nulls));
+ values[Anum_pg_ts_config_map_mapcfg - 1] = ObjectIdGetDatum(cfgId);
+ values[Anum_pg_ts_config_map_maptokentype - 1] = Int32GetDatum(tokens[i]);
+ values[Anum_pg_ts_config_map_mapdicts - 1] = JsonbPGetDatum(TSMapToJsonb(config));
- tup = heap_form_tuple(relMap->rd_att, values, nulls);
- CatalogTupleInsert(relMap, tup);
+ tup = heap_form_tuple(relMap->rd_att, values, nulls);
+ CatalogTupleInsert(relMap, tup);
- heap_freetuple(tup);
- }
+ heap_freetuple(tup);
}
}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 84d717102d..3e5d19c5e2 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4387,6 +4387,42 @@ _copyReassignOwnedStmt(const ReassignOwnedStmt *from)
return newnode;
}
+static DictMapElem *
+_copyDictMapElem(const DictMapElem *from)
+{
+ DictMapElem *newnode = makeNode(DictMapElem);
+
+ COPY_SCALAR_FIELD(kind);
+ COPY_NODE_FIELD(data);
+
+ return newnode;
+}
+
+static DictMapExprElem *
+_copyDictMapExprElem(const DictMapExprElem *from)
+{
+ DictMapExprElem *newnode = makeNode(DictMapExprElem);
+
+ COPY_NODE_FIELD(left);
+ COPY_NODE_FIELD(right);
+ COPY_SCALAR_FIELD(oper);
+
+ return newnode;
+}
+
+static DictMapCase *
+_copyDictMapCase(const DictMapCase *from)
+{
+ DictMapCase *newnode = makeNode(DictMapCase);
+
+ COPY_NODE_FIELD(condition);
+ COPY_NODE_FIELD(command);
+ COPY_NODE_FIELD(elsebranch);
+ COPY_SCALAR_FIELD(match);
+
+ return newnode;
+}
+
static AlterTSDictionaryStmt *
_copyAlterTSDictionaryStmt(const AlterTSDictionaryStmt *from)
{
@@ -5394,6 +5430,15 @@ copyObjectImpl(const void *from)
case T_ReassignOwnedStmt:
retval = _copyReassignOwnedStmt(from);
break;
+ case T_DictMapExprElem:
+ retval = _copyDictMapExprElem(from);
+ break;
+ case T_DictMapElem:
+ retval = _copyDictMapElem(from);
+ break;
+ case T_DictMapCase:
+ retval = _copyDictMapCase(from);
+ break;
case T_AlterTSDictionaryStmt:
retval = _copyAlterTSDictionaryStmt(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 2e869a9d5d..05a056b61d 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2186,6 +2186,36 @@ _equalReassignOwnedStmt(const ReassignOwnedStmt *a, const ReassignOwnedStmt *b)
return true;
}
+static bool
+_equalDictMapElem(const DictMapElem *a, const DictMapElem *b)
+{
+ COMPARE_NODE_FIELD(data);
+ COMPARE_SCALAR_FIELD(kind);
+
+ return true;
+}
+
+static bool
+_equalDictMapExprElem(const DictMapExprElem *a, const DictMapExprElem *b)
+{
+ COMPARE_NODE_FIELD(left);
+ COMPARE_NODE_FIELD(right);
+ COMPARE_SCALAR_FIELD(oper);
+
+ return true;
+}
+
+static bool
+_equalDictMapCase(const DictMapCase *a, const DictMapCase *b)
+{
+ COMPARE_NODE_FIELD(condition);
+ COMPARE_NODE_FIELD(command);
+ COMPARE_NODE_FIELD(elsebranch);
+ COMPARE_SCALAR_FIELD(match);
+
+ return true;
+}
+
static bool
_equalAlterTSDictionaryStmt(const AlterTSDictionaryStmt *a, const AlterTSDictionaryStmt *b)
{
@@ -3532,6 +3562,15 @@ equal(const void *a, const void *b)
case T_ReassignOwnedStmt:
retval = _equalReassignOwnedStmt(a, b);
break;
+ case T_DictMapExprElem:
+ retval = _equalDictMapExprElem(a, b);
+ break;
+ case T_DictMapElem:
+ retval = _equalDictMapElem(a, b);
+ break;
+ case T_DictMapCase:
+ retval = _equalDictMapCase(a, b);
+ break;
case T_AlterTSDictionaryStmt:
retval = _equalAlterTSDictionaryStmt(a, b);
break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index ebfc94f896..3ab0b75ece 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -52,6 +52,7 @@
#include "catalog/namespace.h"
#include "catalog/pg_am.h"
#include "catalog/pg_trigger.h"
+#include "catalog/pg_ts_config_map.h"
#include "commands/defrem.h"
#include "commands/trigger.h"
#include "nodes/makefuncs.h"
@@ -241,6 +242,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
PartitionSpec *partspec;
PartitionBoundSpec *partboundspec;
RoleSpec *rolespec;
+ DictMapElem *dmapelem;
}
%type <node> stmt schema_stmt
@@ -308,7 +310,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <ival> vacuum_option_list vacuum_option_elem
%type <boolean> opt_or_replace
opt_grant_grant_option opt_grant_admin_option
- opt_nowait opt_if_exists opt_with_data
+ opt_nowait opt_if_exists opt_with_data opt_dictionary_map_no
%type <ival> opt_nowait_or_skip
%type <list> OptRoleList AlterOptRoleList
@@ -396,8 +398,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
relation_expr_list dostmt_opt_list
transform_element_list transform_type_list
TriggerTransitions TriggerReferencing
- publication_name_list
vacuum_relation_list opt_vacuum_relation_list
+ publication_name_list
%type <list> group_by_list
%type <node> group_by_item empty_grouping_set rollup_clause cube_clause
@@ -582,6 +584,12 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <list> hash_partbound partbound_datum_list range_datum_list
%type <defelt> hash_partbound_elem
+%type <ival> dictionary_map_set_expr_operator
+%type <dmapelem> dictionary_map_dict dictionary_map_command_expr_paren
+ dictionary_map_set_expr dictionary_map_case
+ dictionary_map_action dictionary_map
+ opt_dictionary_map_case_else dictionary_config
+
/*
* Non-keyword token types. These are hard-wired into the "flex" lexer.
* They must be listed first so that their numeric codes do not depend on
@@ -643,13 +651,14 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
JOIN
- KEY
+ KEEP KEY
LABEL LANGUAGE LARGE_P LAST_P LATERAL_P
LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
- MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
+ MAP MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE
+ MONTH_P MOVE
NAME_P NAMES NATIONAL NATURAL NCHAR NEW NEXT NO NONE
NOT NOTHING NOTIFY NOTNULL NOWAIT NULL_P NULLIF
@@ -10318,24 +10327,26 @@ AlterTSDictionaryStmt:
;
AlterTSConfigurationStmt:
- ALTER TEXT_P SEARCH CONFIGURATION any_name ADD_P MAPPING FOR name_list any_with any_name_list
+ ALTER TEXT_P SEARCH CONFIGURATION any_name ADD_P MAPPING FOR name_list any_with dictionary_config
{
AlterTSConfigurationStmt *n = makeNode(AlterTSConfigurationStmt);
n->kind = ALTER_TSCONFIG_ADD_MAPPING;
n->cfgname = $5;
n->tokentype = $9;
- n->dicts = $11;
+ n->dict_map = $11;
+ n->dicts = NULL;
n->override = false;
n->replace = false;
$$ = (Node*)n;
}
- | ALTER TEXT_P SEARCH CONFIGURATION any_name ALTER MAPPING FOR name_list any_with any_name_list
+ | ALTER TEXT_P SEARCH CONFIGURATION any_name ALTER MAPPING FOR name_list any_with dictionary_config
{
AlterTSConfigurationStmt *n = makeNode(AlterTSConfigurationStmt);
n->kind = ALTER_TSCONFIG_ALTER_MAPPING_FOR_TOKEN;
n->cfgname = $5;
n->tokentype = $9;
- n->dicts = $11;
+ n->dict_map = $11;
+ n->dicts = NULL;
n->override = true;
n->replace = false;
$$ = (Node*)n;
@@ -10387,6 +10398,111 @@ any_with: WITH {}
| WITH_LA {}
;
+opt_dictionary_map_no:
+ NO { $$ = true; }
+ | { $$ = false; }
+ ;
+
+dictionary_config:
+ dictionary_map { $$ = $1; }
+ | any_name_list ',' any_name
+ {
+ DictMapElem *n = makeNode(DictMapElem);
+ n->kind = DICT_MAP_DICTIONARY_LIST;
+ n->data = lappend($1, $3);
+ $$ = n;
+ }
+ ;
+
+dictionary_map:
+ dictionary_map_case { $$ = $1; }
+ | dictionary_map_set_expr { $$ = $1; }
+ ;
+
+dictionary_map_action:
+ KEEP
+ {
+ DictMapElem *n = makeNode(DictMapElem);
+ n->kind = DICT_MAP_KEEP;
+ n->data = NULL;
+ $$ = n;
+ }
+ | dictionary_map { $$ = $1; }
+ ;
+
+opt_dictionary_map_case_else:
+ ELSE dictionary_map { $$ = $2; }
+ | { $$ = NULL; }
+ ;
+
+dictionary_map_case:
+ CASE dictionary_map WHEN opt_dictionary_map_no MATCH THEN dictionary_map_action opt_dictionary_map_case_else END_P
+ {
+ DictMapCase *n = makeNode(DictMapCase);
+ DictMapElem *r = makeNode(DictMapElem);
+
+ n->condition = $2;
+ n->command = $7;
+ n->elsebranch = $8;
+ n->match = !$4;
+
+ r->kind = DICT_MAP_CASE;
+ r->data = n;
+ $$ = r;
+ }
+ ;
+
+dictionary_map_set_expr_operator:
+ UNION { $$ = TSMAP_OP_UNION; }
+ | EXCEPT { $$ = TSMAP_OP_EXCEPT; }
+ | INTERSECT { $$ = TSMAP_OP_INTERSECT; }
+ | MAP { $$ = TSMAP_OP_MAP; }
+ ;
+
+dictionary_map_set_expr:
+ dictionary_map_command_expr_paren { $$ = $1; }
+ | dictionary_map_case dictionary_map_set_expr_operator dictionary_map_case
+ {
+ DictMapExprElem *n = makeNode(DictMapExprElem);
+ DictMapElem *r = makeNode(DictMapElem);
+
+ n->left = $1;
+ n->oper = $2;
+ n->right = $3;
+
+ r->kind = DICT_MAP_EXPRESSION;
+ r->data = n;
+ $$ = r;
+ }
+ | dictionary_map_command_expr_paren dictionary_map_set_expr_operator dictionary_map_command_expr_paren
+ {
+ DictMapExprElem *n = makeNode(DictMapExprElem);
+ DictMapElem *r = makeNode(DictMapElem);
+
+ n->left = $1;
+ n->oper = $2;
+ n->right = $3;
+
+ r->kind = DICT_MAP_EXPRESSION;
+ r->data = n;
+ $$ = r;
+ }
+ ;
+
+dictionary_map_command_expr_paren:
+ '(' dictionary_map_set_expr ')' { $$ = $2; }
+ | dictionary_map_dict { $$ = $1; }
+ ;
+
+dictionary_map_dict:
+ any_name
+ {
+ DictMapElem *n = makeNode(DictMapElem);
+ n->kind = DICT_MAP_DICTIONARY;
+ n->data = $1;
+ $$ = n;
+ }
+ ;
/*****************************************************************************
*
@@ -15042,6 +15158,7 @@ unreserved_keyword:
| LOCK_P
| LOCKED
| LOGGED
+ | MAP
| MAPPING
| MATCH
| MATERIALIZED
@@ -15346,6 +15463,7 @@ reserved_keyword:
| INITIALLY
| INTERSECT
| INTO
+ | KEEP
| LATERAL_P
| LEADING
| LIMIT
diff --git a/src/backend/tsearch/Makefile b/src/backend/tsearch/Makefile
index 34fe4c5b3c..24e47f20f4 100644
--- a/src/backend/tsearch/Makefile
+++ b/src/backend/tsearch/Makefile
@@ -26,7 +26,7 @@ DICTFILES_PATH=$(addprefix dicts/,$(DICTFILES))
OBJS = ts_locale.o ts_parse.o wparser.o wparser_def.o dict.o \
dict_simple.o dict_synonym.o dict_thesaurus.o \
dict_ispell.o regis.o spell.o \
- to_tsany.o ts_selfuncs.o ts_typanalyze.o ts_utils.o
+ to_tsany.o ts_selfuncs.o ts_typanalyze.o ts_utils.o ts_configmap.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/tsearch/ts_parse.c b/src/backend/tsearch/ts_parse.c
index ad5dddff4b..2b3caf95dd 100644
--- a/src/backend/tsearch/ts_parse.c
+++ b/src/backend/tsearch/ts_parse.c
@@ -16,19 +16,30 @@
#include "tsearch/ts_cache.h"
#include "tsearch/ts_utils.h"
+#include "tsearch/ts_configmap.h"
+#include "utils/builtins.h"
+#include "funcapi.h"
#define IGNORE_LONGLEXEME 1
-/*
+/*-------------------
* Lexize subsystem
+ *-------------------
*/
typedef struct ParsedLex
{
- int type;
- char *lemm;
- int lenlemm;
- struct ParsedLex *next;
+ int type; /* Token type */
+ char *lemm; /* Token itself */
+ int lenlemm; /* Length of the token string */
+ int maplen; /* Length of the map */
+ bool *accepted; /* Is accepted by some dictionary */
+ bool *rejected; /* Is rejected by all dictionaries */
+ bool *notFinished; /* Some dictionary not finished processing and
+ * waits for more tokens */
+ struct ParsedLex *next; /* Next token in the list */
+ TSMapElement *relatedRule; /* Rule which is used to produce lexemes from
+ * the token */
} ParsedLex;
typedef struct ListParsedLex
@@ -37,37 +48,98 @@ typedef struct ListParsedLex
ParsedLex *tail;
} ListParsedLex;
-typedef struct
+typedef struct DictState
{
- TSConfigCacheEntry *cfg;
- Oid curDictId;
- int posDict;
- DictSubState dictState;
- ParsedLex *curSub;
- ListParsedLex towork; /* current list to work */
- ListParsedLex waste; /* list of lexemes that already lexized */
+ Oid relatedDictionary; /* DictState contains state of dictionary
+ * with this Oid */
+ DictSubState subState; /* Internal state of the dictionary used to
+ * store some state between dictionary calls */
+ ListParsedLex acceptedTokens; /* Tokens which are processed and
+ * accepted, used in last returned result
+ * by the dictionary */
+ ListParsedLex intermediateTokens; /* Tokens which are not accepted, but
+ * were processed by thesaurus-like
+ * dictionary */
+ bool storeToAccepted; /* Should current token be appended to
+ * accepted or intermediate tokens */
+ bool processed; /* Is the dictionary take control during
+ * current token processing */
+ TSLexeme *tmpResult; /* Last result retued by thesaurus-like
+ * dictionary, if dictionary still waiting for
+ * more lexemes */
+} DictState;
+
+typedef struct DictStateList
+{
+ int listLength;
+ DictState *states;
+} DictStateList;
- /*
- * fields to store last variant to lexize (basically, thesaurus or similar
- * to, which wants several lexemes
- */
+typedef struct LexemesBufferEntry
+{
+ Oid dictId;
+ TSMapElement *key;
+ ParsedLex *token;
+ TSLexeme *data;
+} LexemesBufferEntry;
- ParsedLex *lastRes;
- TSLexeme *tmpRes;
+typedef struct LexemesBuffer
+{
+ int size;
+ LexemesBufferEntry *data;
+} LexemesBuffer;
+
+typedef struct ResultStorage
+{
+ TSLexeme *lexemes; /* Processed lexemes, which is not yet
+ * accepted */
+ TSLexeme *accepted;
+} ResultStorage;
+
+typedef struct LexizeData
+{
+ TSConfigCacheEntry *cfg; /* Text search configuration mappings for
+ * current configuration */
+ DictStateList dslist; /* List of all currently stored states of
+ * dictionaries */
+ ListParsedLex towork; /* Current list to work */
+ ListParsedLex waste; /* List of lexemes that already lexized */
+ LexemesBuffer buffer; /* Buffer of processed lexemes. Used to avoid
+ * multiple execution of token lexize process
+ * with same parameters */
+ ResultStorage delayedResults; /* Results that should be returned but may
+ * be rejected in future */
+ Oid skipDictionary; /* The dictionary we should skip during
+ * processing. Used to avoid infinite loop in
+ * configuration with phrase dictionary */
+ bool debugContext; /* If true, relatedRule attribute is filled */
} LexizeData;
-static void
-LexizeInit(LexizeData *ld, TSConfigCacheEntry *cfg)
+typedef struct TSDebugContext
{
- ld->cfg = cfg;
- ld->curDictId = InvalidOid;
- ld->posDict = 0;
- ld->towork.head = ld->towork.tail = ld->curSub = NULL;
- ld->waste.head = ld->waste.tail = NULL;
- ld->lastRes = NULL;
- ld->tmpRes = NULL;
-}
+ TSConfigCacheEntry *cfg; /* Text search configuration mappings for
+ * current configuration */
+ TSParserCacheEntry *prsobj; /* Parser context of current ts_debug context */
+ LexDescr *tokenTypes; /* Token types supported by current parser */
+ void *prsdata; /* Parser data of current ts_debug context */
+ LexizeData ldata; /* Lexize data of current ts_debug context */
+ int tokentype; /* Last token tokentype */
+ TSLexeme *savedLexemes; /* Last token lexemes stored for ts_debug
+ * output */
+ ParsedLex *leftTokens; /* Corresponded ParsedLex */
+} TSDebugContext;
+
+static TSLexeme *TSLexemeMap(LexizeData *ld, ParsedLex *token, TSMapExpression *expression);
+static TSLexeme *LexizeExecTSElement(LexizeData *ld, ParsedLex *token, TSMapElement *config);
+
+/*-------------------
+ * ListParsedLex API
+ *-------------------
+ */
+/*
+ * Add a ParsedLex to the end of the list
+ */
static void
LPLAddTail(ListParsedLex *list, ParsedLex *newpl)
{
@@ -81,274 +153,1277 @@ LPLAddTail(ListParsedLex *list, ParsedLex *newpl)
newpl->next = NULL;
}
-static ParsedLex *
-LPLRemoveHead(ListParsedLex *list)
-{
- ParsedLex *res = list->head;
+/*
+ * Add a copy of ParsedLex to the end of the list
+ */
+static void
+LPLAddTailCopy(ListParsedLex *list, ParsedLex *newpl)
+{
+ ParsedLex *copy = palloc0(sizeof(ParsedLex));
+
+ copy->lenlemm = newpl->lenlemm;
+ copy->type = newpl->type;
+ copy->lemm = newpl->lemm;
+ copy->relatedRule = newpl->relatedRule;
+ copy->next = NULL;
+
+ if (list->tail)
+ {
+ list->tail->next = copy;
+ list->tail = copy;
+ }
+ else
+ list->head = list->tail = copy;
+}
+
+/*
+ * Remove the head of the list. Return pointer to detached head
+ */
+static ParsedLex *
+LPLRemoveHead(ListParsedLex *list)
+{
+ ParsedLex *res = list->head;
+
+ if (list->head)
+ list->head = list->head->next;
+
+ if (list->head == NULL)
+ list->tail = NULL;
+
+ return res;
+}
+
+/*
+ * Remove all ParsedLex from the list
+ */
+static void
+LPLClear(ListParsedLex *list)
+{
+ ParsedLex *tmp,
+ *ptr = list->head;
+
+ while (ptr)
+ {
+ tmp = ptr->next;
+ pfree(ptr);
+ ptr = tmp;
+ }
+
+ list->head = list->tail = NULL;
+}
+
+/*-------------------
+ * LexizeData manipulation functions
+ *-------------------
+ */
+
+/*
+ * Initialize empty LexizeData object
+ */
+static void
+LexizeInit(LexizeData *ld, TSConfigCacheEntry *cfg)
+{
+ ld->cfg = cfg;
+ ld->skipDictionary = InvalidOid;
+ ld->towork.head = ld->towork.tail = NULL;
+ ld->waste.head = ld->waste.tail = NULL;
+ ld->dslist.listLength = 0;
+ ld->dslist.states = NULL;
+ ld->buffer.size = 0;
+ ld->buffer.data = NULL;
+ ld->delayedResults.lexemes = NULL;
+ ld->delayedResults.accepted = NULL;
+}
+
+/*
+ * Add a token to the processing queue
+ */
+static void
+LexizeAddLemm(LexizeData *ld, int type, char *lemm, int lenlemm)
+{
+ ParsedLex *newpl = (ParsedLex *) palloc(sizeof(ParsedLex));
+
+ newpl->type = type;
+ newpl->lemm = lemm;
+ newpl->lenlemm = lenlemm;
+ newpl->relatedRule = NULL;
+ LPLAddTail(&ld->towork, newpl);
+}
+
+/*
+ * Remove head of the processing queue
+ */
+static void
+RemoveHead(LexizeData *ld)
+{
+ LPLAddTail(&ld->waste, LPLRemoveHead(&ld->towork));
+}
+
+/*
+ * Set token corresponded to current lexeme
+ */
+static void
+setCorrLex(LexizeData *ld, ParsedLex **correspondLexem)
+{
+ if (correspondLexem)
+ *correspondLexem = ld->waste.head;
+ else
+ LPLClear(&ld->waste);
+
+ ld->waste.head = ld->waste.tail = NULL;
+}
+
+/*-------------------
+ * DictState manipulation functions
+ *-------------------
+ */
+
+/*
+ * Get a state of dictionary based on its oid
+ */
+static DictState *
+DictStateListGet(DictStateList *list, Oid dictId)
+{
+ int i;
+ DictState *result = NULL;
+
+ for (i = 0; i < list->listLength; i++)
+ if (list->states[i].relatedDictionary == dictId)
+ result = &list->states[i];
+
+ return result;
+}
+
+/*
+ * Remove a state of dictionary based on its oid
+ */
+static void
+DictStateListRemove(DictStateList *list, Oid dictId)
+{
+ int i;
+
+ for (i = 0; i < list->listLength; i++)
+ if (list->states[i].relatedDictionary == dictId)
+ break;
+
+ if (i != list->listLength)
+ {
+ memcpy(list->states + i, list->states + i + 1, sizeof(DictState) * (list->listLength - i - 1));
+ list->listLength--;
+ if (list->listLength == 0)
+ list->states = NULL;
+ else
+ list->states = repalloc(list->states, sizeof(DictState) * list->listLength);
+ }
+}
+
+/*
+ * Insert a state of dictionary with specified oid
+ */
+static DictState *
+DictStateListAdd(DictStateList *list, DictState *state)
+{
+ DictStateListRemove(list, state->relatedDictionary);
+
+ list->listLength++;
+ if (list->states)
+ list->states = repalloc(list->states, sizeof(DictState) * list->listLength);
+ else
+ list->states = palloc0(sizeof(DictState) * list->listLength);
+
+ memcpy(list->states + list->listLength - 1, state, sizeof(DictState));
+
+ return list->states + list->listLength - 1;
+}
+
+/*
+ * Remove states of all dictionaries
+ */
+static void
+DictStateListClear(DictStateList *list)
+{
+ list->listLength = 0;
+ if (list->states)
+ pfree(list->states);
+ list->states = NULL;
+}
+
+/*-------------------
+ * LexemesBuffer manipulation functions
+ *-------------------
+ */
+
+/*
+ * Check if there is a saved lexeme generated by specified TSMapElement
+ */
+static bool
+LexemesBufferContains(LexemesBuffer *buffer, TSMapElement *key, ParsedLex *token)
+{
+ int i;
+
+ for (i = 0; i < buffer->size; i++)
+ if (TSMapElementEquals(buffer->data[i].key, key) && buffer->data[i].token == token)
+ return true;
+
+ return false;
+}
+
+/*
+ * Get a saved lexeme generated by specified TSMapElement
+ */
+static TSLexeme *
+LexemesBufferGet(LexemesBuffer *buffer, TSMapElement *key, ParsedLex *token)
+{
+ int i;
+ TSLexeme *result = NULL;
+
+ for (i = 0; i < buffer->size; i++)
+ if (TSMapElementEquals(buffer->data[i].key, key) && buffer->data[i].token == token)
+ result = buffer->data[i].data;
+
+ return result;
+}
+
+/*
+ * Remove a saved lexeme generated by specified TSMapElement
+ */
+static void
+LexemesBufferRemove(LexemesBuffer *buffer, TSMapElement *key, ParsedLex *token)
+{
+ int i;
+
+ for (i = 0; i < buffer->size; i++)
+ if (TSMapElementEquals(buffer->data[i].key, key) && buffer->data[i].token == token)
+ break;
+
+ if (i != buffer->size)
+ {
+ memcpy(buffer->data + i, buffer->data + i + 1, sizeof(LexemesBufferEntry) * (buffer->size - i - 1));
+ buffer->size--;
+ if (buffer->size == 0)
+ buffer->data = NULL;
+ else
+ buffer->data = repalloc(buffer->data, sizeof(LexemesBufferEntry) * buffer->size);
+ }
+}
+
+/*
+ * Same a lexeme generated by specified TSMapElement
+ */
+static void
+LexemesBufferAdd(LexemesBuffer *buffer, TSMapElement *key, ParsedLex *token, TSLexeme *data)
+{
+ LexemesBufferRemove(buffer, key, token);
+
+ buffer->size++;
+ if (buffer->data)
+ buffer->data = repalloc(buffer->data, sizeof(LexemesBufferEntry) * buffer->size);
+ else
+ buffer->data = palloc0(sizeof(LexemesBufferEntry) * buffer->size);
+
+ buffer->data[buffer->size - 1].token = token;
+ buffer->data[buffer->size - 1].key = key;
+ buffer->data[buffer->size - 1].data = data;
+}
+
+/*
+ * Remove all lexemes saved in a buffer
+ */
+static void
+LexemesBufferClear(LexemesBuffer *buffer)
+{
+ int i;
+ bool *skipEntry = palloc0(sizeof(bool) * buffer->size);
+
+ for (i = 0; i < buffer->size; i++)
+ {
+ if (buffer->data[i].data != NULL && !skipEntry[i])
+ {
+ int j;
+
+ for (j = 0; j < buffer->size; j++)
+ if (buffer->data[i].data == buffer->data[j].data)
+ skipEntry[j] = true;
+
+ pfree(buffer->data[i].data);
+ }
+ }
+
+ buffer->size = 0;
+ if (buffer->data)
+ pfree(buffer->data);
+ buffer->data = NULL;
+}
+
+/*-------------------
+ * TSLexeme util functions
+ *-------------------
+ */
+
+/*
+ * Get size of TSLexeme except empty-lexeme
+ */
+static int
+TSLexemeGetSize(TSLexeme *lex)
+{
+ int result = 0;
+ TSLexeme *ptr = lex;
+
+ while (ptr && ptr->lexeme)
+ {
+ result++;
+ ptr++;
+ }
+
+ return result;
+}
+
+/*
+ * Remove repeated lexemes. Also remove copies of whole nvariant groups.
+ */
+static TSLexeme *
+TSLexemeRemoveDuplications(TSLexeme *lexeme)
+{
+ TSLexeme *res;
+ int curLexIndex;
+ int i;
+ int lexemeSize = TSLexemeGetSize(lexeme);
+ int shouldCopyCount = lexemeSize;
+ bool *shouldCopy;
+
+ if (lexeme == NULL)
+ return NULL;
+
+ shouldCopy = palloc(sizeof(bool) * lexemeSize);
+ memset(shouldCopy, true, sizeof(bool) * lexemeSize);
+
+ for (curLexIndex = 0; curLexIndex < lexemeSize; curLexIndex++)
+ {
+ for (i = curLexIndex + 1; i < lexemeSize; i++)
+ {
+ if (!shouldCopy[i])
+ continue;
+
+ if (strcmp(lexeme[curLexIndex].lexeme, lexeme[i].lexeme) == 0)
+ {
+ if (lexeme[curLexIndex].nvariant == lexeme[i].nvariant)
+ {
+ shouldCopy[i] = false;
+ shouldCopyCount--;
+ continue;
+ }
+ else
+ {
+ /*
+ * Check for same set of lexemes in another nvariant
+ * series
+ */
+ int nvariantCountL = 0;
+ int nvariantCountR = 0;
+ int nvariantOverlap = 1;
+ int j;
+
+ for (j = 0; j < lexemeSize; j++)
+ if (lexeme[curLexIndex].nvariant == lexeme[j].nvariant)
+ nvariantCountL++;
+ for (j = 0; j < lexemeSize; j++)
+ if (lexeme[i].nvariant == lexeme[j].nvariant)
+ nvariantCountR++;
+
+ if (nvariantCountL != nvariantCountR)
+ continue;
+
+ for (j = 1; j < nvariantCountR; j++)
+ {
+ if (strcmp(lexeme[curLexIndex + j].lexeme, lexeme[i + j].lexeme) == 0
+ && lexeme[curLexIndex + j].nvariant == lexeme[i + j].nvariant)
+ nvariantOverlap++;
+ }
+
+ if (nvariantOverlap != nvariantCountR)
+ continue;
+
+ for (j = 0; j < nvariantCountR; j++)
+ shouldCopy[i + j] = false;
+ }
+ }
+ }
+ }
+
+ res = palloc0(sizeof(TSLexeme) * (shouldCopyCount + 1));
+
+ for (i = 0, curLexIndex = 0; curLexIndex < lexemeSize; curLexIndex++)
+ {
+ if (shouldCopy[curLexIndex])
+ {
+ memcpy(res + i, lexeme + curLexIndex, sizeof(TSLexeme));
+ i++;
+ }
+ }
+
+ pfree(shouldCopy);
+ return res;
+}
+
+/*
+ * Combine two lexeme lists with respect to positions
+ */
+static TSLexeme *
+TSLexemeMergePositions(TSLexeme *left, TSLexeme *right)
+{
+ TSLexeme *result = NULL;
+
+ if (left != NULL || right != NULL)
+ {
+ int left_i = 0;
+ int right_i = 0;
+ int left_max_nvariant = 0;
+ int i;
+ int left_size = TSLexemeGetSize(left);
+ int right_size = TSLexemeGetSize(right);
+
+ result = palloc0(sizeof(TSLexeme) * (left_size + right_size + 1));
+
+ for (i = 0; i < left_size; i++)
+ if (left[i].nvariant > left_max_nvariant)
+ left_max_nvariant = left[i].nvariant;
+
+ for (i = 0; i < right_size; i++)
+ right[i].nvariant += left_max_nvariant;
+ if (right && right[0].flags & TSL_ADDPOS)
+ right[0].flags &= ~TSL_ADDPOS;
+
+ i = 0;
+ while (i < left_size + right_size)
+ {
+ if (left_i < left_size)
+ {
+ do
+ {
+ result[i++] = left[left_i++];
+ } while (left && left[left_i].lexeme && (left[left_i].flags & TSL_ADDPOS) == 0);
+ }
+
+ if (right_i < right_size)
+ {
+ do
+ {
+ result[i++] = right[right_i++];
+ } while (right && right[right_i].lexeme && (right[right_i].flags & TSL_ADDPOS) == 0);
+ }
+ }
+ }
+ return result;
+}
+
+/*
+ * Split lexemes generated by regular dictionaries and multi-input dictionaries
+ * and combine them with respect to positions
+ */
+static TSLexeme *
+TSLexemeFilterMulti(TSLexeme *lexemes)
+{
+ TSLexeme *result;
+ TSLexeme *ptr = lexemes;
+ int multi_lexemes = 0;
+
+ while (ptr && ptr->lexeme)
+ {
+ if (ptr->flags & TSL_MULTI)
+ multi_lexemes++;
+ ptr++;
+ }
+
+ if (multi_lexemes > 0)
+ {
+ TSLexeme *lexemes_multi = palloc0(sizeof(TSLexeme) * (multi_lexemes + 1));
+ TSLexeme *lexemes_rest = palloc0(sizeof(TSLexeme) * (TSLexemeGetSize(lexemes) - multi_lexemes + 1));
+ int rest_i = 0;
+ int multi_i = 0;
+
+ ptr = lexemes;
+ while (ptr && ptr->lexeme)
+ {
+ if (ptr->flags & TSL_MULTI)
+ lexemes_multi[multi_i++] = *ptr;
+ else
+ lexemes_rest[rest_i++] = *ptr;
+
+ ptr++;
+ }
+ result = TSLexemeMergePositions(lexemes_rest, lexemes_multi);
+ }
+ else
+ {
+ result = TSLexemeMergePositions(lexemes, NULL);
+ }
+
+ return result;
+}
+
+/*
+ * Mark lexemes as generated by multi-input (thesaurus-like) dictionary
+ */
+static void
+TSLexemeMarkMulti(TSLexeme *lexemes)
+{
+ TSLexeme *ptr = lexemes;
+
+ while (ptr && ptr->lexeme)
+ {
+ ptr->flags |= TSL_MULTI;
+ ptr++;
+ }
+}
+
+/*-------------------
+ * Lexemes set operations
+ *-------------------
+ */
+
+/*
+ * Combine left and right lexeme lists into one.
+ * If append is true, right lexemes added after last left lexeme with TSL_ADDPOS flag
+ */
+static TSLexeme *
+TSLexemeUnionOpt(TSLexeme *left, TSLexeme *right, bool append)
+{
+ TSLexeme *result;
+ int left_size = TSLexemeGetSize(left);
+ int right_size = TSLexemeGetSize(right);
+ int left_max_nvariant = 0;
+ int i;
+
+ if (left == NULL && right == NULL)
+ {
+ result = NULL;
+ }
+ else
+ {
+ result = palloc0(sizeof(TSLexeme) * (left_size + right_size + 1));
+
+ for (i = 0; i < left_size; i++)
+ if (left[i].nvariant > left_max_nvariant)
+ left_max_nvariant = left[i].nvariant;
+
+ if (left_size > 0)
+ memcpy(result, left, sizeof(TSLexeme) * left_size);
+ if (right_size > 0)
+ memcpy(result + left_size, right, sizeof(TSLexeme) * right_size);
+ if (append && left_size > 0 && right_size > 0)
+ result[left_size].flags |= TSL_ADDPOS;
+
+ for (i = left_size; i < left_size + right_size; i++)
+ result[i].nvariant += left_max_nvariant;
+ }
+
+ return result;
+}
+
+/*
+ * Combine left and right lexeme lists into one
+ */
+static TSLexeme *
+TSLexemeUnion(TSLexeme *left, TSLexeme *right)
+{
+ return TSLexemeUnionOpt(left, right, false);
+}
+
+/*
+ * Remove common lexemes and return only which is stored in left list
+ */
+static TSLexeme *
+TSLexemeExcept(TSLexeme *left, TSLexeme *right)
+{
+ TSLexeme *result = NULL;
+ int i,
+ j,
+ k;
+ int left_size = TSLexemeGetSize(left);
+ int right_size = TSLexemeGetSize(right);
+
+ result = palloc0(sizeof(TSLexeme) * (left_size + 1));
+
+ for (k = 0, i = 0; i < left_size; i++)
+ {
+ bool found = false;
+
+ for (j = 0; j < right_size; j++)
+ if (strcmp(left[i].lexeme, right[j].lexeme) == 0)
+ found = true;
+
+ if (!found)
+ result[k++] = left[i];
+ }
+
+ return result;
+}
+
+/*
+ * Keep only common lexemes
+ */
+static TSLexeme *
+TSLexemeIntersect(TSLexeme *left, TSLexeme *right)
+{
+ TSLexeme *result = NULL;
+ int i,
+ j,
+ k;
+ int left_size = TSLexemeGetSize(left);
+ int right_size = TSLexemeGetSize(right);
+
+ result = palloc0(sizeof(TSLexeme) * (left_size + 1));
+
+ for (k = 0, i = 0; i < left_size; i++)
+ {
+ bool found = false;
+
+ for (j = 0; j < right_size; j++)
+ if (strcmp(left[i].lexeme, right[j].lexeme) == 0)
+ found = true;
+
+ if (found)
+ result[k++] = left[i];
+ }
+
+ return result;
+}
+
+/*-------------------
+ * Result storage functions
+ *-------------------
+ */
+
+/*
+ * Add a lexeme to the result storage
+ */
+static void
+ResultStorageAdd(ResultStorage *storage, ParsedLex *token, TSLexeme *lexs)
+{
+ TSLexeme *oldLexs = storage->lexemes;
+
+ storage->lexemes = TSLexemeUnionOpt(storage->lexemes, lexs, true);
+ if (oldLexs)
+ pfree(oldLexs);
+}
+
+/*
+ * Move all saved lexemes to accepted list
+ */
+static void
+ResultStorageMoveToAccepted(ResultStorage *storage)
+{
+ if (storage->accepted)
+ {
+ TSLexeme *prevAccepted = storage->accepted;
+
+ storage->accepted = TSLexemeUnionOpt(storage->accepted, storage->lexemes, true);
+ if (prevAccepted)
+ pfree(prevAccepted);
+ if (storage->lexemes)
+ pfree(storage->lexemes);
+ }
+ else
+ {
+ storage->accepted = storage->lexemes;
+ }
+ storage->lexemes = NULL;
+}
+
+/*
+ * Remove all non-accepted lexemes
+ */
+static void
+ResultStorageClearLexemes(ResultStorage *storage)
+{
+ if (storage->lexemes)
+ pfree(storage->lexemes);
+ storage->lexemes = NULL;
+}
+
+/*
+ * Remove all accepted lexemes
+ */
+static void
+ResultStorageClearAccepted(ResultStorage *storage)
+{
+ if (storage->accepted)
+ pfree(storage->accepted);
+ storage->accepted = NULL;
+}
+
+/*-------------------
+ * Condition and command execution
+ *-------------------
+ */
+
+/*
+ * Process a token by the dictionary
+ */
+static TSLexeme *
+LexizeExecDictionary(LexizeData *ld, ParsedLex *token, TSMapElement *dictionary)
+{
+ TSLexeme *res;
+ TSDictionaryCacheEntry *dict;
+ DictSubState subState;
+ Oid dictId = dictionary->value.objectDictionary;
+
+ if (ld->skipDictionary == dictId)
+ return NULL;
+
+ if (LexemesBufferContains(&ld->buffer, dictionary, token))
+ res = LexemesBufferGet(&ld->buffer, dictionary, token);
+ else
+ {
+ char *curValLemm = token->lemm;
+ int curValLenLemm = token->lenlemm;
+ DictState *state = DictStateListGet(&ld->dslist, dictId);
+
+ dict = lookup_ts_dictionary_cache(dictId);
+
+ if (state)
+ {
+ subState = state->subState;
+ state->processed = true;
+ }
+ else
+ {
+ subState.isend = subState.getnext = false;
+ subState.private_state = NULL;
+ }
+
+ res = (TSLexeme *) DatumGetPointer(FunctionCall4(&(dict->lexize),
+ PointerGetDatum(dict->dictData),
+ PointerGetDatum(curValLemm),
+ Int32GetDatum(curValLenLemm),
+ PointerGetDatum(&subState)
+ ));
+
+ if (subState.getnext)
+ {
+ /*
+ * Dictionary wants next word, so store current context and state
+ * in the DictStateList
+ */
+ if (state == NULL)
+ {
+ state = palloc0(sizeof(DictState));
+ state->processed = true;
+ state->relatedDictionary = dictId;
+ state->intermediateTokens.head = state->intermediateTokens.tail = NULL;
+ state->acceptedTokens.head = state->acceptedTokens.tail = NULL;
+ state->tmpResult = NULL;
+
+ /*
+ * Add state to the list and update pointer in order to work
+ * with copy from the list
+ */
+ state = DictStateListAdd(&ld->dslist, state);
+ }
+
+ state->subState = subState;
+ state->storeToAccepted = res != NULL;
+
+ if (res)
+ {
+ if (state->intermediateTokens.head != NULL)
+ {
+ ParsedLex *ptr = state->intermediateTokens.head;
+
+ while (ptr)
+ {
+ LPLAddTailCopy(&state->acceptedTokens, ptr);
+ ptr = ptr->next;
+ }
+ state->intermediateTokens.head = state->intermediateTokens.tail = NULL;
+ }
+
+ if (state->tmpResult)
+ pfree(state->tmpResult);
+ TSLexemeMarkMulti(res);
+ state->tmpResult = res;
+ res = NULL;
+ }
+ }
+ else if (state != NULL)
+ {
+ if (res)
+ {
+ if (state)
+ TSLexemeMarkMulti(res);
+ DictStateListRemove(&ld->dslist, dictId);
+ }
+ else
+ {
+ /*
+ * Trigger post-processing in order to check tmpResult and
+ * restart processing (see LexizeExec function)
+ */
+ state->processed = false;
+ }
+ }
+ LexemesBufferAdd(&ld->buffer, dictionary, token, res);
+ }
+
+ return res;
+}
+
+/*
+ * Check is dictionary waits for more tokens or not
+ */
+static bool
+LexizeExecDictionaryWaitNext(LexizeData *ld, Oid dictId)
+{
+ DictState *state = DictStateListGet(&ld->dslist, dictId);
+
+ if (state)
+ return state->subState.getnext;
+ else
+ return false;
+}
+
+/*
+ * Check is dictionary result for current token is NULL or not.
+ * It dictionary waits for more lexemes, the result is interpreted as not null.
+ */
+static bool
+LexizeExecIsNull(LexizeData *ld, ParsedLex *token, TSMapElement *config)
+{
+ bool result = false;
+
+ if (config->type == TSMAP_EXPRESSION)
+ {
+ TSMapExpression *expression = config->value.objectExpression;
+
+ result = LexizeExecIsNull(ld, token, expression->left) || LexizeExecIsNull(ld, token, expression->right);
+ }
+ else if (config->type == TSMAP_DICTIONARY)
+ {
+ Oid dictOid = config->value.objectDictionary;
+ TSLexeme *lexemes = LexizeExecDictionary(ld, token, config);
+
+ if (lexemes)
+ result = false;
+ else
+ result = !LexizeExecDictionaryWaitNext(ld, dictOid);
+ }
+ return result;
+}
+
+/*
+ * Execute a MAP operator
+ */
+static TSLexeme *
+TSLexemeMap(LexizeData *ld, ParsedLex *token, TSMapExpression *expression)
+{
+ TSLexeme *left_res;
+ TSLexeme *result = NULL;
+ int left_size;
+ int i;
+
+ left_res = LexizeExecTSElement(ld, token, expression->left);
+ left_size = TSLexemeGetSize(left_res);
+
+ if (left_res == NULL)
+ result = LexizeExecTSElement(ld, token, expression->right);
+ else
+ {
+ for (i = 0; i < left_size; i++)
+ {
+ TSLexeme *tmp_res = NULL;
+ TSLexeme *prev_res;
+ ParsedLex tmp_token;
+
+ tmp_token.lemm = left_res[i].lexeme;
+ tmp_token.lenlemm = strlen(left_res[i].lexeme);
+ tmp_token.type = token->type;
+ tmp_token.next = NULL;
+
+ tmp_res = LexizeExecTSElement(ld, &tmp_token, expression->right);
+ prev_res = result;
+ result = TSLexemeUnion(prev_res, tmp_res);
+ if (prev_res)
+ pfree(prev_res);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Execute a TSMapElement
+ * Common point of all possible types of TSMapElement
+ */
+static TSLexeme *
+LexizeExecTSElement(LexizeData *ld, ParsedLex *token, TSMapElement *config)
+{
+ TSLexeme *result = NULL;
+
+ if (LexemesBufferContains(&ld->buffer, config, token))
+ {
+ if (ld->debugContext)
+ token->relatedRule = config;
+ result = LexemesBufferGet(&ld->buffer, config, token);
+ }
+ else if (config->type == TSMAP_DICTIONARY)
+ {
+ if (ld->debugContext)
+ token->relatedRule = config;
+ result = LexizeExecDictionary(ld, token, config);
+ }
+ else if (config->type == TSMAP_CASE)
+ {
+ TSMapCase *caseObject = config->value.objectCase;
+ bool conditionIsNull = LexizeExecIsNull(ld, token, caseObject->condition);
+
+ if ((!conditionIsNull && caseObject->match) || (conditionIsNull && !caseObject->match))
+ {
+ if (caseObject->command->type == TSMAP_KEEP)
+ result = LexizeExecTSElement(ld, token, caseObject->condition);
+ else
+ result = LexizeExecTSElement(ld, token, caseObject->command);
+ }
+ else if (caseObject->elsebranch)
+ result = LexizeExecTSElement(ld, token, caseObject->elsebranch);
+ }
+ else if (config->type == TSMAP_EXPRESSION)
+ {
+ TSLexeme *resLeft = NULL;
+ TSLexeme *resRight = NULL;
+ TSMapElement *relatedRuleTmp;
+ TSMapExpression *expression = config->value.objectExpression;
+
+ if (ld->debugContext)
+ {
+ relatedRuleTmp = palloc0(sizeof(TSMapElement));
+ relatedRuleTmp->parent = NULL;
+ relatedRuleTmp->type = TSMAP_EXPRESSION;
+ relatedRuleTmp->value.objectExpression = palloc0(sizeof(TSMapExpression));
+ relatedRuleTmp->value.objectExpression->operator = expression->operator;
+ }
- if (list->head)
- list->head = list->head->next;
+ if (expression->operator != TSMAP_OP_MAP)
+ {
+ resLeft = LexizeExecTSElement(ld, token, expression->left);
+ if (ld->debugContext)
+ relatedRuleTmp->value.objectExpression->left = token->relatedRule;
- if (list->head == NULL)
- list->tail = NULL;
+ resRight = LexizeExecTSElement(ld, token, expression->right);
+ if (ld->debugContext)
+ relatedRuleTmp->value.objectExpression->right = token->relatedRule;
+ }
- return res;
-}
+ switch (expression->operator)
+ {
+ case TSMAP_OP_UNION:
+ result = TSLexemeUnion(resLeft, resRight);
+ break;
+ case TSMAP_OP_EXCEPT:
+ result = TSLexemeExcept(resLeft, resRight);
+ break;
+ case TSMAP_OP_INTERSECT:
+ result = TSLexemeIntersect(resLeft, resRight);
+ break;
+ case TSMAP_OP_MAP:
+ result = TSLexemeMap(ld, token, expression);
+ break;
+ default:
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("text search configuration is invalid"),
+ errdetail("Text search configuration contains invalid expression operator.")));
+ break;
+ }
-static void
-LexizeAddLemm(LexizeData *ld, int type, char *lemm, int lenlemm)
-{
- ParsedLex *newpl = (ParsedLex *) palloc(sizeof(ParsedLex));
+ if (ld->debugContext)
+ token->relatedRule = relatedRuleTmp;
+ }
- newpl->type = type;
- newpl->lemm = lemm;
- newpl->lenlemm = lenlemm;
- LPLAddTail(&ld->towork, newpl);
- ld->curSub = ld->towork.tail;
+ if (!LexemesBufferContains(&ld->buffer, config, token))
+ LexemesBufferAdd(&ld->buffer, config, token, result);
+
+ return result;
}
-static void
-RemoveHead(LexizeData *ld)
+/*-------------------
+ * LexizeExec and helpers functions
+ *-------------------
+ */
+
+/*
+ * Processing of EOF-like token.
+ * Return all temporary results if any are saved.
+ */
+static TSLexeme *
+LexizeExecFinishProcessing(LexizeData *ld)
{
- LPLAddTail(&ld->waste, LPLRemoveHead(&ld->towork));
+ int i;
+ TSLexeme *res = NULL;
+
+ for (i = 0; i < ld->dslist.listLength; i++)
+ {
+ TSLexeme *last_res = res;
- ld->posDict = 0;
+ res = TSLexemeUnion(res, ld->dslist.states[i].tmpResult);
+ if (last_res)
+ pfree(last_res);
+ }
+
+ return res;
}
-static void
-setCorrLex(LexizeData *ld, ParsedLex **correspondLexem)
+/*
+ * Get last accepted result of the phrase-dictionary
+ */
+static TSLexeme *
+LexizeExecGetPreviousResults(LexizeData *ld)
{
- if (correspondLexem)
- {
- *correspondLexem = ld->waste.head;
- }
- else
- {
- ParsedLex *tmp,
- *ptr = ld->waste.head;
+ int i;
+ TSLexeme *res = NULL;
- while (ptr)
+ for (i = 0; i < ld->dslist.listLength; i++)
+ {
+ if (!ld->dslist.states[i].processed)
{
- tmp = ptr->next;
- pfree(ptr);
- ptr = tmp;
+ TSLexeme *last_res = res;
+
+ res = TSLexemeUnion(res, ld->dslist.states[i].tmpResult);
+ if (last_res)
+ pfree(last_res);
}
}
- ld->waste.head = ld->waste.tail = NULL;
+
+ return res;
}
+/*
+ * Remove all dictionary states which wasn't used for current token
+ */
static void
-moveToWaste(LexizeData *ld, ParsedLex *stop)
+LexizeExecClearDictStates(LexizeData *ld)
{
- bool go = true;
+ int i;
- while (ld->towork.head && go)
+ for (i = 0; i < ld->dslist.listLength; i++)
{
- if (ld->towork.head == stop)
+ if (!ld->dslist.states[i].processed)
{
- ld->curSub = stop->next;
- go = false;
+ DictStateListRemove(&ld->dslist, ld->dslist.states[i].relatedDictionary);
+ i = 0;
}
- RemoveHead(ld);
}
}
-static void
-setNewTmpRes(LexizeData *ld, ParsedLex *lex, TSLexeme *res)
+/*
+ * Check if there are any dictionaries that didn't processed current token
+ */
+static bool
+LexizeExecNotProcessedDictStates(LexizeData *ld)
{
- if (ld->tmpRes)
- {
- TSLexeme *ptr;
+ int i;
- for (ptr = ld->tmpRes; ptr->lexeme; ptr++)
- pfree(ptr->lexeme);
- pfree(ld->tmpRes);
- }
- ld->tmpRes = res;
- ld->lastRes = lex;
+ for (i = 0; i < ld->dslist.listLength; i++)
+ if (!ld->dslist.states[i].processed)
+ return true;
+
+ return false;
}
+/*
+ * Do a lexize processing for a towork queue in LexizeData
+ */
static TSLexeme *
LexizeExec(LexizeData *ld, ParsedLex **correspondLexem)
{
+ ParsedLex *token;
+ TSMapElement *config;
+ TSLexeme *res = NULL;
+ TSLexeme *prevIterationResult = NULL;
+ bool removeHead = false;
+ bool resetSkipDictionary = false;
+ bool accepted = false;
int i;
- ListDictionary *map;
- TSDictionaryCacheEntry *dict;
- TSLexeme *res;
- if (ld->curDictId == InvalidOid)
+ for (i = 0; i < ld->dslist.listLength; i++)
+ ld->dslist.states[i].processed = false;
+ if (ld->skipDictionary != InvalidOid)
+ resetSkipDictionary = true;
+
+ token = ld->towork.head;
+ if (token == NULL)
{
- /*
- * usual mode: dictionary wants only one word, but we should keep in
- * mind that we should go through all stack
- */
+ setCorrLex(ld, correspondLexem);
+ return NULL;
+ }
- while (ld->towork.head)
+ if (token->type >= ld->cfg->lenmap)
+ {
+ removeHead = true;
+ }
+ else
+ {
+ config = ld->cfg->map[token->type];
+ if (config != NULL)
+ {
+ res = LexizeExecTSElement(ld, token, config);
+ prevIterationResult = LexizeExecGetPreviousResults(ld);
+ removeHead = prevIterationResult == NULL;
+ }
+ else
{
- ParsedLex *curVal = ld->towork.head;
- char *curValLemm = curVal->lemm;
- int curValLenLemm = curVal->lenlemm;
+ removeHead = true;
+ if (token->type == 0) /* Processing EOF-like token */
+ {
+ res = LexizeExecFinishProcessing(ld);
+ prevIterationResult = NULL;
+ }
+ }
- map = ld->cfg->map + curVal->type;
+ if (LexizeExecNotProcessedDictStates(ld) && (token->type == 0 || config != NULL)) /* Rollback processing */
+ {
+ int i;
+ ListParsedLex *intermediateTokens = NULL;
+ ListParsedLex *acceptedTokens = NULL;
- if (curVal->type == 0 || curVal->type >= ld->cfg->lenmap || map->len == 0)
+ for (i = 0; i < ld->dslist.listLength; i++)
{
- /* skip this type of lexeme */
- RemoveHead(ld);
- continue;
+ if (!ld->dslist.states[i].processed)
+ {
+ intermediateTokens = &ld->dslist.states[i].intermediateTokens;
+ acceptedTokens = &ld->dslist.states[i].acceptedTokens;
+ if (prevIterationResult == NULL)
+ ld->skipDictionary = ld->dslist.states[i].relatedDictionary;
+ }
}
- for (i = ld->posDict; i < map->len; i++)
+ if (intermediateTokens && intermediateTokens->head)
{
- dict = lookup_ts_dictionary_cache(map->dictIds[i]);
-
- ld->dictState.isend = ld->dictState.getnext = false;
- ld->dictState.private_state = NULL;
- res = (TSLexeme *) DatumGetPointer(FunctionCall4(
- &(dict->lexize),
- PointerGetDatum(dict->dictData),
- PointerGetDatum(curValLemm),
- Int32GetDatum(curValLenLemm),
- PointerGetDatum(&ld->dictState)
- ));
-
- if (ld->dictState.getnext)
+ ParsedLex *head = ld->towork.head;
+
+ ld->towork.head = intermediateTokens->head;
+ intermediateTokens->tail->next = head;
+ head->next = NULL;
+ ld->towork.tail = head;
+ removeHead = false;
+ LPLClear(&ld->waste);
+ if (acceptedTokens && acceptedTokens->head)
{
- /*
- * dictionary wants next word, so setup and store current
- * position and go to multiword mode
- */
-
- ld->curDictId = DatumGetObjectId(map->dictIds[i]);
- ld->posDict = i + 1;
- ld->curSub = curVal->next;
- if (res)
- setNewTmpRes(ld, curVal, res);
- return LexizeExec(ld, correspondLexem);
+ ld->waste.head = acceptedTokens->head;
+ ld->waste.tail = acceptedTokens->tail;
}
+ }
+ ResultStorageClearLexemes(&ld->delayedResults);
+ if (config != NULL)
+ res = NULL;
+ }
- if (!res) /* dictionary doesn't know this lexeme */
- continue;
+ if (config != NULL)
+ LexizeExecClearDictStates(ld);
+ else if (token->type == 0)
+ DictStateListClear(&ld->dslist);
+ }
- if (res->flags & TSL_FILTER)
- {
- curValLemm = res->lexeme;
- curValLenLemm = strlen(res->lexeme);
- continue;
- }
+ if (prevIterationResult)
+ res = prevIterationResult;
+ else
+ {
+ int i;
- RemoveHead(ld);
- setCorrLex(ld, correspondLexem);
- return res;
+ for (i = 0; i < ld->dslist.listLength; i++)
+ {
+ if (ld->dslist.states[i].storeToAccepted)
+ {
+ LPLAddTailCopy(&ld->dslist.states[i].acceptedTokens, token);
+ accepted = true;
+ ld->dslist.states[i].storeToAccepted = false;
+ }
+ else
+ {
+ LPLAddTailCopy(&ld->dslist.states[i].intermediateTokens, token);
}
-
- RemoveHead(ld);
}
}
- else
- { /* curDictId is valid */
- dict = lookup_ts_dictionary_cache(ld->curDictId);
+ if (removeHead)
+ RemoveHead(ld);
+
+ if (ld->dslist.listLength > 0)
+ {
/*
- * Dictionary ld->curDictId asks us about following words
+ * There is at least one thesaurus dictionary in the middle of
+ * processing. Delay return of the result to avoid wrong lexemes in
+ * case of thesaurus phrase rejection.
*/
+ ResultStorageAdd(&ld->delayedResults, token, res);
+ if (accepted)
+ ResultStorageMoveToAccepted(&ld->delayedResults);
- while (ld->curSub)
+ /*
+ * Current value of res should not be cleared, because it is stored in
+ * LexemesBuffer
+ */
+ res = NULL;
+ }
+ else
+ {
+ if (ld->towork.head == NULL)
{
- ParsedLex *curVal = ld->curSub;
-
- map = ld->cfg->map + curVal->type;
-
- if (curVal->type != 0)
- {
- bool dictExists = false;
-
- if (curVal->type >= ld->cfg->lenmap || map->len == 0)
- {
- /* skip this type of lexeme */
- ld->curSub = curVal->next;
- continue;
- }
+ TSLexeme *oldAccepted = ld->delayedResults.accepted;
- /*
- * We should be sure that current type of lexeme is recognized
- * by our dictionary: we just check is it exist in list of
- * dictionaries ?
- */
- for (i = 0; i < map->len && !dictExists; i++)
- if (ld->curDictId == DatumGetObjectId(map->dictIds[i]))
- dictExists = true;
-
- if (!dictExists)
- {
- /*
- * Dictionary can't work with current tpe of lexeme,
- * return to basic mode and redo all stored lexemes
- */
- ld->curDictId = InvalidOid;
- return LexizeExec(ld, correspondLexem);
- }
- }
+ ld->delayedResults.accepted = TSLexemeUnionOpt(ld->delayedResults.accepted, ld->delayedResults.lexemes, true);
+ if (oldAccepted)
+ pfree(oldAccepted);
+ }
- ld->dictState.isend = (curVal->type == 0) ? true : false;
- ld->dictState.getnext = false;
+ /*
+ * Add accepted delayed results to the output of the parsing. All
+ * lexemes returned during thesaurus pharse processing should be
+ * returned simultaneously, since all phrase tokens are processed as
+ * one.
+ */
+ if (ld->delayedResults.accepted != NULL)
+ {
+ /*
+ * Previous value of res should not be cleared, because it is
+ * stored in LexemesBuffer
+ */
+ res = TSLexemeUnionOpt(ld->delayedResults.accepted, res, prevIterationResult == NULL);
- res = (TSLexeme *) DatumGetPointer(FunctionCall4(
- &(dict->lexize),
- PointerGetDatum(dict->dictData),
- PointerGetDatum(curVal->lemm),
- Int32GetDatum(curVal->lenlemm),
- PointerGetDatum(&ld->dictState)
- ));
+ ResultStorageClearLexemes(&ld->delayedResults);
+ ResultStorageClearAccepted(&ld->delayedResults);
+ }
+ setCorrLex(ld, correspondLexem);
+ }
- if (ld->dictState.getnext)
- {
- /* Dictionary wants one more */
- ld->curSub = curVal->next;
- if (res)
- setNewTmpRes(ld, curVal, res);
- continue;
- }
+ if (resetSkipDictionary)
+ ld->skipDictionary = InvalidOid;
- if (res || ld->tmpRes)
- {
- /*
- * Dictionary normalizes lexemes, so we remove from stack all
- * used lexemes, return to basic mode and redo end of stack
- * (if it exists)
- */
- if (res)
- {
- moveToWaste(ld, ld->curSub);
- }
- else
- {
- res = ld->tmpRes;
- moveToWaste(ld, ld->lastRes);
- }
+ res = TSLexemeFilterMulti(res);
+ if (res)
+ res = TSLexemeRemoveDuplications(res);
- /* reset to initial state */
- ld->curDictId = InvalidOid;
- ld->posDict = 0;
- ld->lastRes = NULL;
- ld->tmpRes = NULL;
- setCorrLex(ld, correspondLexem);
- return res;
- }
+ /*
+ * Copy result since it may be stored in LexemesBuffere and removed at the
+ * next step.
+ */
+ if (res)
+ {
+ TSLexeme *oldRes = res;
+ int resSize = TSLexemeGetSize(res);
- /*
- * Dict don't want next lexem and didn't recognize anything, redo
- * from ld->towork.head
- */
- ld->curDictId = InvalidOid;
- return LexizeExec(ld, correspondLexem);
- }
+ res = palloc0(sizeof(TSLexeme) * (resSize + 1));
+ memcpy(res, oldRes, sizeof(TSLexeme) * resSize);
}
- setCorrLex(ld, correspondLexem);
- return NULL;
+ LexemesBufferClear(&ld->buffer);
+ return res;
}
+/*-------------------
+ * ts_parse API functions
+ *-------------------
+ */
+
/*
* Parse string and lexize words.
*
@@ -357,7 +1432,7 @@ LexizeExec(LexizeData *ld, ParsedLex **correspondLexem)
void
parsetext(Oid cfgId, ParsedText *prs, char *buf, int buflen)
{
- int type,
+ int type = -1,
lenlemm;
char *lemm = NULL;
LexizeData ldata;
@@ -375,36 +1450,42 @@ parsetext(Oid cfgId, ParsedText *prs, char *buf, int buflen)
LexizeInit(&ldata, cfg);
+ type = 1;
do
{
- type = DatumGetInt32(FunctionCall3(&(prsobj->prstoken),
- PointerGetDatum(prsdata),
- PointerGetDatum(&lemm),
- PointerGetDatum(&lenlemm)));
-
- if (type > 0 && lenlemm >= MAXSTRLEN)
+ if (type > 0)
{
+ type = DatumGetInt32(FunctionCall3(&(prsobj->prstoken),
+ PointerGetDatum(prsdata),
+ PointerGetDatum(&lemm),
+ PointerGetDatum(&lenlemm)));
+
+ if (type > 0 && lenlemm >= MAXSTRLEN)
+ {
#ifdef IGNORE_LONGLEXEME
- ereport(NOTICE,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("word is too long to be indexed"),
- errdetail("Words longer than %d characters are ignored.",
- MAXSTRLEN)));
- continue;
+ ereport(NOTICE,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("word is too long to be indexed"),
+ errdetail("Words longer than %d characters are ignored.",
+ MAXSTRLEN)));
+ continue;
#else
- ereport(ERROR,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("word is too long to be indexed"),
- errdetail("Words longer than %d characters are ignored.",
- MAXSTRLEN)));
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("word is too long to be indexed"),
+ errdetail("Words longer than %d characters are ignored.",
+ MAXSTRLEN)));
#endif
- }
+ }
- LexizeAddLemm(&ldata, type, lemm, lenlemm);
+ LexizeAddLemm(&ldata, type, lemm, lenlemm);
+ }
while ((norms = LexizeExec(&ldata, NULL)) != NULL)
{
- TSLexeme *ptr = norms;
+ TSLexeme *ptr;
+
+ ptr = norms;
prs->pos++; /* set pos */
@@ -429,14 +1510,245 @@ parsetext(Oid cfgId, ParsedText *prs, char *buf, int buflen)
}
pfree(norms);
}
- } while (type > 0);
+ } while (type > 0 || ldata.towork.head);
FunctionCall1(&(prsobj->prsend), PointerGetDatum(prsdata));
}
+/*-------------------
+ * ts_debug and helper functions
+ *-------------------
+ */
+
+/*
+ * Free memory occupied by temporary TSMapElement
+ */
+
+static void
+ts_debug_free_rule(TSMapElement *element)
+{
+ if (element->type == TSMAP_EXPRESSION)
+ {
+ ts_debug_free_rule(element->value.objectExpression->left);
+ ts_debug_free_rule(element->value.objectExpression->right);
+ pfree(element->value.objectExpression);
+ pfree(element);
+ }
+}
+
+/*
+ * Initialize SRF context and text parser for ts_debug execution.
+ */
+static void
+ts_debug_init(Oid cfgId, text *inputText, FunctionCallInfo fcinfo)
+{
+ TupleDesc tupdesc;
+ char *buf;
+ int buflen;
+ FuncCallContext *funcctx;
+ MemoryContext oldcontext;
+ TSDebugContext *context;
+
+ funcctx = SRF_FIRSTCALL_INIT();
+ oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+ buf = text_to_cstring(inputText);
+ buflen = strlen(buf);
+
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("function returning record called in context "
+ "that cannot accept type record")));
+
+ funcctx->user_fctx = palloc0(sizeof(TSDebugContext));
+ funcctx->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+
+ context = funcctx->user_fctx;
+ context->cfg = lookup_ts_config_cache(cfgId);
+ context->prsobj = lookup_ts_parser_cache(context->cfg->prsId);
+
+ context->tokenTypes = (LexDescr *) DatumGetPointer(OidFunctionCall1(context->prsobj->lextypeOid,
+ (Datum) 0));
+
+ context->prsdata = (void *) DatumGetPointer(FunctionCall2(&context->prsobj->prsstart,
+ PointerGetDatum(buf),
+ Int32GetDatum(buflen)));
+ LexizeInit(&context->ldata, context->cfg);
+ context->ldata.debugContext = true;
+ context->tokentype = 1;
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Get one token from input text and add it to processing queue.
+ */
+static void
+ts_debug_get_token(FuncCallContext *funcctx)
+{
+ TSDebugContext *context;
+ MemoryContext oldcontext;
+ int lenlemm;
+ char *lemm = NULL;
+
+ context = funcctx->user_fctx;
+
+ oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+ context->tokentype = DatumGetInt32(FunctionCall3(&(context->prsobj->prstoken),
+ PointerGetDatum(context->prsdata),
+ PointerGetDatum(&lemm),
+ PointerGetDatum(&lenlemm)));
+
+ if (context->tokentype > 0 && lenlemm >= MAXSTRLEN)
+ {
+#ifdef IGNORE_LONGLEXEME
+ ereport(NOTICE,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("word is too long to be indexed"),
+ errdetail("Words longer than %d characters are ignored.",
+ MAXSTRLEN)));
+#else
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("word is too long to be indexed"),
+ errdetail("Words longer than %d characters are ignored.",
+ MAXSTRLEN)));
+#endif
+ }
+
+ LexizeAddLemm(&context->ldata, context->tokentype, lemm, lenlemm);
+ MemoryContextSwitchTo(oldcontext);
+}
+
/*
+ * Parse text and print debug information, such as token type, dictionary map
+ * configuration, selected command and lexemes for each token.
+ * Arguments: regconfiguration(Oid) cfgId, text *inputText
+ */
+Datum
+ts_debug(PG_FUNCTION_ARGS)
+{
+ FuncCallContext *funcctx;
+ TSDebugContext *context;
+ MemoryContext oldcontext;
+
+ if (SRF_IS_FIRSTCALL())
+ {
+ Oid cfgId = PG_GETARG_OID(0);
+ text *inputText = PG_GETARG_TEXT_P(1);
+
+ ts_debug_init(cfgId, inputText, fcinfo);
+ }
+
+ funcctx = SRF_PERCALL_SETUP();
+ context = funcctx->user_fctx;
+
+ while (context->tokentype > 0 && context->leftTokens == NULL)
+ {
+ oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+ ts_debug_get_token(funcctx);
+
+ context->savedLexemes = LexizeExec(&context->ldata, &(context->leftTokens));
+
+ MemoryContextSwitchTo(oldcontext);
+ }
+
+ while (context->leftTokens == NULL && context->ldata.towork.head != NULL)
+ context->savedLexemes = LexizeExec(&context->ldata, &(context->leftTokens));
+
+ if (context->leftTokens && context->leftTokens && context->leftTokens->type > 0)
+ {
+ HeapTuple tuple;
+ Datum result;
+ char **values;
+ ParsedLex *lex = context->leftTokens;
+ StringInfo str = NULL;
+ TSLexeme *ptr;
+
+ values = palloc0(sizeof(char *) * 7);
+ str = makeStringInfo();
+ initStringInfo(str);
+
+ values[0] = context->tokenTypes[lex->type - 1].alias;
+ values[1] = context->tokenTypes[lex->type - 1].descr;
+
+ values[2] = palloc0(sizeof(char) * (lex->lenlemm + 1));
+ memcpy(values[2], lex->lemm, sizeof(char) * lex->lenlemm);
+
+ initStringInfo(str);
+ appendStringInfoChar(str, '{');
+ if (lex->type < context->ldata.cfg->lenmap && context->ldata.cfg->map[lex->type])
+ {
+ Oid *dictionaries = TSMapGetDictionaries(context->ldata.cfg->map[lex->type]);
+ Oid *currentDictionary = NULL;
+ for (currentDictionary = dictionaries; *currentDictionary != InvalidOid; currentDictionary++)
+ {
+ if (currentDictionary != dictionaries)
+ appendStringInfoChar(str, ',');
+
+ TSMapPrintDictName(*currentDictionary, str);
+ }
+ }
+ appendStringInfoChar(str, '}');
+ values[3] = str->data;
+
+ if (lex->type < context->ldata.cfg->lenmap && context->ldata.cfg->map[lex->type])
+ {
+ initStringInfo(str);
+ TSMapPrintElement(context->ldata.cfg->map[lex->type], str);
+ values[4] = str->data;
+
+ initStringInfo(str);
+ if (lex->relatedRule)
+ {
+ TSMapPrintElement(lex->relatedRule, str);
+ values[5] = str->data;
+ str = makeStringInfo();
+ initStringInfo(str);
+ ts_debug_free_rule(lex->relatedRule);
+ lex->relatedRule = NULL;
+ }
+ }
+
+ ptr = context->savedLexemes;
+ if (context->savedLexemes)
+ appendStringInfoChar(str, '{');
+
+ while (ptr && ptr->lexeme)
+ {
+ if (ptr != context->savedLexemes)
+ appendStringInfoString(str, ", ");
+ appendStringInfoString(str, ptr->lexeme);
+ ptr++;
+ }
+ if (context->savedLexemes)
+ appendStringInfoChar(str, '}');
+ if (context->savedLexemes)
+ values[6] = str->data;
+ else
+ values[6] = NULL;
+
+ tuple = BuildTupleFromCStrings(funcctx->attinmeta, values);
+ result = HeapTupleGetDatum(tuple);
+
+ context->leftTokens = lex->next;
+ pfree(lex);
+ if (context->leftTokens == NULL && context->savedLexemes)
+ pfree(context->savedLexemes);
+
+ SRF_RETURN_NEXT(funcctx, result);
+ }
+
+ FunctionCall1(&(context->prsobj->prsend), PointerGetDatum(context->prsdata));
+ SRF_RETURN_DONE(funcctx);
+}
+
+/*-------------------
* Headline framework
+ *-------------------
*/
+
static void
hladdword(HeadlineParsedText *prs, char *buf, int buflen, int type)
{
@@ -532,12 +1844,12 @@ addHLParsedLex(HeadlineParsedText *prs, TSQuery query, ParsedLex *lexs, TSLexeme
void
hlparsetext(Oid cfgId, HeadlineParsedText *prs, TSQuery query, char *buf, int buflen)
{
- int type,
+ int type = -1,
lenlemm;
char *lemm = NULL;
LexizeData ldata;
TSLexeme *norms;
- ParsedLex *lexs;
+ ParsedLex *lexs = NULL;
TSConfigCacheEntry *cfg;
TSParserCacheEntry *prsobj;
void *prsdata;
@@ -551,32 +1863,36 @@ hlparsetext(Oid cfgId, HeadlineParsedText *prs, TSQuery query, char *buf, int bu
LexizeInit(&ldata, cfg);
+ type = 1;
do
{
- type = DatumGetInt32(FunctionCall3(&(prsobj->prstoken),
- PointerGetDatum(prsdata),
- PointerGetDatum(&lemm),
- PointerGetDatum(&lenlemm)));
-
- if (type > 0 && lenlemm >= MAXSTRLEN)
+ if (type > 0)
{
+ type = DatumGetInt32(FunctionCall3(&(prsobj->prstoken),
+ PointerGetDatum(prsdata),
+ PointerGetDatum(&lemm),
+ PointerGetDatum(&lenlemm)));
+
+ if (type > 0 && lenlemm >= MAXSTRLEN)
+ {
#ifdef IGNORE_LONGLEXEME
- ereport(NOTICE,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("word is too long to be indexed"),
- errdetail("Words longer than %d characters are ignored.",
- MAXSTRLEN)));
- continue;
+ ereport(NOTICE,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("word is too long to be indexed"),
+ errdetail("Words longer than %d characters are ignored.",
+ MAXSTRLEN)));
+ continue;
#else
- ereport(ERROR,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("word is too long to be indexed"),
- errdetail("Words longer than %d characters are ignored.",
- MAXSTRLEN)));
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("word is too long to be indexed"),
+ errdetail("Words longer than %d characters are ignored.",
+ MAXSTRLEN)));
#endif
- }
+ }
- LexizeAddLemm(&ldata, type, lemm, lenlemm);
+ LexizeAddLemm(&ldata, type, lemm, lenlemm);
+ }
do
{
@@ -587,9 +1903,10 @@ hlparsetext(Oid cfgId, HeadlineParsedText *prs, TSQuery query, char *buf, int bu
}
else
addHLParsedLex(prs, query, lexs, NULL);
+ lexs = NULL;
} while (norms);
- } while (type > 0);
+ } while (type > 0 || ldata.towork.head);
FunctionCall1(&(prsobj->prsend), PointerGetDatum(prsdata));
}
@@ -642,14 +1959,14 @@ generateHeadline(HeadlineParsedText *prs)
}
else if (!wrd->skip)
{
- if (wrd->selected)
+ if (wrd->selected && (wrd == prs->words || !(wrd - 1)->selected))
{
memcpy(ptr, prs->startsel, prs->startsellen);
ptr += prs->startsellen;
}
memcpy(ptr, wrd->word, wrd->len);
ptr += wrd->len;
- if (wrd->selected)
+ if (wrd->selected && ((wrd + 1 - prs->words) == prs->curwords || !(wrd + 1)->selected))
{
memcpy(ptr, prs->stopsel, prs->stopsellen);
ptr += prs->stopsellen;
diff --git a/src/backend/tsearch/ts_utils.c b/src/backend/tsearch/ts_utils.c
index 56d4cf03e5..068a684cae 100644
--- a/src/backend/tsearch/ts_utils.c
+++ b/src/backend/tsearch/ts_utils.c
@@ -20,7 +20,6 @@
#include "tsearch/ts_locale.h"
#include "tsearch/ts_utils.h"
-
/*
* Given the base name and extension of a tsearch config file, return
* its full path name. The base name is assumed to be user-supplied,
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 888edbb325..0628b9c2a9 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -828,11 +828,10 @@ static const struct cachedesc cacheinfo[] = {
},
{TSConfigMapRelationId, /* TSCONFIGMAP */
TSConfigMapIndexId,
- 3,
+ 2,
{
Anum_pg_ts_config_map_mapcfg,
Anum_pg_ts_config_map_maptokentype,
- Anum_pg_ts_config_map_mapseqno,
0
},
2
diff --git a/src/backend/utils/cache/ts_cache.c b/src/backend/utils/cache/ts_cache.c
index 29cf93a4de..9adfddc213 100644
--- a/src/backend/utils/cache/ts_cache.c
+++ b/src/backend/utils/cache/ts_cache.c
@@ -39,6 +39,7 @@
#include "catalog/pg_ts_template.h"
#include "commands/defrem.h"
#include "tsearch/ts_cache.h"
+#include "tsearch/ts_configmap.h"
#include "utils/builtins.h"
#include "utils/catcache.h"
#include "utils/fmgroids.h"
@@ -51,13 +52,12 @@
/*
- * MAXTOKENTYPE/MAXDICTSPERTT are arbitrary limits on the workspace size
+ * MAXTOKENTYPE is arbitrary limits on the workspace size
* used in lookup_ts_config_cache(). We could avoid hardwiring a limit
* by making the workspace dynamically enlargeable, but it seems unlikely
* to be worth the trouble.
*/
-#define MAXTOKENTYPE 256
-#define MAXDICTSPERTT 100
+#define MAXTOKENTYPE 256
static HTAB *TSParserCacheHash = NULL;
@@ -415,11 +415,10 @@ lookup_ts_config_cache(Oid cfgId)
ScanKeyData mapskey;
SysScanDesc mapscan;
HeapTuple maptup;
- ListDictionary maplists[MAXTOKENTYPE + 1];
- Oid mapdicts[MAXDICTSPERTT];
+ TSMapElement *mapconfigs[MAXTOKENTYPE + 1];
int maxtokentype;
- int ndicts;
int i;
+ TSMapElement *tmpConfig;
tp = SearchSysCache1(TSCONFIGOID, ObjectIdGetDatum(cfgId));
if (!HeapTupleIsValid(tp))
@@ -450,8 +449,10 @@ lookup_ts_config_cache(Oid cfgId)
if (entry->map)
{
for (i = 0; i < entry->lenmap; i++)
- if (entry->map[i].dictIds)
- pfree(entry->map[i].dictIds);
+ {
+ if (entry->map[i])
+ TSMapElementFree(entry->map[i]);
+ }
pfree(entry->map);
}
}
@@ -465,13 +466,11 @@ lookup_ts_config_cache(Oid cfgId)
/*
* Scan pg_ts_config_map to gather dictionary list for each token type
*
- * Because the index is on (mapcfg, maptokentype, mapseqno), we will
- * see the entries in maptokentype order, and in mapseqno order for
- * each token type, even though we didn't explicitly ask for that.
+ * Because the index is on (mapcfg, maptokentype), we will see the
+ * entries in maptokentype order even though we didn't explicitly ask
+ * for that.
*/
- MemSet(maplists, 0, sizeof(maplists));
maxtokentype = 0;
- ndicts = 0;
ScanKeyInit(&mapskey,
Anum_pg_ts_config_map_mapcfg,
@@ -483,6 +482,7 @@ lookup_ts_config_cache(Oid cfgId)
mapscan = systable_beginscan_ordered(maprel, mapidx,
NULL, 1, &mapskey);
+ memset(mapconfigs, 0, sizeof(mapconfigs));
while ((maptup = systable_getnext_ordered(mapscan, ForwardScanDirection)) != NULL)
{
Form_pg_ts_config_map cfgmap = (Form_pg_ts_config_map) GETSTRUCT(maptup);
@@ -492,51 +492,27 @@ lookup_ts_config_cache(Oid cfgId)
elog(ERROR, "maptokentype value %d is out of range", toktype);
if (toktype < maxtokentype)
elog(ERROR, "maptokentype entries are out of order");
- if (toktype > maxtokentype)
- {
- /* starting a new token type, but first save the prior data */
- if (ndicts > 0)
- {
- maplists[maxtokentype].len = ndicts;
- maplists[maxtokentype].dictIds = (Oid *)
- MemoryContextAlloc(CacheMemoryContext,
- sizeof(Oid) * ndicts);
- memcpy(maplists[maxtokentype].dictIds, mapdicts,
- sizeof(Oid) * ndicts);
- }
- maxtokentype = toktype;
- mapdicts[0] = cfgmap->mapdict;
- ndicts = 1;
- }
- else
- {
- /* continuing data for current token type */
- if (ndicts >= MAXDICTSPERTT)
- elog(ERROR, "too many pg_ts_config_map entries for one token type");
- mapdicts[ndicts++] = cfgmap->mapdict;
- }
+
+ maxtokentype = toktype;
+ tmpConfig = JsonbToTSMap(DatumGetJsonbP(&cfgmap->mapdicts));
+ mapconfigs[maxtokentype] = TSMapMoveToMemoryContext(tmpConfig, CacheMemoryContext);
+ TSMapElementFree(tmpConfig);
+ tmpConfig = NULL;
}
systable_endscan_ordered(mapscan);
index_close(mapidx, AccessShareLock);
heap_close(maprel, AccessShareLock);
- if (ndicts > 0)
+ if (maxtokentype > 0)
{
- /* save the last token type's dictionaries */
- maplists[maxtokentype].len = ndicts;
- maplists[maxtokentype].dictIds = (Oid *)
- MemoryContextAlloc(CacheMemoryContext,
- sizeof(Oid) * ndicts);
- memcpy(maplists[maxtokentype].dictIds, mapdicts,
- sizeof(Oid) * ndicts);
- /* and save the overall map */
+ /* save the overall map */
entry->lenmap = maxtokentype + 1;
- entry->map = (ListDictionary *)
+ entry->map = (TSMapElement * *)
MemoryContextAlloc(CacheMemoryContext,
- sizeof(ListDictionary) * entry->lenmap);
- memcpy(entry->map, maplists,
- sizeof(ListDictionary) * entry->lenmap);
+ sizeof(TSMapElement *) * entry->lenmap);
+ memcpy(entry->map, mapconfigs,
+ sizeof(TSMapElement *) * entry->lenmap);
}
entry->isvalid = true;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e6701aaa78..7e8dd00158 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -14208,10 +14208,11 @@ dumpTSConfig(Archive *fout, TSConfigInfo *cfginfo)
"SELECT\n"
" ( SELECT alias FROM pg_catalog.ts_token_type('%u'::pg_catalog.oid) AS t\n"
" WHERE t.tokid = m.maptokentype ) AS tokenname,\n"
- " m.mapdict::pg_catalog.regdictionary AS dictname\n"
+ " dictionary_mapping_to_text(m.mapcfg, m.maptokentype) AS dictname\n"
"FROM pg_catalog.pg_ts_config_map AS m\n"
"WHERE m.mapcfg = '%u'\n"
- "ORDER BY m.mapcfg, m.maptokentype, m.mapseqno",
+ "GROUP BY m.mapcfg, m.maptokentype\n"
+ "ORDER BY m.mapcfg, m.maptokentype",
cfginfo->cfgparser, cfginfo->dobj.catId.oid);
res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
@@ -14225,20 +14226,14 @@ dumpTSConfig(Archive *fout, TSConfigInfo *cfginfo)
char *tokenname = PQgetvalue(res, i, i_tokenname);
char *dictname = PQgetvalue(res, i, i_dictname);
- if (i == 0 ||
- strcmp(tokenname, PQgetvalue(res, i - 1, i_tokenname)) != 0)
- {
- /* starting a new token type, so start a new command */
- if (i > 0)
- appendPQExpBufferStr(q, ";\n");
- appendPQExpBuffer(q, "\nALTER TEXT SEARCH CONFIGURATION %s\n",
- fmtId(cfginfo->dobj.name));
- /* tokenname needs quoting, dictname does NOT */
- appendPQExpBuffer(q, " ADD MAPPING FOR %s WITH %s",
- fmtId(tokenname), dictname);
- }
- else
- appendPQExpBuffer(q, ", %s", dictname);
+ /* starting a new token type, so start a new command */
+ if (i > 0)
+ appendPQExpBufferStr(q, ";\n");
+ appendPQExpBuffer(q, "\nALTER TEXT SEARCH CONFIGURATION %s\n",
+ fmtId(cfginfo->dobj.name));
+ /* tokenname needs quoting, dictname does NOT */
+ appendPQExpBuffer(q, " ADD MAPPING FOR %s WITH %s",
+ fmtId(tokenname), dictname);
}
if (ntups > 0)
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 3fc69c46c0..279fc2d1f2 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -4605,13 +4605,7 @@ describeOneTSConfig(const char *oid, const char *nspname, const char *cfgname,
" ( SELECT t.alias FROM\n"
" pg_catalog.ts_token_type(c.cfgparser) AS t\n"
" WHERE t.tokid = m.maptokentype ) AS \"%s\",\n"
- " pg_catalog.btrim(\n"
- " ARRAY( SELECT mm.mapdict::pg_catalog.regdictionary\n"
- " FROM pg_catalog.pg_ts_config_map AS mm\n"
- " WHERE mm.mapcfg = m.mapcfg AND mm.maptokentype = m.maptokentype\n"
- " ORDER BY mapcfg, maptokentype, mapseqno\n"
- " ) :: pg_catalog.text,\n"
- " '{}') AS \"%s\"\n"
+ " dictionary_mapping_to_text(m.mapcfg, m.maptokentype) AS \"%s\"\n"
"FROM pg_catalog.pg_ts_config AS c, pg_catalog.pg_ts_config_map AS m\n"
"WHERE c.oid = '%s' AND m.mapcfg = c.oid\n"
"GROUP BY m.mapcfg, m.maptokentype, c.cfgparser\n"
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index b13cf62bec..47f7f669ba 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 201711301
+#define CATALOG_VERSION_NO 201712191
#endif
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index ef8493674c..db487cfe57 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -260,7 +260,7 @@ DECLARE_UNIQUE_INDEX(pg_ts_config_cfgname_index, 3608, on pg_ts_config using btr
DECLARE_UNIQUE_INDEX(pg_ts_config_oid_index, 3712, on pg_ts_config using btree(oid oid_ops));
#define TSConfigOidIndexId 3712
-DECLARE_UNIQUE_INDEX(pg_ts_config_map_index, 3609, on pg_ts_config_map using btree(mapcfg oid_ops, maptokentype int4_ops, mapseqno int4_ops));
+DECLARE_UNIQUE_INDEX(pg_ts_config_map_index, 3609, on pg_ts_config_map using btree(mapcfg oid_ops, maptokentype int4_ops));
#define TSConfigMapIndexId 3609
DECLARE_UNIQUE_INDEX(pg_ts_dict_dictname_index, 3604, on pg_ts_dict using btree(dictname name_ops, dictnamespace oid_ops));
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index c969375981..2640ab8b1c 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -4925,6 +4925,12 @@ DESCR("transform jsonb to tsvector");
DATA(insert OID = 4212 ( to_tsvector PGNSP PGUID 12 100 0 0 0 f f f f t f i s 2 0 3614 "3734 114" _null_ _null_ _null_ _null_ _null_ json_to_tsvector_byid _null_ _null_ _null_ ));
DESCR("transform json to tsvector");
+DATA(insert OID = 8891 ( dictionary_mapping_to_text PGNSP PGUID 12 100 0 0 0 f f f f t f s s 2 0 25 "26 23" _null_ _null_ _null_ _null_ _null_ dictionary_mapping_to_text _null_ _null_ _null_ ));
+DESCR("returns text representation of dictionary configuration map");
+
+DATA(insert OID = 8892 ( ts_debug PGNSP PGUID 12 100 1 0 0 f f f f t t s s 2 0 2249 "3734 25" "{3734,25,25,25,25,3770,25,25,1009}" "{i,i,o,o,o,o,o,o,o}" "{cfgId,inputText,alias,description,token,dictionaries,configuration,command,lexemes}" _null_ _null_ ts_debug _null_ _null_ _null_));
+DESCR("debug function for text search configuration");
+
DATA(insert OID = 3752 ( tsvector_update_trigger PGNSP PGUID 12 1 0 0 0 f f f f f f v s 0 0 2279 "" _null_ _null_ _null_ _null_ _null_ tsvector_update_trigger_byid _null_ _null_ _null_ ));
DESCR("trigger for automatic update of tsvector column");
DATA(insert OID = 3753 ( tsvector_update_trigger_column PGNSP PGUID 12 1 0 0 0 f f f f f f v s 0 0 2279 "" _null_ _null_ _null_ _null_ _null_ tsvector_update_trigger_bycolumn _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_ts_config_map.h b/src/include/catalog/pg_ts_config_map.h
index 3df05195be..f6790d2cd2 100644
--- a/src/include/catalog/pg_ts_config_map.h
+++ b/src/include/catalog/pg_ts_config_map.h
@@ -22,6 +22,7 @@
#define PG_TS_CONFIG_MAP_H
#include "catalog/genbki.h"
+#include "utils/jsonb.h"
/* ----------------
* pg_ts_config_map definition. cpp turns this into
@@ -30,49 +31,98 @@
*/
#define TSConfigMapRelationId 3603
+/* Create a typedef in order to use same type name in
+ * generated DB initialization script and C source code
+ */
+typedef Jsonb jsonb;
+
CATALOG(pg_ts_config_map,3603) BKI_WITHOUT_OIDS
{
Oid mapcfg; /* OID of configuration owning this entry */
int32 maptokentype; /* token type from parser */
- int32 mapseqno; /* order in which to consult dictionaries */
- Oid mapdict; /* dictionary to consult */
+ jsonb mapdicts; /* dictionary map Jsonb representation */
} FormData_pg_ts_config_map;
typedef FormData_pg_ts_config_map *Form_pg_ts_config_map;
+typedef struct TSMapElement
+{
+ int type;
+ union
+ {
+ struct TSMapExpression *objectExpression;
+ struct TSMapCase *objectCase;
+ Oid objectDictionary;
+ void *object;
+ } value;
+ struct TSMapElement *parent;
+} TSMapElement;
+
+typedef struct TSMapExpression
+{
+ int operator;
+ TSMapElement *left;
+ TSMapElement *right;
+} TSMapExpression;
+
+typedef struct TSMapCase
+{
+ TSMapElement *condition;
+ TSMapElement *command;
+ TSMapElement *elsebranch;
+ bool match; /* If false, NO MATCH is used */
+} TSMapCase;
+
/* ----------------
- * compiler constants for pg_ts_config_map
+ * Compiler constants for pg_ts_config_map
* ----------------
*/
-#define Natts_pg_ts_config_map 4
+#define Natts_pg_ts_config_map 3
#define Anum_pg_ts_config_map_mapcfg 1
#define Anum_pg_ts_config_map_maptokentype 2
-#define Anum_pg_ts_config_map_mapseqno 3
-#define Anum_pg_ts_config_map_mapdict 4
+#define Anum_pg_ts_config_map_mapdicts 3
+
+/* ----------------
+ * Dictionary map operators
+ * ----------------
+ */
+#define TSMAP_OP_MAP 1
+#define TSMAP_OP_UNION 2
+#define TSMAP_OP_EXCEPT 3
+#define TSMAP_OP_INTERSECT 4
+
+/* ----------------
+ * TSMapElement object types
+ * ----------------
+ */
+#define TSMAP_EXPRESSION 1
+#define TSMAP_CASE 2
+#define TSMAP_DICTIONARY 3
+#define TSMAP_KEEP 4
/* ----------------
* initial contents of pg_ts_config_map
* ----------------
*/
-DATA(insert ( 3748 1 1 3765 ));
-DATA(insert ( 3748 2 1 3765 ));
-DATA(insert ( 3748 3 1 3765 ));
-DATA(insert ( 3748 4 1 3765 ));
-DATA(insert ( 3748 5 1 3765 ));
-DATA(insert ( 3748 6 1 3765 ));
-DATA(insert ( 3748 7 1 3765 ));
-DATA(insert ( 3748 8 1 3765 ));
-DATA(insert ( 3748 9 1 3765 ));
-DATA(insert ( 3748 10 1 3765 ));
-DATA(insert ( 3748 11 1 3765 ));
-DATA(insert ( 3748 15 1 3765 ));
-DATA(insert ( 3748 16 1 3765 ));
-DATA(insert ( 3748 17 1 3765 ));
-DATA(insert ( 3748 18 1 3765 ));
-DATA(insert ( 3748 19 1 3765 ));
-DATA(insert ( 3748 20 1 3765 ));
-DATA(insert ( 3748 21 1 3765 ));
-DATA(insert ( 3748 22 1 3765 ));
+DATA(insert ( 3748 1 "[3765]" ));
+DATA(insert ( 3748 2 "[3765]" ));
+DATA(insert ( 3748 3 "[3765]" ));
+DATA(insert ( 3748 4 "[3765]" ));
+DATA(insert ( 3748 5 "[3765]" ));
+DATA(insert ( 3748 6 "[3765]" ));
+DATA(insert ( 3748 7 "[3765]" ));
+DATA(insert ( 3748 8 "[3765]" ));
+DATA(insert ( 3748 9 "[3765]" ));
+DATA(insert ( 3748 10 "[3765]" ));
+DATA(insert ( 3748 11 "[3765]" ));
+DATA(insert ( 3748 15 "[3765]" ));
+DATA(insert ( 3748 16 "[3765]" ));
+DATA(insert ( 3748 17 "[3765]" ));
+DATA(insert ( 3748 18 "[3765]" ));
+DATA(insert ( 3748 19 "[3765]" ));
+DATA(insert ( 3748 20 "[3765]" ));
+DATA(insert ( 3748 21 "[3765]" ));
+DATA(insert ( 3748 22 "[3765]" ));
#endif /* PG_TS_CONFIG_MAP_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index c5b5115f5b..63dd5dcb3a 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -381,6 +381,9 @@ typedef enum NodeTag
T_CreateEnumStmt,
T_CreateRangeStmt,
T_AlterEnumStmt,
+ T_DictMapExprElem,
+ T_DictMapElem,
+ T_DictMapCase,
T_AlterTSDictionaryStmt,
T_AlterTSConfigurationStmt,
T_CreateFdwStmt,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2eaa6b2774..f4593fbdf2 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3392,6 +3392,39 @@ typedef enum AlterTSConfigType
ALTER_TSCONFIG_DROP_MAPPING
} AlterTSConfigType;
+typedef enum DictMapElemType
+{
+ DICT_MAP_CASE,
+ DICT_MAP_EXPRESSION,
+ DICT_MAP_KEEP,
+ DICT_MAP_DICTIONARY,
+ DICT_MAP_DICTIONARY_LIST
+} DictMapElemType;
+
+typedef struct DictMapElem
+{
+ NodeTag type;
+ int8 kind; /* See DictMapElemType */
+ void *data; /* Type should be detected by kind value */
+} DictMapElem;
+
+typedef struct DictMapExprElem
+{
+ NodeTag type;
+ DictMapElem *left;
+ DictMapElem *right;
+ int8 oper;
+} DictMapExprElem;
+
+typedef struct DictMapCase
+{
+ NodeTag type;
+ struct DictMapElem *condition;
+ struct DictMapElem *command;
+ struct DictMapElem *elsebranch;
+ bool match;
+} DictMapCase;
+
typedef struct AlterTSConfigurationStmt
{
NodeTag type;
@@ -3404,6 +3437,7 @@ typedef struct AlterTSConfigurationStmt
*/
List *tokentype; /* list of Value strings */
List *dicts; /* list of list of Value strings */
+ DictMapElem *dict_map;
bool override; /* if true - remove old variant */
bool replace; /* if true - replace dictionary by another */
bool missing_ok; /* for DROP - skip error if missing? */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index a932400058..b409f0c02b 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -219,6 +219,7 @@ PG_KEYWORD("is", IS, TYPE_FUNC_NAME_KEYWORD)
PG_KEYWORD("isnull", ISNULL, TYPE_FUNC_NAME_KEYWORD)
PG_KEYWORD("isolation", ISOLATION, UNRESERVED_KEYWORD)
PG_KEYWORD("join", JOIN, TYPE_FUNC_NAME_KEYWORD)
+PG_KEYWORD("keep", KEEP, RESERVED_KEYWORD)
PG_KEYWORD("key", KEY, UNRESERVED_KEYWORD)
PG_KEYWORD("label", LABEL, UNRESERVED_KEYWORD)
PG_KEYWORD("language", LANGUAGE, UNRESERVED_KEYWORD)
@@ -241,6 +242,7 @@ PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
+PG_KEYWORD("map", MAP, UNRESERVED_KEYWORD)
PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
diff --git a/src/include/tsearch/ts_cache.h b/src/include/tsearch/ts_cache.h
index abff0fdfcc..fe1e7bd204 100644
--- a/src/include/tsearch/ts_cache.h
+++ b/src/include/tsearch/ts_cache.h
@@ -14,6 +14,7 @@
#define TS_CACHE_H
#include "utils/guc.h"
+#include "catalog/pg_ts_config_map.h"
/*
@@ -66,6 +67,7 @@ typedef struct
{
int len;
Oid *dictIds;
+ int32 *dictOptions;
} ListDictionary;
typedef struct
@@ -77,7 +79,7 @@ typedef struct
Oid prsId;
int lenmap;
- ListDictionary *map;
+ TSMapElement **map;
} TSConfigCacheEntry;
diff --git a/src/include/tsearch/ts_public.h b/src/include/tsearch/ts_public.h
index 94ba7fcb20..7230968bfa 100644
--- a/src/include/tsearch/ts_public.h
+++ b/src/include/tsearch/ts_public.h
@@ -115,6 +115,7 @@ typedef struct
#define TSL_ADDPOS 0x01
#define TSL_PREFIX 0x02
#define TSL_FILTER 0x04
+#define TSL_MULTI 0x08
/*
* Struct for supporting complex dictionaries like thesaurus.
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 234b44fdf2..40029f396a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -1081,14 +1081,6 @@ WHERE mapcfg != 0 AND
------+--------
(0 rows)
-SELECT ctid, mapdict
-FROM pg_catalog.pg_ts_config_map fk
-WHERE mapdict != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_ts_dict pk WHERE pk.oid = fk.mapdict);
- ctid | mapdict
-------+---------
-(0 rows)
-
SELECT ctid, dictnamespace
FROM pg_catalog.pg_ts_dict fk
WHERE dictnamespace != 0 AND
diff --git a/src/test/regress/expected/tsdicts.out b/src/test/regress/expected/tsdicts.out
index 0744ef803b..f7d966f48f 100644
--- a/src/test/regress/expected/tsdicts.out
+++ b/src/test/regress/expected/tsdicts.out
@@ -420,6 +420,105 @@ SELECT ts_lexize('thesaurus', 'one');
{1}
(1 row)
+-- test dictionary pipeline in configuration
+CREATE TEXT SEARCH CONFIGURATION english_union(
+ COPY=english
+);
+ALTER TEXT SEARCH CONFIGURATION english_union ALTER MAPPING FOR
+ asciiword
+ WITH english_stem UNION simple;
+SELECT to_tsvector('english_union', 'book');
+ to_tsvector
+-------------
+ 'book':1
+(1 row)
+
+SELECT to_tsvector('english_union', 'books');
+ to_tsvector
+--------------------
+ 'book':1 'books':1
+(1 row)
+
+SELECT to_tsvector('english_union', 'booking');
+ to_tsvector
+----------------------
+ 'book':1 'booking':1
+(1 row)
+
+CREATE TEXT SEARCH CONFIGURATION english_intersect(
+ COPY=english
+);
+ALTER TEXT SEARCH CONFIGURATION english_intersect ALTER MAPPING FOR
+ asciiword
+ WITH english_stem INTERSECT simple;
+SELECT to_tsvector('english_intersect', 'book');
+ to_tsvector
+-------------
+ 'book':1
+(1 row)
+
+SELECT to_tsvector('english_intersect', 'books');
+ to_tsvector
+-------------
+
+(1 row)
+
+SELECT to_tsvector('english_intersect', 'booking');
+ to_tsvector
+-------------
+
+(1 row)
+
+CREATE TEXT SEARCH CONFIGURATION english_except(
+ COPY=english
+);
+ALTER TEXT SEARCH CONFIGURATION english_except ALTER MAPPING FOR
+ asciiword
+ WITH simple EXCEPT english_stem;
+SELECT to_tsvector('english_except', 'book');
+ to_tsvector
+-------------
+
+(1 row)
+
+SELECT to_tsvector('english_except', 'books');
+ to_tsvector
+-------------
+ 'books':1
+(1 row)
+
+SELECT to_tsvector('english_except', 'booking');
+ to_tsvector
+-------------
+ 'booking':1
+(1 row)
+
+CREATE TEXT SEARCH CONFIGURATION english_branches(
+ COPY=english
+);
+ALTER TEXT SEARCH CONFIGURATION english_branches ALTER MAPPING FOR
+ asciiword
+ WITH CASE ispell WHEN MATCH THEN KEEP
+ ELSE english_stem
+ END;
+SELECT to_tsvector('english_branches', 'book');
+ to_tsvector
+-------------
+ 'book':1
+(1 row)
+
+SELECT to_tsvector('english_branches', 'books');
+ to_tsvector
+-------------
+ 'book':1
+(1 row)
+
+SELECT to_tsvector('english_branches', 'booking');
+ to_tsvector
+----------------------
+ 'book':1 'booking':1
+(1 row)
+
-- Test ispell dictionary in configuration
CREATE TEXT SEARCH CONFIGURATION ispell_tst (
COPY=english
@@ -580,3 +679,55 @@ SELECT to_tsvector('thesaurus_tst', 'Booking tickets is looking like a booking a
'card':3,10 'invit':2,9 'like':6 'look':5 'order':1,8
(1 row)
+ALTER TEXT SEARCH CONFIGURATION thesaurus_tst ALTER MAPPING FOR asciiword WITH english_stem UNION simple;
+SELECT to_tsvector('thesaurus_tst', 'The Mysterious Rings of Supernova 1987A');
+ to_tsvector
+--------------------------------------------------------------------------------------
+ '1987a':6 'mysteri':2 'mysterious':2 'of':4 'ring':3 'rings':3 'supernova':5 'the':1
+(1 row)
+
+ALTER TEXT SEARCH CONFIGURATION thesaurus_tst ALTER MAPPING FOR asciiword WITH CASE
+ thesaurus WHEN MATCH THEN KEEP ELSE english_stem
+END;
+SELECT to_tsvector('thesaurus_tst', 'The Mysterious Rings of Supernova 1987A');
+ to_tsvector
+---------------------------------------
+ '1987a':6 'mysteri':2 'ring':3 'sn':5
+(1 row)
+
+ALTER TEXT SEARCH CONFIGURATION thesaurus_tst ALTER MAPPING FOR asciiword WITH thesaurus UNION english_stem;
+SELECT to_tsvector('thesaurus_tst', 'The Mysterious Rings of Supernova 1987A');
+ to_tsvector
+-----------------------------------------------------
+ '1987a':6 'mysteri':2 'ring':3 'sn':5 'supernova':5
+(1 row)
+
+ALTER TEXT SEARCH CONFIGURATION thesaurus_tst ALTER MAPPING FOR asciiword WITH simple UNION thesaurus;
+SELECT to_tsvector('thesaurus_tst', 'The Mysterious Rings of Supernova 1987A');
+ to_tsvector
+------------------------------------------------------------------------
+ '1987a':6 'mysterious':2 'of':4 'rings':3 'sn':5 'supernova':5 'the':1
+(1 row)
+
+ALTER TEXT SEARCH CONFIGURATION thesaurus_tst ALTER MAPPING FOR asciiword WITH CASE
+ thesaurus WHEN MATCH THEN simple UNION thesaurus
+ ELSE simple
+END;
+SELECT to_tsvector('thesaurus_tst', 'one two');
+ to_tsvector
+------------------------
+ '12':1 'one':1 'two':2
+(1 row)
+
+SELECT to_tsvector('thesaurus_tst', 'one two three');
+ to_tsvector
+-----------------------------------
+ '123':1 'one':1 'three':3 'two':2
+(1 row)
+
+SELECT to_tsvector('thesaurus_tst', 'one two four');
+ to_tsvector
+---------------------------------
+ '12':1 'four':3 'one':1 'two':2
+(1 row)
+
diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out
index d63fb12f1d..c0e9fc5c8f 100644
--- a/src/test/regress/expected/tsearch.out
+++ b/src/test/regress/expected/tsearch.out
@@ -36,11 +36,11 @@ WHERE cfgnamespace = 0 OR cfgowner = 0 OR cfgparser = 0;
-----+---------
(0 rows)
-SELECT mapcfg, maptokentype, mapseqno
+SELECT mapcfg, maptokentype
FROM pg_ts_config_map
-WHERE mapcfg = 0 OR mapdict = 0;
- mapcfg | maptokentype | mapseqno
---------+--------------+----------
+WHERE mapcfg = 0;
+ mapcfg | maptokentype
+--------+--------------
(0 rows)
-- Look for pg_ts_config_map entries that aren't one of parser's token types
@@ -51,8 +51,8 @@ RIGHT JOIN pg_ts_config_map AS m
ON (tt.cfgid=m.mapcfg AND tt.tokid=m.maptokentype)
WHERE
tt.cfgid IS NULL OR tt.tokid IS NULL;
- cfgid | tokid | mapcfg | maptokentype | mapseqno | mapdict
--------+-------+--------+--------------+----------+---------
+ cfgid | tokid | mapcfg | maptokentype | mapdicts
+-------+-------+--------+--------------+----------
(0 rows)
-- test basic text search behavior without indexes, then with
@@ -567,55 +567,55 @@ SELECT length(to_tsvector('english', '345 qwe@efd.r '' http://www.com/ http://ae
-- ts_debug
SELECT * from ts_debug('english', '<myns:foo-bar_baz.blurfl>abc&nm1;def©ghiõjkl</myns:foo-bar_baz.blurfl>');
- alias | description | token | dictionaries | dictionary | lexemes
------------+-----------------+----------------------------+----------------+--------------+---------
- tag | XML tag | <myns:foo-bar_baz.blurfl> | {} | |
- asciiword | Word, all ASCII | abc | {english_stem} | english_stem | {abc}
- entity | XML entity | &nm1; | {} | |
- asciiword | Word, all ASCII | def | {english_stem} | english_stem | {def}
- entity | XML entity | © | {} | |
- asciiword | Word, all ASCII | ghi | {english_stem} | english_stem | {ghi}
- entity | XML entity | õ | {} | |
- asciiword | Word, all ASCII | jkl | {english_stem} | english_stem | {jkl}
- tag | XML tag | </myns:foo-bar_baz.blurfl> | {} | |
+ alias | description | token | dictionaries | configuration | command | lexemes
+-----------+-----------------+----------------------------+----------------+---------------+--------------+---------
+ tag | XML tag | <myns:foo-bar_baz.blurfl> | {} | | |
+ asciiword | Word, all ASCII | abc | {english_stem} | english_stem | english_stem | {abc}
+ entity | XML entity | &nm1; | {} | | |
+ asciiword | Word, all ASCII | def | {english_stem} | english_stem | english_stem | {def}
+ entity | XML entity | © | {} | | |
+ asciiword | Word, all ASCII | ghi | {english_stem} | english_stem | english_stem | {ghi}
+ entity | XML entity | õ | {} | | |
+ asciiword | Word, all ASCII | jkl | {english_stem} | english_stem | english_stem | {jkl}
+ tag | XML tag | </myns:foo-bar_baz.blurfl> | {} | | |
(9 rows)
-- check parsing of URLs
SELECT * from ts_debug('english', 'http://www.harewoodsolutions.co.uk/press.aspx</span>');
- alias | description | token | dictionaries | dictionary | lexemes
-----------+---------------+----------------------------------------+--------------+------------+------------------------------------------
- protocol | Protocol head | http:// | {} | |
- url | URL | www.harewoodsolutions.co.uk/press.aspx | {simple} | simple | {www.harewoodsolutions.co.uk/press.aspx}
- host | Host | www.harewoodsolutions.co.uk | {simple} | simple | {www.harewoodsolutions.co.uk}
- url_path | URL path | /press.aspx | {simple} | simple | {/press.aspx}
- tag | XML tag | </span> | {} | |
+ alias | description | token | dictionaries | configuration | command | lexemes
+----------+---------------+----------------------------------------+--------------+---------------+---------+------------------------------------------
+ protocol | Protocol head | http:// | {} | | |
+ url | URL | www.harewoodsolutions.co.uk/press.aspx | {simple} | simple | simple | {www.harewoodsolutions.co.uk/press.aspx}
+ host | Host | www.harewoodsolutions.co.uk | {simple} | simple | simple | {www.harewoodsolutions.co.uk}
+ url_path | URL path | /press.aspx | {simple} | simple | simple | {/press.aspx}
+ tag | XML tag | </span> | {} | | |
(5 rows)
SELECT * from ts_debug('english', 'http://aew.wer0c.ewr/id?ad=qwe&dw<span>');
- alias | description | token | dictionaries | dictionary | lexemes
-----------+---------------+----------------------------+--------------+------------+------------------------------
- protocol | Protocol head | http:// | {} | |
- url | URL | aew.wer0c.ewr/id?ad=qwe&dw | {simple} | simple | {aew.wer0c.ewr/id?ad=qwe&dw}
- host | Host | aew.wer0c.ewr | {simple} | simple | {aew.wer0c.ewr}
- url_path | URL path | /id?ad=qwe&dw | {simple} | simple | {/id?ad=qwe&dw}
- tag | XML tag | <span> | {} | |
+ alias | description | token | dictionaries | configuration | command | lexemes
+----------+---------------+----------------------------+--------------+---------------+---------+------------------------------
+ protocol | Protocol head | http:// | {} | | |
+ url | URL | aew.wer0c.ewr/id?ad=qwe&dw | {simple} | simple | simple | {aew.wer0c.ewr/id?ad=qwe&dw}
+ host | Host | aew.wer0c.ewr | {simple} | simple | simple | {aew.wer0c.ewr}
+ url_path | URL path | /id?ad=qwe&dw | {simple} | simple | simple | {/id?ad=qwe&dw}
+ tag | XML tag | <span> | {} | | |
(5 rows)
SELECT * from ts_debug('english', 'http://5aew.werc.ewr:8100/?');
- alias | description | token | dictionaries | dictionary | lexemes
-----------+---------------+----------------------+--------------+------------+------------------------
- protocol | Protocol head | http:// | {} | |
- url | URL | 5aew.werc.ewr:8100/? | {simple} | simple | {5aew.werc.ewr:8100/?}
- host | Host | 5aew.werc.ewr:8100 | {simple} | simple | {5aew.werc.ewr:8100}
- url_path | URL path | /? | {simple} | simple | {/?}
+ alias | description | token | dictionaries | configuration | command | lexemes
+----------+---------------+----------------------+--------------+---------------+---------+------------------------
+ protocol | Protocol head | http:// | {} | | |
+ url | URL | 5aew.werc.ewr:8100/? | {simple} | simple | simple | {5aew.werc.ewr:8100/?}
+ host | Host | 5aew.werc.ewr:8100 | {simple} | simple | simple | {5aew.werc.ewr:8100}
+ url_path | URL path | /? | {simple} | simple | simple | {/?}
(4 rows)
SELECT * from ts_debug('english', '5aew.werc.ewr:8100/?xx');
- alias | description | token | dictionaries | dictionary | lexemes
-----------+-------------+------------------------+--------------+------------+--------------------------
- url | URL | 5aew.werc.ewr:8100/?xx | {simple} | simple | {5aew.werc.ewr:8100/?xx}
- host | Host | 5aew.werc.ewr:8100 | {simple} | simple | {5aew.werc.ewr:8100}
- url_path | URL path | /?xx | {simple} | simple | {/?xx}
+ alias | description | token | dictionaries | configuration | command | lexemes
+----------+-------------+------------------------+--------------+---------------+---------+--------------------------
+ url | URL | 5aew.werc.ewr:8100/?xx | {simple} | simple | simple | {5aew.werc.ewr:8100/?xx}
+ host | Host | 5aew.werc.ewr:8100 | {simple} | simple | simple | {5aew.werc.ewr:8100}
+ url_path | URL path | /?xx | {simple} | simple | simple | {/?xx}
(3 rows)
SELECT token, alias,
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index fcf9990f6b..320e220d06 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -541,10 +541,6 @@ SELECT ctid, mapcfg
FROM pg_catalog.pg_ts_config_map fk
WHERE mapcfg != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_ts_config pk WHERE pk.oid = fk.mapcfg);
-SELECT ctid, mapdict
-FROM pg_catalog.pg_ts_config_map fk
-WHERE mapdict != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_ts_dict pk WHERE pk.oid = fk.mapdict);
SELECT ctid, dictnamespace
FROM pg_catalog.pg_ts_dict fk
WHERE dictnamespace != 0 AND
diff --git a/src/test/regress/sql/tsdicts.sql b/src/test/regress/sql/tsdicts.sql
index a5a569e1ad..3f7df283cb 100644
--- a/src/test/regress/sql/tsdicts.sql
+++ b/src/test/regress/sql/tsdicts.sql
@@ -117,6 +117,57 @@ CREATE TEXT SEARCH DICTIONARY thesaurus (
SELECT ts_lexize('thesaurus', 'one');
+-- test dictionary pipeline in configuration
+CREATE TEXT SEARCH CONFIGURATION english_union(
+ COPY=english
+);
+
+ALTER TEXT SEARCH CONFIGURATION english_union ALTER MAPPING FOR
+ asciiword
+ WITH english_stem UNION simple;
+
+SELECT to_tsvector('english_union', 'book');
+SELECT to_tsvector('english_union', 'books');
+SELECT to_tsvector('english_union', 'booking');
+
+CREATE TEXT SEARCH CONFIGURATION english_intersect(
+ COPY=english
+);
+
+ALTER TEXT SEARCH CONFIGURATION english_intersect ALTER MAPPING FOR
+ asciiword
+ WITH english_stem INTERSECT simple;
+
+SELECT to_tsvector('english_intersect', 'book');
+SELECT to_tsvector('english_intersect', 'books');
+SELECT to_tsvector('english_intersect', 'booking');
+
+CREATE TEXT SEARCH CONFIGURATION english_except(
+ COPY=english
+);
+
+ALTER TEXT SEARCH CONFIGURATION english_except ALTER MAPPING FOR
+ asciiword
+ WITH simple EXCEPT english_stem;
+
+SELECT to_tsvector('english_except', 'book');
+SELECT to_tsvector('english_except', 'books');
+SELECT to_tsvector('english_except', 'booking');
+
+CREATE TEXT SEARCH CONFIGURATION english_branches(
+ COPY=english
+);
+
+ALTER TEXT SEARCH CONFIGURATION english_branches ALTER MAPPING FOR
+ asciiword
+ WITH CASE ispell WHEN MATCH THEN KEEP
+ ELSE english_stem
+ END;
+
+SELECT to_tsvector('english_branches', 'book');
+SELECT to_tsvector('english_branches', 'books');
+SELECT to_tsvector('english_branches', 'booking');
+
-- Test ispell dictionary in configuration
CREATE TEXT SEARCH CONFIGURATION ispell_tst (
COPY=english
@@ -188,3 +239,25 @@ ALTER TEXT SEARCH CONFIGURATION thesaurus_tst ALTER MAPPING FOR
SELECT to_tsvector('thesaurus_tst', 'one postgres one two one two three one');
SELECT to_tsvector('thesaurus_tst', 'Supernovae star is very new star and usually called supernovae (abbreviation SN)');
SELECT to_tsvector('thesaurus_tst', 'Booking tickets is looking like a booking a tickets');
+
+ALTER TEXT SEARCH CONFIGURATION thesaurus_tst ALTER MAPPING FOR asciiword WITH english_stem UNION simple;
+SELECT to_tsvector('thesaurus_tst', 'The Mysterious Rings of Supernova 1987A');
+
+ALTER TEXT SEARCH CONFIGURATION thesaurus_tst ALTER MAPPING FOR asciiword WITH CASE
+ thesaurus WHEN MATCH THEN KEEP ELSE english_stem
+END;
+SELECT to_tsvector('thesaurus_tst', 'The Mysterious Rings of Supernova 1987A');
+
+ALTER TEXT SEARCH CONFIGURATION thesaurus_tst ALTER MAPPING FOR asciiword WITH thesaurus UNION english_stem;
+SELECT to_tsvector('thesaurus_tst', 'The Mysterious Rings of Supernova 1987A');
+
+ALTER TEXT SEARCH CONFIGURATION thesaurus_tst ALTER MAPPING FOR asciiword WITH simple UNION thesaurus;
+SELECT to_tsvector('thesaurus_tst', 'The Mysterious Rings of Supernova 1987A');
+
+ALTER TEXT SEARCH CONFIGURATION thesaurus_tst ALTER MAPPING FOR asciiword WITH CASE
+ thesaurus WHEN MATCH THEN simple UNION thesaurus
+ ELSE simple
+END;
+SELECT to_tsvector('thesaurus_tst', 'one two');
+SELECT to_tsvector('thesaurus_tst', 'one two three');
+SELECT to_tsvector('thesaurus_tst', 'one two four');
diff --git a/src/test/regress/sql/tsearch.sql b/src/test/regress/sql/tsearch.sql
index 1c8520b3e9..6f8af63c1a 100644
--- a/src/test/regress/sql/tsearch.sql
+++ b/src/test/regress/sql/tsearch.sql
@@ -26,9 +26,9 @@ SELECT oid, cfgname
FROM pg_ts_config
WHERE cfgnamespace = 0 OR cfgowner = 0 OR cfgparser = 0;
-SELECT mapcfg, maptokentype, mapseqno
+SELECT mapcfg, maptokentype
FROM pg_ts_config_map
-WHERE mapcfg = 0 OR mapdict = 0;
+WHERE mapcfg = 0;
-- Look for pg_ts_config_map entries that aren't one of parser's token types
SELECT * FROM
## Short description of changes in the patch.
* `doc/src/sgml/ref/alter_tsconfig.sgml` - Documentation for ALTER TEXT SEARCH CONFIGURATION is changed with respect to syntax changes.
* `doc/src/sgml/textsearch.sgml` - Documentation for FTS. Changes in chapters describing dictionaries and configurations.
* `src/backend/catalog/system_views.sql` - Remove of `ts_debug()` difinition due to moving it to `ts_parse.c`
* `src/backend/commands/tsearchcmds.c` - Changes related to execution of ALTER TEXT SEARCH CONFIGURATION SQL command.
* `src/backend/nodes/copyfuncs.c` - Copy functions for new parsing nodes.
* `src/backend/nodes/equalfuncs.c` - Equal functions for new parsing nodes.
* `src/backend/parser/gram.y` - Changes in ALTER TEXT SEARCH CONFIGURATION command syntax.
* `src/backend/tsearch/Makefile` - Add a new source code file `ts_configmap.c`.
* `src/backend/tsearch/ts_configmap.c` - Functions to manipulate FTS Configuration objects and convert it to Jsonb and vice versa.
* `src/backend/tsearch/ts_parse.c` - Processing of the text with respect to new way to configure FTS.
* `src/backend/utils/cache/syscache.c` - Change a cache entry for text search configuration map.
* `src/backend/utils/cache/ts_cache.c` - Change the way text search configuration is stored in cache.
* `src/bin/pg_dump/pg_dump.c` - Change the way how FTS configuration is dumped.
* `src/bin/psql/describe.c` - Change the way psql prints FTS configuration.
* `src/include/catalog/catversion.h` - Update catalog version.
* `src/include/catalog/indexing.h` - Update index on `pg_ts_config_map` with respect to changes in schema.
* `src/include/catalog/pg_proc.h` - Add new functions to manipulation FTS configuration.
* `src/include/catalog/pg_ts_config_map.h` - Change the way configuration is stored. The configuration map is stored in JSONB field.
* `src/include/nodes/nodes.h` - Add new parse node types.
* `src/include/nodes/parsenodes.h` - Add new parse node structures.
* `src/include/parser/kwlist.h` - Add two new keywords.
* `src/include/tsearch/ts_cache.h` - Change types of cached FTS configuration maps.
* `src/include/tsearch/ts_configmap.h` - API function to manipulate FTS configuration structures.
* `src/include/tsearch/ts_public.h` - Add new flag for TSLexeme to mark lexemes produced by phrase-dictionary.
* `src/test/regress/expected/oidjoins.out` - Remove test for JOIN on `pg_ts_config_map` and `pg_ts_dict` because there is no JOIN point anymore.
* `src/test/regress/expected/tsdicts.out` - New tests to check is new syntax works or not.
* `src/test/regress/expected/tsearch.out` - Update test for `pg_ts_config_map` table and `ts_debug` output.
* `src/test/regress/sql/oidjoins.sql` - Remove test for JOIN on `pg_ts_config_map` and `pg_ts_dict` because there is no JOIN point anymore.
* `src/test/regress/sql/tsdicts.sql` - New tests to check is new syntax works or not.
* `src/test/regress/sql/tsearch.sql` - Update test for `pg_ts_config_map` table and `ts_debug` output.