> On 2021.03.20. 19:48 Gilles Darold <gil...@darold.net> wrote:
>
> This is a new version of the patch that now implements all the XQUERY
> regexp functions as described in the standard, minus the differences of
> PostgerSQL regular expression explain in [1].
>
> The standard SQL describe functions like_regex(), occurrences_regex(),
> position_regex(), substring_regex() and translate_regex() which
> correspond to the commonly named functions regexp_like(),
> regexp_count(), regexp_instr(), regexp_substr() and regexp_replace() as
> reported by Chapman Flack in [2]. All these function are implemented in
> [v2-0001-xquery-regexp-functions.patch]
Hi,
Apply, compile and (world)check are fine. I haven't found errors in
functionality.
I went through the docs, and came up with these changes in func.sgml, and
pg_proc.dat.
Useful functions - thanks!
Erik Rijkers
--- doc/src/sgml/func.sgml.orig 2021-03-21 03:59:37.884365465 +0100
+++ doc/src/sgml/func.sgml 2021-03-21 11:37:46.880644051 +0100
@@ -3106,7 +3106,7 @@
<returnvalue>integer</returnvalue>
</para>
<para>
- Return the number of times a pattern occurs for a match of a POSIX
+ Returns the number of times a pattern occurs for a match of a POSIX
regular expression to the <parameter>string</parameter>; see
<xref linkend="functions-posix-regexp"/>.
</para>
@@ -3125,11 +3125,11 @@
<returnvalue>integer</returnvalue>
</para>
<para>
- Return the position within <parameter>string</parameter> where the
+ Returns the position within <parameter>string</parameter> where the
match of a POSIX regular expression occurs. It returns an integer
indicating the beginning or ending position of the matched substring,
depending on the value of the <parameter>returnopt</parameter> argument
- (default beginning). If no match is found, then the function returns 0;
+ (default beginning). If no match is found the function returns 0;
see <xref linkend="functions-posix-regexp"/>.
</para>
<para>
@@ -3147,12 +3147,12 @@
<returnvalue>boolean</returnvalue>
</para>
<para>
- Evaluate the existence of a match to a POSIX regular expression
+ Evaluates the existence of a match to a POSIX regular expression
in <parameter>string</parameter>; see <xref linkend="functions-posix-regexp"/>.
</para>
<para>
<literal>regexp_like('Hello'||chr(10)||'world', '^world$', 'm')</literal>
- <returnvalue>3</returnvalue>
+ <returnvalue>t</returnvalue>
</para></entry>
</row>
@@ -5773,7 +5773,7 @@
</para>
<para>
- The <function>regexp_like</function> function evaluate the existence of a match
+ The <function>regexp_like</function> function evaluates the existence of a match
to a POSIX regular expression in <parameter>string</parameter>; returns a boolean
resulting from matching a POSIX regular expression pattern to a string. It has
the syntax <function>regexp_like</function>(<replaceable>string</replaceable>,
@@ -5782,7 +5782,7 @@
from the beginning of <replaceable>string</replaceable>.
The <replaceable>flags</replaceable> parameter is an optional text string
containing zero or more single-letter flags that change the function's behavior.
- <function>regexp_count</function> accepts all the flags
+ <function>regexp_like</function> accepts all the flags
shown in <xref linkend="posix-embedded-options-table"/>.
This function is similar to regexp operator <literal>~</literal> when used without
<replaceable>flags</replaceable> and similar to operator <literal>~*</literal> when
@@ -5792,9 +5792,9 @@
<para>
Some examples:
<programlisting>
-SELECT 'found' FROM t1 WHERE regexp_like('Hello'||chr(10)||'world', '^world$', 'm');
- regexp_like
---------------
+SELECT 'found' FROM (values('Hello'||chr(10)||'world') as f(col) WHERE regexp_like(col, '^world$', 'm');
+ ?column?
+----------
found
(1 row)
</programlisting>
@@ -5814,7 +5814,7 @@
<function>regexp_count</function> accepts all the flags
shown in <xref linkend="posix-embedded-options-table"/>.
The <literal>g</literal> flag is forced internally to count all matches.
- This function returns 0 if there is no match or the number of match as
+ This function returns 0 if there is no match or the number of matches as
an integer.
</para>
@@ -5853,17 +5853,17 @@
the position of the character after the occurrence.
The <replaceable>flags</replaceable> parameter is an optional text string
containing zero or more single-letter flags that change the function's behavior.
- <function>regexp_count</function> accepts all the flags
+ <function>regexp_instr</function> accepts all the flags
shown in <xref linkend="posix-embedded-options-table"/>.
The <literal>g</literal> flag is forced internally to track all matches.
For a pattern with capture groups, <replaceable>group</replaceable> is an integer indicating
- which capture in pattern is the target of the function. A capture group is a part of the pattern
- enclosed in parentheses. Capture groups can be nested. They are numbered in order in which their
- left parentheses appear in pattern. If <replaceable>group</replaceable> is zero, then the position
+ which capture in <replaceable>pattern</replaceable> is the target of the function. A capture group is a part of the pattern
+ enclosed in parentheses. Capture groups can be nested. They are numbered in the order in which their
+ left parentheses appear in <replaceable>pattern</replaceable>. If <replaceable>group</replaceable> is zero, then the position
of the entire substring that matches the pattern is returned. If <replaceable>pattern</replaceable>
- does not have at least <replaceable>group</replaceable> capture group, the function returns zero.
+ does not have at least <replaceable>group</replaceable> capture groups, the function returns zero.
This function returns 0 if there is no match or the starting or ending position
- of match as an integer.
+ of a match as an integer.
</para>
<para>
@@ -5897,16 +5897,16 @@
indicates which occurrence of <replaceable>pattern</replaceable> in <replaceable>string</replaceable>
should be searched. The <replaceable>flags</replaceable> parameter is an optional text string
containing zero or more single-letter flags that change the function's behavior.
- <function>regexp_count</function> accepts all the flags
+ <function>regexp_substr</function> accepts all the flags
shown in <xref linkend="posix-embedded-options-table"/>.
The <literal>g</literal> flag is forced internally to track all matches.
For a pattern with capture groups, optional <replaceable>group</replaceable> is an integer indicating
- which capture in pattern is the target of the function. A capture group is a part of the pattern
- enclosed in parentheses. Capture groups can be nested. They are numbered in order in which their
- left parentheses appear in pattern. If <replaceable>group</replaceable> is zero, then the position
+ which capture in <replaceable>pattern</replaceable> is the target of the function. A capture group is a part of the pattern
+ enclosed in parentheses. Capture groups can be nested. They are numbered in the order in which their
+ left parentheses appear in <replaceable>pattern</replaceable>. If <replaceable>group</replaceable> is zero, then the position
of the entire substring that matches the pattern is returned. If <replaceable>pattern</replaceable>
- does not have at least <replaceable>group</replaceable> capture group, the function returns zero.
- This function returns NULL if there is no match or the substring of match.
+ does not have at least <replaceable>group</replaceable> capture groups, the function returns zero.
+ This function returns NULL if there is no match or the substring of the match.
</para>
<para>
--- src/include/catalog/pg_proc.dat.orig 2021-03-21 11:59:04.107454798 +0100
+++ src/include/catalog/pg_proc.dat 2021-03-21 12:02:24.006401415 +0100
@@ -3587,19 +3587,19 @@
{ oid => '9622', descr => 'position where the match for regexp was located',
proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4 text int4',
prosrc => 'regexp_instr' },
-{ oid => '9623', descr => 'substring that match the regexp pattern',
+{ oid => '9623', descr => 'substring that matches the regexp pattern',
proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text',
prosrc => 'regexp_substr_no_start' },
-{ oid => '9624', descr => 'substring that match the regexp pattern',
+{ oid => '9624', descr => 'substring that matches the regexp pattern',
proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4',
prosrc => 'regexp_substr_no_occurrence' },
-{ oid => '9625', descr => 'substring that match the regexp pattern',
+{ oid => '9625', descr => 'substring that matches the regexp pattern',
proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4',
prosrc => 'regexp_substr_no_flags' },
-{ oid => '9626', descr => 'substring that match the regexp pattern',
+{ oid => '9626', descr => 'substring that matches the regexp pattern',
proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4 text',
prosrc => 'regexp_substr_no_subexpr' },
-{ oid => '9627', descr => 'substring that match the regexp pattern',
+{ oid => '9627', descr => 'substring that matches the regexp pattern',
proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4 text int4',
prosrc => 'regexp_substr' },
{ oid => '9628', descr => 'evaluate match(es) for regexp',