On 9/17/19 12:09 PM, Erik Rijkers wrote:
> On 2019-09-17 17:38, Jonathan S. Katz wrote:
>> [regex.patch]

Thanks for the review!

> "Several other parts of the SQL standard
> also define LIKE_REGEX equivalents that refer
> to this implementation, including the
> SQL/JSON path like_regex filter."
> 
> As I understand this text, 'concept' seems better.
> I'd drop 'also', too.

I rewrote this to be:

"Several other parts of the SQL standard refer to the LIKE_REGEX
specification to define similar operations, including..."

> 2.
> 'whereas the POSIX will those'  should be
> 'whereas POSIX will regard those'
>  or maybe 'read those'

I used "treat those"

> 
> 3.
> +     The SQL/JSON standard borrows its definition for how regular
> expressions
> +     from the <literal>LIKE_REGEX</literal> operator, which in turns
> uses the
> +     XQuery standard.
> That sentence needs the verb 'work', no?  'for how regular expressions
> work [..]'
> Or alternatively drop 'how'.

I dropped the "how".

v2 attached. Thanks!

Jonathan
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2b4fe0cb59..c867ea13de 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -5968,6 +5968,88 @@ SELECT regexp_match('abc01234xyz', 
'(?:(.*?)(\d+)(.*)){1,1}');
    </para>
    </sect3>
 
+   <sect3 id="posix-vs-like_regexp">
+   <title>Differences with <literal>LIKE_REGEX</literal></title>
+
+   <para>
+    The <literal>LIKE_REGEX</literal> operator is specified starting with
+    the SQL:2008 standard to provide a more robust specification for
+    comparisons using regular expressions. Several other parts of the SQL
+    standard refer to the <literal>LIKE_REGEX</literal> specification
+    to define similar operations, including the
+    <link linkend="jsonpath-regular-expressions">SQL/JSON path
+     <literal>like_regex</literal></link> filter.
+   </para>
+   <para>
+    The SQL standard states that regular expressions are evaluated according to
+    the XQuery standard for regular expressions. While POSIX regular
+    expressions are similar to XQuery regular expressions, there exist some
+    differences where the behavior deviates from what is defined for
+    <literal>LIKE_REGEX</literal>. Notably, regular expressions evaluated for
+    <literal>LIKE_REGEX</literal> are defined to work on Unicode encoded 
strings,
+    whereas POSIX regular expressions can work on strings of any encoding.
+   </para>
+   <para>
+    Other differences include:
+    <itemizedlist>
+     <listitem>
+      <para>
+       Character class subtraction is not supported (for example, using the
+       following to search for only consonants:
+       <literal>[a-z-[aeiou]]</literal>).
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The <literal>LIKE_REGEX</literal> specification states that a single
+       <literal>.</literal> should match a Windows newline
+       (<literal>\r\n</literal>) whereas POSIX will treat those as two separate
+       characters.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The format <literal>#NN</literal> where <literal>NN</literal>
+       represents two hex digits used for character class elements is not
+       supported. The same character class elements can be used with POSIX by
+       specifying <literal>\xNN</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Character class elements using <literal>\p{UnicodeProperty}</literal>
+       or the inverse <literal>\P{UnicodeProperty}</literal> are not supported.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Character class shorthands <literal>\i</literal>,
+       <literal>\I</literal>, <literal>\c</literal>, and <literal>\C</literal>
+       are not supported.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The specification for <literal>LIKE_REGEX</literal> may allow for more
+       characters for the <literal>\w</literal> character class shorthand, and
+       by extensions, excludes more characters with the complement
+       <literal>\W</literal>. As PostgreSQL depends on the underlying system's
+       locale, this may cause the behavior of <literal>\w</literal> and
+       <literal>\W</literal> to be equivalent to what POSIX provides.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The <literal>x</literal> flag in PostgreSQL extends on the specification
+       for <literal>LIKE_REGEX</literal> by allowing for comments specified
+       with <literal>#</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+  </sect3>
+
 <!-- end re_syntax.n man page -->
 
   </sect2>
@@ -11872,6 +11954,38 @@ table2-mapping
 
    </sect3>
 
+   <sect3 id="jsonpath-regular-expressions">
+    <title>Regular Expressions</title>
+
+    <para>
+     SQL/JSON path expressions support the ability to match text using regular
+     expressions with the <literal>like_regex</literal> filter. For example,
+     the following SQL/JSON path query would case-insensitively match all
+     strings in an array that start with a vowel:
+<programlisting>
+'$[*] ? (@ like_regex "^[aeiou]" flag "i")'
+</programlisting>
+    </para>
+
+    <para>
+     The SQL/JSON standard borrows its definition for regular expressions
+     from the <literal>LIKE_REGEX</literal> operator, which in turns uses the
+     XQuery standard. PostgreSQL does not support the
+     <literal>LIKE_REGEX</literal> operator as it currently implements
+     <link linkend="functions-posix-regexp">POSIX regular expressions</link>.
+    </para>
+
+    <para>
+     For its implementation of the SQL/JSON path <literal>like_regex</literal>
+     filter, PostgreSQL uses its POSIX implementation to evaluate the
+     regular expressions. While similar to the SQL standard specification for
+     the <literal>LIKE_REGEX</literal> operator, there are some noted
+     differences that you can read about in
+     <xref linkend="posix-vs-like_regexp" />.
+    </para>
+
+   </sect3>
+
    <sect3 id="functions-sqljson-path-operators">
    <title>SQL/JSON Path Operators and Methods</title>
 
@@ -12114,9 +12228,10 @@ table2-mapping
         <entry><literal>like_regex</literal></entry>
         <entry>
           Tests pattern matching with POSIX regular expressions
-          (see <xref linkend="functions-posix-regexp"/>).  Supported flags
-          are <literal>i</literal>, <literal>s</literal>, <literal>m</literal>,
-          <literal>x</literal>, and <literal>q</literal>.</entry>
+          (see <xref linkend="jsonpath-regular-expressions"/> for additional
+          details).  Supported flags are <literal>i</literal>,
+          <literal>s</literal>, <literal>m</literal>,
+          and <literal>q</literal>.</entry>
         <entry><literal>["abc", "abd", "aBdC", "abdacb", 
"babc"]</literal></entry>
         <entry><literal>$[*] ? (@ like_regex "^ab.*c" flag 
"i")</literal></entry>
         <entry><literal>"abc", "aBdC", "abdacb"</literal></entry>

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to