On 9/17/19 12:09 PM, Erik Rijkers wrote: > On 2019-09-17 17:38, Jonathan S. Katz wrote: >> [regex.patch]
Thanks for the review! > "Several other parts of the SQL standard > also define LIKE_REGEX equivalents that refer > to this implementation, including the > SQL/JSON path like_regex filter." > > As I understand this text, 'concept' seems better. > I'd drop 'also', too. I rewrote this to be: "Several other parts of the SQL standard refer to the LIKE_REGEX specification to define similar operations, including..." > 2. > 'whereas the POSIX will those' should be > 'whereas POSIX will regard those' > or maybe 'read those' I used "treat those" > > 3. > + The SQL/JSON standard borrows its definition for how regular > expressions > + from the <literal>LIKE_REGEX</literal> operator, which in turns > uses the > + XQuery standard. > That sentence needs the verb 'work', no? 'for how regular expressions > work [..]' > Or alternatively drop 'how'. I dropped the "how". v2 attached. Thanks! Jonathan
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 2b4fe0cb59..c867ea13de 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -5968,6 +5968,88 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); </para> </sect3> + <sect3 id="posix-vs-like_regexp"> + <title>Differences with <literal>LIKE_REGEX</literal></title> + + <para> + The <literal>LIKE_REGEX</literal> operator is specified starting with + the SQL:2008 standard to provide a more robust specification for + comparisons using regular expressions. Several other parts of the SQL + standard refer to the <literal>LIKE_REGEX</literal> specification + to define similar operations, including the + <link linkend="jsonpath-regular-expressions">SQL/JSON path + <literal>like_regex</literal></link> filter. + </para> + <para> + The SQL standard states that regular expressions are evaluated according to + the XQuery standard for regular expressions. While POSIX regular + expressions are similar to XQuery regular expressions, there exist some + differences where the behavior deviates from what is defined for + <literal>LIKE_REGEX</literal>. Notably, regular expressions evaluated for + <literal>LIKE_REGEX</literal> are defined to work on Unicode encoded strings, + whereas POSIX regular expressions can work on strings of any encoding. + </para> + <para> + Other differences include: + <itemizedlist> + <listitem> + <para> + Character class subtraction is not supported (for example, using the + following to search for only consonants: + <literal>[a-z-[aeiou]]</literal>). + </para> + </listitem> + <listitem> + <para> + The <literal>LIKE_REGEX</literal> specification states that a single + <literal>.</literal> should match a Windows newline + (<literal>\r\n</literal>) whereas POSIX will treat those as two separate + characters. + </para> + </listitem> + <listitem> + <para> + The format <literal>#NN</literal> where <literal>NN</literal> + represents two hex digits used for character class elements is not + supported. The same character class elements can be used with POSIX by + specifying <literal>\xNN</literal>. + </para> + </listitem> + <listitem> + <para> + Character class elements using <literal>\p{UnicodeProperty}</literal> + or the inverse <literal>\P{UnicodeProperty}</literal> are not supported. + </para> + </listitem> + <listitem> + <para> + Character class shorthands <literal>\i</literal>, + <literal>\I</literal>, <literal>\c</literal>, and <literal>\C</literal> + are not supported. + </para> + </listitem> + <listitem> + <para> + The specification for <literal>LIKE_REGEX</literal> may allow for more + characters for the <literal>\w</literal> character class shorthand, and + by extensions, excludes more characters with the complement + <literal>\W</literal>. As PostgreSQL depends on the underlying system's + locale, this may cause the behavior of <literal>\w</literal> and + <literal>\W</literal> to be equivalent to what POSIX provides. + </para> + </listitem> + <listitem> + <para> + The <literal>x</literal> flag in PostgreSQL extends on the specification + for <literal>LIKE_REGEX</literal> by allowing for comments specified + with <literal>#</literal>. + </para> + </listitem> + </itemizedlist> + </para> + + </sect3> + <!-- end re_syntax.n man page --> </sect2> @@ -11872,6 +11954,38 @@ table2-mapping </sect3> + <sect3 id="jsonpath-regular-expressions"> + <title>Regular Expressions</title> + + <para> + SQL/JSON path expressions support the ability to match text using regular + expressions with the <literal>like_regex</literal> filter. For example, + the following SQL/JSON path query would case-insensitively match all + strings in an array that start with a vowel: +<programlisting> +'$[*] ? (@ like_regex "^[aeiou]" flag "i")' +</programlisting> + </para> + + <para> + The SQL/JSON standard borrows its definition for regular expressions + from the <literal>LIKE_REGEX</literal> operator, which in turns uses the + XQuery standard. PostgreSQL does not support the + <literal>LIKE_REGEX</literal> operator as it currently implements + <link linkend="functions-posix-regexp">POSIX regular expressions</link>. + </para> + + <para> + For its implementation of the SQL/JSON path <literal>like_regex</literal> + filter, PostgreSQL uses its POSIX implementation to evaluate the + regular expressions. While similar to the SQL standard specification for + the <literal>LIKE_REGEX</literal> operator, there are some noted + differences that you can read about in + <xref linkend="posix-vs-like_regexp" />. + </para> + + </sect3> + <sect3 id="functions-sqljson-path-operators"> <title>SQL/JSON Path Operators and Methods</title> @@ -12114,9 +12228,10 @@ table2-mapping <entry><literal>like_regex</literal></entry> <entry> Tests pattern matching with POSIX regular expressions - (see <xref linkend="functions-posix-regexp"/>). Supported flags - are <literal>i</literal>, <literal>s</literal>, <literal>m</literal>, - <literal>x</literal>, and <literal>q</literal>.</entry> + (see <xref linkend="jsonpath-regular-expressions"/> for additional + details). Supported flags are <literal>i</literal>, + <literal>s</literal>, <literal>m</literal>, + and <literal>q</literal>.</entry> <entry><literal>["abc", "abd", "aBdC", "abdacb", "babc"]</literal></entry> <entry><literal>$[*] ? (@ like_regex "^ab.*c" flag "i")</literal></entry> <entry><literal>"abc", "aBdC", "abdacb"</literal></entry>
signature.asc
Description: OpenPGP digital signature