Le 30/07/2021 à 23:38, Tom Lane a écrit : > Gilles Darold <gil...@darold.net> writes: >> Le 26/07/2021 à 21:56, Tom Lane a écrit : >>> I'm inclined to just drop the regexp_replace additions. I don't think >>> that the extra parameters Oracle provides here are especially useful. >>> They're definitely not useful enough to justify creating compatibility >>> hazards for. >> I would not say that being able to replace the Nth occurrence of a >> pattern matching is not useful but i agree that this is not a common >> case with replacement. Both Oracle [1] and IBM DB2 [2] propose this form >> and I have though that we can not have compatibility issues because of >> the different data type at the 4th parameter. > Well, here's an example of the potential issues: > > [...]
Thanks for pointing me this case, I did not think that the prepared statement could lead to this confusion. >> Anyway, maybe we can just >> rename the function even if I would prefer that regexp_replace() be >> extended. For example: >> regexp_replace(source, pattern, replacement [, flags ]); >> regexp_substitute(source, pattern, replacement [, position ] [, >> occurrence ] [, flags ]); > Hmm. Of course the entire selling point of this patch seems to be > bug-compatibility with Oracle, so using different names is largely > defeating the point :-( > > Maybe we should just hold our noses and do it. The point that > you'd get a recognizable failure if the wrong function were chosen > reassures me a little bit. We've seen a lot of cases where this > sort of ambiguity results in the system just silently doing something > different from what you expected, and I was afraid that that could > happen here. I join a new version of the patch that include a check of the option parameter in the basic form of regexp_replace() and return an error in ambiguous cases. PREPARE rr AS SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', $1); EXECUTE rr(1); ERROR: ambiguous use of the option parameter in regex_replace(), value: 1 HINT: you might set the occurrence parameter to force the use of the extended form of regex_replace() This is done by checking if the option parameter value is an integer and throw the error in this case. I don't think of anything better. Best regards, -- Gilles Darold
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index a5b6adc4bb..02d1f72e1e 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -3108,6 +3108,66 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue> </para></entry> </row> + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>regexp_count</primary> + </indexterm> + <function>regexp_count</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] ) + <returnvalue>integer</returnvalue> + </para> + <para> + Returns the number of times a pattern occurs for a match of a POSIX + regular expression to the <parameter>string</parameter>; see + <xref linkend="functions-posix-regexp"/>. + </para> + <para> + <literal>regexp_count('123456789012', '\d{3}', 3)</literal> + <returnvalue>3</returnvalue> + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>regexp_instr</primary> + </indexterm> + <function>regexp_instr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>occurence</parameter> <type>integer</type> ] [, <parameter>returnopt</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] [, <parameter>group</parameter> <type>integer</type> ] ) + <returnvalue>integer</returnvalue> + </para> + <para> + Returns the position within <parameter>string</parameter> where the + match of a POSIX regular expression occurs. It returns an integer + indicating the beginning or ending position of the matched substring, + depending on the value of the <parameter>returnopt</parameter> argument + (default beginning). If no match is found the function returns 0; + see <xref linkend="functions-posix-regexp"/>. + </para> + <para> + <literal>regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)</literal> + <returnvalue>7</returnvalue> + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>regexp_like</primary> + </indexterm> + <function>regexp_like</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type> ] ) + <returnvalue>boolean</returnvalue> + </para> + <para> + Evaluates the existence of a match to a POSIX regular expression + in <parameter>string</parameter>; see <xref linkend="functions-posix-regexp"/>. + </para> + <para> + <literal>regexp_like('Hello'||chr(10)||'world', '^world$', 'm')</literal> + <returnvalue>t</returnvalue> + </para></entry> + </row> + + <row> <entry role="func_table_entry"><para role="func_signature"> <indexterm> @@ -3156,7 +3216,7 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue> <indexterm> <primary>regexp_replace</primary> </indexterm> - <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type> ] ) + <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>occurence</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] ) <returnvalue>text</returnvalue> </para> <para> @@ -3171,6 +3231,24 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue> </para></entry> </row> + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>regexp_substr</primary> + </indexterm> + <function>regexp_substr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>occurence</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] [, <parameter>group</parameter> <type>integer</type> ] ) + <returnvalue>text</returnvalue> + </para> + <para> + Return the substring within <parameter>string</parameter> corresponding to the + match of a POSIX regular expression; see <xref linkend="functions-posix-regexp"/>. + </para> + <para> + <literal>regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3)</literal> + <returnvalue>55</returnvalue> + </para></entry> + </row> + <row> <entry role="func_table_entry"><para role="func_signature"> <indexterm> @@ -5392,6 +5470,18 @@ substring('foobar' similar '#"o_b#"%' escape '#') <lineannotation>NULL</linea <indexterm> <primary>regexp_split_to_array</primary> </indexterm> + <indexterm> + <primary>regexp_like</primary> + </indexterm> + <indexterm> + <primary>regexp_count</primary> + </indexterm> + <indexterm> + <primary>regexp_instr</primary> + </indexterm> + <indexterm> + <primary>regexp_substr</primary> + </indexterm> <para> <xref linkend="functions-posix-table"/> lists the available @@ -5650,6 +5740,8 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab; It has the syntax <function>regexp_replace</function>(<replaceable>source</replaceable>, <replaceable>pattern</replaceable>, <replaceable>replacement</replaceable> + <optional>, <replaceable>position</replaceable> </optional> + <optional>, <replaceable>occurrence</replaceable> </optional> <optional>, <replaceable>flags</replaceable> </optional>). The <replaceable>source</replaceable> string is returned unchanged if there is no match to the <replaceable>pattern</replaceable>. If there is a @@ -5663,12 +5755,19 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab; substring matching the entire pattern should be inserted. Write <literal>\\</literal> if you need to put a literal backslash in the replacement text. - The <replaceable>flags</replaceable> parameter is an optional text - string containing zero or more single-letter flags that change the + <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting + from an optional <replaceable>position</replaceable> or from the beginning + of <replaceable>source</replaceable> by default. Optional <replaceable>occurrence</replaceable> + parameter indicates which occurrence of <replaceable>pattern</replaceable> in + <replaceable>source</replaceable> should be replaced. Default value for <replaceable>occurrence</replaceable> + is 1, replace only the first occurrence. The <replaceable>flags</replaceable> parameter + is an optional text string containing zero or more single-letter flags that change the function's behavior. Flag <literal>i</literal> specifies case-insensitive matching, while flag <literal>g</literal> specifies replacement of each matching - substring rather than only the first one. Supported flags (though - not <literal>g</literal>) are + substring rather than only the first one. When <replaceable>occurrence</replaceable> + is set flag <literal>g</literal> has no effect. If <replaceable>occurrence</replaceable> + is set to zero, all occurrences are replaced which is similar to flag <literal>g</literal>. + Supported flags (though not <literal>g</literal>) are described in <xref linkend="posix-embedded-options-table"/>. </para> @@ -5681,6 +5780,10 @@ regexp_replace('foobarbaz', 'b..', 'X', 'g') <lineannotation>fooXX</lineannotation> regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g') <lineannotation>fooXarYXazY</lineannotation> +regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i') + <lineannotation>X PXstgrXSQL fXnctXXn</lineannotation> +regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i') + <lineannotation>A PostgrXSQL function</lineannotation> </programlisting> </para> @@ -5766,6 +5869,163 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo; in practice. Other software systems such as Perl use similar definitions. </para> + <para> + The <function>regexp_like</function> function evaluates the existence of a match + to a POSIX regular expression in <parameter>string</parameter>; returns a boolean + resulting from matching a POSIX regular expression pattern to a string. It has + the syntax <function>regexp_like</function>(<replaceable>string</replaceable>, + <replaceable>pattern</replaceable> <optional>, <replaceable>flags</replaceable> </optional>). + <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting + from the beginning of <replaceable>string</replaceable>. + The <replaceable>flags</replaceable> parameter is an optional text string + containing zero or more single-letter flags that change the function's behavior. + <function>regexp_like</function> accepts all the flags + shown in <xref linkend="posix-embedded-options-table"/>. + This function is similar to regexp operator <literal>~</literal> when used without + <replaceable>flags</replaceable> and similar to operator <literal>~*</literal> when + used with <replaceable>flags</replaceable> <literal>'i'</literal> only. + </para> + + <para> + Some examples: +<programlisting> +SELECT 'found' FROM (values('Hello'||chr(10)||'world') as f(col) WHERE regexp_like(col, '^world$', 'm'); + ?column? +---------- + found +(1 row) +</programlisting> + </para> + + + <para> + The <function>regexp_count</function> function returns the number of + captured substring(s) resulting from matching a POSIX regular + expression pattern to a string. It has the syntax <function>regexp_count</function>( + <replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>, + <replaceable>position</replaceable> </optional> <optional>, <replaceable>flags</replaceable> </optional>). + <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting + from an optional <replaceable>position</replaceable> or from the beginning of <replaceable>string</replaceable> + by default. The <replaceable>flags</replaceable> parameter is an optional text string + containing zero or more single-letter flags that change the function's behavior. + <function>regexp_count</function> accepts all the flags + shown in <xref linkend="posix-embedded-options-table"/>. + The <literal>g</literal> flag is forced internally to count all matches. + This function returns 0 if there is no match or the number of matches as + an integer. + </para> + + <para> + Some examples: +<programlisting> +SELECT regexp_count('123123123123', '\d{3}', 1); + regexp_count +-------------- + 4 +(1 row) + +SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm'); + regexp_count +-------------- + 1 +(1 row) +</programlisting> + </para> + + <para> + The <function>regexp_instr</function> function returns the beginning or ending + position of the matched substring resulting from matching a POSIX regular + expression pattern to a string. It has the syntax <function>regexp_instr</function>( + <replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>, + <replaceable>position</replaceable> </optional> <optional>, <replaceable>occurrence</replaceable> </optional> + <optional>, <replaceable>returnopt</replaceable> </optional> + <optional>, <replaceable>flags</replaceable> </optional> + <optional>, <replaceable>group</replaceable> </optional>). + <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting + from an optional <replaceable>position</replaceable> or from the beginning + of <replaceable>string</replaceable> by default. Optional <replaceable>occurrence</replaceable> parameter + indicates which occurrence of <replaceable>pattern</replaceable> in <replaceable>string</replaceable> + should be searched. When optional <replaceable>returnopt</replaceable> parameter is set to 0 (default) + the function returns the position of the first character of the occurrence. When set to 1 returns + the position of the character after the occurrence. + The <replaceable>flags</replaceable> parameter is an optional text string + containing zero or more single-letter flags that change the function's behavior. + <function>regexp_instr</function> accepts all the flags + shown in <xref linkend="posix-embedded-options-table"/>. + The <literal>g</literal> flag is forced internally to track all matches. + For a pattern with capture groups, <replaceable>group</replaceable> is an integer indicating + which capture in <replaceable>pattern</replaceable> is the target of the function. + A capture group is a part of the pattern enclosed in parentheses. Capture groups can be nested. + They are numbered in the order in which their left parentheses appear in <replaceable>pattern</replaceable>. + If <replaceable>group</replaceable> is zero, then the position + of the entire substring that matches the pattern is returned. If <replaceable>pattern</replaceable> + does not have at least <replaceable>group</replaceable> capture groups, the function returns zero. + This function returns 0 if there is no match or the starting or ending position + of a match as an integer. + </para> + + <para> + Some examples: +<programlisting> +SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 1, 6); + regexp_instr +-------------- + 32 +(1 row) + +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3); + regexp_instr +-------------- + 5 +(1 row) +</programlisting> + </para> + + <para> + The <function>regexp_substr</function> function returns the + matched substring resulting from matching a POSIX regular + expression pattern to a string. It has the syntax <function>regexp_substr</function>( + <replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>, + <replaceable>position</replaceable> </optional> <optional>, <replaceable>occurrence</replaceable> </optional> + <optional>, <replaceable>flags</replaceable> </optional> + <optional>, <replaceable>group</replaceable> </optional>). + <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting + from an optional <replaceable>position</replaceable> or from the beginning + of <replaceable>string</replaceable> by default. Optional parameter <replaceable>occurrence</replaceable> + indicates which occurrence of <replaceable>pattern</replaceable> in <replaceable>string</replaceable> + should be searched. The <replaceable>flags</replaceable> parameter is an optional text string + containing zero or more single-letter flags that change the function's behavior. + <function>regexp_substr</function> accepts all the flags + shown in <xref linkend="posix-embedded-options-table"/>. + The <literal>g</literal> flag is forced internally to track all matches. + For a pattern with capture groups, optional <replaceable>group</replaceable> is an integer indicating + which capture in <replaceable>pattern</replaceable> is the target of the function. A capture group + is a part of the pattern enclosed in parentheses. Capture groups can be nested. They are numbered in + the order in which their left parentheses appear in <replaceable>pattern</replaceable>. + If <replaceable>group</replaceable> is zero, then the position + of the entire substring that matches the pattern is returned. If <replaceable>pattern</replaceable> + does not have at least <replaceable>group</replaceable> capture groups, the function returns zero. + This function returns NULL if there is no match or the substring of the match. + </para> + + <para> + Some examples: +<programlisting> +SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 2); + regexp_substr +--------------- + , FR +(1 row) + +SELECT regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3); + regexp_substr +--------------- + 55 +(1 row) +</programlisting> + </para> + + <!-- derived from the re_syntax.n man page --> <sect3 id="posix-syntax-details"> diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c index a32c5c82ab..f34cce18ef 100644 --- a/src/backend/utils/adt/regexp.c +++ b/src/backend/utils/adt/regexp.c @@ -120,6 +120,7 @@ static regexp_matches_ctx *setup_regexp_matches(text *orig_str, text *pattern, static ArrayType *build_regexp_match_result(regexp_matches_ctx *matchctx); static Datum build_regexp_split_result(regexp_matches_ctx *splitctx); +static text *enclose_with_parenthesis(text *str); /* * RE_compile_and_cache - compile a RE, caching if possible @@ -555,7 +556,6 @@ texticregexne(PG_FUNCTION_ARGS) 0, NULL)); } - /* * textregexsubstr() * Return a substring matched by a regular expression. @@ -646,6 +646,32 @@ textregexreplace(PG_FUNCTION_ARGS) regex_t *re; pg_re_flags flags; + + /* + * When the function is called with four parameters in a prepared + * statement the basic regexp_replace() function can be called + * instead of the extended form. This is because the parameter type + * is unknown at prepare time and the basic form is chosen. Check + * that the fourth parameter is not an integer otherwise return an + * error that we call the extended form. + */ + if (PG_NARGS() == 4) + { + char *badopt; + + (void) strtol(TextDatumGetCString(opt), &badopt, 10); + if (strcmp(badopt, "") == 0) + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("ambiguous use of the option parameter" + " in regex_replace(), value: %s", + TextDatumGetCString(opt)), + errhint("you might set the occurrence parameter to force" + " the use of the extended form of regex_replace()"))); + } + } + parse_re_flags(&flags, opt); re = RE_compile_and_cache(p, flags.cflags, PG_GET_COLLATION()); @@ -1063,6 +1089,508 @@ regexp_matches_no_flags(PG_FUNCTION_ARGS) return regexp_matches(fcinfo); } +/* + * regexp_like() + * Return the true if a pattern match within a string. + */ +Datum +regexp_like(PG_FUNCTION_ARGS) +{ + text *str = PG_GETARG_TEXT_PP(0); + text *pattern = PG_GETARG_TEXT_PP(1); + text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(2); + + pg_re_flags re_flags; + regexp_matches_ctx *matchctx; + + /* Determine options */ + parse_re_flags(&re_flags, flags); + + matchctx = setup_regexp_matches(str, pattern, &re_flags, + PG_GET_COLLATION(), true, false, false); + + if (matchctx->nmatches > 0) + PG_RETURN_BOOL(true); + + PG_RETURN_BOOL(false); +} + +/* This is separate to keep the opr_sanity regression test from complaining */ +Datum +regexp_like_no_flags(PG_FUNCTION_ARGS) +{ + return regexp_like(fcinfo); +} + +/* + * regexp_count() + * Return the number of matches of a pattern within a string. + */ +Datum +regexp_count(PG_FUNCTION_ARGS) +{ + text *orig_str = NULL; + text *pattern = PG_GETARG_TEXT_PP(1); + int start = 1; + text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(3); + + pg_re_flags re_flags; + regexp_matches_ctx *matchctx; + + if (PG_NARGS() > 2) + start = PG_GETARG_INT32(2); + + if (start < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid value for parameter \"%s\": %d", "start_position", start))); + + /* regexp_count(string, pattern, start[, flags]) */ + if (start > 1) + orig_str = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len, + PG_GETARG_DATUM(0), + Int32GetDatum(start) + )); + else + orig_str = PG_GETARG_TEXT_PP(0); + + /* Determine options */ + parse_re_flags(&re_flags, flags); + + /* this function require flag 'g' */ + re_flags.glob = true; + + matchctx = setup_regexp_matches(orig_str, pattern, &re_flags, + PG_GET_COLLATION(), true, false, false); + + PG_RETURN_INT32(matchctx->nmatches); +} + +/* This is separate to keep the opr_sanity regression test from complaining */ +Datum +regexp_count_no_start(PG_FUNCTION_ARGS) +{ + return regexp_count(fcinfo); +} + + +/* This is separate to keep the opr_sanity regression test from complaining */ +Datum +regexp_count_no_flags(PG_FUNCTION_ARGS) +{ + return regexp_count(fcinfo); +} + +/* + * textregexreplace_extended() + * Return a string matched by a regular expression, with replacement. + * Extends textregexreplace by allowing a start position and the + * choice of the occurrence to replace (0 means all occurrences). + */ +Datum +textregexreplace_extended(PG_FUNCTION_ARGS) +{ + text *s = NULL; + text *p = PG_GETARG_TEXT_PP(1); + text *pattern = enclose_with_parenthesis(PG_GETARG_TEXT_PP(1)); + text *r = PG_GETARG_TEXT_PP(2); + int start = 1; + int occurrence = 1; + int pos = 0; + text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(5); + StringInfoData str; + regex_t *re; + pg_re_flags re_flags; + regexp_matches_ctx *matchctx; + text *after = NULL; + + /* start position */ + if (PG_NARGS() > 3) + start = PG_GETARG_INT32(3); + if (start < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid value for parameter \"%s\": %d", "start_position", start))); + if (start > 1) + s = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len, + PG_GETARG_DATUM(0), + Int32GetDatum(start) + )); + else + s = PG_GETARG_TEXT_PP(0); + + /* occurrence to replace */ + if (PG_NARGS() > 4) + occurrence = PG_GETARG_INT32(4); + + if (occurrence < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid value for parameter \"%s\": %d", "occurrence", occurrence))); + + /* Determine options */ + parse_re_flags(&re_flags, flags); + re_flags.glob = true; + + /* lookup for pattern */ + matchctx = setup_regexp_matches(s, pattern, &re_flags, + PG_GET_COLLATION(), true, + false, re_flags.glob); + + /* If no match is found, then the function returns the original string */ + if (matchctx->nmatches == 0) + PG_RETURN_DATUM(PG_GETARG_DATUM(0)); + + /* Get the position of the occurence to replace */ + if (PG_NARGS() > 4 && occurrence > 0) + { + /* When occurrence exceed matches return the original string */ + if (occurrence > matchctx->nmatches) + PG_RETURN_DATUM(PG_GETARG_DATUM(0)); + + pos = (occurrence*matchctx->npatterns*2)-(matchctx->npatterns*2); + pos = matchctx->match_locs[pos]+start; + } + else + pos = start; + + /* Normal case without explicit VARIADIC marker */ + initStringInfo(&str); + + /* Get the string before the occurrence starting */ + if (pos > 1) + { + text *before = DatumGetTextPP(DirectFunctionCall3(text_substr, + PG_GETARG_DATUM(0), + Int32GetDatum(1), + Int32GetDatum(pos - 1))); + appendStringInfoString(&str, TextDatumGetCString(before)); + } + + /* all occurences must be replaced? */ + if (occurrence == 0) + re_flags.glob = true; + else + re_flags.glob = false; + + /* Compile the regular expression */ + re = RE_compile_and_cache(p, re_flags.cflags, PG_GET_COLLATION()); + + /* Get the substring starting at the right occurrence position */ + after = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len, + PG_GETARG_DATUM(0), + Int32GetDatum(pos) + )); + + appendStringInfoString(&str, TextDatumGetCString(replace_text_regexp( + after, + (void *) re, + r, + re_flags.glob))); + + PG_RETURN_TEXT_P(CStringGetTextDatum(str.data)); +} + +/* This is separate to keep the opr_sanity regression test from complaining */ +Datum +textregexreplace_extended_no_occurrence(PG_FUNCTION_ARGS) +{ + return textregexreplace_extended(fcinfo); +} + +/* This is separate to keep the opr_sanity regression test from complaining */ +Datum +textregexreplace_extended_no_flags(PG_FUNCTION_ARGS) +{ + return textregexreplace_extended(fcinfo); +} + +/* + * Return text string between parenthesis + */ +static text * +enclose_with_parenthesis(text *str) +{ + int len = VARSIZE_ANY_EXHDR(str); + text *result; + char *ptr; + + result = palloc(len + VARHDRSZ + 2); + SET_VARSIZE(result, len + VARHDRSZ + 2); + ptr = VARDATA(result); + memcpy(ptr, "(", 1); + memcpy(ptr+1, VARDATA_ANY(str), len); + memcpy(ptr+len+1, ")", 1); + + return result; +} + +/* + * regexp_instr() + * Return the position within the string where the match was located + */ +Datum +regexp_instr(PG_FUNCTION_ARGS) +{ + text *orig_str = NULL; + text *pattern = NULL; + text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(5); + int start = 1; + int occurrence = 1; + int return_opt = 0; + int subexpr = 0; + int pos; + + pg_re_flags re_flags; + regexp_matches_ctx *matchctx; + + /* regexp_instr(string, pattern, start) */ + if (PG_NARGS() > 2) + start = PG_GETARG_INT32(2); + + if (start < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid value for parameter \"%s\": %d", "start_position", start))); + + if (start > 1) + orig_str = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len, + PG_GETARG_DATUM(0), + Int32GetDatum(start) + )); + else + /* regexp_instr(string, pattern) */ + orig_str = PG_GETARG_TEXT_PP(0); + + if (orig_str == NULL) + PG_RETURN_NULL(); + + /* regexp_instr(string, pattern, start, occurrence) */ + if (PG_NARGS() > 3) + occurrence = PG_GETARG_INT32(3); + + if (occurrence <= 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid value for parameter \"%s\": %d", "occurrence", occurrence))); + + /* regexp_instr(string, pattern, start, occurrence, return_opt) */ + if (PG_NARGS() > 4) + return_opt = PG_GETARG_INT32(4); + + if (return_opt != 0 && return_opt != 1) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid value for parameter \"%s\": %d", "return_option", return_opt))); + + /* regexp_instr(string, pattern, start, occurrence, return_opt, flags, subexpr) */ + if (PG_NARGS() > 6) + subexpr = PG_GETARG_INT32(6); + if (subexpr < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid value for parameter \"%s\": %d", "group", subexpr))); + + /* + * If subexpr is zero (default), then the position of the entire + * substring that matches the pattern is returned. Otherwise we + * will exactly register the subexpressions given in the pattern. + * Enclose pattern between parenthesis to register the position + * of the entire substring. + */ + pattern = enclose_with_parenthesis(PG_GETARG_TEXT_PP(1)); + + /* Determine options */ + parse_re_flags(&re_flags, flags); + /* this function require flag 'g' */ + re_flags.glob = true; + + matchctx = setup_regexp_matches(orig_str, pattern, &re_flags, + PG_GET_COLLATION(), true, false, false); + + /* If no match is found, then the function returns 0 */ + if (matchctx->nmatches == 0) + PG_RETURN_INT32(0); + + /* When occurrence exceed matches return 0 */ + if (occurrence > matchctx->nmatches) + PG_RETURN_INT32(0); + + /* When subexpr exceed number of subexpression return 0 */ + if (subexpr > matchctx->npatterns - 1) + PG_RETURN_INT32(0); + + /* + * Returns the position of the first character of the occurrence + * or for subexpression in this occurrence. + */ + pos = (occurrence*matchctx->npatterns*2)-((matchctx->npatterns-subexpr)*2); + if (return_opt == 1) + pos += 1; + + PG_RETURN_INT32(matchctx->match_locs[pos]+start); +} + +/* This is separate to keep the opr_sanity regression test from complaining */ +Datum +regexp_instr_no_start(PG_FUNCTION_ARGS) +{ + return regexp_instr(fcinfo); +} + +/* This is separate to keep the opr_sanity regression test from complaining */ +Datum +regexp_instr_no_occurrence(PG_FUNCTION_ARGS) +{ + return regexp_instr(fcinfo); +} + +/* This is separate to keep the opr_sanity regression test from complaining */ +Datum +regexp_instr_no_return_opt(PG_FUNCTION_ARGS) +{ + return regexp_instr(fcinfo); +} + +/* This is separate to keep the opr_sanity regression test from complaining */ +Datum +regexp_instr_no_flags(PG_FUNCTION_ARGS) +{ + return regexp_instr(fcinfo); +} + +/* This is separate to keep the opr_sanity regression test from complaining */ +Datum +regexp_instr_no_subexpr(PG_FUNCTION_ARGS) +{ + return regexp_instr(fcinfo); +} + +/* + * regexp_substr() + * Return the substring within the string that match a regular + * expression pattern + */ +Datum +regexp_substr(PG_FUNCTION_ARGS) +{ + text *orig_str = NULL; + text *pattern = NULL; + text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(4); + int start = 1; + int occurrence = 1; + int subexpr = 0; + int so, eo, pos; + + pg_re_flags re_flags; + regexp_matches_ctx *matchctx; + + if (PG_NARGS() > 2) + start = PG_GETARG_INT32(2); + + if (start < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid value for parameter \"%s\": %d", "start_position", start))); + + /* regexp_substr(string, pattern, start) */ + if (start > 1) + orig_str = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len, + PG_GETARG_DATUM(0), + Int32GetDatum(start) + )); + else + /* regexp_substr(string, pattern) */ + orig_str = PG_GETARG_TEXT_PP(0); + + if (orig_str == NULL) + PG_RETURN_NULL(); + + /* regexp_substr(string, pattern, start, occurrence) */ + if (PG_NARGS() > 3) + occurrence = PG_GETARG_INT32(3); + + if (occurrence <= 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid value for parameter \"%s\": %d", "occurrence", occurrence))); + + /* regexp_substr(string, pattern, start, occurrence, flags, subexpr) */ + if (PG_NARGS() > 5) + subexpr = PG_GETARG_INT32(5); + if (subexpr < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid value for parameter \"%s\": %d", "group", subexpr))); + + /* + * If subexpr is zero (default), then the position of the entire + * substring that matches the pattern is returned. Otherwise we + * will exactly register the subexpressions given in the pattern. + * Enclose pattern between parenthesis to register the position + * of the entire substring. + */ + pattern = enclose_with_parenthesis(PG_GETARG_TEXT_PP(1)); + + /* Determine options */ + parse_re_flags(&re_flags, flags); + /* this function require flag 'g' */ + re_flags.glob = true; + + matchctx = setup_regexp_matches(orig_str, pattern, &re_flags, + PG_GET_COLLATION(), true, false, false); + + /* If no match is found, then the function returns NULL */ + if (matchctx->nmatches == 0) + PG_RETURN_NULL(); + + /* When occurrence exceed matches return NULL */ + if (occurrence > matchctx->nmatches) + PG_RETURN_NULL(); + + /* When subexpr exceed number of subexpression return NULL */ + if (subexpr > matchctx->npatterns - 1) + PG_RETURN_NULL(); + + /* Returns the substring corresponding to the occurrence. */ + pos = (occurrence*matchctx->npatterns*2)-((matchctx->npatterns-subexpr)*2); + so = matchctx->match_locs[pos]+1; + eo = matchctx->match_locs[pos+1]+1; + + PG_RETURN_DATUM(DirectFunctionCall3(text_substr, + PointerGetDatum(matchctx->orig_str), + Int32GetDatum(so), + Int32GetDatum(eo - so))); +} + +/* This is separate to keep the opr_sanity regression test from complaining */ +Datum +regexp_substr_no_start(PG_FUNCTION_ARGS) +{ + return regexp_substr(fcinfo); +} + +/* This is separate to keep the opr_sanity regression test from complaining */ +Datum +regexp_substr_no_occurrence(PG_FUNCTION_ARGS) +{ + return regexp_substr(fcinfo); +} + +/* This is separate to keep the opr_sanity regression test from complaining */ +Datum +regexp_substr_no_flags(PG_FUNCTION_ARGS) +{ + return regexp_substr(fcinfo); +} + +/* This is separate to keep the opr_sanity regression test from complaining */ +Datum +regexp_substr_no_subexpr(PG_FUNCTION_ARGS) +{ + return regexp_substr(fcinfo); +} + /* * setup_regexp_matches --- do the initial matching for regexp_match * and regexp_split functions diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 8cd0252082..f45c98c05f 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -3579,6 +3579,63 @@ proname => 'regexp_matches', prorows => '10', proretset => 't', prorettype => '_text', proargtypes => 'text text text', prosrc => 'regexp_matches' }, +{ oid => '9614', descr => 'count match(es) for regexp', + proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text', + prosrc => 'regexp_count_no_start' }, +{ oid => '9615', descr => 'count match(es) for regexp', + proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text int4', + prosrc => 'regexp_count_no_flags' }, +{ oid => '9616', descr => 'count match(es) for regexp', + proname => 'regexp_count', prorettype => 'int4', + proargtypes => 'text text int4 text', prosrc => 'regexp_count' }, +{ oid => '9617', descr => 'position where the match for regexp was located', + proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text', + prosrc => 'regexp_instr_no_start' }, +{ oid => '9618', descr => 'position where the match for regexp was located', + proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4', + prosrc => 'regexp_instr_no_occurrence' }, +{ oid => '9619', descr => 'position where the match for regexp was located', + proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4', + prosrc => 'regexp_instr_no_return_opt' }, +{ oid => '9620', descr => 'position where the match for regexp was located', + proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4', + prosrc => 'regexp_instr_no_flags' }, +{ oid => '9621', descr => 'position where the match for regexp was located', + proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4 text', + prosrc => 'regexp_instr_no_subexpr' }, +{ oid => '9622', descr => 'position where the match for regexp was located', + proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4 text int4', + prosrc => 'regexp_instr' }, +{ oid => '9623', descr => 'substring that matches the regexp pattern', + proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text', + prosrc => 'regexp_substr_no_start' }, +{ oid => '9624', descr => 'substring that matchies the regexp pattern', + proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4', + prosrc => 'regexp_substr_no_occurrence' }, +{ oid => '9625', descr => 'substring that matches the regexp pattern', + proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4', + prosrc => 'regexp_substr_no_flags' }, +{ oid => '9626', descr => 'substring that matches the regexp pattern', + proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4 text', + prosrc => 'regexp_substr_no_subexpr' }, +{ oid => '9627', descr => 'substring that matches the regexp pattern', + proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4 text int4', + prosrc => 'regexp_substr' }, +{ oid => '9628', descr => 'evaluate match(es) for regexp', + proname => 'regexp_like', prorettype => 'bool', proargtypes => 'text text', + prosrc => 'regexp_like_no_flags' }, +{ oid => '9629', descr => 'evaluate match(es) for regexp', + proname => 'regexp_like', prorettype => 'bool', + proargtypes => 'text text text', prosrc => 'regexp_like' }, +{ oid => '9606', descr => 'replace text using regexp', + proname => 'regexp_replace', prorettype => 'text', + proargtypes => 'text text text int4 int4 text', prosrc => 'textregexreplace_extended' }, +{ oid => '9607', descr => 'replace text using regexp', + proname => 'regexp_replace', prorettype => 'text', + proargtypes => 'text text text int4 int4', prosrc => 'textregexreplace_extended_no_flags' }, +{ oid => '9608', descr => 'replace text using regexp', + proname => 'regexp_replace', prorettype => 'text', + proargtypes => 'text text text int4', prosrc => 'textregexreplace_extended_no_occurrence' }, { oid => '2088', descr => 'split string by field_sep and return field_num', proname => 'split_part', prorettype => 'text', proargtypes => 'text text int4', prosrc => 'split_part' }, diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out index 91aa819804..b4e3501556 100644 --- a/src/test/regress/expected/strings.out +++ b/src/test/regress/expected/strings.out @@ -905,6 +905,757 @@ SELECT foo, length(foo) FROM regexp_split_to_table('thE QUick bROWn FOx jUMPs ov ERROR: regexp_split_to_table() does not support the "global" option SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g'); ERROR: regexp_split_to_array() does not support the "global" option +-- regexp_like tests +SELECT regexp_like('a'||CHR(10)||'d', 'a.d'); + regexp_like +------------- + t +(1 row) + +SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'm'); + regexp_like +------------- + f +(1 row) + +SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'n'); + regexp_like +------------- + f +(1 row) + +SELECT regexp_like('Steven', '^Ste(v|ph)en$'); + regexp_like +------------- + t +(1 row) + +SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar'); + regexp_like +------------- + f +(1 row) + +SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', 'bar'); + regexp_like +------------- + t +(1 row) + +SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'm'); + regexp_like +------------- + t +(1 row) + +SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'n'); + regexp_like +------------- + t +(1 row) + +SELECT regexp_like('GREEN', '([aeiou])\1'); + regexp_like +------------- + f +(1 row) + +SELECT regexp_like('GREEN', '([aeiou])\1', 'i'); + regexp_like +------------- + t +(1 row) + +SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'i'); + regexp_like +------------- + t +(1 row) + +SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'i'); + regexp_like +------------- + f +(1 row) + +SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'in'); + regexp_like +------------- + t +(1 row) + +SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'in'); + regexp_like +------------- + t +(1 row) + +SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'im'); + regexp_like +------------- + t +(1 row) + +SELECT REGEXP_LIKE('abc', 'a b c'); + regexp_like +------------- + f +(1 row) + +SELECT REGEXP_LIKE('abc', 'a b c','x'); + regexp_like +------------- + t +(1 row) + +-- count all matches for regexp +SELECT regexp_count('123123123123123', '(12)3'); + regexp_count +-------------- + 5 +(1 row) + +-- count all matches with start position +SELECT regexp_count('123123123123', '123', 0); + regexp_count +-------------- + 4 +(1 row) + +SELECT regexp_count('123123123123', '123', 1); + regexp_count +-------------- + 4 +(1 row) + +SELECT regexp_count('123123123123', '123', 3); + regexp_count +-------------- + 3 +(1 row) + +SELECT regexp_count('123123123123', '123', -3); +ERROR: invalid value for parameter "start_position": -3 +-- count all matches in NULL string with a start position +SELECT regexp_count(NULL, '123', 3); + regexp_count +-------------- + \N +(1 row) + +-- count all matches with a start position greater than string length +SELECT regexp_count('123', '123', 10); + regexp_count +-------------- + 0 +(1 row) + +-- count all matches from different regexp +SELECT regexp_count('ABC123', '[A-Z]'), regexp_count('A1B2C3', '[A-Z]'); + regexp_count | regexp_count +--------------+-------------- + 3 | 3 +(1 row) + +SELECT regexp_count('ABC123', '[A-Z][0-9]'), regexp_count('A1B2C3', '[A-Z][0-9]'); + regexp_count | regexp_count +--------------+-------------- + 1 | 3 +(1 row) + +SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C3', '[A-Z][0-9]{2}'); + regexp_count | regexp_count +--------------+-------------- + 1 | 0 +(1 row) + +SELECT regexp_count('ABC123', '([A-Z][0-9]){2}'), regexp_count('A1B2C3', '([A-Z][0-9]){2}'); + regexp_count | regexp_count +--------------+-------------- + 0 | 1 +(1 row) + +SELECT regexp_count('ABC123A5', '^[A-Z][0-9]'), regexp_count('A1B2C3', '^[A-Z][0-9]'); + regexp_count | regexp_count +--------------+-------------- + 0 | 1 +(1 row) + +SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C34', '[A-Z][0-9]{2}'); + regexp_count | regexp_count +--------------+-------------- + 1 | 1 +(1 row) + +-- count matches with newline case insensivity +SELECT regexp_count('a'||CHR(10)||'d', 'a.d'); + regexp_count +-------------- + 1 +(1 row) + +SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 's'); + regexp_count +-------------- + 1 +(1 row) + +-- count matches with newline case sensivity +SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 'n'); + regexp_count +-------------- + 0 +(1 row) + +-- count not multiline matches +SELECT regexp_count('a'||CHR(10)||'d', '^d$'); + regexp_count +-------------- + 0 +(1 row) + +-- count multiline matches +SELECT regexp_count('a'||CHR(10)||'d', '^d$', 1, 'm'); + regexp_count +-------------- + 1 +(1 row) + +SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm'); -- 1 + regexp_count +-------------- + 1 +(1 row) + +-- Count the number of occurrences of the substring an in the string. +SELECT regexp_count('a man, a plan, a canal: Panama', 'an'); + regexp_count +-------------- + 4 +(1 row) + +-- Find the number of occurrences of the substring an in the string starting with the fifth character. +SELECT regexp_count('a man, a plan, a canal: Panama', 'an',5); + regexp_count +-------------- + 3 +(1 row) + +-- Find the number of occurrences of a substring containing a lower-case character +-- followed by an. In the first example, do not use a modifier. In the second example, +-- use the i modifier to force the regular expression to ignore case. +SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an'); + regexp_count +-------------- + 3 +(1 row) + +SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an', 1, 'i'); + regexp_count +-------------- + 4 +(1 row) + +DROP TABLE IF EXISTS regexp_temp; +NOTICE: table "regexp_temp" does not exist, skipping +CREATE TABLE regexp_temp(fullname varchar(20), email varchar(20)); +INSERT INTO regexp_temp (fullname, email) VALUES ('John Doe', 'john...@example.com'); +INSERT INTO regexp_temp (fullname, email) VALUES ('Jane Doe', 'janedoe'); +-- count matches case sensitive +SELECT fullname, regexp_count(fullname, 'e', 1, 'c') FROM regexp_temp; + fullname | regexp_count +----------+-------------- + John Doe | 1 + Jane Doe | 2 +(2 rows) + +SELECT fullname, regexp_count(fullname, 'D', 1, 'c') FROM regexp_temp; + fullname | regexp_count +----------+-------------- + John Doe | 1 + Jane Doe | 1 +(2 rows) + +SELECT fullname, regexp_count(fullname, 'd', 1, 'c') FROM regexp_temp; + fullname | regexp_count +----------+-------------- + John Doe | 0 + Jane Doe | 0 +(2 rows) + +-- count matches case insensitive +SELECT fullname, regexp_count(fullname, 'E', 1, 'i') FROM regexp_temp; + fullname | regexp_count +----------+-------------- + John Doe | 1 + Jane Doe | 2 +(2 rows) + +SELECT fullname, regexp_count(fullname, 'do', 1, 'i') FROM regexp_temp; + fullname | regexp_count +----------+-------------- + John Doe | 1 + Jane Doe | 1 +(2 rows) + +-- return the start position of the 6th occurence starting at beginning of the string +SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 1, 6); + regexp_instr +-------------- + 32 +(1 row) + +-- return the start position of the 5th occurence starting after the first word +SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0); + regexp_instr +-------------- + 32 +(1 row) + +-- return the ending position of the 5th occurence starting after the first word +SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0); + regexp_instr +-------------- + 32 +(1 row) + +-- return the ending position of the 2nd occurence starting after the first word +SELECT regexp_instr('number of your street, zipcode thetown, FR', '[T|Z|S][[:alpha:]]{5}', 7, 2, 1, 'i'); + regexp_instr +-------------- + 30 +(1 row) + +-- return the starting position corresponding to the different capture group +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 0); + regexp_instr +-------------- + 1 +(1 row) + +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 1); + regexp_instr +-------------- + 1 +(1 row) + +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 2); + regexp_instr +-------------- + 4 +(1 row) + +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3); + regexp_instr +-------------- + 5 +(1 row) + +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4); + regexp_instr +-------------- + 7 +(1 row) + +-- return the starting position corresponding to a non existant capture group +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 5); + regexp_instr +-------------- + 0 +(1 row) + +-- Same but with the ending position +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 0); + regexp_instr +-------------- + 9 +(1 row) + +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 1); + regexp_instr +-------------- + 4 +(1 row) + +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 2); + regexp_instr +-------------- + 9 +(1 row) + +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 3); + regexp_instr +-------------- + 7 +(1 row) + +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 4); + regexp_instr +-------------- + 9 +(1 row) + +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 5); + regexp_instr +-------------- + 0 +(1 row) + +-- start position of a valid email +SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+') "valid_email" FROM regexp_temp; + email | valid_email +---------------------+------------- + john...@example.com | 1 + janedoe | 0 +(2 rows) + +-- ending position of a valid email +SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1) "valid_email" FROM regexp_temp; + email | valid_email +---------------------+------------- + john...@example.com | 20 + janedoe | 0 +(2 rows) + +-- start position of first capture group in the email (the dot part) +SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 0, 'i', 1) FROM regexp_temp; + email | regexp_instr +---------------------+-------------- + john...@example.com | 16 + janedoe | 0 +(2 rows) + +-- ending position of first capture group in the email (the dot part) +SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1, 'i', 1) FROM regexp_temp; + email | regexp_instr +---------------------+-------------- + john...@example.com | 20 + janedoe | 0 +(2 rows) + +-- test negative values +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', -1, 1, 1, 'i', 1); +ERROR: invalid value for parameter "start_position": -1 +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, -1, 1, 'i', 1); +ERROR: invalid value for parameter "occurrence": -1 +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, -1, 'i', 1); +ERROR: invalid value for parameter "return_option": -1 +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', -1); +ERROR: invalid value for parameter "group": -1 +-- Find the first occurrence of a sequence of letters starting with the letter e +-- and ending with the letter y in the phrase "easy come, easy go." +SELECT regexp_instr('easy come, easy go','e\w*y'); + regexp_instr +-------------- + 1 +(1 row) + +-- Find the first sequence of letters starting with the letter e and ending with +-- the letter y in the string "easy come, easy go" starting at the second character +SELECT regexp_instr('easy come, easy go','e\w*y',2); + regexp_instr +-------------- + 12 +(1 row) + +-- Find the second sequence of letters starting with the letter e and ending with +-- the letter y in the string "easy come, easy go" starting at the first character. +SELECT regexp_instr('easy come, easy go','e\w*y',1,2); + regexp_instr +-------------- + 12 +(1 row) + +-- Find the position of the first character after the first whitespace in the string "easy come, easy go." +SELECT regexp_instr('easy come, easy go','\s',1,1,1); + regexp_instr +-------------- + 6 +(1 row) + +-- Find the position of the start of the third word in a string by capturing each +-- word as a subexpression, and returning the third subexpression's start position. +SELECT regexp_instr('one two three','(\w+)\s+(\w+)\s+(\w+)', 1,1,0,'',3); + regexp_instr +-------------- + 9 +(1 row) + +-- return the substring matching the regexp +SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+'); + regexp_substr +---------------- + , zipcode town +(1 row) + +-- return the substring matching the regexp +SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 24); + regexp_substr +--------------- + , FR +(1 row) + +-- return the substring matching the regexp at the first occurrence +SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 1); + regexp_substr +---------------- + , zipcode town +(1 row) + +-- return the substring matching the regexp at the second occurrence +SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 2); + regexp_substr +--------------- + , FR +(1 row) + +-- case sensitivity substring search +SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1); + regexp_substr +--------------- + \N +(1 row) + +SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i'); + regexp_substr +---------------- + , zipcode town +(1 row) + +-- case sensitivity substring search with no capture group +SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0); + regexp_substr +---------------- + , zipcode town +(1 row) + +-- case sensitivity substring search with non existing capture group +SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0); + regexp_substr +---------------- + , zipcode town +(1 row) + +-- return the substring matching the regexp at different occurrence and capture group +SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 4); + regexp_substr +--------------- + 78 +(1 row) + +SELECT regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3); + regexp_substr +--------------- + 55 +(1 row) + +SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 0); + regexp_substr +--------------- + 12345678 +(1 row) + +-- test negative values +SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', -1, 1, 'i', 4); +ERROR: invalid value for parameter "start_position": -1 +SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, -1, 'i', 4); +ERROR: invalid value for parameter "occurrence": -1 +SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', -4); +ERROR: invalid value for parameter "group": -4 +-- Select the first substring of letters that end with "thy." +SELECT regexp_substr('healthy, wealthy, and wise','\w+thy'); + regexp_substr +--------------- + healthy +(1 row) + +-- Select the first substring of letters that ends with "thy" starting at the second character in the string. +SELECT regexp_substr('healthy, wealthy, and wise','\w+thy',2); + regexp_substr +--------------- + ealthy +(1 row) + +-- Return the contents of the third captured subexpression, which captures the third word in the string. +SELECT regexp_substr('one two three', '(\w+)\s+(\w+)\s+(\w+)', 1, 1, '', 3); + regexp_substr +--------------- + three +(1 row) + +DROP TABLE IF EXISTS regexp_temp; +-- Regression tests for extended regexp_replace() function with start position and occurrence +SELECT regexp_replace('512.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1); + regexp_replace +---------------- + (512) 123-4567 +(1 row) + +SELECT regexp_replace('512.123.4567 612.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1, 0); + regexp_replace +------------------------------- + (512) 123-4567 (612) 123-4567 +(1 row) + +SELECT regexp_replace('number your street,'||CHR(10)||' zipcode town, FR', '( ){2,}', ' ', 1, 0); + regexp_replace +--------------------- + number your street,+ + zipcode town, FR +(1 row) + +SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9); + regexp_replace +-------------------------------------------- + number your street, zipcode town, FR +(1 row) + +SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 0); + regexp_replace +---------------------------------------- + number your street, zipcode town, FR +(1 row) + +SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2); + regexp_replace +--------------------------------------------- + number your street, zipcode town, FR +(1 row) + +SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2, 'm'); + regexp_replace +--------------------------------------------- + number your street, zipcode town, FR +(1 row) + +SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 1, 'i'); + regexp_replace +------------------------------------------------- + number yo[r] street, zipcode town, FR +(1 row) + +SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 2, 'i'); + regexp_replace +---------------------------------------------- + number your s[t], zipcode town, FR +(1 row) + +SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 2); + regexp_replace +----------------------- + A PXstgreSQL function +(1 row) + +SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i'); + regexp_replace +----------------------- + X PXstgrXSQL fXnctXXn +(1 row) + +SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'i'); + regexp_replace +----------------------- + X PostgreSQL function +(1 row) + +SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 2, 'i'); + regexp_replace +----------------------- + A PXstgreSQL function +(1 row) + +SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i'); + regexp_replace +----------------------- + A PostgrXSQL function +(1 row) + +SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 9, 'i'); + regexp_replace +----------------------- + A PostgreSQL function +(1 row) + +SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 9); + regexp_replace +----------------------- + A PostgreSQL function +(1 row) + +-- Invalid parameter values +SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', -1, 0, 'i'); +ERROR: invalid value for parameter "start_position": -1 +SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, -1, 'i'); +ERROR: invalid value for parameter "occurrence": -1 +-- Modifier 'g' should not be taken in account, we have an occurrence to replace +SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'g'); + regexp_replace +----------------------- + A PXstgreSQL function +(1 row) + +-- Find groups of "word characters" (letters, numbers and underscore) ending with +-- "thy" in the string "healthy, wealthy, and wise" and replace them with nothing. +SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 'g'); + regexp_replace +---------------- + , , and wise +(1 row) + +SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0); + regexp_replace +---------------- + , , and wise +(1 row) + +SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0, 'g'); + regexp_replace +---------------- + , , and wise +(1 row) + +-- Find groups of word characters ending with "thy" and replace with "something." +SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 'g'); + regexp_replace +-------------------------------- + something, something, and wise +(1 row) + +-- Find groups of word characters ending with "thy" and replace with the string +-- "something" starting at the third character in the string. +SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 3, 0); + regexp_replace +---------------------------------- + hesomething, something, and wise +(1 row) + +-- Replace the second group of word characters ending with "thy" with "something." +SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 1, 2); + regexp_replace +------------------------------ + healthy, something, and wise +(1 row) + +-- Find groups of word characters ending with "thy" capturing the letters before +-- the "thy", and replace with the captured letters plus the letters "ish." +SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 'g'); + regexp_replace +---------------------------- + healish, wealish, and wise +(1 row) + +SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 1, 0); + regexp_replace +---------------------------- + healish, wealish, and wise +(1 row) + +-- Ambiguous use of option parameter with regexp_replace() +PREPARE rr AS SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', $1); +EXECUTE rr(1); +ERROR: ambiguous use of the option parameter in regex_replace(), value: 1 +HINT: you might set the occurrence parameter to force the use of the extended form of regex_replace() +DEALLOCATE rr; -- change NULL-display back \pset null '' -- E021-11 position expression diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql index 2c502534c2..f50131896f 100644 --- a/src/test/regress/sql/strings.sql +++ b/src/test/regress/sql/strings.sql @@ -255,6 +255,202 @@ SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', SELECT foo, length(foo) FROM regexp_split_to_table('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g') AS foo; SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g'); +-- regexp_like tests +SELECT regexp_like('a'||CHR(10)||'d', 'a.d'); +SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'm'); +SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'n'); +SELECT regexp_like('Steven', '^Ste(v|ph)en$'); +SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar'); +SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', 'bar'); +SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'm'); +SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'n'); +SELECT regexp_like('GREEN', '([aeiou])\1'); +SELECT regexp_like('GREEN', '([aeiou])\1', 'i'); +SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'i'); +SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'i'); +SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'in'); +SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'in'); +SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'im'); +SELECT REGEXP_LIKE('abc', 'a b c'); +SELECT REGEXP_LIKE('abc', 'a b c','x'); + +-- count all matches for regexp +SELECT regexp_count('123123123123123', '(12)3'); +-- count all matches with start position +SELECT regexp_count('123123123123', '123', 0); +SELECT regexp_count('123123123123', '123', 1); +SELECT regexp_count('123123123123', '123', 3); +SELECT regexp_count('123123123123', '123', -3); +-- count all matches in NULL string with a start position +SELECT regexp_count(NULL, '123', 3); +-- count all matches with a start position greater than string length +SELECT regexp_count('123', '123', 10); +-- count all matches from different regexp +SELECT regexp_count('ABC123', '[A-Z]'), regexp_count('A1B2C3', '[A-Z]'); +SELECT regexp_count('ABC123', '[A-Z][0-9]'), regexp_count('A1B2C3', '[A-Z][0-9]'); +SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C3', '[A-Z][0-9]{2}'); +SELECT regexp_count('ABC123', '([A-Z][0-9]){2}'), regexp_count('A1B2C3', '([A-Z][0-9]){2}'); +SELECT regexp_count('ABC123A5', '^[A-Z][0-9]'), regexp_count('A1B2C3', '^[A-Z][0-9]'); +SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C34', '[A-Z][0-9]{2}'); +-- count matches with newline case insensivity +SELECT regexp_count('a'||CHR(10)||'d', 'a.d'); +SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 's'); +-- count matches with newline case sensivity +SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 'n'); +-- count not multiline matches +SELECT regexp_count('a'||CHR(10)||'d', '^d$'); +-- count multiline matches +SELECT regexp_count('a'||CHR(10)||'d', '^d$', 1, 'm'); +SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm'); -- 1 +-- Count the number of occurrences of the substring an in the string. +SELECT regexp_count('a man, a plan, a canal: Panama', 'an'); +-- Find the number of occurrences of the substring an in the string starting with the fifth character. +SELECT regexp_count('a man, a plan, a canal: Panama', 'an',5); +-- Find the number of occurrences of a substring containing a lower-case character +-- followed by an. In the first example, do not use a modifier. In the second example, +-- use the i modifier to force the regular expression to ignore case. +SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an'); +SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an', 1, 'i'); + +DROP TABLE IF EXISTS regexp_temp; +CREATE TABLE regexp_temp(fullname varchar(20), email varchar(20)); +INSERT INTO regexp_temp (fullname, email) VALUES ('John Doe', 'john...@example.com'); +INSERT INTO regexp_temp (fullname, email) VALUES ('Jane Doe', 'janedoe'); +-- count matches case sensitive +SELECT fullname, regexp_count(fullname, 'e', 1, 'c') FROM regexp_temp; +SELECT fullname, regexp_count(fullname, 'D', 1, 'c') FROM regexp_temp; +SELECT fullname, regexp_count(fullname, 'd', 1, 'c') FROM regexp_temp; +-- count matches case insensitive +SELECT fullname, regexp_count(fullname, 'E', 1, 'i') FROM regexp_temp; +SELECT fullname, regexp_count(fullname, 'do', 1, 'i') FROM regexp_temp; + +-- return the start position of the 6th occurence starting at beginning of the string +SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 1, 6); +-- return the start position of the 5th occurence starting after the first word +SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0); +-- return the ending position of the 5th occurence starting after the first word +SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0); +-- return the ending position of the 2nd occurence starting after the first word +SELECT regexp_instr('number of your street, zipcode thetown, FR', '[T|Z|S][[:alpha:]]{5}', 7, 2, 1, 'i'); +-- return the starting position corresponding to the different capture group +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 0); +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 1); +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 2); +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3); +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4); +-- return the starting position corresponding to a non existant capture group +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 5); +-- Same but with the ending position +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 0); +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 1); +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 2); +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 3); +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 4); +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 5); +-- start position of a valid email +SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+') "valid_email" FROM regexp_temp; +-- ending position of a valid email +SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1) "valid_email" FROM regexp_temp; +-- start position of first capture group in the email (the dot part) +SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 0, 'i', 1) FROM regexp_temp; +-- ending position of first capture group in the email (the dot part) +SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1, 'i', 1) FROM regexp_temp; +-- test negative values +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', -1, 1, 1, 'i', 1); +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, -1, 1, 'i', 1); +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, -1, 'i', 1); +SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', -1); + +-- Find the first occurrence of a sequence of letters starting with the letter e +-- and ending with the letter y in the phrase "easy come, easy go." +SELECT regexp_instr('easy come, easy go','e\w*y'); +-- Find the first sequence of letters starting with the letter e and ending with +-- the letter y in the string "easy come, easy go" starting at the second character +SELECT regexp_instr('easy come, easy go','e\w*y',2); +-- Find the second sequence of letters starting with the letter e and ending with +-- the letter y in the string "easy come, easy go" starting at the first character. +SELECT regexp_instr('easy come, easy go','e\w*y',1,2); +-- Find the position of the first character after the first whitespace in the string "easy come, easy go." +SELECT regexp_instr('easy come, easy go','\s',1,1,1); +-- Find the position of the start of the third word in a string by capturing each +-- word as a subexpression, and returning the third subexpression's start position. +SELECT regexp_instr('one two three','(\w+)\s+(\w+)\s+(\w+)', 1,1,0,'',3); + +-- return the substring matching the regexp +SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+'); +-- return the substring matching the regexp +SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 24); +-- return the substring matching the regexp at the first occurrence +SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 1); +-- return the substring matching the regexp at the second occurrence +SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 2); +-- case sensitivity substring search +SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1); +SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i'); +-- case sensitivity substring search with no capture group +SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0); +-- case sensitivity substring search with non existing capture group +SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0); +-- return the substring matching the regexp at different occurrence and capture group +SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 4); +SELECT regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3); +SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 0); +-- test negative values +SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', -1, 1, 'i', 4); +SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, -1, 'i', 4); +SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', -4); +-- Select the first substring of letters that end with "thy." +SELECT regexp_substr('healthy, wealthy, and wise','\w+thy'); +-- Select the first substring of letters that ends with "thy" starting at the second character in the string. +SELECT regexp_substr('healthy, wealthy, and wise','\w+thy',2); +-- Return the contents of the third captured subexpression, which captures the third word in the string. +SELECT regexp_substr('one two three', '(\w+)\s+(\w+)\s+(\w+)', 1, 1, '', 3); + +DROP TABLE IF EXISTS regexp_temp; + +-- Regression tests for extended regexp_replace() function with start position and occurrence +SELECT regexp_replace('512.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1); +SELECT regexp_replace('512.123.4567 612.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1, 0); +SELECT regexp_replace('number your street,'||CHR(10)||' zipcode town, FR', '( ){2,}', ' ', 1, 0); +SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9); +SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 0); +SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2); +SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2, 'm'); +SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 1, 'i'); +SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 2, 'i'); +SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 2); +SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i'); +SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'i'); +SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 2, 'i'); +SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i'); +SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 9, 'i'); +SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 9); +-- Invalid parameter values +SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', -1, 0, 'i'); +SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, -1, 'i'); +-- Modifier 'g' should not be taken in account, we have an occurrence to replace +SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'g'); +-- Find groups of "word characters" (letters, numbers and underscore) ending with +-- "thy" in the string "healthy, wealthy, and wise" and replace them with nothing. +SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 'g'); +SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0); +SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0, 'g'); +-- Find groups of word characters ending with "thy" and replace with "something." +SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 'g'); +-- Find groups of word characters ending with "thy" and replace with the string +-- "something" starting at the third character in the string. +SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 3, 0); +-- Replace the second group of word characters ending with "thy" with "something." +SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 1, 2); +-- Find groups of word characters ending with "thy" capturing the letters before +-- the "thy", and replace with the captured letters plus the letters "ish." +SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 'g'); +SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 1, 0); +-- Ambiguous use of option parameter with regexp_replace() +PREPARE rr AS SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', $1); +EXECUTE rr(1); +DEALLOCATE rr; + -- change NULL-display back \pset null ''