Hi! Sorry for very long reply.
On Thu, Aug 16, 2018 at 11:44 PM David G. Johnston <david.g.johns...@gmail.com> wrote: > If the new behavior is an error I don't really have a problem since the need > to fix one's queries will be obvious. > > "So length of last group of spaces/separators in the pattern should be > greater or equal to length of spaces/separators in the input string. > Other previous groups are ignored in Oracle. And that seems > ridiculous for me." > > What do you believe should (or does) happen? Multiple groups always fail or > something else? How does this interplay with the detection of the negative > timezone offset? I'm not finding the behavior ridiculous at first blush; not > to the extent to avoid emulating it in a function whose purpose is emulation. > Being more lenient than Oracle seems undesirable. Regardless of the choice > made here it should be memorialized in the regression tests. The current version of patch doesn't really distinguish spaces and delimiters in format string in non-FX mode. So, spaces and delimiters are forming single group. For me Oracle behavior is ridiculous at least because it doesn't allow cases when input string exactly matches format string. This one fails: SELECT to_timestamp('2018- -01 02', 'YYYY- -MM DD') FROM dual But both this two are working: SELECT to_timestamp('2018- -01 02', 'YYYY---MM DD') FROM dual SELECT to_timestamp('2018- -01 02', 'YYYY MM DD') FROM dual Regarding TZH, Oracle takes into account total number of characters between placeholders as we do. So, there is no change in this aspect. > The couple of regression tests that change do so for the better. It would be > illuminating to set this up as two patches though, one introducing all of the > new regression tests against the current code and then a second patch with > the changed behavior with only the affected tests. OK, here you go. 0001-to_timestamp-regression-test-v17.patch introduces changes in regression tests and their output for current master, while 0002-to_timestamp-format-checking-v17.patch contain changes to to_timestamp itself. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out index 63e39198e68..e585b07c18c 100644 --- a/src/test/regress/expected/horology.out +++ b/src/test/regress/expected/horology.out @@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS'); Sat Feb 16 08:14:30 0097 PST (1 row) +SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS'); + to_timestamp +------------------------------ + Fri Mar 18 23:38:15 2011 PDT +(1 row) + SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD'); to_timestamp ------------------------------ Sat Jan 12 00:00:00 1985 PST (1 row) +SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD'); + to_timestamp +------------------------------ + Sat Jan 12 00:00:00 1985 PST +(1 row) + +SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD'); + to_timestamp +------------------------------ + Wed Jan 02 00:00:00 1985 PST +(1 row) + SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16', - '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD'); + '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD'); to_timestamp ------------------------------ Sun May 16 00:00:00 1976 PDT @@ -2789,7 +2807,7 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD'); (1 row) SELECT to_timestamp('15 "text between quote marks" 98 54 45', - E'HH24 "\\text between quote marks\\"" YY MI SS'); + E'HH24 "\\"text between quote marks\\"" YY MI SS'); to_timestamp ------------------------------ Thu Jan 01 15:54:45 1998 PST @@ -2810,6 +2828,24 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay'); SELECT to_timestamp('97/Feb/16', 'YYMonDD'); ERROR: invalid value "/Fe" for "Mon" DETAIL: The given value did not match any of the allowed values for this field. +SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD'); + to_timestamp +------------------------------ + Sun Feb 16 00:00:00 1997 PST +(1 row) + +SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD'); + to_timestamp +------------------------------ + Sun Feb 16 00:00:00 1997 PST +(1 row) + +SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD'); + to_timestamp +------------------------------ + Sun Feb 16 00:00:00 1997 PST +(1 row) + SELECT to_timestamp('19971116', 'YYYYMMDD'); to_timestamp ------------------------------ @@ -2999,6 +3035,48 @@ SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD HH24:MI:SS'); Sun Dec 18 03:38:15 2011 PST (1 row) +SELECT to_timestamp('2000+ JUN', 'YYYY/MON'); + to_timestamp +------------------------------ + Thu Jun 01 00:00:00 2000 PDT +(1 row) + +SELECT to_timestamp(' 2000 +JUN', 'YYYY/MON'); +ERROR: invalid value "+JU" for "MON" +DETAIL: The given value did not match any of the allowed values for this field. +SELECT to_timestamp(' 2000 +JUN', 'YYYY//MON'); + to_timestamp +------------------------------ + Thu Jun 01 00:00:00 2000 PDT +(1 row) + +SELECT to_timestamp('2000 +JUN', 'YYYY//MON'); +ERROR: invalid value "+JU" for "MON" +DETAIL: The given value did not match any of the allowed values for this field. +SELECT to_timestamp('2000 + JUN', 'YYYY MON'); +ERROR: invalid value "+ J" for "MON" +DETAIL: The given value did not match any of the allowed values for this field. +SELECT to_timestamp('2000 ++ JUN', 'YYYY MON'); +ERROR: invalid value "+ J" for "MON" +DETAIL: The given value did not match any of the allowed values for this field. +SELECT to_timestamp('2000 + + JUN', 'YYYY MON'); +ERROR: invalid value "+ J" for "MON" +DETAIL: The given value did not match any of the allowed values for this field. +SELECT to_timestamp('2000 + + JUN', 'YYYY MON'); +ERROR: invalid value "+ J" for "MON" +DETAIL: The given value did not match any of the allowed values for this field. +SELECT to_timestamp('2000 -10', 'YYYY TZH'); + to_timestamp +------------------------------ + Sat Jan 01 02:00:00 2000 PST +(1 row) + +SELECT to_timestamp('2000 -10', 'YYYY TZH'); + to_timestamp +------------------------------ + Fri Dec 31 06:00:00 1999 PST +(1 row) + SELECT to_date('2011 12 18', 'YYYY MM DD'); to_date ------------ diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql index ebb196a1cfc..807037be76e 100644 --- a/src/test/regress/sql/horology.sql +++ b/src/test/regress/sql/horology.sql @@ -392,15 +392,21 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS'); SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS'); +SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS'); + SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD'); +SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD'); + +SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD'); + SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16', - '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD'); + '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD'); SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD'); SELECT to_timestamp('15 "text between quote marks" 98 54 45', - E'HH24 "\\text between quote marks\\"" YY MI SS'); + E'HH24 "\\"text between quote marks\\"" YY MI SS'); SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY'); @@ -408,6 +414,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay'); SELECT to_timestamp('97/Feb/16', 'YYMonDD'); +SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD'); + +SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD'); + +SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD'); + SELECT to_timestamp('19971116', 'YYYYMMDD'); SELECT to_timestamp('20000-1116', 'YYYY-MMDD'); @@ -464,6 +476,17 @@ SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD HH24:MI:SS'); SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD HH24:MI:SS'); SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD HH24:MI:SS'); +SELECT to_timestamp('2000+ JUN', 'YYYY/MON'); +SELECT to_timestamp(' 2000 +JUN', 'YYYY/MON'); +SELECT to_timestamp(' 2000 +JUN', 'YYYY//MON'); +SELECT to_timestamp('2000 +JUN', 'YYYY//MON'); +SELECT to_timestamp('2000 + JUN', 'YYYY MON'); +SELECT to_timestamp('2000 ++ JUN', 'YYYY MON'); +SELECT to_timestamp('2000 + + JUN', 'YYYY MON'); +SELECT to_timestamp('2000 + + JUN', 'YYYY MON'); +SELECT to_timestamp('2000 -10', 'YYYY TZH'); +SELECT to_timestamp('2000 -10', 'YYYY TZH'); + SELECT to_date('2011 12 18', 'YYYY MM DD'); SELECT to_date('2011 12 18', 'YYYY MM DD'); SELECT to_date('2011 12 18', 'YYYY MM DD');
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index bb794e044f4..6f28fe1e753 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -6262,16 +6262,54 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); <listitem> <para> <function>to_timestamp</function> and <function>to_date</function> - skip multiple blank spaces in the input string unless the - <literal>FX</literal> option is used. For example, - <literal>to_timestamp('2000 JUN', 'YYYY MON')</literal> works, but + skip multiple blank spaces at the beginning of the input string and + around date and time values unless the <literal>FX</literal> option is used. For example, + <literal>to_timestamp(' 2000 JUN', 'YYYY MON')</literal> and + <literal>to_timestamp('2000 - JUN', 'YYYY-MON')</literal> work, but <literal>to_timestamp('2000 JUN', 'FXYYYY MON')</literal> returns an error - because <function>to_timestamp</function> expects one space only. + because <function>to_timestamp</function> expects a single space only. <literal>FX</literal> must be specified as the first item in the template. </para> </listitem> + <listitem> + <para> + A separator (a space or a non-letter/non-digit character) in the template string of + <function>to_timestamp</function> and <function>to_date</function> + matches any single separator in the input string or is skipped, + unless the <literal>FX</literal> option is used. + For example, <literal>to_timestamp('2000JUN', 'YYYY///MON')</literal> and + <literal>to_timestamp('2000/JUN', 'YYYY MON')</literal> work, but + <literal>to_timestamp('2000//JUN', 'YYYY/MON')</literal> + returns an error because the number of separators in the input string + exceeds the number of separators in the template. + </para> + <para> + If <literal>FX</literal> is specified, separators in the + input and template strings must match exactly. For example, + <literal>to_timestamp('2000/JUN', 'FXYYYY MON')</literal> + returns an error because a space is expected in the input string. + </para> + </listitem> + + <listitem> + <para> + <literal>TZH</literal> template pattern can match a signed number. + Without the <literal>FX</literal> option, it may lead to ambiguity in + interpretation of the minus sign, which can also be interpreted as a separator. + This ambiguity is resolved as follows. If the number of separators before + <literal>TZH</literal> in the template string is less than the number of + separators before the minus sign in the input string, the minus sign + is interpreted as part of <literal>TZH</literal>. + Otherwise, the minus sign is considered to be a separator between values. + For example, <literal>to_timestamp('2000 -10', 'YYYY TZH')</literal> matches + <literal>-10</literal> to <literal>TZH</literal>, but + <literal>to_timestamp('2000 -10', 'YYYY TZH')</literal> + matches <literal>10</literal> to <literal>TZH</literal>. + </para> + </listitem> + <listitem> <para> Ordinary text is allowed in <function>to_char</function> @@ -6287,6 +6325,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); string; for example <literal>"XX"</literal> skips two input characters (whether or not they are <literal>XX</literal>). </para> + <tip> + <para> + Prior to <productname>PostgreSQL</productname> 12, it was possible to + skip arbitrary text in the input string using non-letter or non-digit + characters. For example, + <literal>to_timestamp('2000y6m1d', 'yyyy-MM-DD')</literal> used to + work. Now you can only use letter characters for this purpose. For example, + <literal>to_timestamp('2000y6m1d', 'yyyytMMtDDt')</literal> and + <literal>to_timestamp('2000y6m1d', 'yyyy"y"MM"m"DD"d"')</literal> + skip <literal>y</literal>, <literal>m</literal>, and + <literal>d</literal>. + </para> + </tip> </listitem> <listitem> diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c index 30696e3575d..bf4e6067e85 100644 --- a/src/backend/utils/adt/formatting.c +++ b/src/backend/utils/adt/formatting.c @@ -165,6 +165,8 @@ typedef struct #define NODE_TYPE_END 1 #define NODE_TYPE_ACTION 2 #define NODE_TYPE_CHAR 3 +#define NODE_TYPE_SEPARATOR 4 +#define NODE_TYPE_SPACE 5 #define SUFFTYPE_PREFIX 1 #define SUFFTYPE_POSTFIX 2 @@ -955,6 +957,7 @@ typedef struct NUMProc static const KeyWord *index_seq_search(const char *str, const KeyWord *kw, const int *index); static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int type); +static bool is_separator_char(const char *str); static void NUMDesc_prepare(NUMDesc *num, FormatNode *n); static void parse_format(FormatNode *node, const char *str, const KeyWord *kw, const KeySuffix *suf, const int *index, int ver, NUMDesc *Num); @@ -1044,6 +1047,16 @@ suff_search(const char *str, const KeySuffix *suf, int type) return NULL; } +static bool +is_separator_char(const char *str) +{ + /* ASCII printable character, but not letter or digit */ + return (*str > 0x20 && *str < 0x7F && + !(*str >= 'A' && *str <= 'Z') && + !(*str >= 'a' && *str <= 'z') && + !(*str >= '0' && *str <= '9')); +} + /* ---------- * Prepare NUMDesc (number description struct) via FormatNode struct * ---------- @@ -1319,7 +1332,14 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw, if (*str == '\\' && *(str + 1) == '"') str++; chlen = pg_mblen(str); - n->type = NODE_TYPE_CHAR; + + if (ver == DCH_TYPE && is_separator_char(str)) + n->type = NODE_TYPE_SEPARATOR; + else if (isspace((unsigned char) *str)) + n->type = NODE_TYPE_SPACE; + else + n->type = NODE_TYPE_CHAR; + memcpy(n->character, str, chlen); n->character[chlen] = '\0'; n->key = NULL; @@ -2987,25 +3007,69 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out) int len, value; bool fx_mode = false; + /* number of extra skipped characters (more than given in format string) */ + int extra_skip = 0; for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++) { - if (n->type != NODE_TYPE_ACTION) + /* + * Ignore spaces at the beginning of the string and before fields when + * not in FX (fixed width) mode. + */ + if (!fx_mode && (n->type != NODE_TYPE_ACTION || n->key->id != DCH_FX) && + (n->type == NODE_TYPE_ACTION || n == node)) + { + while (*s != '\0' && isspace((unsigned char) *s)) + { + s++; + extra_skip++; + } + } + + if (n->type == NODE_TYPE_SPACE || n->type == NODE_TYPE_SEPARATOR) { /* - * Separator, so consume one character from input string. Notice - * we don't insist that the consumed character match the format's - * character. + * In non FX (fixed format) mode we don't insist that the consumed + * character matches the format's character. */ - s += pg_mblen(s); + if (!fx_mode) + { + extra_skip--; + if (isspace((unsigned char) *s) || is_separator_char(s)) + { + s++; + extra_skip++; + } + } + else + { + /* + * In FX mode we insist that whitespaces and separator + * characters from the format string match the same in input + * string. + */ + if ((n->type == NODE_TYPE_SPACE && !isspace((unsigned char) *s)) || + (n->type == NODE_TYPE_SEPARATOR && *n->character != *s)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_DATETIME_FORMAT), + errmsg("unexpected character \"%.*s\", expected character \"%s\"", + pg_mblen(s), s, n->character), + errhint("In FX mode, punctuation in the input string " + "must exactly match the format string."))); + s++; + } continue; } - - /* Ignore spaces before fields when not in FX (fixed width) mode */ - if (!fx_mode && n->key->id != DCH_FX) + else if (n->type != NODE_TYPE_ACTION) { - while (*s != '\0' && isspace((unsigned char) *s)) - s++; + /* + * Text character, so consume one character from input string. + * Notice we don't insist that the consumed character match the + * format's character. + * Text field ignores FX mode. + */ + s += pg_mblen(s); + continue; } from_char_set_mode(out, n->key->date_mode); @@ -3086,10 +3150,24 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out) n->key->name))); break; case DCH_TZH: - out->tzsign = *s == '-' ? -1 : +1; - + /* + * Value of TZH might be negative. And the issue is that we + * might swallow minus sign as the separator. So, if we have + * skipped more characters than specified in the format string, + * then we consider prepending last skipped minus to TZH. + */ if (*s == '+' || *s == '-' || *s == ' ') + { + out->tzsign = *s == '-' ? -1 : +1; s++; + } + else + { + if (extra_skip > 0 && *(s - 1) == '-') + out->tzsign = -1; + else + out->tzsign = +1; + } from_char_parse_int_len(&out->tzh, &s, 2, n); break; @@ -3261,6 +3339,17 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out) SKIP_THth(s, n->suffix); break; } + + /* Ignore all spaces after fields */ + if (!fx_mode) + { + extra_skip = 0; + while (*s != '\0' && isspace((unsigned char) *s)) + { + s++; + extra_skip++; + } + } } } diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out index e585b07c18c..976f4783e1e 100644 --- a/src/test/regress/expected/horology.out +++ b/src/test/regress/expected/horology.out @@ -2790,7 +2790,7 @@ SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD'); SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD'); to_timestamp ------------------------------ - Wed Jan 02 00:00:00 1985 PST + Sat Jan 12 00:00:00 1985 PST (1 row) SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16', @@ -2835,11 +2835,8 @@ SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD'); (1 row) SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD'); - to_timestamp ------------------------------- - Sun Feb 16 00:00:00 1997 PST -(1 row) - +ERROR: unexpected character "/", expected character ":" +HINT: In FX mode, punctuation in the input string must exactly match the format string. SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD'); to_timestamp ------------------------------ @@ -3002,7 +2999,7 @@ SELECT to_timestamp('2011-12-18 11:38 20', 'YYYY-MM-DD HH12:MI TZM'); SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD HH24:MI:SS'); to_timestamp ------------------------------ - Sun Dec 18 03:38:15 2011 PST + Sun Dec 18 23:38:15 2011 PST (1 row) SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD HH24:MI:SS'); @@ -3032,7 +3029,7 @@ SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD HH24:MI:SS'); SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD HH24:MI:SS'); to_timestamp ------------------------------ - Sun Dec 18 03:38:15 2011 PST + Sun Dec 18 23:38:15 2011 PST (1 row) SELECT to_timestamp('2000+ JUN', 'YYYY/MON'); @@ -3042,8 +3039,11 @@ SELECT to_timestamp('2000+ JUN', 'YYYY/MON'); (1 row) SELECT to_timestamp(' 2000 +JUN', 'YYYY/MON'); -ERROR: invalid value "+JU" for "MON" -DETAIL: The given value did not match any of the allowed values for this field. + to_timestamp +------------------------------ + Thu Jun 01 00:00:00 2000 PDT +(1 row) + SELECT to_timestamp(' 2000 +JUN', 'YYYY//MON'); to_timestamp ------------------------------ @@ -3051,20 +3051,32 @@ SELECT to_timestamp(' 2000 +JUN', 'YYYY//MON'); (1 row) SELECT to_timestamp('2000 +JUN', 'YYYY//MON'); -ERROR: invalid value "+JU" for "MON" -DETAIL: The given value did not match any of the allowed values for this field. + to_timestamp +------------------------------ + Thu Jun 01 00:00:00 2000 PDT +(1 row) + SELECT to_timestamp('2000 + JUN', 'YYYY MON'); -ERROR: invalid value "+ J" for "MON" -DETAIL: The given value did not match any of the allowed values for this field. + to_timestamp +------------------------------ + Thu Jun 01 00:00:00 2000 PDT +(1 row) + SELECT to_timestamp('2000 ++ JUN', 'YYYY MON'); -ERROR: invalid value "+ J" for "MON" -DETAIL: The given value did not match any of the allowed values for this field. + to_timestamp +------------------------------ + Thu Jun 01 00:00:00 2000 PDT +(1 row) + SELECT to_timestamp('2000 + + JUN', 'YYYY MON'); ERROR: invalid value "+ J" for "MON" DETAIL: The given value did not match any of the allowed values for this field. SELECT to_timestamp('2000 + + JUN', 'YYYY MON'); -ERROR: invalid value "+ J" for "MON" -DETAIL: The given value did not match any of the allowed values for this field. + to_timestamp +------------------------------ + Thu Jun 01 00:00:00 2000 PDT +(1 row) + SELECT to_timestamp('2000 -10', 'YYYY TZH'); to_timestamp ------------------------------ @@ -3092,13 +3104,13 @@ SELECT to_date('2011 12 18', 'YYYY MM DD'); SELECT to_date('2011 12 18', 'YYYY MM DD'); to_date ------------ - 12-08-2011 + 12-18-2011 (1 row) SELECT to_date('2011 12 18', 'YYYY MM DD'); to_date ------------ - 02-18-2011 + 12-18-2011 (1 row) SELECT to_date('2011 12 18', 'YYYY MM DD');