Hi, I was randomly testing some date related stuff on PG & observed that the outputs were wrong.
e.g. postgres=# SELECT TO_DATE('01-jan-2010', 'DD-MON-YY'); to_date ------------ 3910-01-01 <--------- Look at this (1 row) postgres=# SELECT TO_DATE('01-jan-2010', 'DD-MON-YYYY'); to_date ------------ 2010-01-01 (1 row) User can provide the year in any format i.e. 1-digit to 4-digit, in this case the internal representation of century should be logically correct. Considering this situation, I am drawing the table below & logging my observation here. This might help us to demonstrate where we are lacking in PG. *Data Format PostgreSQL EDBAS* *TO_DATE('01-jan-1', 'DD-MON-Y') 2001-01-01 01-JAN-2001* *TO_DATE('01-jan-1', 'DD-MON-YY') 2001-01-01 01-JAN-2001* *TO_DATE('01-jan-1', 'DD-MON-YYY') 2001-01-01 01-JAN-2001* *TO_DATE('01-jan-1', 'DD-MON-YYYY') 0001-01-01 01-JAN-0001* In this case, all the cases seems correct. Also the YEAR part in the output is seems logical. *Data Format PostgreSQL EDBAS* *TO_DATE('01-jan-10', 'DD-MON-Y') 2010-01-01 Error * *TO_DATE('01-jan-10', 'DD-MON-YY') 2010-01-01 01-JAN-2010* *TO_DATE('01-jan-10', 'DD-MON-YYY') 2010-01-01 01-JAN-2010* *TO_DATE('01-jan-10', 'DD-MON-YYYY') 0010-01-01 01-JAN-0010* In this case, it seems in last 3 cases PG is behaving correctly. Whereas in 1st case the output is not correct since the Format ('Y') is lesser than the actual input ('10'). But PG is ignoring this condition and throwing whatever is input. The output year is might not be the year, what user is expecting. Hence PG should throw an error. *Data Format PostgreSQL EDBAS* *TO_DATE('01-jan-067', 'DD-MON-Y') 2067-01-01 Error* *TO_DATE('01-jan-111', 'DD-MON-YY') 2011-01-01 Error* *TO_DATE('01-jan-678', 'DD-MON-YYY') 1678-01-01 01-JAN-2678* *TO_DATE('01-jan-001', 'DD-MON-YYYY') 0001-01-01 01-JAN-0001* In this case, only last case seems correct in PG. Rest other cases are might not be logical, rather the output is vague. In PG, I haven't seen any document which is saying something like this, if year is 111...999 then the century would be 2000 and 001...099 then then century would be 1000. However, the 1st and 2nd case should throw an error since the output format ('Y' & 'YY') are really not matching with the Input ('067' & '111'), respectively. *Data Format PostgreSQL EDBAS* *TO_DATE('01-jan-2010', 'DD-MON-Y') 4010-01-01 Error* *TO_DATE('01-jan-2010', 'DD-MON-YY') 3910-01-01 Error * *TO_DATE('01-jan-2010', 'DD-MON-YYY') 3010-01-01 Error * *TO_DATE('01-jan-2010', 'DD-MON-YYYY') 2010-01-01 01-JAN-2010* In this case, PG is giving wrong output in first 3 cases. Those need to get rectified. Again it should throw error in these cases, because Output Format is not matching with Input Data. The exception here is the 2nd case, where century is well-defined. After observing the all above cases, the summary would be, if the output format is lesser than the actual input value, then it should throw an error. Considering this thumb rule, we can fix the wrong outputs in PG. I have made necessary changes to the code & attaching the patch with this email. In the patch, I have written one centralize function which will decide the century depends upon the given Input format. Thoughts ? Any other ideas on this ? -- Piyush S Newe Principal Engineer EnterpriseDB office: +91 20 3058 9500 www.enterprisedb.com Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c index aba1145..ad42126 100644 --- a/src/backend/utils/adt/formatting.c +++ b/src/backend/utils/adt/formatting.c @@ -989,6 +989,7 @@ static DCHCacheEntry *DCH_cache_getnew(char *str); static NUMCacheEntry *NUM_cache_search(char *str); static NUMCacheEntry *NUM_cache_getnew(char *str); static void NUM_cache_remove(NUMCacheEntry *ent); +static int DecideCentury(int *in, int len); /* ---------- @@ -2733,21 +2734,14 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out) case DCH_IYYY: from_char_parse_int(&out->year, &s, n); out->yysz = 4; + DecideCentury(&out->year, out->yysz); s += SKIP_THth(n->suffix); break; case DCH_YYY: case DCH_IYY: from_char_parse_int(&out->year, &s, n); out->yysz = 3; - - /* - * 3-digit year: '100' ... '999' = 1100 ... 1999 '000' ... - * '099' = 2000 ... 2099 - */ - if (out->year >= 100) - out->year += 1000; - else - out->year += 2000; + DecideCentury(&out->year, out->yysz); s += SKIP_THth(n->suffix); break; case DCH_YY: @@ -2755,14 +2749,7 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out) from_char_parse_int(&out->year, &s, n); out->yysz = 2; - /* - * 2-digit year: '00' ... '69' = 2000 ... 2069 '70' ... '99' - * = 1970 ... 1999 - */ - if (out->year < 70) - out->year += 2000; - else - out->year += 1900; + DecideCentury(&out->year, out->yysz); s += SKIP_THth(n->suffix); break; case DCH_Y: @@ -2770,10 +2757,7 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out) from_char_parse_int(&out->year, &s, n); out->yysz = 1; - /* - * 1-digit year: always +2000 - */ - out->year += 2000; + DecideCentury(&out->year, out->yysz); s += SKIP_THth(n->suffix); break; case DCH_RM: @@ -5161,3 +5145,63 @@ float8_to_char(PG_FUNCTION_ARGS) NUM_TOCHAR_finish; PG_RETURN_TEXT_P(result); } + +/* + * In case of different Inputs & different Year formats, engine itself + * has to decide the century for the given year. This function will take + * care of the input & accordingly append the century their if not provided. + * + * Input = Transformed output + * 0 ... 9 = 2000 ... 2009 + * 00 ... 69 = 2000 ... 2069 + * 70 ... 99 = 1900 ... 1999 + * 000 ... 999 = 2000 ... 2999 + * + * In case input is greater than 9999 it throws an error. The basic rule of + * the function is " If the format is lesser than the actual value, then it + * should throw an error. Only in case of 'YY', we need not throw an error, + * unless the year is later than 9999". + */ + +static int DecideCentury(int *in, int len) +{ + if (len == 1) + { + if (*in > 0 && *in < 9) + *in += 2000; + else + ereport(ERROR, + (errcode(ERRCODE_INVALID_DATETIME_FORMAT), + errmsg("date format picture ends before converting entire input string %d ", len))); + } + else if (len == 2) + { + if (*in > 0 && *in < 69) + *in += 2000; + else if (*in > 70 && *in < 99) + *in += 1900; + else if (*in > 9999) + ereport(ERROR, + (errcode(ERRCODE_INVALID_DATETIME_FORMAT), + errmsg("date format picture ends before converting entire input string %d ", len))); + + } + else if (len == 3) + { + if (*in > 0 && *in < 999) + *in += 2000; + else + ereport(ERROR, + (errcode(ERRCODE_INVALID_DATETIME_FORMAT), + errmsg("date format picture ends before converting entire input string %d ", len))); + } + else if (len == 4) + { + if (*in > 9999) + ereport(ERROR, + (errcode(ERRCODE_INVALID_DATETIME_FORMAT), + errmsg("date format picture ends before converting entire input string %d ", len))); + } + + return 0; +}
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers