Hi, I wanted to take this opportunity to provide a brief summary of outstanding work.
> Also there are several trap-producing cases with date types: > SELECT to_date('100000000', 'CC'); > SELECT to_timestamp('1000000000,999', 'Y,YYY'); > SELECT make_date(-2147483648, 1, 1); This is resolved with Matthew's patches, which I've rebased, squashed and attached to this email. They still require a review. ---- > SET temp_buffers TO 1000000000; > > CREATE TEMP TABLE t(i int PRIMARY KEY); > INSERT INTO t VALUES(1); > > #4 0x00007f385cdd37f3 in __GI_abort () at ./stdlib/abort.c:79 > #5 0x00005620071c4f51 in __addvsi3 () > #6 0x0000562007143f3c in init_htab (hashp=0x562008facb20, nelem=610070812) at dynahash.c:720 > > (gdb) f 6 > #6 0x0000560915207f3c in init_htab (hashp=0x560916039930, nelem=1000000000) at dynahash.c:720 > 720 hctl->high_mask = (nbuckets << 1) - 1; > (gdb) p nbuckets > $1 = 1073741824 I've taken a look at this and my current proposal is to convert `nbuckets` to 64 bit integer which would prevent the overflow. I'm hoping to look into if this is feasible soon. ---- > CREATE FUNCTION check_foreign_key () RETURNS trigger AS .../refint.so' LANGUAGE C; > CREATE TABLE t (i int4 NOT NULL); > CREATE TRIGGER check_fkey BEFORE DELETE ON t FOR EACH ROW EXECUTE PROCEDURE > check_foreign_key (2147483647, 'cascade', 'i', "ft", "i"); > INSERT INTO t VALUES (1); > DELETE FROM t; > > #4 0x00007f57f0bef7f3 in __GI_abort () at ./stdlib/abort.c:79 > #5 0x00007f57f1671351 in __addvsi3 () from .../src/test/regress/refint.so > #6 0x00007f57f1670234 in check_foreign_key (fcinfo=0x7ffebf523650) at refint.c:321 > > (gdb) f 6 > #6 0x00007f3400ef9234 in check_foreign_key (fcinfo=0x7ffd6e16a600) at refint.c:321 > 321 nkeys = (nargs - nrefs) / (nrefs + 1); > (gdb) p nargs > $1 = 3 > (gdb) p nrefs > $2 = 2147483647 I have not looked into this yet, though I was unable to reproduce it immediately. test=# CREATE FUNCTION check_foreign_key () RETURNS trigger AS '.../refint.so' LANGUAGE C; ERROR: could not access file ".../refint.so": No such file or directory I think I just have to play around with the path. ---- >> Moreover, I tried to use "-ftrapv" on 32-bit Debian and came across >> another failure: >> select '9223372036854775807'::int8 * 2147483648::int8; >> server closed the connection unexpectedly >> ... >> #4 0xb722226a in __GI_abort () at ./stdlib/abort.c:79 >> #5 0x004cb2e1 in __mulvdi3.cold () >> #6 0x00abe7ab in pg_mul_s64_overflow (a=9223372036854775807, b=2147483648, result=0xbff1da68) >> at ../../../../src/include/common/int.h:264 >> #7 0x00abfbff in int8mul (fcinfo=0x14d9d04) at int8.c:496 >> #8 0x00782675 in ExecInterpExpr (state=0x14d9c4c, econtext=0x14da15c, isnull=0xbff1dc3f) at execExprInterp.c:765 > > Hm. It looks like that is pointing to __builtin_mul_overflow(), which > seems strange. Agreed that this looks strange. The docs [0] seem to indicate that this shouldn't happen. > These built-in functions promote the first two operands into infinite > precision signed type and perform addition on those promoted > operands. ... > As the addition is performed in infinite signed precision, these > built-in functions have fully defined behavior for all argument > values. ... > The first built-in function allows arbitrary integral types for > operands and the result type must be pointer to some integral type > other than enumerated or boolean type The docs for the mul functions say that they behave the same as addition. Alexander, is it possible that you're compiling with something other than GCC? ---- >>> #6 0x00005576cf627c68 in bms_singleton_member (a=0x5576d09f7fb0) at bitmapset.c:691 >>> 691 if (result >= 0 || HAS_MULTIPLE_ONES(w)) >> >> At a glance, this appears to be caused by the RIGHTMOST_ONE macro: >> >> #define RIGHTMOST_ONE(x) ((signedbitmapword) (x) & -((signedbitmapword) (x))) > > I believe hand-rolling the two's complement calculation should be > sufficient to avoid depending on -fwrapv here. godbolt.org indicates that > it produces roughly the same code, too. > > diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c > index cd05c642b0..d37a997c0e 100644 > --- a/src/backend/nodes/bitmapset.c > +++ b/src/backend/nodes/bitmapset.c > @@ -67,7 +67,7 @@ > * we get zero. > *---------- > */ > -#define RIGHTMOST_ONE(x) ((signedbitmapword) (x) & -((signedbitmapword) (x))) > +#define RIGHTMOST_ONE(x) ((bitmapword) (x) & (~((bitmapword) (x)) + 1)) This approach seems to resolve the issue locally for me, and I think it falls out cleanly from the comment in the code above. Thanks, Joseph Koshakow [0] https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html
From 15e5a79f198a029fc6436c409a00a1592e23a46d Mon Sep 17 00:00:00 2001 From: Matthew Kim <38759997+friendlymatthew@users.noreply.github.com> Date: Tue, 9 Jul 2024 18:25:10 -0400 Subject: [PATCH v24] Remove dependence on -fwrapv semantics in some date/time functions. This commit updates a few date/time functions, such as to_timestamp() and make_date(), to no longer rely on signed integer wrapping for correctness. This is intended to move us closer towards removing -fwrapv, which may enable some compiler optimizations. However, there is presently no plan to actually remove that compiler option in the near future. --- src/backend/utils/adt/date.c | 5 +++- src/backend/utils/adt/formatting.c | 35 +++++++++++++++++++++++--- src/test/regress/expected/date.out | 2 ++ src/test/regress/expected/horology.out | 4 +++ src/test/regress/sql/date.sql | 1 + src/test/regress/sql/horology.sql | 2 ++ 6 files changed, 45 insertions(+), 4 deletions(-) diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c index 9c854e0e5c..baa677fb96 100644 --- a/src/backend/utils/adt/date.c +++ b/src/backend/utils/adt/date.c @@ -257,7 +257,10 @@ make_date(PG_FUNCTION_ARGS) if (tm.tm_year < 0) { bc = true; - tm.tm_year = -tm.tm_year; + if (pg_mul_s32_overflow(tm.tm_year, -1, &tm.tm_year)) + ereport(ERROR, + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), + errmsg("date field value out of range"))); } dterr = ValidateDate(DTK_DATE_M, false, false, bc, &tm); diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c index 68069fcfd3..e2cb57a9e1 100644 --- a/src/backend/utils/adt/formatting.c +++ b/src/backend/utils/adt/formatting.c @@ -77,6 +77,7 @@ #include "catalog/pg_collation.h" #include "catalog/pg_type.h" +#include "common/int.h" #include "common/unicode_case.h" #include "common/unicode_category.h" #include "mb/pg_wchar.h" @@ -3809,7 +3810,15 @@ DCH_from_char(FormatNode *node, const char *in, TmFromChar *out, ereturn(escontext,, (errcode(ERRCODE_INVALID_DATETIME_FORMAT), errmsg("invalid input string for \"Y,YYY\""))); - years += (millennia * 1000); + if (pg_mul_s32_overflow(millennia, 1000, &millennia)) + ereturn(escontext,, + (errcode(ERRCODE_INVALID_DATETIME_FORMAT), + errmsg("invalid input string for \"Y,YYY\""))); + if (pg_add_s32_overflow(years, millennia, &years)) + ereturn(escontext,, + (errcode(ERRCODE_INVALID_DATETIME_FORMAT), + errmsg("invalid input string for \"Y,YYY\""))); + if (!from_char_set_int(&out->year, years, n, escontext)) return; out->yysz = 4; @@ -4797,11 +4806,31 @@ do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std, if (tmfc.bc) tmfc.cc = -tmfc.cc; if (tmfc.cc >= 0) + { /* +1 because 21st century started in 2001 */ - tm->tm_year = (tmfc.cc - 1) * 100 + 1; + /* tm->tm_year = (tmfc.cc - 1) * 100 + 1; */ + if (pg_mul_s32_overflow((tmfc.cc - 1), 100, &tm->tm_year) || + pg_add_s32_overflow(tm->tm_year, 1, &tm->tm_year)) + { + ereport(ERROR, + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), + errmsg("date out of range: \"%s\"", + text_to_cstring(date_txt)))); + } + } else + { /* +1 because year == 599 is 600 BC */ - tm->tm_year = tmfc.cc * 100 + 1; + /* tm->tm_year = tmfc.cc * 100 + 1; */ + if (pg_mul_s32_overflow(tmfc.cc, 100, &tm->tm_year) || + pg_add_s32_overflow(tm->tm_year, 1, &tm->tm_year)) + { + ereport(ERROR, + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), + errmsg("date out of range: \"%s\"", + text_to_cstring(date_txt)))); + } + } fmask |= DTK_M(YEAR); } diff --git a/src/test/regress/expected/date.out b/src/test/regress/expected/date.out index f5949f3d17..9293e045b0 100644 --- a/src/test/regress/expected/date.out +++ b/src/test/regress/expected/date.out @@ -1532,3 +1532,5 @@ select make_time(10, 55, 100.1); ERROR: time field value out of range: 10:55:100.1 select make_time(24, 0, 2.1); ERROR: time field value out of range: 24:00:2.1 +SELECT make_date(-2147483648, 1, 1); +ERROR: date field value out of range diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out index 241713cc51..df02d268c0 100644 --- a/src/test/regress/expected/horology.out +++ b/src/test/regress/expected/horology.out @@ -3448,6 +3448,8 @@ SELECT i, to_timestamp('2018-11-02 12:34:56.123456', 'YYYY-MM-DD HH24:MI:SS.FF' SELECT i, to_timestamp('2018-11-02 12:34:56.123456789', 'YYYY-MM-DD HH24:MI:SS.FF' || i) FROM generate_series(1, 6) i; ERROR: date/time field value out of range: "2018-11-02 12:34:56.123456789" +SELECT to_timestamp('1000000000,999', 'Y,YYY'); +ERROR: invalid input string for "Y,YYY" SELECT to_date('1 4 1902', 'Q MM YYYY'); -- Q is ignored to_date ------------ @@ -3778,6 +3780,8 @@ SELECT to_date('0000-02-01','YYYY-MM-DD'); -- allowed, though it shouldn't be 02-01-0001 BC (1 row) +SELECT to_date('100000000', 'CC'); +ERROR: date out of range: "100000000" -- to_char's TZ format code produces zone abbrev if known SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD HH:MI:SS TZ'); to_char diff --git a/src/test/regress/sql/date.sql b/src/test/regress/sql/date.sql index 1c58ff6966..9a4e5832b9 100644 --- a/src/test/regress/sql/date.sql +++ b/src/test/regress/sql/date.sql @@ -373,3 +373,4 @@ select make_date(2013, 13, 1); select make_date(2013, 11, -1); select make_time(10, 55, 100.1); select make_time(24, 0, 2.1); +SELECT make_date(-2147483648, 1, 1); diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql index e5cf12ff63..db532ee3c0 100644 --- a/src/test/regress/sql/horology.sql +++ b/src/test/regress/sql/horology.sql @@ -558,6 +558,7 @@ SELECT i, to_timestamp('2018-11-02 12:34:56.1234', 'YYYY-MM-DD HH24:MI:SS.FF' || SELECT i, to_timestamp('2018-11-02 12:34:56.12345', 'YYYY-MM-DD HH24:MI:SS.FF' || i) FROM generate_series(1, 6) i; SELECT i, to_timestamp('2018-11-02 12:34:56.123456', 'YYYY-MM-DD HH24:MI:SS.FF' || i) FROM generate_series(1, 6) i; SELECT i, to_timestamp('2018-11-02 12:34:56.123456789', 'YYYY-MM-DD HH24:MI:SS.FF' || i) FROM generate_series(1, 6) i; +SELECT to_timestamp('1000000000,999', 'Y,YYY'); SELECT to_date('1 4 1902', 'Q MM YYYY'); -- Q is ignored SELECT to_date('3 4 21 01', 'W MM CC YY'); @@ -660,6 +661,7 @@ SELECT to_date('2016 365', 'YYYY DDD'); -- ok SELECT to_date('2016 366', 'YYYY DDD'); -- ok SELECT to_date('2016 367', 'YYYY DDD'); SELECT to_date('0000-02-01','YYYY-MM-DD'); -- allowed, though it shouldn't be +SELECT to_date('100000000', 'CC'); -- to_char's TZ format code produces zone abbrev if known SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD HH:MI:SS TZ'); -- 2.34.1