On Tue, Jun 06, 2023 at 10:16:09AM -0400, Evan Jones wrote: > I did a quick look at the places found with "git grep isspace" yesterday. I > agree with the comment from commit 9ae2661: "I've left alone isspace() > calls in places that aren't really expecting any non-ASCII input > characters, such as float8in()." There are a number of other calls where I > think it would likely be safe, and possibly even a good idea, to replace > isspace() with scanner_isspace(). However, I couldn't find any where I > could cause a bug like the one I hit in hstore parsing.
Yes, I agree with this feeling. Like 9ae2661, I can't get really excited about plastering more of that, especially if it were for timezone value input or dictionary options. One area with a new isspace() since 2017 is multirangetypes.c, but it is just a copy of rangetypes.c. > Original mailing list post for commit 9ae2661 in case it is helpful for > others: https://www.postgresql.org/message-id/10129.1495302...@sss.pgh.pa.us I have reproduced the original problem reported on macOS 13.4, which is close to the top of what's available. Passing to pg_regress some options to use something else than UTF-8 leads to a failure in the tests, so we need a split like fussyztrmatch to test that: REGRESS_OPTS='--encoding=SQL_ASCII --no-locale' make check An other error pattern without a split could be found on Windows, as of: select E'key\u0105=>value'::hstore; - hstore ------------------ - "keyÄ…"=>"value" -(1 row) - +ERROR: character with byte sequence 0xc4 0x85 in encoding "UTF8" has no equivalent in encoding "WIN1252" +LINE 1: select E'key\u0105=>value'::hstore; We don't do that for unaccent, actually, leading to similar failures.. I'll launch a separate thread about that shortly. With that fixed, the fix has been applied and backpatched. Thanks for the report, Evan! -- Michael
signature.asc
Description: PGP signature