This patch fixes a rare parsing bug with unicode characters on Mac OS X.
The problem is that isspace() on Mac OS X changes its behaviour with the
locale. Use scanner_isspace instead, which only returns true for ASCII
whitespace. It appears other places in the Postgres code have already run
into this, since a number of places use scanner_isspace instead. However,
there are still a lot of other calls to isspace(). I'll try to take a quick
look to see if there might be other instances of this bug.

The bug is that in the following hstore value, the unicode character
"disappears", and is replaced with "key\xc4", because it is parsed
incorrectly:

select E'keyą=>value'::hstore;
     hstore
-----------------
 "keyą"=>"value"
(1 row)

select 'keyą=>value'::hstore::text::bytea;
              bytea
----------------------------------
 \x226b6579c4223d3e2276616c756522
(1 row)

The correct result should be:

     hstore
-----------------
 "keyą"=>"value"
(1 row)

That query is added to the regression test. The query works on Linux, but
failed on Mac OS X.

For a more detailed explanation of how isspace() works, on Mac OS X, see:
https://github.com/evanj/isspace_locale

Thanks!

Evan Jones

Attachment: hstore-isspace.patch
Description: Binary data

Reply via email to