On 8/9/24 8:24 PM, Jeff Davis wrote:
On Fri, 2024-08-09 at 13:41 +0200, Andreas Karlsson wrote:
I am leaning towards that we should write our own pure ascii
functions
for this.
That makes sense for a lot of call sites, but it could cause breakage
if we aren't careful.
Since we do not support any non-ascii compatible encodings
anyway I do not see the point in having locale support in most of
these
call-sites.
An ascii-compatible encoding just means that the code points in the
ascii range are represented as ascii. I'm not clear on whether code
points in the ascii range can return different results for things like
isspace(), but it sounds plausible -- toupper() can return different
results for 'i' in tr_TR.
Also, what about the values outside 128-255, which are still valid
input to isspace()?
My idea was that in a lot of those cases we only try to parse e.g. 0-9
as digits and always only . as the decimal separator so we should make
just make that obvious by either using locale C or writing our own ascii
only functions. These strings are meant to be read by machines, not
humans, primarily.
Andreas