UTF-8 is just an encoding of Unicode; not a character set. All of ISO-8859-1 is 
part of Unicode.

Of course, the encoding of characters between U+80 and U+FF requires two octets 
in UTF-8.

And, yes, UTF-8 is clearly the way forward, although there may be some bumps in 
the road. 


--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3

________________________________________
From: IBM Mainframe Discussion List [IBM-MAIN@LISTSERV.UA.EDU] on behalf of 
Andrew Rowley [and...@blackhillsoftware.com]
Sent: Wednesday, July 12, 2023 9:02 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Python 3.11 on z/OS - UTF-8 errors

On 13/07/2023 10:01 am, David Crayford wrote:
> We specify
> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> in
> our Maven builds as most of the time we are building off host on
> machines with UTF8 locales. However, we tag our files ISO8859-1 on z/OS

...

> If we cared about the euro sign we could change it to ISO8859-15 which
> is still an 8-bit character set. It’s those pesky codes above 0x7F in
> UTF-8 that cause the issues.

Euro was just an example, there are plenty of other UTF-8 characters. If
you convert to an 8 bit character set, does it mean that any literals
with codes above 0x7F are silently broken? Or does git fail to checkout?

Either way, sourceEncoding=UTF8 seems like a good answer to why you
might want to actually have the files encoded in UTF8. Anything else
would seem to be courting unpredictable errors.

--
Andrew Rowley
Black Hill Software

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to