UTF-8 is just an encoding of Unicode; not a character set. All of ISO-8859-1 is 
part of Unicode.

Of course, the encoding of characters between U+80 and U+FF requires two octets 
in UTF-8.

And, yes, UTF-8 is clearly the way forward, although there may be some bumps in 
the road. 


--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3

________________________________________
From: IBM Mainframe Discussion List [[email protected]] on behalf of 
Andrew Rowley [[email protected]]
Sent: Wednesday, July 12, 2023 9:02 PM
To: [email protected]
Subject: Re: Python 3.11 on z/OS - UTF-8 errors

On 13/07/2023 10:01 am, David Crayford wrote:
> We specify
> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> in
> our Maven builds as most of the time we are building off host on
> machines with UTF8 locales. However, we tag our files ISO8859-1 on z/OS

...

> If we cared about the euro sign we could change it to ISO8859-15 which
> is still an 8-bit character set. It’s those pesky codes above 0x7F in
> UTF-8 that cause the issues.

Euro was just an example, there are plenty of other UTF-8 characters. If
you convert to an 8 bit character set, does it mean that any literals
with codes above 0x7F are silently broken? Or does git fail to checkout?

Either way, sourceEncoding=UTF8 seems like a good answer to why you
might want to actually have the files encoded in UTF8. Anything else
would seem to be courting unpredictable errors.

--
Andrew Rowley
Black Hill Software

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to