Re: UTF-8 woes on z/OS, a solution - comments invited

John McKown Wed, 06 Sep 2017 06:18:38 -0700

On Wed, Sep 6, 2017 at 12:42 AM, Peter Hunkeler <[email protected]> wrote:

> >>If for some odd reason you absolutely insist on an EBCDIC-ish approach
> then
> >>you can do what the Japanese have done for decades: Shift Out (SO), Shift
> >>In (SI). Refer to CCSID 930 and CCSID 1390 for inspiration. You'd
> probably
> >>use one of the EBCDIC Latin 1+euro codepages as a starting point, such as
> >>1140, then SO/SI from there to pick up the exceptional characters.
> >>
> >The worst of both worlds.
>
> It's repeating history. The origin of all that code page mess was
> companies (not countries at that time) starting to build their own custom
> code page for any character in need that was not in the (single) EBCDIC
> code page. Later, some standardization was done and country code pages
> evolved.
>
> While is was justifiable at that time, it is not today. Do not start this
> mess again by doing your own code page thing in your programs. Go Unicode,
> UTF-8 or UTF-16, whatever suits best.
>


I agree with the sentiment. On Linux/Intel, I set my locale to en_US.utf8.
The "Go" and "Python3" language definitions _require_ their source to be in
UTF-8. But I wonder how well UTF-8 is really supported by z/OS
_applications_. I'm still stuck on z/OS 1.13 and COBOL 4.2, so I will ask.
Can I directly (and correctly) process UTF-8 coded characters in a COBOL 6
program? Even the multibyte characters? What about DFSORT? From the manual
at
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/com.ibm.zos.v2r3.icea100/ice2ca_DFSORT_data_formats.htm
it appears to support UTF8, UTF16, and UTF32. But I'd love to see an
example of how that works. In particular, how do you say "this file is in
UTF8. Sort on the 3rd through the 10th characters."? The problem, to me, is
how do I say "the 3rd through the 10th characters"? If the data is all in
UTF8, then the 3rd character need not start in the 3rd byte. And the number
of bytes is not necessarily 8, but could be from 8 to 32 bytes depending.
Also, according to the same manual (different page), a "character string"
is always in EBCDIC. So I guess if you want to include based on a UTF8
string, you need to use hex encoding.



>
>
> --
> Peter Hunkeler
>
> ----------------------------------------------------------------------
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to [email protected] with the message: INFO IBM-MAIN
>



-- 
UNIX was not designed to stop you from doing stupid things, because that
would also stop you from doing clever things. -- Doug Gwyn

Maranatha! <><
John McKown

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: UTF-8 woes on z/OS, a solution - comments invited

Reply via email to