Re: UTF-8 woes on z/OS, a solution - comments invited

Don Poitras Wed, 06 Sep 2017 06:35:58 -0700

For collating, I think most people use the ICU libraries. I know the C++
version has been used on z/OS by lots of folks and some searching found
a COBOL page. I have no idea if z/OS COBOL 4.2 can use it.


http://userguide.icu-project.org/usefrom/cobol

In article <CAAJSdjgW65o4C2Cv8wS=vfk6amxgbmels_ekc2_jhkdffrz...@mail.gmail.com> 
you wrote:
> On Wed, Sep 6, 2017 at 12:42 AM, Peter Hunkeler <[email protected]> wrote:

> > >>If for some odd reason you absolutely insist on an EBCDIC-ish approach
> > then
> > >>you can do what the Japanese have done for decades: Shift Out (SO), Shift
> > >>In (SI). Refer to CCSID 930 and CCSID 1390 for inspiration. You'd
> > probably
> > >>use one of the EBCDIC Latin 1+euro codepages as a starting point, such as
> > >>1140, then SO/SI from there to pick up the exceptional characters.
> > >>
> > >The worst of both worlds.
> >
> > It's repeating history. The origin of all that code page mess was
> > companies (not countries at that time) starting to build their own custom
> > code page for any character in need that was not in the (single) EBCDIC
> > code page. Later, some standardization was done and country code pages
> > evolved.
> >
> > While is was justifiable at that time, it is not today. Do not start this
> > mess again by doing your own code page thing in your programs. Go Unicode,
> > UTF-8 or UTF-16, whatever suits best.
> >

> ?I agree with the sentiment. On Linux/Intel, I set my locale to en_US.utf8.
> The "Go" and "Python3" language definitions _require_ their source to be in
> UTF-8. But I wonder how well UTF-8 is really supported by z/OS
> _applications_. I'm still stuck on z/OS 1.13 and COBOL 4.2, so I will ask.
> Can I directly (and correctly) process UTF-8 coded characters in a COBOL 6
> program? Even the multibyte characters?? What about DFSORT? From the manual
> at
> https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/com.ibm.zos.v2r3.icea100/ice2ca_DFSORT_data_formats.htm
> it appears to support UTF8, UTF16, and UTF32. But I'd love to see an
> example of how that works. In particular, how do you say "this file is in
> UTF8. Sort on the 3rd through the 10th characters."? The problem, to me, is
> how do I say "the 3rd through the 10th characters"? If the data is all in
> UTF8, then the 3rd character need not start in the 3rd byte. And the number
> of bytes is not necessarily 8, but could be from 8 to 32 bytes depending.
> Also, according to the same manual (different page), a "character string"
> is always in EBCDIC. So I guess if you want to include based on a UTF8
> string, you need to use hex encoding.



> >
> >
> > --
> > Peter Hunkeler
> Maranatha! <><
> John McKown

-- 
Don Poitras - SAS Development  -  SAS Institute Inc. - SAS Campus Drive
[email protected]           (919) 531-5637                Cary, NC 27513

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: UTF-8 woes on z/OS, a solution - comments invited

Reply via email to