For collating, I think most people use the ICU libraries. I know the C++ version has been used on z/OS by lots of folks and some searching found a COBOL page. I have no idea if z/OS COBOL 4.2 can use it.
http://userguide.icu-project.org/usefrom/cobol In article <CAAJSdjgW65o4C2Cv8wS=vfk6amxgbmels_ekc2_jhkdffrz...@mail.gmail.com> you wrote: > On Wed, Sep 6, 2017 at 12:42 AM, Peter Hunkeler <[email protected]> wrote: > > >>If for some odd reason you absolutely insist on an EBCDIC-ish approach > > then > > >>you can do what the Japanese have done for decades: Shift Out (SO), Shift > > >>In (SI). Refer to CCSID 930 and CCSID 1390 for inspiration. You'd > > probably > > >>use one of the EBCDIC Latin 1+euro codepages as a starting point, such as > > >>1140, then SO/SI from there to pick up the exceptional characters. > > >> > > >The worst of both worlds. > > > > It's repeating history. The origin of all that code page mess was > > companies (not countries at that time) starting to build their own custom > > code page for any character in need that was not in the (single) EBCDIC > > code page. Later, some standardization was done and country code pages > > evolved. > > > > While is was justifiable at that time, it is not today. Do not start this > > mess again by doing your own code page thing in your programs. Go Unicode, > > UTF-8 or UTF-16, whatever suits best. > > > ?I agree with the sentiment. On Linux/Intel, I set my locale to en_US.utf8. > The "Go" and "Python3" language definitions _require_ their source to be in > UTF-8. But I wonder how well UTF-8 is really supported by z/OS > _applications_. I'm still stuck on z/OS 1.13 and COBOL 4.2, so I will ask. > Can I directly (and correctly) process UTF-8 coded characters in a COBOL 6 > program? Even the multibyte characters?? What about DFSORT? From the manual > at > https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/com.ibm.zos.v2r3.icea100/ice2ca_DFSORT_data_formats.htm > it appears to support UTF8, UTF16, and UTF32. But I'd love to see an > example of how that works. In particular, how do you say "this file is in > UTF8. Sort on the 3rd through the 10th characters."? The problem, to me, is > how do I say "the 3rd through the 10th characters"? If the data is all in > UTF8, then the 3rd character need not start in the 3rd byte. And the number > of bytes is not necessarily 8, but could be from 8 to 32 bytes depending. > Also, according to the same manual (different page), a "character string" > is always in EBCDIC. So I guess if you want to include based on a UTF8 > string, you need to use hex encoding. > > > > > > -- > > Peter Hunkeler > Maranatha! <>< > John McKown -- Don Poitras - SAS Development - SAS Institute Inc. - SAS Campus Drive [email protected] (919) 531-5637 Cary, NC 27513 ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
