Re: UTF-8 woes on z/OS, a solution - comments invited

Robert Prins Mon, 04 Sep 2017 11:57:19 -0700

On 2017-09-04 17:55, Charles Mills wrote:

I don't understand the problem.


That's correct.

Yes, ü is two bytes (not characters as you wrote!) in UTF-8.


You're correct again.

But if the translation is working correctly and the code page is specifiedcorrectly it should become one byte in EBCDIC, and assuming the reportprogram treats it as a literal of some sort -- does not expect to deducemeaning from each byte -- it should be perfectly happy with S?d (pretending
? is an EBCDIC ü) as a district or whatever name. The report columns should
be correct, and it should come back to UTF-8 land as ü, with the proper
number of padding blanks.

It sounds like you are incorrectly translating ü to *two* EBCDIC characters,
and that is the root of your problem. See if you can't translate to an
EBCDIC code page that includes ü.


I can probably find a set of code-pages that correctly translate the two byte
UTF-8 "ü" character to a one byte EBCDIC "ü" character, but how would those same

two code-pages translate the Polish "ł", the Danish "ø", the Baltic "ė", and theGreek "Θ", which appear in the same PC-side file to one single character... Andback to the correct UTF-8 character...


That makes the problem maybe more understandable?

Robert

Charles
-----Original Message----- From: IBM Mainframe Discussion List[mailto:[email protected]] On Behalf Of Robert Prins Sent: Monday,September 4, 2017 12:34 PM To: [email protected] Subject: UTF-8 woeson z/OS, a solution - comments invited
OK, I solved the problem, but maybe someone here can come up with something
a bit more efficient...
There is a file in the non-z/OS world, that used to be pure ASCII (actuallyCP437/850), but that has now been converted to UTF-8, due to furtherinternationalisation requirements. Said file was uploaded to z/OS, processedinto a set of datasets containing various reports, and those reports werelater downloaded to the non-z/OS world, using the same process that was usedto upload them, which could be one of two, IND$FILE, or FTP.
Both FTP and IND$FILE uploads had (and still have) no problems withCP437/850/UTF-8 data, and although an ü might not have displayed as such onz/OS, it would have transferred back to the same ü. However, an ü in UTF-8now consists of two characters, and that means that, replacing spaces with'=' characters, the original
|=Süd====| |=Nord===|

report lines now come out as

|=Süd===| |=Nord===|

when opened in the non z/OS world with an UTF-8 aware application.
---------------------------------------------------------------------- ForIBM-MAIN subscribe / signoff / archive access instructions, send email to[email protected] with the message: INFO IBM-MAIN



--
Robert AH Prins
robert(a)prino(d)org

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: UTF-8 woes on z/OS, a solution - comments invited

Reply via email to