On 2019-02-10 20:16, Seymour J Metz wrote:
Every release that I have ever used has had issues with translation, ever since the original ASCII support for tape.

Do you mean UTF-8 or UTF-EBCDIC (https://en.wikipedia.org/wiki/UTF-EBCDIC)?

UTF-8, I didn't even know there was something like UTF-EBCDIC!

Flow is UTF-8 (on a white box) -> IND$FILE -> lots of gibberish on z/OS -> process -> processed gibberish -> IND$FILE -> nice unchanged UTF-8, at least when extracted with XMIT Manager and its CP1046.9921875 codepage

Processing on z/OS doesn't touch the gibberish itself, it only has to calculate the actual length in UFT-8 characters, and that's done using the translate and sum PL/I builtins, using metadata in the original (white box) input file.

Robert
--
Robert AH Prins
robert(a)prino(d)org

From: IBM Mainframe Discussion List <[email protected]> on behalf of Robert 
Prins <[email protected]>
Sent: Saturday, February 9, 2019 7:30 PM
To: [email protected]
Subject: XMIT Manager and CP1047 (or rather CP1046.9921875)

Over the last week or so I've been having a discussion with Denis Molony about
his XmitApp, a platform-agnostic, it's written in Java, viewer for XMIT files,
see <https://github.com/dmolony/Xmit>

As is, it (currently) only shows the contents of the xmit file in one of the
panels, and he's hit a snag. One of my PDS's he's using contains text that comes
from uploaded-to-z/OS UTF-8 encoded text (which basically means all UTF-8
characters are mangled beyond recognition on z/OS). It's processed on z/OS, and
the results, also containing UTF-8 encoded text is downloaded to Windoze
(unmangling the mangled mess again), but XmitApp using CP1047 screws up
codepoints 0x15 and 0x25, and if you take a look at those two code points on
https://en.wikipedia.org/wiki/EBCDIC_1047>, shiver...

0x15 is NL (Newline)  (Unicode 0085)
0x25 is LF (Linefeed) (Unicode 000A)

I never realised that EBCDIC had two of the same, just different...

Extract the files with Neil Johnston-Ward's XMIT Manager from the cbttape.org
site @ <http://www.cbttape.org/njw/index.html>, and the UTF-8 encoded characters
show up OK. Do it with the official CP1047 and they don't.

So load XMIT Manager.exe into a hex-editor, I'm (still) using HxD 1.7.7.0 from
<https://mh-nexus.de/en/hxd/>, and look for the translate table NJW uses (just
do a find for 'abcdef') and you'll see that he has swapped the ASCII characters
for the 0x15 and 0x25 code points from those in the "official" CP1047...

Denis has found an APAR dating back to 2010,
<https://www-01.ibm.com/support/docview.wss%3Fuid%3Dswg1IZ70874>, that seems to
confirm that, for Java in mixed environments, i.e. z/OS vs little white boxes,
NJW is correct in swapping them.

Can anyone provide any more insights? For what it's worth, I'm currently
restricted to doing the round-trip transfers using IND$FILE (Upload as ASCII,
download of XMIT (obviously) binary), but I would appreciate if anyone can check
what happens if they are done using FTP or the WSA. I've attached, in the hope
it survives, utf-8.zip.txt with a bit of UTF-8 encoded data to experiment with.
It's all the UTF-8 encoded data that's in use in the test file, and consists of
European (and a few Japanese) place names.



--
Robert AH Prins
robert.ah.prins(a)gmail.com

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to