Every release that I have ever used has had issues with translation, ever since 
the original ASCII support for tape.

Do you mean UTF-8 or UTF-EBCDIC (https://en.wikipedia.org/wiki/UTF-EBCDIC)?


--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3

________________________________________
From: IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU> on behalf of 
Robert Prins <robert.ah.pr...@gmail.com>
Sent: Saturday, February 9, 2019 7:30 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: XMIT Manager and CP1047 (or rather CP1046.9921875)

Over the last week or so I've been having a discussion with Denis Molony about
his XmitApp, a platform-agnostic, it's written in Java, viewer for XMIT files,
see 
<https://secure-web.cisco.com/1sGiqbJi6xPegTGfvBq-5fvDzZXiHLZRIFWwfhZcbQ7xHIIF-oYWrF9lZ453qSHkanquaFCQBmVwm8bc6xWlSfgpLvq1C4U_dU3aM1EFeBy-cMdvVc1BkN6E04sLGaTo0iSUl5F1NlWpXzafcSyfiF99hv5gOwlbolqGzpZC0tRAUSCWqDqHf1Kk_WqKl9_ZhOG0vZAJXuFle6oRdo9mthGVF4GSmxEQD_mqovMeouD2oT_dKkAZT3Cc491cKtCCHogt4H6L2_nu73gXhcLgjYGff5ikJhEjSHu4NE3Y6NrEQ3nHG4vlf4MEFTSi2jqZv_iklTCpjjJkr6X5HNhHmBgjBeCVeXq1k6rOeY80opEneiL_XQcQj49ije6UETovhPuiEtNspnRzWrW8C4JbIfqhTwVNq8cxCrI9YZCmzl1NGG0EzpEiym4zI8IABRzHVrD0dbXXQ1fXqHOw7nk2Dow/https%3A%2F%2Fgithub.com%2Fdmolony%2FXmit>

As is, it (currently) only shows the contents of the xmit file in one of the
panels, and he's hit a snag. One of my PDS's he's using contains text that comes
from uploaded-to-z/OS UTF-8 encoded text (which basically means all UTF-8
characters are mangled beyond recognition on z/OS). It's processed on z/OS, and
the results, also containing UTF-8 encoded text is downloaded to Windoze
(unmangling the mangled mess again), but XmitApp using CP1047 screws up
codepoints 0x15 and 0x25, and if you take a look at those two code points on
<https://secure-web.cisco.com/18hCOhdKREIfSBoXg3BWVhrEVYbfoxRwEazTVRg4XOSpw1TtrXsc2bt7xy8crg8mlNq8h3pJYTX7c3pMqb_cJc7qZim9Lupe5id8V5-2nm7NU3uCk24k_8OiufjrMVg2IJ1mY_4P1LCcSOGR_yLzUKVHwI8VxlsQgAUkDAmizvPfQCJDFyIrNwa1r5GChMGHL7tEN86ltn2Xzg5K8izeMiTFO7l35lDLRPyuwIOM7TJMdoP8cUvbo_tByz7peauZM_tJTHRe8b6KmL1S4im1g1l0C-kyv3HBfsyJKekzmaby6gH5Lg2eUrlu5gAR1OEKG0cRmgs5D4PP2n8m4MmMDGQxGQXPtKOm1tgq4_VhYO_sWM_nYuwazYwiDjeayxPplElSD5P1frQa-WWobAblKDbL7o9UhGl5GsQasK7x6ieAVQ2riEXIFHDm9YrRSX9a0/https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FEBCDIC_1047>,
 shiver...

0x15 is NL (Newline)  (Unicode 0085)
0x25 is LF (Linefeed) (Unicode 000A)

I never realised that EBCDIC had two of the same, just different...

Extract the files with Neil Johnston-Ward's XMIT Manager from the cbttape.org
site @ 
<http://secure-web.cisco.com/1M98BaYTQs2Y-kUqoAKmj2AnuevDUM4AydWf_qGDVUb3JS453s8k6Rbh4qKSQAHI_52ixRFUg5OSN0AblVt3rf_IOmsY34-ggaEYgzWx5W6VkhVkuyBMSujZCFkIwdMyHrMioJeVHm8RmOZCyEyooEcjbK8oT9al8mZ6uzkikbhwE-ne59km-Mqg_wkD6s-QmRCpRleckaM56EGlSCbGBdma87MVVNioT5DOh0YpLcrUqvLb5WbFZ7GNMqcjTOdNRbvfCtF_wX7xs3UlcqKNn8vEjzrinMlJax8susJk9sGBt5_hgW9086Im5M9mVAZM8gvUMIt7dEJwvdsko-nAOeKWt5SsIP6Z-ZnPn3G8xB3_RXuQzZhBppakvWSkTjhhYyOANlJ3ivR2y6c5xZMIiA8BTpVexrZGYKOBl3MIpngck33Il_i8-Hqtd3LAOU31u/http%3A%2F%2Fwww.cbttape.org%2Fnjw%2Findex.html>,
 and the UTF-8 encoded characters
show up OK. Do it with the official CP1047 and they don't.

So load XMIT Manager.exe into a hex-editor, I'm (still) using HxD 1.7.7.0 from
<https://secure-web.cisco.com/1O5oH-Pa1gWbpp9RxOSKGTERAAq5nN7pw5VU26-8_YrSziigBIKft81-oBqqrHUy06qEC1fKC73gbHtWjDPWDVkssYe2Sii3QTZEo5fCjJ3sk9GLe3KATt0dr3GTdXba3JQcAEP1k8_Tarlmrbj66UPxcH0wuLH_Q17aG3XbELtHcM5VldFimiJL1ftegE6fyRxUdQMyYoRZP1tFJkMCj4QV0hC7BtKFO75dwilWiWcVnWpBi1v1A5fE7TBLuKvyExQXjgDRpUPD9QvIIdYtxU-ZQRkM1XZoHcWZXGrNoG_MyOT1mp2IZxp3NyUHG6S6zHj-5e6NxPlJMfvdMgKXVP4Lkc5Gu68zQAaljwuNLsGLDRCwJfuW3_7odW2rkqm64xOcR2Vs2AmNX1aMd31Zebe67-EHvTEG-RitIJzARUCsIXektTK6ADvVIQGduITs4/https%3A%2F%2Fmh-nexus.de%2Fen%2Fhxd%2F>,
 and look for the translate table NJW uses (just
do a find for 'abcdef') and you'll see that he has swapped the ASCII characters
for the 0x15 and 0x25 code points from those in the "official" CP1047...

Denis has found an APAR dating back to 2010,
<https://secure-web.cisco.com/1G70c7GALlSMgpXyns0HV3hZYX1_0B6TG-ae6p8-MzRRQfhUKCv49w1qD0W6QNUGKKz-5c_WVuPxVjxpfzwWymJ36R2CuUySyijykMbuXqTorakHeZYJ9WY5JxUyZcHHlF2o-b1zdU8o1UvJQjfSpdFwamb0blSf1V_s4HPTwQWldkPdmobCditnuFJ9xHDL64O2a_exXUk2z0hBsAoZK0LXJIw-SdpUhz63NJxmdX_0AHD7Ty6CnOSNqJXOrHCJR0at9Ed2fnoa_uYkS6woP1OCH1DfQkPGj6whkCWLBMMjlUMFwF0_odJUmGRfMmtA7NXo-1lMM7MABCwfI9iTEfWt_ZCBPglMspFvLfkqYnZGdKJIlp1s4v7cNOBl6fi4PCxsL3GAPdCX1oP8tx6NBvtF709RKXXFdbNsYqyvpj90h6y46PZHhoTI8YyArr24Y/https%3A%2F%2Fwww-01.ibm.com%2Fsupport%2Fdocview.wss%3Fuid%3Dswg1IZ70874>,
 that seems to
confirm that, for Java in mixed environments, i.e. z/OS vs little white boxes,
NJW is correct in swapping them.

Can anyone provide any more insights? For what it's worth, I'm currently
restricted to doing the round-trip transfers using IND$FILE (Upload as ASCII,
download of XMIT (obviously) binary), but I would appreciate if anyone can check
what happens if they are done using FTP or the WSA. I've attached, in the hope
it survives, utf-8.zip.txt with a bit of UTF-8 encoded data to experiment with.
It's all the UTF-8 encoded data that's in use in the test file, and consists of
European (and a few Japanese) place names.

Robert
--
Robert AH Prins
robert.ah.prins(a)gmail.com

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to