CECP code pages (such as 037 and 500) and the OpenEdition EBCDIC code page 1047 
have the same character set as ISO-8859-1, the first 256 bytes of Unicode, also 
known as IBM extended ASCII code page 819.  The Euro EBCDIC code pages are very 
similar except that the Euro symbol replaces the general currency symbol.

Supporting any multi-byte code page in old EBCDIC SBCS contexts is extremely 
tricky.  I added SuperC support for ASCII files (supporting CECP and Euro 
EBCDIC code pages) back around 2012, but I couldn't work out a practical way to 
extend it to UTF-8 because that introduces a difference between characters and 
columns.  I did at least manage to extend HLASM in 2023 to understand EBCDIC 
code pages, supporting ASCII or Unicode (including UTF-8) constants generated 
from any EBCDIC SBCS character in the CECP or Euro code pages (and some extras, 
such as supporting EBCDIC Latin-9 code page 924 mapped to the corresponding 
ASCII code page 923 or to Unicode).  You can even for example assemble a single 
program in EBCDIC 1047 containing messages in multiple western languages 
showing the correct characters in the source (which could be stored and edited 
in ISO 8859-1 but systematically converted to EBCDIC 1047 for building) but 
generating the binary data for each one in the relevant local EBCDIC code page. 
 The fact that assembler expression evaluation can deferred arbitrarily (to 
allow for forward references) and evaluated multiple times meant that code page 
conversion for ASCII and Unicode self-defining terms must remain fixed for a 
whole assembly, but the code pages used for DC constants can be modified 
locally using ACONTROL.  

I'd have personally liked to do that enhancement much earlier (around 2013), 
but our most outspoken customers were strongly biased towards English language 
environments which made it difficult to prioritise.

There is of course a fundamental limitation that source code in the EBCDIC 
environment cannot contain Unicode characters that are not available in that 
environment, so the idea of EBCDIC-based compilers supporting such characters 
directly is a non-starter.  High Level Assembler evolved from Assembler H V2 
(from around 1981) and relies heavily on EBCDIC SBCS conventions, including the 
usual OC with spaces for upper case when processing symbol names or keywords.  
And of course EBCDIC itself is based on the BCD code used for punched cards and 
accounting machines decades earlier, and meant that early System/360 machines 
were compatible with previous hardware such as the 1403 printer.  So there's a 
lot of history behind the limitations of EBCDIC.

EBCDIC is of course a software problem, in that although z/OS and z/VM are 
deeply entangled with it, the IBM Z hardware isn't EBCDIC-specific, and it can 
for example run Linux using ASCII-based encodings including Unicode.  (When 
HLASM runs under Linux, it translates ASCII input files to EBCDIC and works 
internally in EBCDIC, exactly as for z/OS and z/VM, optionally translating the 
listing back to ASCII).

As always, compatibility and innovation may not work well together.  Throughout 
my career, I tried to combine both, which can be exceptionally challenging at 
times, but ensures that all users can continue to move forward without leaving 
anything behind or encountering roadblocks.  I felt that this approach was one 
of IBM's greatest strengths, although not always flawlessly executed.  However, 
I get the impression that many vendors now try to get ahead rapidly by starting 
afresh, ignoring compatibility and leaving existing users behind, and only if 
eventually enough users make a fuss may some sort of migration advice or 
tooling be provided.  When this happens, the users left behind are likely to 
look elsewhere.  I hope IBM continues to remember that innovation without the 
support of compatibility or seamless migration may be the end of the line for 
existing customers.

Jonathan Scott

-----Original Message-----
From: IBM Mainframe Assembler List <ASSEMBLER-LIST@LISTSERV.UGA.EDU> On Behalf 
Of Paul Gilmartin
Sent: 26 August 2025 17:46
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Is HLASM efficient WAS: Telum and SpyreWAS: Vector instruction 
performance

On 8/26/25 10:14, Seymour J Metz wrote:
> 26? 52? That's very Anglocentric. Why not any alphabetic Unicode character? 
> EBCDIC was great in its day, but these days 256 code points is not nearly 
> enough.
>     ...
Yes, but they can be overloaded (037, 500, 1047, ...)

Linux allows ISO-8859 in pathnames.  MacOS enforces UTF-8.  You may argue (I 
expect you will) with that design decision.  But at least it's enforced 
uniformly at the filesystem level, not chaotically, as by MVS in middleware.

More than 256?  Unicode?  In JCL, TSO TMP, Data Management,..., That's a great 
Idea!  Have you submitted it?

--
gil

Reply via email to