On 2017-09-05 15:41, Walt Farrell wrote:
On Tue, 5 Sep 2017 10:19:45 -0500, Paul Gilmartin <[email protected]>
wrote:
What language(s) cleanly handle vertical alignment of formatted text output
when the text contains UTF-16 supplemental/surrogate (not in the BMP)
characters? Here's an example of /bin/printf's failure for similar input
with UTF-8 on MacOS:
The script: printf "%-22s+++\n" "Hello World." printf "%-22s+++\n" "Привет
мир." printf "%-22s+++\n" "Bonjour le monde."
writes: Hello World. +++ Привет мир. +++ Bonjour le monde.
+++
I wish the "+++" would line up (at least in a monospaced font). What sort
of PICTURE would work for such, not restricting to BMP?
It would take more than a simple script like that, but with programming it
can be done. I have a Python program that does it, for example. The key is
understanding that some characters don't take up any space when printed
(combining characters, for example), and therefore don't contribute to the
length of the output string. When those characters are present you need to
pad the end with blanks if you want a fixed width output string.
And that is exactly what I'm doing with my translate/sum method. I know that any
character that starts with the orange bytes in
<https://en.wikipedia.org/wiki/UTF-8#Codepage_layout> is a non-printing one (and
yes a few exceptions that I do not cater for, assuming the non-z/OS file to
contain correct UTF-8) and the translate just sets them to zero.
As I wrote, it works like a charm, but may not be the most efficient way of
doing things, although, given the (still) limited amount of UTF-8 text that has
to undergo this kind of processing, it's probably way faster than converting the
entire file into a multi-byte format, and using PL/I WCHAR's and the ULENGTH()
builtin, which must, in its implementation, do something pretty similar anyway.
Robert
--
Robert AH Prins
robert(a)prino(d)org
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN