On 2017-09-05 15:41, Walt Farrell wrote:
On Tue, 5 Sep 2017 10:19:45 -0500, Paul Gilmartin <[email protected]>
wrote:

What language(s) cleanly handle vertical alignment of formatted text output
when the text contains UTF-16 supplemental/surrogate (not in the BMP)
characters? Here's an example of /bin/printf's failure for similar input
with UTF-8 on MacOS:

The script: printf "%-22s+++\n" "Hello World." printf "%-22s+++\n" "Привет
мир." printf "%-22s+++\n" "Bonjour le monde."

writes: Hello World.          +++ Привет мир.  +++ Bonjour le monde.
+++

I wish the "+++" would line up (at least in a monospaced font). What sort
of PICTURE would work for such, not restricting to BMP?

It would take more than a simple script like that, but with programming it
can be done. I have a Python program that does it, for example. The key is
understanding that some characters don't take up any space when printed
(combining characters, for example), and therefore don't contribute to the
length of the output string. When those characters are present you need to
pad the end with blanks if you want a fixed width output string.

And that is exactly what I'm doing with my translate/sum method. I know that any character that starts with the orange bytes in <https://en.wikipedia.org/wiki/UTF-8#Codepage_layout> is a non-printing one (and yes a few exceptions that I do not cater for, assuming the non-z/OS file to contain correct UTF-8) and the translate just sets them to zero.

As I wrote, it works like a charm, but may not be the most efficient way of doing things, although, given the (still) limited amount of UTF-8 text that has to undergo this kind of processing, it's probably way faster than converting the entire file into a multi-byte format, and using PL/I WCHAR's and the ULENGTH() builtin, which must, in its implementation, do something pretty similar anyway.

Robert
--
Robert AH Prins
robert(a)prino(d)org

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to