[issue43950] Include column offsets for bytecode instructions

Terry J. Reedy Mon, 19 Jul 2021 14:03:31 -0700

Terry J. Reedy <tjre...@udel.edu> added the comment:

The effort to match caret lines to general unicode is similar to a previous 
issue that was closed as futile.  (But I could not find it.)  It has a downside 
that should be considered.


The fundamental problem is that there is no fixed pitch font for unicode. (Let 
alone any font covering all of unicode.) Nor is there a single-double width 
definition, or font, for all of unicode.  Some character sets are not amenable 
to such treatment.

To see the problem easier, open, for instance, IDLE's option/settings dialog, 
showing the fonts tab and a multi-script sample.  On Windows, select what I 
believe is the most 'fixed' font -- Courier New.  ASCII, Latin1, IPA, Greek, 
Cyrillic, Hebrew, and Arabic are all rendered in the same fixed pitch.  But the 
last 4 Cyrillic characters of "...ЪъЭэѠѤѬӜ" are extremely cramped and may be 
rendered differently from the rest.  The East Asian characters are in a 
different fixed pitch, about 1.6 times the Ascii, etc. fixed pitch.  (So the 
double-wide 2 is 1.6 rounded up.  And with some fonts, the East Asian scripts 
are not all the same pitch.)  The South Asian script are variable pitch and for 
the sample chars, average wider than 1 (they have 20 chars, like the Ascii, 
etc, lines).  Tamil, especially, has a wide range of widths, with the widest as 
wide as the East Asian chars.

On Windows, on my machine, pasting the sample text between quotes results in 
the Greek chars, the last 4 Cyrillic chars, and all Asian chars (including 
Hebrew and Arabic) being replaced by replacement chars.  (I thought that this 
was better once, but maybe I mis-remember.)  While one can get script-specific 
fonts, the fixed-pitch South Asian fonts I tried on Mac were hardly readable.  
My conclusion is that people using certain scripts and anyone wanting a wide 
variety of scripts needs to use a GUI-based editor and shell rather than a 
fixed-pitch terminal/console.

As long as the caret line has 1 char per code char, a GUI program can use it to 
mark code characters, and do so differently for '~' and '^'.  If some of these 
chars are doubled, exact character information is lost.  If you go ahead with 
this, please use a third character, such as '-', for additions.  GUI programs 
could then ignore these, given that they can otherwise can get the start 
character information.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue43950>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43950] Include column offsets for bytecode instructions

Reply via email to