On Tue, Jul 17, 2018 at 6:27 PM, Marko Rauhamaa <ma...@pacujo.net> wrote: >> But of course other people's experience may vary. I'm interested in >> learning about the library you use to process graphemes in your software. > > For me, the issue is where do I produce a line break in my text output? > Currently, I'm just counting codepoints to estimate the width of the > output.
Well, that's just flat out wrong, then. Counting graphemes isn't going to make it any better. Grab a well-known library like Pango and let it do your measurements for you, *in pixels*. Or better still, just poke your text to a dedicated text-display widget and let it display it correctly. Back in the early 2000s, I built a program that displayed text in a monospaced font, and it was riddled with assumptions that "one byte == one character == N pixels of width" (for some value of N that changed only when you change font). It was easier to throw it out completely and start over than to try to "bolt on" true Unicode support. The replacement program uses GTK and Pango to do all its display work, and while it still has a lot of complexities (because it has to handle colour codes, highlighting, point-to-word, and such, all of which get very complicated when you mix LTR and RTL text), at least it can 100% dependably say "wrap to this point". For the convenience of the human using it, it specifies a wrap width in characters, but in the fine print, the wrap width is defined as "the width of that many of the letter 'n' in the chosen font". At no point do I ever count bytes, code units, code points, grapheme clusters, or blue-faced baboons, to try to pretend that I know the width of the string. All of them are wrong for the wrapping of text. ChrisA -- https://mail.python.org/mailman/listinfo/python-list