On 2 April 2017 at 00:22, Chris Angelico <ros...@gmail.com> wrote: > On Sun, Apr 2, 2017 at 8:16 AM, Mikhail V <mikhail...@gmail.com> wrote: >> For multiple-alphabet rendering I will use some >> custom text format, e.g. with tags >> <s="Voynich"> ... </s>, and for latin >> <s="Latin">...</s> and etc. >> >> Simple and effective. > > For multi-alphabet rendering, I would rather use an even simpler > format: Remove the tags and use a consistent encoding.
No, flat encoding would not be simpler, it would be simpler only and only if you take a text with several alphabets, and mix the data randomly. In real situation, data chunks that use different glyph sets for representation are not mixed in a random manner. Also for different processing purposes tagged structure will be way more effective, e.g. if I want to extract all chunks in alphabet A in a single list with strings, or use advanced search, etc. > > Have you ever actually *used* a system of tagged encodings? It is an > abomination. Not with such encodings (exept my own experiments), but in some sense, I use it every day, e.g. in rich text format (Word, InDesign), you have bold text or italics text - in internal representation it is like tagged text, and adresses different glyph sets. Yes those are [probably] unicode values, but the application deals with those tags at rendering and copy-pasting. IOW such applications technically could cope with encodings in similar manner without problems. -- https://mail.python.org/mailman/listinfo/python-list