On Sun, Apr 2, 2017 at 9:25 AM, Mikhail V <mikhail...@gmail.com> wrote: > On 2 April 2017 at 00:22, Chris Angelico <ros...@gmail.com> wrote: >> On Sun, Apr 2, 2017 at 8:16 AM, Mikhail V <mikhail...@gmail.com> wrote: >>> For multiple-alphabet rendering I will use some >>> custom text format, e.g. with tags >>> <s="Voynich"> ... </s>, and for latin >>> <s="Latin">...</s> and etc. >>> >>> Simple and effective. >> >> For multi-alphabet rendering, I would rather use an even simpler >> format: Remove the tags and use a consistent encoding. > > No, flat encoding would not be simpler, it would be simpler only and only > if you take a text with several alphabets, and mix the data randomly. > In real situation, data chunks that use different glyph sets for > representation are not mixed in a random manner. > Also for different processing purposes tagged structure will be way > more effective, e.g. if I want to extract all chunks in alphabet A > in a single list with strings, or use advanced search, etc.
https://github.com/Rosuav/LetItTrans/blob/master/25%20languages.srt Not exactly random, but that's a single file, a single document, using characters from several different scripts. And this is far from the only case of this sort of thing happening. ChrisA -- https://mail.python.org/mailman/listinfo/python-list