On Thu, Sep 27, 2018 at 10:06:25PM +0200, Laslo Hunhold wrote: > ... > The function bound() just operates on relatively small LUTs and is > pretty efficient. If we implement a font drawing library in some way, > we will have to think about how we do this special handling right. > Extended grapheme clusters fortunately really stand for themselves and > can be a good "atom" to base font rendering on.
Agreed: the "atom" would be this "extended grapheme cluster", and from this point of view, a terminal would be a grid of "space" and "extended grapheme". > ... > Javascript has its purposes if applied lightly and always as an > afterthought (i.e. the page works 100% without Javascript). Unfortunately, I am still working out some issues before sueing the french administration for that... > This is not a bash or anything but really just due to the fact that all > this processing on higher layers is a question of efficiency, > especially when e.g. the UNIX system tools are used with plain ASCII > data 99% of the time, not requiring all the UTF-8 processing. For pure system tools ofc. But then I would need an i18n terminal for mutt, lynx, etc. > I would not favor such a solution, but this is just my opinion. Idem, for the previous reasons. > ... > I've not yet dared to touch NFD or generally normalization and string > comparison, but for simple stream-based operations and to get a grasp > of a stream and where the bounds for extended grapheme clusters are > you, by definition of bound(), only need to know the current and > previous code point to know when a "drawn character" is finished. > > Still even there we would need bounds, as Unicode sets no limit for the > size of an extended grapheme cluster. But this is a "problem" of the > implementing application itself and not of the library, which I strive > to have no memory allocations at all. Well, there is something about stream safe unicode application. Basically, it is a buffer of 128 bytes (32 unicode points) with a continuation mark if a "extented grapheme cluster" is not finished at the end of the buffer. It seems related only to stream normalization on the fly, though. I did not go that deep into the "extended grapheme cluster" boundaries computation, it seems that everything we need is there, but it raises many more questions, for instance: - how this finite state machine is resilient to garbage data? - can we locate "extended grapheme cluster" boundaries on non normalized unicode? - can we normalize on the fly a "extented grapheme cluster"? - etc... regards, -- Sylvain