Re: [dev] [st] Proposal of changing internal representation

Dimitris Papastamos Sat, 23 Aug 2014 08:48:29 -0700

On Sat, Aug 23, 2014 at 05:35:54PM +0200, Roberto E. Vargas Caballero wrote:
> If the character is a multibyte, we decode it again!!!!. So for
> multibyte characters we:
> 
>       - decode
>       - encode
>       - decode
> 
> It is slow and really ugly. But we have this problem not only in
> tputc. We have a function utf8len:
> 
> 
>       size_t
>       utf8len(char *c) {
>               return utf8decode(c, &(long){0}, UTF_SIZ);
>       }
> 
> That decode again the string because in some places we need the size
> of the utf8 string.


I am not an st developer and not familiar with the code, but the above
approach seems quite crazy...

> I think we should decode the utf8 character in the input, store it
> in raw unicode with 4 bytes, and encode again in output (usually in
> getsel or in printer functions). The memory usage is going to be the
> same, because we store the utf8 string with 'char c[UTF_SIZ]', where
> UTF_SIZE is 4 (although it should be bigger because if we accept
> unicode of 32 bits then we can receive utf8 strings of 6 bytes).

Sounds pretty sensible to me.

Re: [dev] [st] Proposal of changing internal representation

Reply via email to