Re: [RP] [PATCH 1/3 v2] Limit width of formatted text by characters rather than bytes

Will Storey Tue, 29 Aug 2017 22:24:01 -0700

On Mon 2017-08-28 20:50:19 +0200, Jeremie Courreges-Anglas wrote:
> 
> Hi Will,


Hi!

Thank you for looking at this.

> First, thanks for your submission.  You're dealing with a known problem.
> 
> The direction taken so far in ratpoison was: don't deal with wide
> characters, only handle UTF-8 in a rather dumb but at least simple way.
> 
> Rationale:
> - the wide characters API has a lot of gotchas.  I won't detail them
>   here but what to do in case of an invalid sequence often remains an
>   open question.  Here, I can see that you return a partial length
>   early.  I'm not sure this is desirable.

I see. I'm not super familiar with the wchar.h API. I was not aware
ratpoison had functionality for this!

Regarding returning on invalid characters: Another option we could do would
be to replace them with U+FFFD.

> - UTF-8 is easy and looks like the sanest choice for a multibyte locale.
>   No offense, but other less commonly used locales are just a pain to
>   handle.  Think state-dependant encodings.

Well, even with UTF-8 it is not so easy to do everything perfectly! I'm not
sure how wchar.h deals with there being combining characters in weird spots
for instance. That might be something to look at if we ever revisited using
it.

> So while technically speaking the wide characters API looks like the
> obvious choice, I think its cost is a bit high.  Consistency is good.
> If we start using the wide chars API somewhere, it should be used in all
> places where it makes sense.  I'm not sure this is an easy task even in
> ratpoison. :)
>
> Handling only UTF-8 as a multibyte locale, the tentative diff below
> seems to do the job.  *WARNING*: I have barely tested it with your html
> testcase.
> 
> Feedback / test reports welcome.

Cool! Thanks for writing that. I've tried it out and it works well.

I'm in agreement about only worrying about support for UTF-8.

After looking at the UTF8 macros, one thought I have is we could improve
this to be more conservative about what we accept. For example, only
consuming two, three, or four bytes when the first byte indicates that is
appropriate, rather than having no limit. I suppose it depends how far we
want to go in writing UTF-8 decoding.

Anyway, I think it is a big improvement as is.

It also still might be good to have a few unit tests. Would you be okay
with tests in the form of my third patch?

Thanks again!

_______________________________________________
Ratpoison-devel mailing list
Ratpoison-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/ratpoison-devel

Re: [RP] [PATCH 1/3 v2] Limit width of formatted text by characters rather than bytes

Reply via email to