On Sun, Aug 18, 2013 at 10:01:51PM +0300, Daniel Shahaf wrote:
> Ivan Zhakov wrote on Sun, Aug 18, 2013 at 22:04:58 +0400:
> > >   * r1514785
> > >     ra_serf: Improve SSL certificate verification failure message.
> > > @@ -211,6 +210,8 @@ Candidate changes:
> > >       informative. Regression from Subversion 1.7.x
> > >     Votes:
> > >       +1: ivan, stefan2
> > > +     danielsh: I believe chopping off the last 2 bytes is wrong, _(", ")
> > > would
> > > +       be longer than two bytes in Japanese locale.
> > 
> >  Actually not, because we use UTF8 internally so ', ' will be always two
> > bytes long.
> 
> Yes, ", " will be two bytes long, but _(", ") may be any number of
> bytes.  It is not guaranteed that the localised version ends with an
> ASCII comma and an ASCII space; it might end with a character whose
> representation has three bytes.
> 

Case in point: 

    >>> unicodedata.lookup('ARABIC COMMA').encode('utf-8')
    b'\xd8\x8c'

If we add an Arabic localization, the localised version would end with bytes
D8 8C 20 00, and chopping off two bytes would result in a bytestring that ends
with D8 00, which is invalid UTF-8.

Daniel

> > String will be convert to required console locale if needed.
> > The code could be improved btw: remove ', ' and ': ' from loclized strings
> > and them seaparately to prevent translators broke output accidently.
> > But it does not prevent backport this change IMHO.

Reply via email to