On Tue, Sep 10, 2013 at 07:44:39PM +0200, Volker Lendecke wrote: > On Tue, Sep 10, 2013 at 09:48:57AM -0700, Jeremy Allison wrote: > > It's an old, old check back from when SJIS and EUC were > > common multi-byte systems. > > > > SJIS especially has the property that the second byte > > can contain a value <127 as part of the 2-byte char > > set. So if CH_UNIX is set to a char set with such a > > property we can't walk it as bytes, but must see if > > a pair of values [0] (> 0x80) [1] (any value) can be > > converted into a valid multi-byte char, in which case > > we ignore it (otherwise we might look at the second > > byte value of ':' or something and consider it invalid). > > > > I thought about removing this and re-writing it, but > > it made my brain hurt (and might break some very old > > systems :-). So moving to next_codepoint() which checks > > the next char len without causing the conversion error > > messages seemed the simplest fix :-). > > Thanks! +1 from me.
Actually - your question made me think about this some more and I think I can easily simplify this - due to the fact that no encoding with a length > 1 can contain invalid characters (which are all ASCII < 0x80). So here is the fix I'd like to commit to master, and then I'll create a bug and back-port for 4.1.0, 4.0.next and 3.6.next. Please re-review (sorry :-). Jeremy
-- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba