Heyho On Thu, Mar 20, 2014 at 5:39 PM, Damian Okrasa <dokr...@gmail.com> wrote: > Hey, > > this patch replaces current utf decoder with a new one, which is ~50 > lines shorter and should be easier to understand. Parsing 5 and 6 > sequences, if necessary, requires trivial modification of UTF_SIZ > constant and utfbyte, utfmask, utfmin, utfmax arrays.
I can't yet claim to fully understand the code but according to my testing with https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt the behavior of the decoder has not changed a bit which I'll assume is a good thing. "Benchmarking" the decoder with time for i in `seq 10000`; do cat UTF-8-test.txt; done; did not seem to highlight any significant differences either. I will stare at the code some more but so far it looks good to me. Cheers, Silvan