On Sat, Sep 05, 2015 at 01:26:18PM +0200, Stefan Sperling wrote:
> I can't see where you're checking for overlong UTF-8 sequences, for example.
It is somewhere in there
+ } else if ((e & 0xe0) == 0xc0) { /* 11 bit code point */
+ state = 1;
+ c = (e & 0x1f) << 6;
[snip]
+ /*
+ * Check that the header byte has some non-zero data
+ * after masking off the length marker. If not it is
+ * an invalid encoding.
+ */
+ if (c == 0) {
+ bad_encoding:
That being said, I find that state variable danse in utf8_decode() very ugly
and confusing -- but then I'm not a developer so I better shut up.