Re: UTF-8 string filtering

pizdelect Sat, 05 Sep 2015 06:41:36 -0700

On Sat, Sep 05, 2015 at 01:26:18PM +0200, Stefan Sperling wrote:
> I can't see where you're checking for overlong UTF-8 sequences, for example.


It is somewhere in there

+                       } else if ((e & 0xe0) == 0xc0) { /* 11 bit code point */
+                               state = 1;                                    
+                               c = (e & 0x1f) << 6;                        
[snip]
+                       /*                                                 
+                        * Check that the header byte has some non-zero data
+                        * after masking off the length marker. If not it is
+                        * an invalid encoding.                 
+                        */                                              
+                       if (c == 0) {                                          
+ bad_encoding:                       

That being said, I find that state variable danse in utf8_decode() very ugly 
and confusing -- but then I'm not a developer so I better shut up.

Re: UTF-8 string filtering

Reply via email to