Re: Validate string as UTF-8?

Tony Nelson Sun, 06 Nov 2005 12:51:22 -0800

In article <[EMAIL PROTECTED]>,
 "Fredrik Lundh" <[EMAIL PROTECTED]> wrote:


> Tony Nelson wrote:
> 
> > I'd like to have a fast way to validate large amounts of string data as
> > being UTF-8.
> 
> define "validate".

All data conforms to the UTF-8 encoding format.  I can stand if someone 
has made data that impersonates UTF-8 that isn't really Unicode.


> > I don't see a fast way to do it in Python, though:
> >
> >     unicode(s,'utf-8').encode('utf-8)
> 
> if "validate" means "make sure the byte stream doesn't use invalid
> sequences", a plain
> 
>     unicode(s, "utf-8")
> 
> should be sufficient.

You are correct.  I misunderstood what was happening in my code.  I 
apologise for wasting bandwidth and your time (and I wasted my own time 
as well).

Indeed, unicode(s, 'utf-8') will catch the problem and is fast enough 
for my purpose, adding about 25% to the time to load a file.
________________________________________________________________________
TonyN.:'                        [EMAIL PROTECTED]
      '                                  <http://www.georgeanelson.com/>
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Validate string as UTF-8?

Reply via email to