RE: ignoring chinese characters parsing xml file

Ryan Ginstrom Mon, 22 Oct 2007 15:05:02 -0700

> On Behalf Of Fabian Lopez
> like ^�u�u啖啖才是�w.���扉L锍才是�� or ヘアアイロン... The problem is that
I get


Just thought I'd point out here that the second string is Japanese, not
Chinese.

>From your second post, it appears that you've parsed the text without
problems -- it's when you go to print them out that you get the error. This
is no doubt because your default encoding can't handle Chinese/Japanese
characters. I can imagine several ways to fix this, including encoding the
text in utf-8 for printout.

If you really want to strip out Asian characters, here's a way:

def strip_asian(text):
    """"Returns the Unicode string text, minus any Asian characters"""
    return u''.join([x for x in text if ord(x) < 0x3000])



Regards,
Ryan Ginstrom

-- 
http://mail.python.org/mailman/listinfo/python-list

RE: ignoring chinese characters parsing xml file

Reply via email to