Re: ignoring chinese characters parsing xml file

2007-10-23 Thread Fabian López
Thanks, I have tried all you told me. It was an error on print statement. So I decided to catch the exception if I had an UnicodeEncodeError, that is, if I had chinese/japanese characters because they don't interest to me and it worked. The strip_asian function of Ryan didn't work well here, but it

Re: ignoring chinese characters parsing xml file

2007-10-23 Thread limodou
On 10/23/07, Stefan Behnel <[EMAIL PROTECTED]> wrote: > Fabian López wrote: > > Thanks Mark, the code is like this. The attrib name is the problem: > > > > from lxml import etree > > > > context = etree.iterparse("file.xml") > > for action, elem in context: > > if elem.tag == "weblog": > >

Re: ignoring chinese characters parsing xml file

2007-10-23 Thread limodou
On 10/23/07, Fabian López <[EMAIL PROTECTED]> wrote: > Hi, > I am parsing an XML file that includes chineses characters, like > ^�u�u啖啖才是�w.���扉L锍才是�� or ヘアアイロン... The problem is that I get an error like: > UnicodeEncodeerror:'charmap' codec can't encode characters in position > The thing is th

Re: ignoring chinese characters parsing xml file

2007-10-22 Thread Stefan Behnel
Fabian López wrote: > Thanks Mark, the code is like this. The attrib name is the problem: > > from lxml import etree > > context = etree.iterparse("file.xml") > for action, elem in context: > if elem.tag == "weblog": > print action, elem.tag , elem.attrib["name"],elem.attrib["url"],

RE: ignoring chinese characters parsing xml file

2007-10-22 Thread Ryan Ginstrom
> On Behalf Of Fabian Lopez > like ^�u�u啖啖才是�w.���扉L锍才是�� or ヘアアイロン... The problem is that I get Just thought I'd point out here that the second string is Japanese, not Chinese. >From your second post, it appears that you've parsed the text without problems -- it's when you go to print them out

Re: ignoring chinese characters parsing xml file

2007-10-22 Thread Fabian López
Thanks Mark, the code is like this. The attrib name is the problem: from lxml import etree context = etree.iterparse("file.xml") for action, elem in context: if elem.tag == "weblog": print action, elem.tag , elem.attrib["name"],elem.attrib["url"], elem.attrib["rssUrl"] And the xml fi

Re: ignoring chinese characters parsing xml file

2007-10-22 Thread Marc 'BlackJack' Rintsch
On Mon, 22 Oct 2007 21:24:40 +0200, Fabian López wrote: > I am parsing an XML file that includes chineses characters, like ^ > uu啖啖才是w.扉L锍才是 or ヘアアイロン... The problem is that I get an error like: > UnicodeEncodeerror:'charmap' codec can't encode characters in > position.. You say you are *parsing*

ignoring chinese characters parsing xml file

2007-10-22 Thread Fabian López
Hi, I am parsing an XML file that includes chineses characters, like ^ �u�u啖啖才是�w.���扉L锍才是�� or ヘアアイロン... The problem is that I get an error like: UnicodeEncodeerror:'charmap' codec can't encode characters in position The thing is that I would like to ignore it and parse all the characters less