Thanks, I have tried all you told me. It was an error on print statement. So
I decided to catch the exception if I had an UnicodeEncodeError, that is, if
I had chinese/japanese characters because they don't interest to me and it
worked.
The strip_asian function of Ryan didn't work well here, but it
On 10/23/07, Stefan Behnel <[EMAIL PROTECTED]> wrote:
> Fabian López wrote:
> > Thanks Mark, the code is like this. The attrib name is the problem:
> >
> > from lxml import etree
> >
> > context = etree.iterparse("file.xml")
> > for action, elem in context:
> > if elem.tag == "weblog":
> >
On 10/23/07, Fabian López <[EMAIL PROTECTED]> wrote:
> Hi,
> I am parsing an XML file that includes chineses characters, like
> ^�u�u啖啖才是�w.���扉L锍才是�� or ヘアアイロン... The problem is that I get an error like:
> UnicodeEncodeerror:'charmap' codec can't encode characters in position
> The thing is th
Fabian López wrote:
> Thanks Mark, the code is like this. The attrib name is the problem:
>
> from lxml import etree
>
> context = etree.iterparse("file.xml")
> for action, elem in context:
> if elem.tag == "weblog":
> print action, elem.tag , elem.attrib["name"],elem.attrib["url"],
> On Behalf Of Fabian Lopez
> like ^�u�u啖啖才是�w.���扉L锍才是�� or ヘアアイロン... The problem is that
I get
Just thought I'd point out here that the second string is Japanese, not
Chinese.
>From your second post, it appears that you've parsed the text without
problems -- it's when you go to print them out
Thanks Mark, the code is like this. The attrib name is the problem:
from lxml import etree
context = etree.iterparse("file.xml")
for action, elem in context:
if elem.tag == "weblog":
print action, elem.tag , elem.attrib["name"],elem.attrib["url"],
elem.attrib["rssUrl"]
And the xml fi
On Mon, 22 Oct 2007 21:24:40 +0200, Fabian López wrote:
> I am parsing an XML file that includes chineses characters, like ^
> uu啖啖才是w.扉L锍才是 or ヘアアイロン... The problem is that I get an error like:
> UnicodeEncodeerror:'charmap' codec can't encode characters in
> position..
You say you are *parsing*
Hi,
I am parsing an XML file that includes chineses characters, like ^
�u�u啖啖才是�w.���扉L锍才是�� or ヘアアイロン... The problem is that I get an error like:
UnicodeEncodeerror:'charmap' codec can't encode characters in position
The thing is that I would like to ignore it and parse all the characters
less