Re: Parsing unicode (devanagari) text with xml.dom.minidom

2009-03-08 Thread rparimi
On Mar 8, 12:42 am, Stefan Behnel wrote: > rpar...@gmail.com wrote: > > I am trying to process an xml file that contains unicode characters > > (seehttp://vyakarnam.wordpress.com/). Wordpress allows exporting the > > entire content of the website into an xml file. Using > > xml.dom.minidom,  I wro

Re: Parsing unicode (devanagari) text with xml.dom.minidom

2009-03-08 Thread Martin v. Löwis
> For the described problem, maybe. But certainly not for the application. > The background was parsing the XML dump of an entire web site, which I > would expect to be larger than what minidom is designed to handle > gracefully. Switching to cElementTree before major code gets written is > almost

Re: Parsing unicode (devanagari) text with xml.dom.minidom

2009-03-08 Thread Stefan Behnel
Martin v. Löwis wrote: >> Regarding minidom, you might be happier with the xml.etree package that >> comes with Python2.5 and later (it's also avalable for older versions). >> It's a lot easier to use, more memory friendly and also much faster. > > OTOH, choice of XML library is completely irrelev

Re: Parsing unicode (devanagari) text with xml.dom.minidom

2009-03-08 Thread Martin v. Löwis
> Regarding minidom, you might be happier with the xml.etree package that > comes with Python2.5 and later (it's also avalable for older versions). > It's a lot easier to use, more memory friendly and also much faster. OTOH, choice of XML library is completely irrelevant for the issue at hand. If

Re: Parsing unicode (devanagari) text with xml.dom.minidom

2009-03-07 Thread Stefan Behnel
rpar...@gmail.com wrote: > I am trying to process an xml file that contains unicode characters > (see http://vyakarnam.wordpress.com/). Wordpress allows exporting the > entire content of the website into an xml file. Using > xml.dom.minidom, I wrote a few lines of python code to parse out the > xm