Re: xpathEval fails for large files
On Jul 23, 2:03 am, Stefan Behnel <[EMAIL PROTECTED]> wrote: > Fredrik Lundh wrote: > > Kanchana wrote: > > >> I tried to extract some data with xpathEval. Path contain more than > >> 100,000 elements. > > >> doc = libxml2.parseFile("test.xml") > >> ctxt = doc.xpathNewContext() > >> result = ctxt.xpathEval('//src_ref/@editions') > >> doc.freeDoc() > >> ctxt.xpathFreeContext() > > >> this will stuck in following line and will result in high usage of > >> CPU. > >> result = ctxt.xpathEval('//src_ref/@editions') > > >> Any suggestions to resolve this. > > > what happens if you just search for "//src_ref"? what happens if you > > use libxml's command line tools to do the same search? > > >> Is there any better alternative to handle large documents? > > > the raw libxml2 API is pretty hopeless; there's a much nicer binding > > called lxml: > > >http://codespeak.net/lxml/ > > > but that won't help if the problem is with libxml2 itself, though > > It may still help a bit as lxml's setup of libxml2 is pretty memory friendly > and hand-tuned in a lot of places. But it's definitely worth trying with both > cElementTree and lxml to see what works better for you. Depending on your > data, this may be fastest in lxml 2.1: > > doc = lxml.etree.parse("test.xml") > for el in doc.iter("src_ref"): > attrval = el.get("editions") > if attrval is not None: > # do something > > Stefan Original file was 18MB, and contained 288328 element attributes for the particular path. I wonder whether for loop will cause a problem in iterating for 288328 times. -- http://mail.python.org/mailman/listinfo/python-list
Re: xpathEval fails for large files
On Jul 23, 11:05 am, Stefan Behnel <[EMAIL PROTECTED]> wrote: > Kanch wrote: > > Original file was 18MB, and contained 288328 element attributes for > > the particular path. > > You didn't say how many elements there are in total, but I wouldn't expect > that to be a problem, unless you have very little free memory (say, way below > 256MB). I just tried with lxml 2.1 and a 40MB XML file with 300 000 elements > and it lets the whole Python interpreter take up some 140MB of memory in > total. Looping over all elements by calling "list(root.iter())" takes a bit > more than one second on my laptop. That suggests that /any/ solution involving > lxml (or cElementTree) will do just fine for you. > > > I wonder whether for loop will cause a problem in iterating for 288328 > > times. > > You are heavily underestimating the power of the Python here. > > Stefan Hi, thanks for the help. lxml will suit my work. I have not being working with python for that long. :) Kanch -- http://mail.python.org/mailman/listinfo/python-list