Re: xpathEval fails for large files

2008-07-22 Thread Kanch
On Jul 23, 2:03 am, Stefan Behnel <[EMAIL PROTECTED]> wrote:
> Fredrik Lundh wrote:
> > Kanchana wrote:
>
> >> I tried to extract some data with xpathEval. Path contain more than
> >> 100,000 elements.
>
> >> doc = libxml2.parseFile("test.xml")
> >> ctxt = doc.xpathNewContext()
> >> result = ctxt.xpathEval('//src_ref/@editions')
> >> doc.freeDoc()
> >> ctxt.xpathFreeContext()
>
> >> this will stuck in following line and will result in high usage of
> >> CPU.
> >> result = ctxt.xpathEval('//src_ref/@editions')
>
> >> Any suggestions to resolve this.
>
> > what happens if you just search for "//src_ref"?  what happens if you
> > use libxml's command line tools to do the same search?
>
> >> Is there any better alternative to handle large documents?
>
> > the raw libxml2 API is pretty hopeless; there's a much nicer binding
> > called lxml:
>
> >http://codespeak.net/lxml/
>
> > but that won't help if the problem is with libxml2 itself, though
>
> It may still help a bit as lxml's setup of libxml2 is pretty memory friendly
> and hand-tuned in a lot of places. But it's definitely worth trying with both
> cElementTree and lxml to see what works better for you. Depending on your
> data, this may be fastest in lxml 2.1:
>
> doc = lxml.etree.parse("test.xml")
> for el in doc.iter("src_ref"):
> attrval = el.get("editions")
> if attrval is not None:
> # do something
>
> Stefan

Original file was 18MB, and contained 288328 element attributes for
the particular path.
I wonder whether for loop will cause a problem in iterating for 288328
times.
--
http://mail.python.org/mailman/listinfo/python-list


Re: xpathEval fails for large files

2008-07-23 Thread Kanch
On Jul 23, 11:05 am, Stefan Behnel <[EMAIL PROTECTED]> wrote:
> Kanch wrote:
> > Original file was 18MB, and contained 288328 element attributes for
> > the particular path.
>
> You didn't say how many elements there are in total, but I wouldn't expect
> that to be a problem, unless you have very little free memory (say, way below
> 256MB). I just tried with lxml 2.1 and a 40MB XML file with 300 000 elements
> and it lets the whole Python interpreter take up some 140MB of memory in
> total. Looping over all elements by calling "list(root.iter())" takes a bit
> more than one second on my laptop. That suggests that /any/ solution involving
> lxml (or cElementTree) will do just fine for you.
>
> > I wonder whether for loop will cause a problem in iterating for 288328
> > times.
>
> You are heavily underestimating the power of the Python here.
>
> Stefan

Hi,

thanks for the help. lxml will suit my work. I have not being working
with python for that long. :)

Kanch
--
http://mail.python.org/mailman/listinfo/python-list