Re: XML parsing per record

2005-04-23 Thread Kent Johnson
Willem Ligtenberg wrote: Is there an easy way, to couple data together. Because I have discoverd an irritating feature in the xml file. Sometimes this is a database reference: UCSC 1234 And sometimes: UCS

Re: XML parsing per record

2005-04-22 Thread Fredrik Lundh
Willem Ligtenberg wrote: By the way, I know about findall, but when I iterate thruogh it like: for x in function: print 'function', x I get: function function But ofcourse I want the information in there... for x in function: print 'function', x.text -- http://mail.python.org/mailman/l

Re: XML parsing per record

2005-04-22 Thread Willem Ligtenberg
As you can read in the other post of mine, my problem was with the iterating through the list. didn't know that you should do. e.text. I did only print e, not print e.text Did read documentation, but must admit not everything. Anyway, thank you very much! On Fri, 22 Apr 2005 15:47:08 +0200, Fredr

Re: XML parsing per record

2005-04-22 Thread Fredrik Lundh
Willem Ligtenberg wrote: As I'm trying to write the code using cElementTree. I stumble across one problem. Sometimes there are multiple values to retrieve from one record for the same element. Like this: ATP-binding cassette, subfamily G, member 1 ATP-binding cassette 8 How do you get not only the

Re: XML parsing per record

2005-04-22 Thread Willem Ligtenberg
By the way, I know about findall, but when I iterate thruogh it like: for x in function: print 'function', x I get: function function But ofcourse I want the information in there... On Fri, 22 Apr 2005 15:22:17 +0200, Willem Ligtenberg wrote: > As I'm trying to write the code using cE

Re: XML parsing per record

2005-04-22 Thread Willem Ligtenberg
As I'm trying to write the code using cElementTree. I stumble across one problem. Sometimes there are multiple values to retrieve from one record for the same element. Like this: ATP-binding cassette, subfamily G, member 1 ATP-binding cassette 8 How do you get not only the first, but the rest as w

Re: XML parsing per record

2005-04-22 Thread Willem Ligtenberg
This is all the info I need from the xml file: ID --> 320632 Name --> Pzp Startbase --> 126957426 126989473 51860

Re: XML parsing per record

2005-04-21 Thread William Park
Willem Ligtenberg <[EMAIL PROTECTED]> wrote: > On Sun, 17 Apr 2005 02:16:04 +, William Park wrote: > > Care to post more details? > > The XML file I need to parse contains information about genes. > So the first element is a gene and then there are a lot sub-elements with > sub-elements. I onl

Re: XML parsing per record

2005-04-21 Thread Paul McGuire
Don't assume that just because you have a 2.4G XML file that you have 2.4G of data. Looking at these verbose tags, plus the fact that the XML is pretty-printed (all those leading spaces - not even tabs! - add up), I'm guessing you only have about 5-10% actual data, and the rest is just XML tagging

Re: XML parsing per record

2005-04-21 Thread Simon Brunning
On 4/21/05, Willem Ligtenberg <[EMAIL PROTECTED]> wrote: > Sorry I just decided that I want to use your solution, but I am wondering > is cElemenTree in expat or is that something different? Nope, cElemenTree is very much its own man. See . -- Cheers, Si

Re: XML parsing per record

2005-04-21 Thread Willem Ligtenberg
Sorry I just decided that I want to use your solution, but I am wondering is cElemenTree in expat or is that something different? On Wed, 20 Apr 2005 08:03:00 -0400, Kent Johnson wrote: > Willem Ligtenberg wrote: >>>Willem Ligtenberg <[EMAIL PROTECTED]> wrote: >>> I want to parse a very large

Re: XML parsing per record

2005-04-21 Thread Willem Ligtenberg
I'll first try it using SAX, because I want to have as little dependancies as possible. I already have BioPython as a dependancy. And I personally don't like to install lot's of packages for a program to work. So I don't want to impose that on other people. But thanks anyway and I might go for the

Re: XML parsing per record

2005-04-20 Thread Kent Johnson
Willem Ligtenberg wrote: Willem Ligtenberg <[EMAIL PROTECTED]> wrote: I want to parse a very large (2.4 gig) XML file (bioinformatics ofcourse :)) But I have no clue how to do that. Most things I see read the entire xml file at once. That isn't going to work here ofcourse. So I would like to parse

Re: XML parsing per record

2005-04-20 Thread Willem Ligtenberg
On Sun, 17 Apr 2005 02:16:04 +, William Park wrote: > Willem Ligtenberg <[EMAIL PROTECTED]> wrote: >> I want to parse a very large (2.4 gig) XML file (bioinformatics >> ofcourse :)) But I have no clue how to do that. Most things I see read >> the entire xml file at once. That isn't going to wo

Re: XML parsing per record

2005-04-17 Thread Fredrik Lundh
William Park wrote: You may want to try Expat (www.libexpat.org) or Python wrapper to it. Python comes with a low-level expat wrapper (pyexpat). however, if you want performance, cElementTree (which also uses expat) is a lot faster than pyexpat. (see my other post for links to benchmarks and code)

Re: XML parsing per record

2005-04-16 Thread William Park
Willem Ligtenberg <[EMAIL PROTECTED]> wrote: > I want to parse a very large (2.4 gig) XML file (bioinformatics > ofcourse :)) But I have no clue how to do that. Most things I see read > the entire xml file at once. That isn't going to work here ofcourse. > > So I would like to parse a XML file one

Re: XML parsing per record

2005-04-16 Thread Fredrik Lundh
Kent Johnson wrote: So I would like to parse a XML file one record at a time and then be able to store the information in another object. You might be interested in this recipe using ElementTree: http://online.effbot.org/2004_12_01_archive.htm#element-generator if you have ElementTree 1.2.5 or late

Re: XML parsing per record

2005-04-16 Thread Kent Johnson
Willem Ligtenberg wrote: I want to parse a very large (2.4 gig) XML file (bioinformatics ofcourse :)) But I have no clue how to do that. Most things I see read the entire xml file at once. That isn't going to work here ofcourse. So I would like to parse a XML file one record at a time and then be a

Re: XML parsing per record

2005-04-16 Thread Ivan Voras
Irmen de Jong wrote: XML is not known for its efficiency Surely you are blaspheming, sir! XML's the greatest thing since peanut butter! I'm just *waiting* for the day someone finds its use on the rolls of toilet paper... oh the glorious day... -- http://mail.python.org/mailman/listinfo/py

Re: XML parsing per record

2005-04-16 Thread Irmen de Jong
Willem Ligtenberg wrote: I want to parse a very large (2.4 gig) XML file (bioinformatics ofcourse :)) But I have no clue how to do that. Most things I see read the entire xml file at once. That isn't going to work here ofcourse. So I would like to parse a XML file one record at a time and then be a

XML parsing per record

2005-04-16 Thread Willem Ligtenberg
I want to parse a very large (2.4 gig) XML file (bioinformatics ofcourse :)) But I have no clue how to do that. Most things I see read the entire xml file at once. That isn't going to work here ofcourse. So I would like to parse a XML file one record at a time and then be able to store the informa