In article ,
Stefan Behnel wrote:
>
>Try
>
> import xml.etree.cElementTree as etree
>
>instead. Note the leading "c", which hints at the C implementations of
>ElementTree. It's much faster and much more memory friendly than the Python
>implementation.
Thanks! I updated our codebase this a
You should look into vtd-xml, available in c, c++, java and c#.
On Dec 20, 11:34 am, spaceman-spiff wrote:
> Hi c.l.p folks
>
> This is a rather long post, but i wanted to include all the details &
> everything i have tried so far myself, so please bear with me & read the
> entire boringly long
maybe you can try http://vtd-xml.sourceforge.net/
--
http://mail.python.org/mailman/listinfo/python-list
In article ,
"BartC" wrote:
> Still, that's 27 times as much as it need be. Readability is fine, but why
> does the full, expanded, human-readable textual format have to be stored on
> disk too, and for every single instance?
Well, I know the answer to that one. The particular XML feed I'm
wo
"BartC" writes:
>> Roy Smith, 28.12.2010 00:21:
>>> To go back to my earlier example of
>>>
>>> FALSE
>>>
>
> Isn't it possible for XML to define a shorter alias for these tags? Isn't
> there a shortcut available for in simple examples like
> this (I seem to remember something like this
"Stefan Behnel" wrote in message
news:mailman.335.1293516506.6505.python-l...@python.org...
Roy Smith, 28.12.2010 00:21:
To go back to my earlier example of
FALSE
using 432 bits to store 1 bit of information, stuff like that doesn't
happen in marked-up text documents. Most of the
On Tue, 2010-12-28 at 07:08 +0100, Stefan Behnel wrote:
> Roy Smith, 28.12.2010 00:21:
> > To go back to my earlier example of
> > FALSE
> > using 432 bits to store 1 bit of information, stuff like that doesn't
> > happen in marked-up text documents. Most of the file is CDATA (do they
> >
Alan Meyer, 28.12.2010 01:29:
On 12/27/2010 4:55 PM, Stefan Behnel wrote:
From my experience, SAX is only practical for very simple cases where
little state is involved when extracting information from the parse
events. A typical example is gathering statistics based on single tags -
not a very
Alan Meyer, 28.12.2010 03:18:
By the way Stefan, please don't take any of my comments as complaints.
I don't. After all, this discussion is more about the general data format
than the specific tools.
I use lxml more and more in my work. It's fast, functional and pretty elegant.
I've writt
Roy Smith, 28.12.2010 00:21:
To go back to my earlier example of
FALSE
using 432 bits to store 1 bit of information, stuff like that doesn't
happen in marked-up text documents. Most of the file is CDATA (do they
still use that term in XML, or was that an SGML-ism only?). The markup
i
By the way Stefan, please don't take any of my comments as complaints.
I use lxml more and more in my work. It's fast, functional and pretty
elegant.
I've written a lot of code on a lot of projects in my 35 year career but
I don't think I've written anything anywhere near as useful to anywher
On 12/27/2010 6:21 PM, Roy Smith wrote:
... In the old days, they used to say, "Nobody ever got
fired for buying IBM". Relational databases have pretty much gotten to
that point
That's _exactly_ the comparison I had in mind too.
I once worked for a company that made a pitch to a big pot
On 12/27/2010 4:55 PM, Stefan Behnel wrote:
...
From my experience, SAX is only practical for very simple cases where
little state is involved when extracting information from the parse
events. A typical example is gathering statistics based on single tags -
not a very common use case. Anything
Alan Meyer wrote:
> On 12/26/2010 3:15 PM, Tim Harig wrote:
> I agree with you but, as you say, it has become a defacto standard. As
> a result, we often need to use it unless there is some strong reason to
> use something else.
This is certainly true. In the rarified world of usenet, we can
On 2010-12-27, Alan Meyer wrote:
> On 12/26/2010 3:15 PM, Tim Harig wrote:
> ...
>> The problem is that XML has become such a defacto standard that it
>> used automatically, without thought, even when there are much better
>> alternatives available.
>
> I agree with you but, as you say, it has bec
On Mon, 2010-12-27 at 22:55 +0100, Stefan Behnel wrote:
> Alan Meyer, 27.12.2010 21:40:
> > On 12/21/2010 3:16 AM, Stefan Behnel wrote:
> >> Adam Tauno Williams, 20.12.2010 20:49:
> > ...
> >>> You need to process the document as a stream of elements; aka SAX.
> >> IMHO, this is the worst advice yo
Alan Meyer, 27.12.2010 21:40:
On 12/21/2010 3:16 AM, Stefan Behnel wrote:
Adam Tauno Williams, 20.12.2010 20:49:
...
You need to process the document as a stream of elements; aka SAX.
IMHO, this is the worst advice you can give.
Why do you say that? I would have thought that using SAX in t
On 12/26/2010 3:15 PM, Tim Harig wrote:
...
The problem is that XML has become such a defacto standard that it
used automatically, without thought, even when there are much better
alternatives available.
I agree with you but, as you say, it has become a defacto standard. As
a result, we often
On 12/21/2010 3:16 AM, Stefan Behnel wrote:
Adam Tauno Williams, 20.12.2010 20:49:
...
You need to process the document as a stream of elements; aka SAX.
IMHO, this is the worst advice you can give.
Why do you say that? I would have thought that using SAX in this
application is an excelle
On 2010-12-26, Stefan Behnel wrote:
> Tim Harig, 26.12.2010 10:22:
>> On 2010-12-26, Stefan Behnel wrote:
>>> Tim Harig, 26.12.2010 02:05:
On 2010-12-25, Nobody wrote:
> On Sat, 25 Dec 2010 14:41:29 -0500, Roy Smith wrote:
>> Of course, one advantage of XML is that with so much redund
Tim Harig, 26.12.2010 10:22:
On 2010-12-26, Stefan Behnel wrote:
Tim Harig, 26.12.2010 02:05:
On 2010-12-25, Nobody wrote:
On Sat, 25 Dec 2010 14:41:29 -0500, Roy Smith wrote:
Of course, one advantage of XML is that with so much redundant text, it
compresses well. We typically see gzip compr
On 2010-12-26, Stefan Behnel wrote:
> Tim Harig, 26.12.2010 02:05:
>> On 2010-12-25, Nobody wrote:
>>> On Sat, 25 Dec 2010 14:41:29 -0500, Roy Smith wrote:
Of course, one advantage of XML is that with so much redundant text, it
compresses well. We typically see gzip compression ratios
On 2010-12-26, Nobody wrote:
> On Sun, 26 Dec 2010 01:05:53 +, Tim Harig wrote:
>
>>> XML is typically processed sequentially, so you don't need to create a
>>> decompressed copy of the file before you start processing it.
>>
>> Sometimes XML is processed sequentially. When the markup footpr
Tim Harig, 26.12.2010 02:05:
On 2010-12-25, Nobody wrote:
On Sat, 25 Dec 2010 14:41:29 -0500, Roy Smith wrote:
Of course, one advantage of XML is that with so much redundant text, it
compresses well. We typically see gzip compression ratios of 20:1.
But, that just means you can archive them e
On Sun, 26 Dec 2010 01:05:53 +, Tim Harig wrote:
>> XML is typically processed sequentially, so you don't need to create a
>> decompressed copy of the file before you start processing it.
>
> Sometimes XML is processed sequentially. When the markup footprint is
> large enough it must be. Qu
On 2010-12-25, Adam Tauno Williams wrote:
> On Sat, 2010-12-25 at 22:34 +, Nobody wrote:
>> On Sat, 25 Dec 2010 14:41:29 -0500, Roy Smith wrote:
>> XML is typically processed sequentially, so you don't need to create a
>> decompressed copy of the file before you start processing it.
>
> Yep.
On 2010-12-25, Nobody wrote:
> On Sat, 25 Dec 2010 14:41:29 -0500, Roy Smith wrote:
>>> XML works extremely well for large datasets.
> One advantage it has over many legacy formats is that there are no
> inherent 2^31/2^32 limitations. Many binary formats inherently cannot
> support files larger t
"Adam Tauno Williams" wrote in message
news:mailman.287.1293319780.6505.python-l...@python.org...
On Sat, 2010-12-25 at 22:34 +, Nobody wrote:
On Sat, 25 Dec 2010 14:41:29 -0500, Roy Smith wrote:
>> XML works extremely well for large datasets.
One advantage it has over many legacy formats i
On Sat, 2010-12-25 at 22:34 +, Nobody wrote:
> On Sat, 25 Dec 2010 14:41:29 -0500, Roy Smith wrote:
> >> XML works extremely well for large datasets.
> One advantage it has over many legacy formats is that there are no
> inherent 2^31/2^32 limitations. Many binary formats inherently cannot
> su
On Sat, 25 Dec 2010 14:41:29 -0500, Roy Smith wrote:
>> XML works extremely well for large datasets.
One advantage it has over many legacy formats is that there are no
inherent 2^31/2^32 limitations. Many binary formats inherently cannot
support files larger than 2GiB or 4Gib due to the use of 32
Am 25.12.2010 20:41, schrieb Roy Smith:
In article,
Adam Tauno Williams wrote:
XML works extremely well for large datasets.
Barf. I'll agree that there are some nice points to XML. It is
portable. It is (to a certain extent) human readable, and in a pinch
you can use standard text tools
In article ,
Adam Tauno Williams wrote:
> XML works extremely well for large datasets.
Barf. I'll agree that there are some nice points to XML. It is
portable. It is (to a certain extent) human readable, and in a pinch
you can use standard text tools to do ad-hoc queries (i.e. grep for a
On 2010-12-25, Steve Holden wrote:
> On 12/23/2010 4:34 PM, Stefan Sonnenberg-Carstens wrote:
>> For large datasets I always have huge question marks if one says "xml".
>> But I don't want to start a flame war.
I would agree; but, you don't always have the choice over the data format
that you hav
"Steve Holden" wrote:
>On 12/23/2010 4:34 PM, Stefan Sonnenberg-Carstens wrote:
>> For large datasets I always have huge question marks if one says
>"xml".
>> But I don't want to start a flame war.
>I agree people abuse the "spirit of XML" using it to transfer gigabytes
>of data,
How so? I th
Steve Holden, 25.12.2010 16:55:
On 12/23/2010 4:34 PM, Stefan Sonnenberg-Carstens wrote:
For large datasets I always have huge question marks if one says "xml".
But I don't want to start a flame war.
I agree people abuse the "spirit of XML" using it to transfer gigabytes
of data
I keep readi
On 12/23/2010 4:34 PM, Stefan Sonnenberg-Carstens wrote:
> For large datasets I always have huge question marks if one says "xml".
> But I don't want to start a flame war.
I agree people abuse the "spirit of XML" using it to transfer gigabytes
of data, but what else are they to use?
regards
Stev
Am 23.12.2010 21:27, schrieb Nobody:
On Wed, 22 Dec 2010 23:54:34 +0100, Stefan Sonnenberg-Carstens wrote:
Normally (what is normal, anyway?) such files are auto-generated,
and are something that has a apparent similarity with a database query
result, encapsuled in xml.
Most of the time the str
On Wed, 22 Dec 2010 23:54:34 +0100, Stefan Sonnenberg-Carstens wrote:
> Normally (what is normal, anyway?) such files are auto-generated,
> and are something that has a apparent similarity with a database query
> result, encapsuled in xml.
> Most of the time the structure is same for every "row"
Am 20.12.2010 20:34, schrieb spaceman-spiff:
Hi c.l.p folks
This is a rather long post, but i wanted to include all the details& everything i
have tried so far myself, so please bear with me& read the entire boringly long
post.
I am trying to parse a ginormous ( ~ 1gb) xml file.
0. I am a
On 12/20/2010 12:33 PM, Adam Tauno Williams wrote:
On Mon, 2010-12-20 at 12:29 -0800, spaceman-spiff wrote:
I need to detect them& then for each 1, i need to copy all the
content b/w the element's start& end tags& create a smaller xml
file.
Yep, do that a lot; via iterparse.
1. Can you po
spaceman-spiff, 20.12.2010 21:29:
I am sorry i left out what exactly i am trying to do.
0. Goal :I am looking for a specific element..there are several 10s/100s
occurrences of that element in the 1gb xml file.
The contents of the xml, is just a dump of config parameters from a packet
switch( a
Adam Tauno Williams, 20.12.2010 20:49:
On Mon, 2010-12-20 at 11:34 -0800, spaceman-spiff wrote:
This is a rather long post, but i wanted to include all the details&
everything i have tried so far myself, so please bear with me& read
the entire boringly long post.
I am trying to parse a ginormou
On 2010-12-20, spaceman-spiff wrote:
> 0. Goal :I am looking for a specific element..there are several 10s/100s
> occurrences of that element in the 1gb xml file. The contents of the xml,
> is just a dump of config parameters from a packet switch( although imho,
> the contents of the xml dont mat
On 12/20/2010 2:49 PM, Adam Tauno Williams wrote:
Yes, this is a terrible technique; most examples are crap.
Yes, this is using DOM. DOM is evil and the enemy, full-stop.
You're still using DOM; DOM is evil.
For serial processing, DOM is superfluous superstructure.
For random access pr
On Mon, 2010-12-20 at 12:29 -0800, spaceman-spiff wrote:
> I need to detect them & then for each 1, i need to copy all the
> content b/w the element's start & end tags & create a smaller xml
> file.
Yep, do that a lot; via iterparse.
> 1. Can you point me to some examples/samples of using SAX,
>
Hi Usernet
First up, thanks for your prompt reply.
I will make sure i read RFC1855, before posting again, but right now chasing a
hard deadline :)
I am sorry i left out what exactly i am trying to do.
0. Goal :I am looking for a specific element..there are several 10s/100s
occurrences of that
On Mon, 2010-12-20 at 11:34 -0800, spaceman-spiff wrote:
> Hi c.l.p folks
> This is a rather long post, but i wanted to include all the details &
> everything i have tried so far myself, so please bear with me & read
> the entire boringly long post.
> I am trying to parse a ginormous ( ~ 1gb) xml f
[Wrapped to meet RFC1855 Netiquette Guidelines]
On 2010-12-20, spaceman-spiff wrote:
> This is a rather long post, but i wanted to include all the details &
> everything i have tried so far myself, so please bear with me & read
> the entire boringly long post.
>
> I am trying to parse a ginormous
48 matches
Mail list logo