On Wed, Mar 5, 2008 at 9:43 PM, Shachar Shemesh <[EMAIL PROTECTED]> wrote: > Gilad Ben-Yossef wrote: > > > Amos Shapira wrote: > > > >> > >> What do people around here like to use for EFFICIENT XML parsing? > >> > > > > Isn't Efficient XML an oxymoron? > > > > Seriously, and despite the flame bait way I've introduced the subject, > > if you need to do XML parsing in a way which is more efficient then > > Xerces, maybe it is an indication that XML is a not a proper way to > > encode you r data. > I'll bite.
Thanks to everyone for your answers. I'm replying to Shachar's reply because his is the closest to what I have to add to this, plus some more info about my question as I learned since I sent it. > > Without knowing Xerces too deeply, I think you can do MUCH faster than > it, by feeding the schema before hand. Theoretically (though, the last Xerces is apparently "the Lincoln of XML parsers" i.e. it supports everything there is to support in the standard but it comes with a huge weight attached to it. On my desktop it's the 9th largest library at almost 4Mb, comes just before libkhtml and twice the size of libc. But library size is not all I can say against it - it adheres to the standard approach of DOM (tons of object, lots of memory) or SAX (i.e. have to manually handle each event in the code which uses SAX). There are a few newer approaches to parse XML files, there is a pretty good list at http://en.wikipedia.org/wiki/Xml_parser#Processing_XML_files The one that appeals the most to me is "Data Binding" (http://en.wikipedia.org/wiki/Xml_parser#Data_binding), i.e., as Shachar describes below - it's based on a program which reads the schema and builds code (in my case, C++ class) which reads files of this specific schema, its objects are strongly-typed in-memory representations of the data in the XML file and provide convenient accessors. Presumebly, because these classes are schema-specific, they can cut a lot of checks for irrelevant execution paths. If you ever wrote XDR/RPC stuff (I'm talking about the stuff the NFS and friends uses for network-level representation) then it might be something similar - it used to have a program to convert language independent data representation to various language-specific implementations of classes to marshal and demarshal data (only I forgot the name of the XDR compiler right now). The snag about Data Binding is that all the implementations I found so far are either for Java or Proprietary and cost a fortune (thousands of dollars per developer seat, where you have to buy a license for every developer who links his code with the output of the programs). Ah - and our final programs (the ones we ship to customers) have to support all sorts of UNIX variants, and Windows, not just Linux. The only one which keeps our hopes alive is xmlbeanscxx (http://xmlbeansxx.touk.pl/). I'm struggling with getting it to compile and run for now. Another one is CodeSynthesis XSD (http://www.codesynthesis.com/products/xsd/), it's GPL so we we can't link it with our proprietary code. Here is a pretty complete list of XML Data Binding resources, almost all options for C/C++ are commercial: http://www.rpbourret.com/xml/XMLDataBinding.htm Thanks again for everyone's input. --Amos ================================================================= To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]