On Nov 16, 2006, at 7:25 AM, Fredrik Lundh wrote: >> If I'm wrong, just chalk it up to the fact that this is the first >> time I've ever looked at the Infoset spec, and I'm simply confused. > > the Infoset spec *is* the essence of XML; if you don't realize that an > XML document is just a serialization of a very simple data model, > you're > bound to be fighting with XML all the time.
The principle and the practice diverge significantly in our neck of the woods. The current project involves consuming and making sense of extraordinarily (and typically unnecessarily) complex XHTML. Of course, as you say, those documents are still serializations of a simple data model, but the types of manipulations we do happen to butt up very uncomfortably with the way ET does things. > but ET doesn't implement the Infoset spec as it is, of course: it > uses a > *simplified* model, carefully optimized for the large percentage of > all > XML formats that simply doesn't use mixed content. if you're doing > document-style processing, you sometimes need to add an extra > assignment > or two, but unless you're doing *only* document-style processing, ET's > API gives you a net win. (and even if you're doing only document- > style > processing, ET's speed and memory footprint gives you a net win over > most competing technologies). Yeah, documents are all we do -- XML just happens to be a pleasant intermediate format, and something we need to consume. The notion of an nicely-formatted XML is entirely foreign to the work that we do -- in fact, our current focus is (in part) dragging decidedly unstructured data out of those XHTML documents (among other source formats) and putting them into a reasonable, useful structure. I took some time last night to bang out some functions that squeezed ET's model (via lxml) into doing what we need, and it ended up requiring a lot more B&D than I like. At that point, I swung over to 4suite, which dropped into place quite nicely. *shrug* I guess we're just in the minority with regard to our API requirements -- we happen to live in the corner cases. I'm certainly glad to have made the detour on a different path for a bit though. - Chas -- http://mail.python.org/mailman/listinfo/python-list