On Sep 4, 7:54 pm, alex23 <[EMAIL PROTECTED]> wrote: > On Sep 4, 8:31 am, castironpi <[EMAIL PROTECTED]> wrote: > > > Any interest in pursuing/developing/working together on a mmaped-xml > > class? Faster, not readable in text editor. > > XML is text-based, so it should -always- be readable in a text editor. > It's part of the definition, I believe. > > However, an implementation of one of the alternative binary XML > formats would probably be very welcome. > > Fast Infoset:http://www.itu.int/rec/T-REC-X.891-200505-I/en > EXI:http://www.w3.org/TR/2007/WD-exi-20070716/ > > I don't know enough about either format to say if it would be > possible, but an implementation that conformed to the ElementTree API > could be a big win.
I was thinking something much less restrictive than the two links. Since it's not text, I'm not sure it event counts as structured markup. More generic, something like hierarchical 'tag-content-child' pairs. Here's what the xml.etree.ElementTree API says: Each element has a number of properties associated with it: - a tag which is a string identifying what kind of data this element represents (the element type, in other words). - a number of attributes, stored in a Python dictionary. - a text string. - an optional tail string. - a number of child elements, stored in a Python sequence Since all of these would be buffer-based representations, the attribute list would merely implement the mapping-object protocol, not be in a true dictionary. The strings would be stored as offsets to length-prefixed buffer segments. Each node would look roughly like: tag_offset, first_attr, text_offset, tail_offset, first_child, prev_sibling, next_sibling, parent Attributes would look like: key_offset, value_offset, prev_attr, next_attr, node These are all integers representing offsets elsewhere into the map. A short observation: >>> a= e.XML( '<a><b>abc</b></a>' ) >>> a.getchildren()[0].text 'abc' >>> a.getchildren()[0].text= 'ab<' >>> e.tostring(a) '<a><b>ab<</b></a>' >>> e.XML(_) <Element a at c2c3f0> >>> _.getchildren()[0].text 'ab<' The current implementation supports round trips between special characters '<' and markup '<', which I propose to support as well. Of course, you'd have to garbage collect removed nodes by hand, on any deletions. Also, poss. change subject to: ElementTree + mmap cross. -- http://mail.python.org/mailman/listinfo/python-list