I like. It's a good proof-of-concept. As I've worked heavily with XML in the past, and am presently unemployed (and therefore in need of a project to keep my skills sharp), I'd love to lend a hand with this, Lars.
My first suggestion would be to figure out the DTD. At my last job, I worked with a XML format that had been designed bass-ackwards. They implemented the code first, then tried to create a DTD from what the code generated. The result was a bug-ridden, untestable nightmare. Once the DTD exists, you've done the hard work: you've specified what the LyX XML-based file format must be capable of. Implementation then follows suit logically and easily. My second suggestion involves judicious use of both character entities and XML namespaces. Now, as I don't know how LyX works internally, I can't make any useful suggestions yet. However, I'd go with the following rule of thumb: - If the LyX kernel treats something in a character-like fashion, go with entities. Example: Say that the LyX file "command" for a non-breaking space is translated into a character, and that said character is then translated to something else when the buffer exporter code encounters that character. Using the ' ' XML entity-reference does this for you neatly. And, it works with any special character. - If the LyX kernel treats a command as an atomic token, better to define it as such using XML namespaces instead of attributes. Example: Suppose a non-breaking space ... and all of the other "special chars" ... is treated as a command, not a character. Semantically, it makes no sense to have a "<special_char />" tag and pass the name of the char as an attribute. It's also harder to deal with in the DOM. Better to create a "special_char" namespace and define one tag per char: "<special_char:nonbreaking_space/>", "<special_char:ellipses>", etc. The XML parser will create a separate token for each of these, which is what you're doing under the hood, anyhow. (Well, at least the way I've constructed the example, that's what you're doing.) - Reserve attributes for mutable aspects of the XML tags. Example #1: '<paragraph style="Part">' We can't create a tag named 'part' in a 'paragraph' namespace because that tag is only valid in certain types of document. Example #2: '<font:em>' '<font:other weight="bold">' In this example, I've defined a 'font' namespace, which contains tags for standard fonts (like 'em'), and one additional tag, 'other', which specifies the font via attributes. Example #3: '<minipage width="20em">' Here, the command always parses to a single, known token. The attributes serve to "pass parameters" to the command. Note that these are just guidelines, not recommendations. I say that we write the DTD first, figuring out what we need. Then, above certain definitions, put comments labelled "REVIEW:" followed by "maybe use entities?" or "maybe put in namespace?" etc. We can then review the completed DTD and figure out the cleanest format. -- John Weiss