On Fri, Oct 08, 2004 at 03:01:18PM +0200, Lars Gullik Bjønnes wrote: > > Note that we already have an internal structure, that implicitly > defines much of the DTD.
...making the definition of the DTD very easy. After that, it's just a matter of determining how to fiddle with things... > | My second suggestion involves judicious use of both character entities > | and XML namespaces. : : > | - If the LyX kernel treats something in a character-like fashion, go > | with entities. > > But why is entities better than f.ex. <nbsp/>? Because it's part of the text, and is treated as such by your XML parser. Remember: XML tags break up the flow of text. So: "<paragraph>In the following example, there are <em>two</em> objects, and 'em' object contained in a 'paragraph' object.</paragraph>" Every XML parser I know of will create two objects for that example sentence. When you use character entities, e.g. " ", the entities are treated no differently than any other character. I.e. the XML parser just puts them into the text contained by the paragraph/font style/inset/whatever object. The point of the suggestion is this: Treat XML tags as separate objects. When something is semantically a character, not a separate type of object, in the LyX core, favor character entities. Was I clearer this time? > | - If the LyX kernel treats a command as an atomic token, better to > | define it as such using XML namespaces instead of attributes. > > The only problem I see is that we would like to change the DTD as > seldom as possible. So having separate tags for the special-chars > might be bad. So a <special-char type=""> might be best after all? Okay ... you're thinking like a C++ programmer here, Lars. Which is good ... when writing C++. When I learned SQL, my first major hurdle was to stop thinking like a C++ developer. It inhibited thinking like a SQL developer. Same goes here for XML. While the DTD/XSchema for an XML format is analgous to a C++ header, the analogy breaks down when it comes to "adding things." See explanation below. > | - Reserve attributes for mutable aspects of the XML tags. : : > I am not sure about the font stuff... almost better to have just a > font tags with lots of attributes. You got lost in the example and missed the point. In XML, attributes are analogous to public data members in a C++ class. The XML tags are akin to ... well, I *was* going to say a C++ class, but that's not really true. The analogy breaks down, for XML tags are semantic critters, governed by context. In contrast, a C++ class behaves the same, regardless where or when you use it. Now, in XML, an attribute is to a tag what, in C++, a data member is to an instance of a class. Consequently all XML parsers treat attributes as "just some extra text." All of the action takes place on the tags. XML parsers are designed to "do something" when they see tags, with different actions for different tags. Making everything an attribute will actually make the parsing HARDER. It's like writing a class heirarchy with a "typecode enum" and using switch-statements all over the place instead of using virtual member functions. Here's another analogy: XML attribute <==> double XML tag <==> enum (or, more like the illicit love-child of "enum" and "class") An XML tag has fixed value and fixed semantics. An XML attribute has an infinite-sized set of values to draw from. The value of an XML attribute IS IRRELEVANT to the document structure. This brings me, at last, to how this analogy to C++ breaks down: DTD/XSchema "maintenance". Obviously any DTD/XSchema maintenance requiring *deletion* or *modification* is a problem, and breaks backward-compatibility. But addition ... addition to a DTD/XSchema is not only backward-compatible, but often utterly *irrelevant* to all of your existing documents. By creating the core of your DTD/XSchema, you've defined all of your required tags, by definition. Anything else is an optional tag (it doesn't need to be there). Let's say that we need to add a new special character, "wynn". Using the "everything's an attribute" model, you'd need no change to the DTD, true. You'd need, instead, to add extra code to parse the new possible value of the "<special-char type='...'>" attribute. And, of course, you'd need to add the code to handle the "wynn" when you see one. Consider, instead, the case where there's a separate tag in a "special-char" XML namespace. You still need to add the code to handle the "wynn", of course. However, your parse changes become trivial: the tag, "<special-char:wynn/>" needs no new code to parse it (since the DTD tells the XML parser how to do that). You just use your wynn-handling-code as your "action to take for the <special-char:wynn/> tag. Finally, you need to add "<special-char:wynn/>" as an optional tag in your DTD/XSchema. This has no impact on existing docs. Let me repeat that: it has NO IMPACT on any existing LyX docs. Why? Consider: it's an optional tag. Therefore, all pre-existing LyX docs are equivalent to a document that uses the new DTD but contains no "<special-char:wynn/>" tags. Granted, older versions of LyX won't read docs from the new DTD, but that's to be expected. The older versions of LyX, after all, have no idea what a "wynn" is. In summary: - You break nothing in older documents. - You require no code modifications that you weren't already going to make. - You use all of XML's native parsing facilities to do the heavy-lifting for you. -- John Weiss