On Fri, Oct 08, 2004 at 03:01:18PM +0200, Lars Gullik Bjønnes wrote:
> 
> Note that we already have an internal structure, that implicitly
> defines much of the DTD.

...making the definition of the DTD very easy.  After that, it's just
a matter of determining how to fiddle with things...


> | My second suggestion involves judicious use of both character entities
> | and XML namespaces. 
    :
    :
> | - If the LyX kernel treats something in a character-like fashion, go
> |   with entities.
> 
> But why is entities better than f.ex. <nbsp/>?

Because it's part of the text, and is treated as such by your XML
parser.

Remember:  XML tags break up the flow of text.  So: "<paragraph>In the
following example, there are <em>two</em> objects, and 'em' object
contained in a 'paragraph' object.</paragraph>"  Every XML parser I
know of will create two objects for that example sentence.

When you use character entities, e.g. "&nbsp;", the entities are
treated no differently than any other character.  I.e. the XML parser
just puts them into the text contained by the paragraph/font
style/inset/whatever object.

The point of the suggestion is this:  Treat XML tags as separate
objects.  When something is semantically a character, not a separate
type of object, in the LyX core, favor character entities.

Was I clearer this time?

> | - If the LyX kernel treats a command as an atomic token, better to
> |   define it as such using XML namespaces instead of attributes.
> 
> The only problem I see is that we would like to change the DTD as
> seldom as possible. So having separate tags for the special-chars
> might be bad. So a <special-char type=""> might be best after all?

Okay ... you're thinking like a C++ programmer here, Lars.  Which is
good ... when writing C++.

When I learned SQL, my first major hurdle was to stop thinking like a
C++ developer.  It inhibited thinking like a SQL developer.  Same goes
here for XML.


While the DTD/XSchema for an XML format is analgous to a C++ header,
the analogy breaks down when it comes to "adding things."  See
explanation below.


> | - Reserve attributes for mutable aspects of the XML tags.
    :
    :
> I am not sure about the font stuff... almost better to have just a
> font tags with lots of attributes.

You got lost in the example and missed the point.

In XML, attributes are analogous to public data members in a C++
class.

The XML tags are akin to ... well, I *was* going to say a C++ class,
but that's not really true.  The analogy breaks down, for XML tags are
semantic critters, governed by context.  In contrast, a C++ class
behaves the same, regardless where or when you use it.

Now, in XML, an attribute is to a tag what, in C++, a data
member is to an instance of a class.  Consequently all XML parsers
treat attributes as "just some extra text."  All of the action takes
place on the tags.  XML parsers are designed to "do something" when
they see tags, with different actions for different tags.

Making everything an attribute will actually make the parsing HARDER.
It's like writing a class heirarchy with a "typecode enum" and using
switch-statements all over the place instead of using virtual member
functions.

Here's another analogy:
    XML attribute <==> double
    XML tag       <==> enum
                       (or, more like the illicit love-child of "enum"
                        and "class")
An XML tag has fixed value and fixed semantics.  An XML attribute has
an infinite-sized set of values to draw from.

The value of an XML attribute IS IRRELEVANT to the document structure.


This brings me, at last, to how this analogy to C++ breaks down:
DTD/XSchema "maintenance".  Obviously any DTD/XSchema maintenance
requiring *deletion* or *modification* is a problem, and breaks
backward-compatibility.  But addition ... addition to a DTD/XSchema is
not only backward-compatible, but often utterly *irrelevant* to all of
your existing documents.

By creating the core of your DTD/XSchema, you've defined all of your
required tags, by definition.  Anything else is an optional tag (it
doesn't need to be there).  Let's say that we need to add a new
special character, "wynn".  Using the "everything's an attribute"
model, you'd need no change to the DTD, true.  You'd need, instead, to
add extra code to parse the new possible value of the "<special-char
type='...'>" attribute.  And, of course, you'd need to add the code to
handle the "wynn" when you see one.

Consider, instead, the case where there's a separate tag in a
"special-char" XML namespace.  You still need to add the code to
handle the "wynn", of course.  However, your parse changes become
trivial:  the tag, "<special-char:wynn/>" needs no new code to parse
it (since the DTD tells the XML parser how to do that).  You just use
your wynn-handling-code as your "action to take for the
<special-char:wynn/> tag.  Finally, you need to add
"<special-char:wynn/>" as an optional tag in your DTD/XSchema.  This
has no impact on existing docs.

Let me repeat that:  it has NO IMPACT on any existing LyX docs.

Why?  Consider:  it's an optional tag.  Therefore, all pre-existing
LyX docs are equivalent to a document that uses the new DTD but
contains no "<special-char:wynn/>" tags.  Granted, older versions of
LyX won't read docs from the new DTD, but that's to be expected.  The
older versions of LyX, after all, have no idea what a "wynn" is.


In summary:

- You break nothing in older documents.
- You require no code modifications that you weren't already going to
  make.
- You use all of XML's native parsing facilities to do the
  heavy-lifting for you.

-- 
John Weiss

Reply via email to