Re: Fun file

John Weiss Thu, 07 Oct 2004 21:01:31 -0700

I like.  It's a good proof-of-concept.

As I've worked heavily with XML in the past, and am presently
unemployed (and therefore in need of a project to keep my skills
sharp), I'd love to lend a hand with this, Lars.



My first suggestion would be to figure out the DTD.  At my last job, I
worked with a XML format that had been designed bass-ackwards.  They
implemented the code first, then tried to create a DTD from what the
code generated.  The result was a bug-ridden, untestable nightmare.

Once the DTD exists, you've done the hard work:  you've specified what
the LyX XML-based file format must be capable of.  Implementation then
follows suit logically and easily.

My second suggestion involves judicious use of both character entities
and XML namespaces.  Now, as I don't know how LyX works internally, I
can't make any useful suggestions yet.  However, I'd go with the
following rule of thumb:

- If the LyX kernel treats something in a character-like fashion, go
  with entities.

  Example:  Say that the LyX file "command" for a non-breaking space
  is translated into a character, and that said character is then
  translated to something else when the buffer exporter code
  encounters that character.  Using the '&nbsp;' XML entity-reference
  does this for you neatly.  And, it works with any special character.

- If the LyX kernel treats a command as an atomic token, better to
  define it as such using XML namespaces instead of attributes.

  Example:  Suppose a non-breaking space ... and all of the other
  "special chars" ... is treated as a command, not a character.
  Semantically, it makes no sense to have a "<special_char />" tag and
  pass the name of the char as an attribute.  It's also harder to deal
  with in the DOM.  Better to create a "special_char" namespace and
  define one tag per char:  "<special_char:nonbreaking_space/>",
  "<special_char:ellipses>", etc.  The XML parser will create a
  separate token for each of these, which is what you're doing under
  the hood, anyhow.  (Well, at least the way I've constructed the
  example, that's what you're doing.)

- Reserve attributes for mutable aspects of the XML tags.

  Example #1:  '<paragraph style="Part">'
  We can't create a tag named 'part' in a 'paragraph' namespace
  because that tag is only valid in certain types of document.

  Example #2:  '<font:em>'  '<font:other weight="bold">'
  In this example, I've defined a 'font' namespace, which contains
  tags for standard fonts (like 'em'), and one additional tag,
  'other', which specifies the font via attributes.

  Example #3:  '<minipage width="20em">'
  Here, the command always parses to a single, known token.  The
  attributes serve to "pass parameters" to the command.



Note that these are just guidelines, not recommendations.  I say that
we write the DTD first, figuring out what we need.  Then, above
certain definitions, put comments labelled "REVIEW:" followed by
"maybe use entities?" or "maybe put in namespace?" etc.  We can then
review the completed DTD and figure out the cleanest format.


-- 
John Weiss

Re: Fun file

Reply via email to