John Weiss <[EMAIL PROTECTED]> writes:

| I like.  It's a good proof-of-concept.
>
| As I've worked heavily with XML in the past, and am presently
| unemployed (and therefore in need of a project to keep my skills
| sharp), I'd love to lend a hand with this, Lars.

Help would certainly be welcome.

Note that the work so far has been utterly trivial (in almost all
cases). The hard part is the parsing and to build up the internal
structure again.

I'd really like to use boost::spirit for this work, and if that fails
use libxml (which I have so far only used from php, but it shold be
quite easy to make it do the job.)

As to the DTD, I don't want to create that right away. Currently we
have a, as you say, a proof-of-concecpt. I'd like to fiddle with this
a bit, try to make the XML look the way we want it. Best practice and
so forth.

When we have that, then I'd like us to put the DTD down.

So already now I'd like concrete suggestions on changes to the
generated XML on how to improve it, make it more flexible etc.

| My first suggestion would be to figure out the DTD.  At my last job, I
| worked with a XML format that had been designed bass-ackwards.  They
| implemented the code first, then tried to create a DTD from what the
| code generated.  The result was a bug-ridden, untestable nightmare.
>
| Once the DTD exists, you've done the hard work:  you've specified what
| the LyX XML-based file format must be capable of.  Implementation then
| follows suit logically and easily.

Note that we already have an internal structure, that implicitly
defines much of the DTD. And as I say, I'd like to fiddle with the
current XML and improve that before creating the DTD.

| My second suggestion involves judicious use of both character entities
| and XML namespaces.  Now, as I don't know how LyX works internally, I
| can't make any useful suggestions yet.  However, I'd go with the
| following rule of thumb:
>
| - If the LyX kernel treats something in a character-like fashion, go
|   with entities.
>
|   Example:  Say that the LyX file "command" for a non-breaking space
|   is translated into a character, and that said character is then
|   translated to something else when the buffer exporter code
|   encounters that character.  Using the '&nbsp;' XML entity-reference
|   does this for you neatly.  And, it works with any special character.

But why is entities better than f.ex. <nbsp/>?

| - If the LyX kernel treats a command as an atomic token, better to
|   define it as such using XML namespaces instead of attributes.
>
|   Example:  Suppose a non-breaking space ... and all of the other
|   "special chars" ... is treated as a command, not a character.
|   Semantically, it makes no sense to have a "<special_char />" tag and
|   pass the name of the char as an attribute.  It's also harder to deal
|   with in the DOM.  Better to create a "special_char" namespace and
|   define one tag per char:  "<special_char:nonbreaking_space/>",
|   "<special_char:ellipses>", etc.  The XML parser will create a
|   separate token for each of these, which is what you're doing under
|   the hood, anyhow.  (Well, at least the way I've constructed the
|   example, that's what you're doing.)

The only problem I see is that we would like to change the DTD as
seldom as possible. So having separate tags for the special-chars
might be bad. So a <special-char type=""> might be best after all?

| - Reserve attributes for mutable aspects of the XML tags.
>
|   Example #1:  '<paragraph style="Part">'
|   We can't create a tag named 'part' in a 'paragraph' namespace
|   because that tag is only valid in certain types of document.
>
|   Example #2:  '<font:em>'  '<font:other weight="bold">'
|   In this example, I've defined a 'font' namespace, which contains
|   tags for standard fonts (like 'em'), and one additional tag,
|   'other', which specifies the font via attributes.

I am not sure about the font stuff... almost better to have just a
font tags with lots of attributes.

concrete suggestions for the Fun File is welcome.
(but the suggestions must be possible to implement as well...)

I'll post the patch shortly.

-- 
        Lgb

Reply via email to