It is important to understand that an XML DOM does not capture all of the
constraints and referential requirements within an ODF document. In
particular, content.xml does not have everything and there are references using
XLink (relative hrefs) and also special identifiers (not IDREFs) to other
files, whether for binary attachments or into other defined parts (styles.xml
and meta.xml for two).
There is also considerable internal structuring that is off-hierachy. Some of
the connections are via fragment IDs (xml:id) and IDREFs, others are by
identifiers (not IDs and IDREFs) that are introduced in the ODF specification
but which are not modelled in the Relax NG Schema (beyond saying they have
string values, for example).
This sort of thing also happens rather heavily in OOXML, where communication
among parts uses a unique cross-part relationship model. There are also many
cross references to named components by other than XML IDs and IDREFs, whether
or not the components and the references occur in the same part of the OPC
package.
One could continue the kind of hack that plants that information as benign
markers into an internal form of the XML parts (even as a single XML document,
although that is tricky when ODF documents are nested as subdocuments of
another), so long as they are replaced when the XML document is committed to a
saved ODF document file format.
In terms of having a DOM that maps to the external file form and a different
internal model, the only time that the internal model needs to update the
externally-oriented DOM is as part of a Save operation. There might be more
coupling, but performance and storage issues will doubtless impact the
engineering outcome, especially for handling large documents with alacrity.
Copy and paste and undo management will also be factors, along with maintaining
pagination, word counts, and such.
On the other hand, it is convenient (practically necessary) to specify the
semantics of ODF, or some profile of ODF, as if operations are on the format
itself, since it is only the format that is more-or-less well-specified. It
would be interesting to know how much this could be taken literally in an
application. I think there might be forensic tools on ODF documents that might
be able to operate that way. I'm not at all certain about production WYSIWYG
consumers and producers, especially ones implemented to harmonize between
OOXML, ODF and other interesting formats (EPUB coming to mind).
I will watch Peter Kelly's efforts with great interest to see how much the
boundaries can be moved in this area.
-- Dennis E. Hamilton
[email protected] +1-206-779-9430
https://keybase.io/orcmid PGP F96E 89FF D456 628A
X.509 certs used and requested for signed e-mail
----- Original Message ---
From: Peter Kelly [mailto:[email protected]]
Sent: Monday, August 4, 2014 01:27
To: [email protected]
Subject: Re: OOXML
On 4 Aug 2014, at 12:16 am, jan i <[email protected]> wrote:
[ ... ]
It's possible in theory, though I'm not familiar enough with the OO codebase to
say whether it would work in practice.
The key idea is to maintain two separate data structures - one which is the ODF
XML trees, and another which is the internal representation. Any time a change
gets made to the former, the implementation must update the latter to reflect
the change. Modification operations on the latter would need to go in the other
direction.
[ ... ]
In the case of UX Write, there's a few instances where I've used custom
extensions to handle certain things. The main ones are:
1. Table of contents/list of tables/list of figures.
When you insert one of these into your document, it inserts a <nav> element
with a CSS class name of "tableofcontents", "listoffigures", or "listoftables",
which were chosen as these are the same keywords that LaTeX uses for these
features. UX Write treats these as having special meaning, in the sense that
when opening a document (and when the document is modified), it updates the
content of these <nav> elements based on the set of all heading, figure, or
table elements in the document (including numbering/captions).
2. OOXML-specific features.
When converting from .docx to .html during the process of opening a document,
it assigns certain pre-defined CSS class names to particular types of HTML
elements to indicate their purpose. For example, a cross-reference whose
display format is supposed to include both the label and caption of a figure
will be translated as:
[ ... ]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]