Change tracking & versioning (was Re: OOXML)

Peter Kelly Sun, 03 Aug 2014 04:57:08 -0700

On 3 Aug 2014, at 3:05 am, Dennis E. Hamilton <dennis.hamil...@acm.org> wrote:


> In line with the sketch that Peter Kelley provides below, I am personally 
> very sympathetic to the idea of having an internal model that can tolerate 
> difference in format between input and output while preserving in the output 
> everything from the input format it can, even by leaving markers that will be 
> useful on future input of the produced form.  (There is a well-known case of 
> Microsoft Office doing this for HTML it exports, although the added 
> information for recovery of the MSO rendition led to many complaints about 
> document bloat.)
> 
> There are some conflicts between the desire to do this and the fact that some 
> alterations have non-local consequences and may have other effects.  I still 
> support the idea, but there are some tricky cases, including
> 
> - Changes that overlap/conflict with tracked changes but tracked changes are 
> not updated/preserved properly

I'm probably getting a bit off-topic here, but this issue is one of the reasons 
I advocate an approach that keeps change tracking information separate from the 
content itself, rather than part of it. In my mind, Git provides the perfect 
model for this, although integrating it (or something else based on a similar 
model) into a word processor or office suite remains, shall we say, a rather 
significant problem to solve, both in the sense of the theoretical model and 
how that would be exposed in a user interface.

By itself, keeping the change information separate wouldn't solve the problem 
of inconsistency when the file is modified by an implementation with no 
knowledge of change tracking information. However, with a data model based on 
that of a version control system, that is able to access the previous version 
of the file as well as the current one, find the differences between the two, 
and allow the user to apply those differences, this could be addressed.

Let's say, just as a mental exercise, that we were to embed a git repository 
directly within an ODF file. That is, the .odt file is a zip archive containing 
the usual content.xml, styles.xml etc and also has a .git directory inside it, 
which contains the complete revision history of all these separate files. When 
you save the document in an implementation that does not support any change 
tracking/versioning, it would just overwrite the XML files in the same way as a 
text editor writes a file to disk. When you save the document in an 
implementation that *does* support this however, it overwrites the files and 
*then* does a git commit.

With this approach, if you were to first create a file in implementation A 
which supports this versioning, you'd have a zip file with a git repository and 
one or more commits, and the "working copy" (that is, all the files within the 
zip archive outside of the .git directory) would be "clean" (up to date). If 
you then open and save it in implementation B which does not support 
versioning, it would not touch the repository and leave the .git directory in 
the zip file untouched, but instead save over the XML files. Then you open it 
in implementation A again, and you can see that the working directory is not 
clean, and there are outstanding changes. These could then be displayed in the 
editor in the same way as is done currently, without the user noticing any 
difference. And you'd have the benefits of knowing the derivation relationships 
between versions, so if you get two different versions of a document back that 
have the same ancestor, you could do a merge.

Now I'm not suggesting that actually storing a git repository inside a .odt 
archive would be a good way to go - partly for efficiency reasons (duplication 
of document's entire history in every copy), and partly because its format is 
pure binary, and is so vastly different from everything else in ODF. 
Nonetheless, at a theoretical level, the core idea - of storing a version 
history separate from the content, from which changes can automatically be 
detected without requiring any extensions to the core part of the standard 
itself - would I think be worth exploring.

I know this is quite a different approach to what you've previously been 
considering; what are your thoughts?

--
Dr. Peter M. Kelly
Founder, UX Productivity
pe...@uxproductivity.com
http://www.uxproductivity.com/
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

signature.asc
Description: Message signed with OpenPGP using GPGMail

Change tracking & versioning (was Re: OOXML)

Reply via email to