On Mon, Mar 12, 2012 at 1:26 PM, Eike Rathke <er...@redhat.com> wrote: > Hi Riccardo, > > On Thursday, 2012-03-08 10:39:47 +0000, Michael Meeks wrote: > >> > For several reasons (let me skip them), at my University we are >> > thinking about starting a project involving ODF and we would like to >> > know if there is already something similar to what we would like to >> > produce and it the LibreOffice community could have some interest in >> > it. >> >> Wonderful - there is already some similar work underway that Eike is >> looking into, it would be great to have you work with him. > > Right, let's coordinate things a bit. Seems this topic comes up more > frequently recently, we should really avoid conflicting approaches. > > >> > In our vision, each user has control about a "section" of the text >> > (for simplicity, we are aiming mainly to text documents) and every >> > changes made to the document are propagated to all the users currently >> > on-line (with an experience similar to "Google Docs"). >> >> Right, >> >> > The difference with Google Docs is that the document is not in some >> > fuzzy "cloud," but on the user's disk and the user can edit it while >> > off-line, encrypt it, ... If the user does some changes while off-line, >> > the other copies will receive the updates as soon as the user returns >> > on-line. Different copies will exchange updates in a peer-to-peer >> > fashion, without the need of a centralized repository (the bazaar/git >> > flavor). >> >> Ok - so, (I hope) our focus first would be the on-line co-editing, and >> then use/fall-back to (and improve) the document merging / comparison >> functionality to do on-line/off-line merges. > > That's what I had in mind as well. My approach would be to use > a change-tracking enabled document, because (besides that it gives you > the benefit of being able to display who changed what when) then > actually only trackable (content, not attributes) features are enabled, > and the current file based collaboration (aka shared document) uses this > mode as well, as does the compare/merge document feature. It talso > provides functionality to accept/reject changes. Note that I'm Calc > biased here, Writer doesn't have the shared document feature yet, though > it does have change-tracking. > > >> > 1. Are you aware if this type of capability is already available (I >> > do not think so) or currently developed? >> >> There is work underway, to bootstrap this via instant messaging, and >> particularly the Telepathy framework - it would be great to make that >> more public / visible and get more hands onto playing with it. Eike - do >> you have something that could go into a feature branch ? :-) > > I can commit the remainders of what I have (threw away the first > unpromising approach) to a feature branch this week. > > >> > 3. Do you have some general suggestions for us? Especially about >> > interfacing the rest of the developers. >> >> So - first, talk to Eike (preferably CC'ing the list here). Second - >> here is what I was trying to persuade Eike was a sensible way of doing >> it (which he's prolly detected as insane already ;-). > > Actually not that much ;-) > >> Please bear in >> mind we're starting with calc here ... >> >> Here are my thoughts: >> >> * It doesn't matter what you do to the document, as long as >> everyone's document does the same thing. >> >> * Thus - whatever protocol you use, it needs to enforce hard >> ordering, such that edits 'A1', 'B1', 'A2' 'C1' end up in >> the same order for A, B, and C regardless of latency / >> topology etc. > > This is absolutely a must, especially when it comes to edits that move > things in the document, such as inserting/deleting rows/columns or > moving cells. > >> * Jabber provides this guarentee :-) and a beautiful way of >> bootstrapping communication from an existing communication >> tool: telepathy/empathy/IM > > Yup, and a hard one to deal with.. > >> * Those edits need to do -exactly- the same thing, ie. we'd want >> the same major version of LibreOffice at each end. > > I'd rather version the collaboration feature, so each end can > announce/handshake on the minimum collaboration version required, > instead of tie it to the LibO version. > >> ** But ** - and here is where the work starts >> >> * We need to ensure that all edits to the document are not >> applied immediately, but described and dispatched to the >> Jabber server, and only the events returned are applied. >> >> * This means we need a -clean- Controller <-> Model split >> which we currently don't have ;-) -although- some things >> are really quite pleasant, eg. dialogs often tend not to be >> instant apply, and to collect up their changes into >> abstract SfxItemSets (PropertyBags to you and me) so with >> work we can tease out the controller perhaps. > > That would be a long run, but yes, at the end that's probably what we > want. > >> * And of course, some thinking of good ways of managing >> cursor locations, and transmitting other people's >> movement around documents to maintain sensible editing state >> is necessary. > > I don't think tracking cursor locations is needed. An edit action would > be transmitted as "at position (or range) so and so do this and that". > > Maybe locking a region to announce "I'm going to edit here" would come > handy to prevent clashes. > > In Calc, the ScDocFunc provides almost what's needed and is already used > by UI and API (not consistently, but to a great amount), feeding it from > edit actions as an intermediate layer should be possible. This again > made me think of reusing the existing API and serialize it through > online editing, not sure how far we could go there, but once the basics > were implemented we'd cover a great deal of functionality almost at no > cost. > > Eike > > -- > LibreOffice Calc developer. Number formatter stricken i18n transpositionizer. > GnuPG key 0x293C05FD : 997A 4C60 CE41 0149 0DB3 9E96 2F1A D073 293C 05FD
Michael, Eike sorry for the long silence. I wanted to write a thoughtful reply and finally I got a timeslot long enough (I am doing an 8-hours train trip. That should be plenty). Reading your mails, I understood that maybe you and I have different models in mind. Let me describe mine. Take a comfortable chair... :-) A first difference between my model and yours is that yours has a "Google doc" flavor where everyone can edit everything (although Eike suggests a region locking), while in my model different document regions are assigned to different editors and each editor can modify only his own region. Although this could seem less "elegant" than the Google approach, my personal experience (I often write documents [project proposals, papers, ...] in collaboration with others) is that usually you resort to some form of "informal locking," saying, for example, that you will take care of the introduction, Alice of the state of the art and Bob will draw the GANTT chart. So, maybe is more convenient to transform that informal locking into a true one, enforced by the editing software. This would also solve the problem of serializing the changes. In the locking model that I have in mind, the person that creates the document becomes the "owner" of the document. The whole document is covered by a single region and the editor of the region is the owner himself. Portions of regions can be "given" to other editors by the editor in charge of the region, but in an emergency case (say, the editor is sick) a region can be "taken away" by the document owner (i.e., the person who initially created the document). Note that this model semi-centralizes the changes to the region layout and this could make synchronization simpler. What if someone sees a mistake in a section different from his? A solution (very, very simple) could be to write an e-mail to the editor in charge... if we want something more sophisticated, we can allow anyone to add "proposed changes" to a non-owned section. I'm thinking something like to a comment added to a section. Since different comments do not interact each other, the problem of serializing the changes becomes much simpler. If we can make this like a special type of comment, we can allow the section owner to accept the suggested changes by just clicking on a button. (Please note that I do not anything about the internals of LO or ODF, so what I am suggesting could be almost impossible...) Maybe another important difference is that you are thinking about showing "in real time" to each editor the changes made by the other editors. Instead I am thinking about transferring the changes in a "batch" mode, not sending the "editing commands," but updates in the actual structure of the document. A rough description of what would happen is the following: * Each section has a version number that it is increased as the editor makes changes (when a new version number is created? I am not sure yet, maybe after a given number of changes, maybe after a given amount of time, maybe when the user saves the document or when the user explicitly requires a new version). * The editors' PCs form a network of node that communicate with something "multicast-like", in the sense that everything a node puts on the network it is received by every other node. I used "multicast-like" with quotes to emphasize that it is not necessary that the communication happens over a "true multicast" protocol, but it can use many other solutions such as multi-unicast, the use of centralized actor (Jabber?), an overlay-multicast protocol. * When the user goes on line, the PC tries to join the network of editors. If it succeeds, it sends over the network a description of the version numbers of the sections (e.g., "I have version 12 of Bob's section, version 42 of Alice's session, ...") * Suppose a node (say, Alice's PC) receives a section description with version number N_remote. Let N_local be the version number of the version on Alice's PC. Alice's editor acts as follows (a) If the N_local == N_remote, the node does nothing. (b) If the N_local < N_remote (i.e., the local version is older), the node sends over the network N_local (c) If the N_local > N_remote (i.e., the local version is newer), the node sends over the network "update data." Note that if Alice made some changes off-line, when she comes back on-line, the other editors will have an older version. So, when Alice broadcasts her version number, since the other nodes will have a smaller version number, they will execute (b) sending their own version numbers. When the replies of the other nodes will be received by Alice, she will be in case (c) and will begin sending updates to the group. An interesting consequence of this approach is the following: * Suppose Alice worked off-line and now her version has number 42, while Bob and Charlie still have version 40 of Alice's section. * Now Alice goes on-line and finds Bob (but not Charlie). By the above protocol, Alice sends update data to Bob until Bob has version 42 too. * Now Alice leaves and Charlie arrives. By the same protocol, Bob now is able to update Charlie with version 42. In other words, every document is able to update every other document. In this sense I say that this approach has a "git" (or "bazaar") flavor. Things can get a bit more complex if more than an old version is present. Consider the following case: * Suppose current Alice's section has version number 38 and that Bob and Charlie are up-to-date. * Suppose Charlie goes off-line, keeping version 38 of Alice's section. * Alice and Bob continue editing on-line. When Alice leaves her section (shared with Bob) has version 40. * Alice does some editing off-line and reaches version 42. * Alice goes on-line again, Bob joins her and Alice begins sending update data to go from version 40 to version 42. * Now Charlie arrives. Now Alice must send updates both to Bob (40-42) and Charlie (38-42). Note that Charlie cannot use the data sent to Bob until he reaches version 40 too. Also, since Charlie arrived late he lost some data sent to Bob and Alice will need to resend them too. An alternative solution that can make things simpler in the presence of different old versions is the following, based on an idea similar to fountain codes * Alice computes an hash of the content of her section (MD5 would be fine, since we do not use it for security) * Alice computes "linear combinations" of the content and distributes them to the other nodes * It is possible to show that as soon as a node receives "enough data" it can recover the current version of the section. "Enough data" here means, roughly, the minimum required amount of data plus a small overhead. It is also possible to show that both Bob and Charlie can use the same data, Charlie, having the oldest version, will just need to listen for data a bit longer. The advantage of this approach is that the same update data can be used by any node, independently on the owned version. Things here get complex and I would skip here, for now, over all the (quite gory :-) details. Finally, few words about the underneath structure, as I can imagine it (without knowing if it is compatible with LO internal structure). Let me do some ASCII-art ---------- / \ | Document | \__________/ ^ | v +------------+ User | Editing | commands +---------+ | "engine" |<-----------| GUI | +------------+ +---------+ ^ ^ | | | +-----------+ | | v v +----------+ +----------+ | Section | | Update | | Locker | | Manager | +----------+ +----------+ ^ ^ | | | +------------+ | +-->| Para |<--+ | Multicast | +------------+ ^ | v To the network A brief explanation of the blocks * The "Editing engine" is the software part that actually acts on the document, making changes to its content. In order to emphasize it, we shown that the "editing engine" takes request from the user via the GUI * The editing engine interfaces itself with two new blocks + The "section locker" takes care of the protocol used by the nodes to distribute the changes in the section structure (for example, when part of a section is assigned to a different editor). The section locker receives "section change requests" both from the user (and distribute them to the other nodes) and from the network (and move the requests to the engine). + The "update manager" takes care of the update protocol outlined above. It receives from the editing engine a "description" (more about this later) of the current content of the section and, if necessary, distributes updates to the other nodes. Moreover, dually, updates received from the network are communicated to the engine that will update the document accordingly. * Finally, the "para multicast" block shows to the other two blocks a multicast-like API, hiding to them how that multicast is achieved (e.g., multi-unicast, overlay multicast, true multicast, ...) Note that the engine decouples the document structure from the section locker and the update manager. The data format used by the engine to communicate contents to the update manager can be different from the actual document format. This could solve (or simplify) the issue with different software versions, since the only block that needs to understand the document format is the engine. I am aware that the definition of this "intermediate format" is not a banal issue, but it could prove an important tool. Note that all the nodes in the network must agree on this intermediate format since the exchanged updates will be relative to this intermediate format. Because of this, it is important that the format is sufficiently general for not requiring frequent updates. Also, compactness will be a plus. (BTW, nothing prevent us from using ODF, if it suits us). OK, I hope you are still with me and you did not fall (asleep) from your chair... ;-) This is the model that I had in mind, model that I thought without knowing anything about LO structure and just a very tiny bit about ODF format, so maybe there are some deep incompatibilities. I like it because of its simple structure and the "symmetry" between the nodes (the "git" flavor), but, of course, there is nothing sacred about it. Riccardo _______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice