Re: Future of Apache wave [Was: Re: Advantages of P2P messaging?]

Sam Nelson Thu, 13 Jun 2013 13:08:10 -0700

Hi Michael,

I'm trying to wrap my head around this too.
Say you have some JSON object:
{
  "i" : 5
  "s" : "string"
  "c" : { "i" : 2 }
  "a" : [ { "i" : 3 } ]
}

What would the parameters be to delete "s" since a path is reallyrequired isn't it, rather than an index? (i.e. parameters are specificto the type they operate on) And further, what would a delete operationdo in this case? remove the "s" member of the object, or just set itsvalue to null? That decision could be application implementationspecific, sure, but if the application needed both concepts, how can younow define two abstract delete operations, in order for the applicationto implement them both for each case?


-Sam



On 14/06/2013 07:45, Michael MacFadden wrote:

Joseph,

We are almost in sync now.  Lets go one step further.  Let's so you were
designing an application to be a rich text editor.  Forget OT, you just
making an editor.  I assume your editor has to have some sort of model
right?  Let's temporarily forget the persistence format.  You may save the
rich text to xml, or rtf, or whatever, but I am not worried about that.  I
am saying what is the in memory model that your editor uses to interact
with the document?  Build that.  Build it any way you like.

Ok so now you have a rich text object model.  Your editor is going to
interact with that though some sort of object model API.  When the user
selects some text and presses the bold button, the editor makes some API
call to the model and says, make this part bold.  For the sake of
conversation, I don't care how that internally happens in the object data
model.

OK.  So now if we have a sufficiently powerful OT operation set can
describe manipulating objects, we can manipulate the object model with OT.
  Really what OT services are, are robust message busses that describe how
one user is changing the objects to another user, and accounting for
context transformations along the way.  So if you can build an abstract OT
operation set that lets you mess with objects and objects structures, then
you have a shot at then adapting that operation set to a whole slew of
applications.

This is actually an ongoing area of research, that I presented a paper on
to the collaborative editing workshop at the ACM CSCW conference last year.

~Michael

On 6/13/13 8:34 PM, "Joseph Gentle" <jose...@gmail.com> wrote:

Interesting...

The abstraction I use is to have a bunch of data types. Each data type
defines what documents look like, what operations look like and they
define a set of OT functions (transform, compose, apply, etc). Eg,
Text documents are strings and their operations are lists of {skip:5},
{insert:'hi'}, {delete:10}, etc. JSON documents are JSON and their
operations are lists of path+what to do there. Eg, [{path: ['hi'],
delete list element 5}, ...]

It sounds like you're saying we should abstract over the ideas of
ot-for-lists, ot-for-sets and so on. Is that right?

... But rich text isn't quite a list or a set. You can make annotation
markers or something, but then they take up space. Maybe its possible
to ignore the final document space that an annotation takes up for the
purpose of transformation?

Another architecture I've thought about using is making all documents
use the JSON OT code. Specialized type like rich text can exist as
leaves in the JSON structure - and let you embed a rich text operation
inside a JSON operation.

-J


On Thu, Jun 13, 2013 at 12:05 PM, Michael MacFadden
<michael.macfad...@gmail.com> wrote:

As a follow up.  The reason you are struggling with the concept is that
you have tied the operation language directly to a specific data model,
in
much the way wave did.  They created a conversation model and a specific
set of operations that act on that model.  When you do that your
operations a making assumptions on how the object model works.  This
coupling is not a good idea.  Much of the OT community strongly
recommends
avoiding this.

Rather great a generic set of operations that manipulate things in an
abstract way, and then let the application sort out what to do with the
operations when it receives it.  The OT stack only needs to understand
how
the parameters of the operations interact; such as positional arguments
for insert and delete style operations.  The OT Stack doesn't need to
know
that the thing you are inserting is a character, a contact card, a
database record, or an object in a list.  It doesn't care.  It just
knows
that if one insert happens before another it has to increment the index
of
the second operation.

If things are decoupled in this way, the whole OT stack becomes much
more
flexible.  As one of the founders of OT says almost every time I see
him,
"Let OT focus on what it is good at, and let it ignore everything else".

~Michael

On 6/13/13 7:54 PM, "Joseph Gentle" <jose...@gmail.com> wrote:

So you're imagining storing rich text like this?

{doc: 'hi there!', annotations: [{from:0, to:2, bold:true}]} or
something?

Every change to the document is going to need to manually update every
single annotation which has start / end points after the edit. But it
wouldn't work - if you insert some text and I edit an annotation later
in the document, my annotation will float forwards / backwards when I
get your op because I don't know how I should change it.

This idea comes up about every 6 months on the sharejs mailing list.
Several solutions have been proposed, but none of them work correctly.
I think we just need a separate set of transform / apply / ...
functions for rich text.

-J


On Thu, Jun 13, 2013 at 1:19 AM, Michael MacFadden
<michael.macfad...@gmail.com> wrote:

Joseph,

I disagree.  The annotations themselves are just another data
structure.
You add them, remove them and modify them like anything else.  You can
manage annotations as another structure within the blip model.  There
is
no reason why you can interface them though a JSON Style operations
structure.

~Michael

On 6/13/13 12:11 AM, "Joseph Gentle" <jose...@gmail.com> wrote:

The conversation *model* yes, but not the rich text documents
themselves. You can't really make text annotations work properly on
top of JSON operations. We should keep something like the current
system for actual blips.

-J


On Wed, Jun 12, 2013 at 4:06 PM, Michael MacFadden
<michael.macfad...@gmail.com> wrote:

Actually I just went and took a look at your operations.  The JSON
OT
type
is probably the closest to what I would suggest we use.  JSON
Objects
are
not just for javascript.  They define arbitrary objects structures.
We
don't need a specific wave XML type, we could use the JSNO
operations
to
modify the conversation model

Potentially.


On 6/12/13 10:55 PM, "Joseph Gentle" <jose...@gmail.com> wrote:

Really?

My method for ShareJS was to simply have a JSON OT type and a
plaintext OT type. I'd like to add a rich text OT type as well. Then
people can just pick which one based on what kind of data they have.

For Wave I'd like to be able to do something similar - JSON is
obviously useful for storing application data. It'd be nice to have
some sort of hybrid for wavelets where we can put multiple different
kinds of data inside a wavelet. One option is to use a JSON OT type
as
the root of all wavelets and support subdocuments at arbitrary paths
(so the object could be:
{projectName:"ruby on rails", files:[{name:'foo/bar.rb', ...}],
documentation:{_type:richtext, _data:"<Rich text data>"}}

Or wavelets could simply each have a type (defaulting to the current
wavey XML type).

-J


On Wed, Jun 12, 2013 at 2:41 PM, Michael MacFadden
<michael.macfad...@gmail.com> wrote:

You have stumbled upon one of the weaknesses of wave OT.  Best
practices
would say to NOT bind your OT directly to the data type, because
then
you
don't have an extendable model. For example if you have all of
your
operations figured out and validated, and then you need to change
your
data model, you have to go back and mess with your transformation
functions.  Not good.  Or you have to try to bend new data models
in
to
the existing one, also not good.

Best practice is to create a generic OT model and operate on that.
There
is debate as to what the model should be, but most agree on the
concept.

For example in wave they tried to create a map like collection
that
OT
could operate on. Essentially though that had to implement the map
as
if
its underlying model was a bunch of XMLish type tags.  This we
very
convoluted.

~Michael

On 6/12/13 10:26 PM, "Joseph Gentle" <jose...@gmail.com> wrote:

Yeah exactly. The google wave OT code uses special operations that
can
understand the XML structure. It doesn't just edit the plaintext.
Formatting annotations are stored in a special way - operations
can
say something like "At position 10 add bold. At position 20 stop
adding bold".

-J

On Wed, Jun 12, 2013 at 1:56 PM, Bruno Gonzalez (aka stenyak)
<sten...@gmail.com> wrote:

I suspected something like that. I assume it also correctly
handles
variable-length UTF8 characters, so it's not necessarily 1-byte
patches?

This starts to make sense. OT can only compute conflict-free
merges
using
the "character" primitive (because that's how Wave was
originally
designed). As an unfortunate consequence, you can then only
OT-operate
on
plain text. Otherwise you could get conflict-free xml text that
<loo<ks
li<>ke>this>, and that of course isn't legal xml.
But we still want rich text in Google Wave, therefore all the
formatting
stuff is stored some place else, specifically in the blip
annotations.
The
modifications to annotations are (sometimes) simply derived from
the
transformations that the plain text suffers after merges?

I suppose there could be other OT algorithms that don't use a
"character"
primitive, but rather an "xml tag" primitive, a json item, a
"pixel",
or
anything else, right?

(sorry for only contributing with questions... :-)


On Wed, Jun 12, 2013 at 10:27 PM, Joseph Gentle
<jose...@gmail.com>
wrote:

On Wed, Jun 12, 2013 at 12:13 PM, Bruno Gonzalez (aka stenyak)
<sten...@gmail.com> wrote:

My assumption was that conflicts were simply mathematically

inevitable
in a

DVCSs, that's why your mention about lack of conflict markers

sparked my

interest... you mention conflicts like they can be optional?

If
so,
are

conflicts "eliminated" by choosing an arbitrary merging

strategy
when

conflicts *do* happen (e.g. "choose the last timestamped

patch
and
lose

information on the way, we don't care"), or can they be

prevented
from
ever

happening in the first place?

They're inevitable in patch based systems because patches
usually
have
a line level granularity. OT usually uses individual character
positions. In OT, if two operations both delete the same
character,
the character gets deleted once. If two clients insert a
character
at
the same position, one of the characters will be first in the
resultant document and one will be second. Conflict markers
just
aren't necessary.

-J

--
Saludos,
      Bruno González

_______________________________________________
Jabber: stenyak AT gmail.com
http://www.stenyak.com



--
Saludos,
      Bruno González

_______________________________________________
Jabber: stenyak AT gmail.com
http://www.stenyak.com

Re: Future of Apache wave [Was: Re: Advantages of P2P messaging?]

Reply via email to