Joseph,

Notes inline below.

~Michael

On 6/19/13 11:22 AM, "Joseph Gentle" <jose...@gmail.com> wrote:

>I've given half a dozen talks about ShareJS over the last 3 years, and
>almost every time I give a talk, someone asks me whether you can use
>ShareJS in a peer-to-peer way instead of just through a single server.
>
>"You say it works like subversion. Can it work like Git?"
>"Can you have a document shared between multiple servers?"
>
>Sigh no. ShareJS & Wave's algorithms were invented in 1995. Back then,
>it was news when someone put up a new website.
>
>"What about Wave Federation?" Appropriately, it works like IRC, but
>using XML. Its complicated, its vulnerable to netsplits and its buggy.
>I guess its like IRC except it doesn't actually work.
>
>So lets fix that! Lets modernize wave and make it federate properly.
>On the way we have a great opportunity to make it simpler and cleaner.

Agree 100%

>
>
>To start, I want to build a generic P2P OT container. This is a simple
>wrapper that contains a set of OT documents and defines a network
>protocol for keeping them in sync. The container needs to be able to
>talk to another instance of itself running somewhere else and
>syncronize documents between the two instances.
>
>Thats all I want this container to do - it should be as lightweight as
>possible, so we can port it to lots of different languages and
>environments. I want that code running in websites, in giant server
>farms, in vim, and everywhere in between. It won't have any database
>code, network code, users or a user interface (though it'll need APIs
>to support all of that stuff). At its core it just does OT + protocol
>work to syncronize documents.

I strongly suggest we separate the OT Algorithms, the application level
protocol, and the network transport layer.

>
>What are the documents? Well, like ShareJS, I'd like to support
>multiple different kinds of data. We'll need to be able to support
>wave's conversation model, but I'd also like to support arbitrary
>JSON. Doing OT over arbitrary JSON structures would allow other
>applications to be built on top of wave, using wave as a data platform
>("Glorious messaging bus in the sky"). It'd also be super useful for
>gadgets and user data.
>
>There's three models I can imagine for what wavelets could look like:
>
>Option 1: All documents in the container have a unique name and a
>type. This is how ShareJS works. We could have a JSON type and a
>wavelet type. This is simple, but not particularly extensible (it
>makes it hard to embed JSON inside a conversation, and vice versa).
>
>Option 2: At the root of every document is a JSON object. Leaves in
>the JSON structure can be subdocuments, which could be rich text for
>blips, or any other type we think of down the road.
>
>Option 3: We make another layer, which can contain a set of documents.
>So, a wavelet could contain a JSON document describing the
>conversation structure, some rich text documents for blips and another
>JSON document containing gadget data or something. Access control
>rules are at the container level. This is (sort of) how wavelets work
>today.

If I had to choose, I would choose option #2.  However, I think there are
some design choices that we need to make before we answer this question
categorically.  I thought I would put that in a separate email since it
would be buried here.

>
>The OT itself I imagine building along the lines of Torben Weis's P2P
>OT theory that he made in Lightwave:
>https://code.google.com/p/lightwave/ . Briefly, every operation gets a
>hash (like git). We add tombstones to wave's OT type and remove
>invertability, so the transform function supports TP2. We also add a
>prune function (inverse transform) which allows the history list to be
>reordered (so you don't have to transform out on every site). The hard
>part is figuring out which operations to sync, and which operations
>need to be reordered. I'd like to go over the details with Michael
>MacFadden and anyone else who's interested - there may well be a
>better system that we should use instead. If there is, I'd like to
>know about it now.

I think this is another interesting area.  One thing to consider is that I
am not sure if the linear model that wave used is even the right option.
If we start manipulating things like JSON Objects, Java Object, XML, or
nested documents, a hierarchical path mechanism may be best.  My other
instinct here is, we should make sure that we base the OT on something
that has been proven to be correct.  There has been some work to evaluate
OT systems and prove that they are correct and solve the proper OT Puzzles
and support the required properties.

Two other things that we need to consider.  Beyond TP1 and TP2, there are
also IP1, IP2, and IP3 that you need to think about if you are going to
support group undo, which in my opinion wave needs to support.  If you
can't undo things in collaboration, then you have problems.

Basically, I am not confident that we know that wave's OT or what is in
lightwave is really what we want.  I am not saying that they are not, I am
just saying that I don't think we really have stated what we NEED from the
OT and then proved that a particular approach salsifies those needs.
Simply having a demo of something that works in a toy environment is
likely to have holes that we don't see until much, much later in the
development cycle.

My recommendation here would be for us to form a small committee to do a
literature review on the topic and to foster some technical discussion.
The result should be something in the wiki that lays out a plan that the
community can comment on.

>
>
>Once thats built, we can start integrating it into WIAB. The simplest
>way to do the client-server protocol and federation will be to simply
>reuse the container's protocol (obviously wrapped for access control).
>We could also strip it down for pure client-server interaction if we
>want, to make it less chatty. (If we decide thats worthwhile.)
>
>I'm also thinking about full end-to-end encryption. Especially in the
>wake of the PRISM stuff, I'd quite like to make something secure.
>Snowden: "Encryption works. Properly implemented strong crypto systems
>are one of the few things that you can rely on. Unfortunately,
>endpoint security is so terrifically weak that NSA can frequently find
>ways around it." --
>http://www.guardian.co.uk/world/2013/jun/17/edward-snowden-nsa-files-whist
>leblower
>.
I like this idea.  There is a paper from France that talks about nested
hashing, digital signing, etc that proves the authenticity of operations
that might also be worth looking at.  I will try to dig up the reference.

>
>
>All of this should happen in the experiments branch (with a mirror on
>github).
>
>The design decisions that we make here will be really hard to change
>later, so I'd like to get this right. I'd like as much feedback as
>possible. But please restrain yourself from complaining that its too
>much work. You're not the boss of me :D
>
>Also, I expect the core OT piece to be no longer than a few thousand
>lines. We can definitely pull that off - its just figuring out what
>those lines are thats the tricky part.
>
>Cheers
>Joseph


Reply via email to