I've given half a dozen talks about ShareJS over the last 3 years, and
almost every time I give a talk, someone asks me whether you can use
ShareJS in a peer-to-peer way instead of just through a single server.

"You say it works like subversion. Can it work like Git?"
"Can you have a document shared between multiple servers?"

Sigh no. ShareJS & Wave's algorithms were invented in 1995. Back then,
it was news when someone put up a new website.

"What about Wave Federation?" Appropriately, it works like IRC, but
using XML. Its complicated, its vulnerable to netsplits and its buggy.
I guess its like IRC except it doesn't actually work.

So lets fix that! Lets modernize wave and make it federate properly.
On the way we have a great opportunity to make it simpler and cleaner.


To start, I want to build a generic P2P OT container. This is a simple
wrapper that contains a set of OT documents and defines a network
protocol for keeping them in sync. The container needs to be able to
talk to another instance of itself running somewhere else and
syncronize documents between the two instances.

Thats all I want this container to do - it should be as lightweight as
possible, so we can port it to lots of different languages and
environments. I want that code running in websites, in giant server
farms, in vim, and everywhere in between. It won't have any database
code, network code, users or a user interface (though it'll need APIs
to support all of that stuff). At its core it just does OT + protocol
work to syncronize documents.

What are the documents? Well, like ShareJS, I'd like to support
multiple different kinds of data. We'll need to be able to support
wave's conversation model, but I'd also like to support arbitrary
JSON. Doing OT over arbitrary JSON structures would allow other
applications to be built on top of wave, using wave as a data platform
("Glorious messaging bus in the sky"). It'd also be super useful for
gadgets and user data.

There's three models I can imagine for what wavelets could look like:

Option 1: All documents in the container have a unique name and a
type. This is how ShareJS works. We could have a JSON type and a
wavelet type. This is simple, but not particularly extensible (it
makes it hard to embed JSON inside a conversation, and vice versa).

Option 2: At the root of every document is a JSON object. Leaves in
the JSON structure can be subdocuments, which could be rich text for
blips, or any other type we think of down the road.

Option 3: We make another layer, which can contain a set of documents.
So, a wavelet could contain a JSON document describing the
conversation structure, some rich text documents for blips and another
JSON document containing gadget data or something. Access control
rules are at the container level. This is (sort of) how wavelets work
today.

The OT itself I imagine building along the lines of Torben Weis's P2P
OT theory that he made in Lightwave:
https://code.google.com/p/lightwave/ . Briefly, every operation gets a
hash (like git). We add tombstones to wave's OT type and remove
invertability, so the transform function supports TP2. We also add a
prune function (inverse transform) which allows the history list to be
reordered (so you don't have to transform out on every site). The hard
part is figuring out which operations to sync, and which operations
need to be reordered. I'd like to go over the details with Michael
MacFadden and anyone else who's interested - there may well be a
better system that we should use instead. If there is, I'd like to
know about it now.


Once thats built, we can start integrating it into WIAB. The simplest
way to do the client-server protocol and federation will be to simply
reuse the container's protocol (obviously wrapped for access control).
We could also strip it down for pure client-server interaction if we
want, to make it less chatty. (If we decide thats worthwhile.)

I'm also thinking about full end-to-end encryption. Especially in the
wake of the PRISM stuff, I'd quite like to make something secure.
Snowden: "Encryption works. Properly implemented strong crypto systems
are one of the few things that you can rely on. Unfortunately,
endpoint security is so terrifically weak that NSA can frequently find
ways around it." --
http://www.guardian.co.uk/world/2013/jun/17/edward-snowden-nsa-files-whistleblower
.


All of this should happen in the experiments branch (with a mirror on github).

The design decisions that we make here will be really hard to change
later, so I'd like to get this right. I'd like as much feedback as
possible. But please restrain yourself from complaining that its too
much work. You're not the boss of me :D

Also, I expect the core OT piece to be no longer than a few thousand
lines. We can definitely pull that off - its just figuring out what
those lines are thats the tricky part.

Cheers
Joseph

Reply via email to