Re: A Very Wavey Plan (P2P!)

Pratik Paranjape Wed, 19 Jun 2013 16:40:20 -0700

Michael.

Thanks for putting that out clearly, and yes, you are right. We will
eventually have to solve the problem or find compromises depending on the
target network, If we are aiming towards something like Wave or better. But
we can forget about transport for time being and focus on the OT aspect
assuming the connection just exists.


For the topic:

*If we start manipulating things like JSON Objects, Java Object, XML, or
> **nested documents, a hierarchical path mechanism may be best...*


In my opinion, is a particularly important point. I am not versed with OT
literature beyond Wave's, Sharejs' or couple of other approaches, but I
think we should try to find a way to make it work without making too many
assumptions about the document structure.

Perhaps we can have an intermediate stage where document is divided into n
parts (and synthesized back at the other end) : text, metadata....
Implementation of this stage is application owner's responsibility. I am
not yet sure how the relationships between the different parts will be
maintained, more thinking needed, but I would go on the lines of some
mapping functions again provided by api user. This will allow us to treat
input streams uniformly and support as many number of document structures
as people care to implement.

*Two other things that we need to consider.  Beyond TP1 and TP2, there are
> **also IP1, IP2, and IP3 that you need to think about if you are going to
> **support group undo, which in my opinion wave needs to support.  If you
> **can't undo things in collaboration, then you have problems.*


I am not familiar with OT literature beyond wave, sharejs or couple of
other implementations to add into ideas here, but sounds like these details
will affect the rest of the architecture. May be we should start with
fundamental assumptions about which features we want to support and go
ahead from there.


On Thu, Jun 20, 2013 at 4:18 AM, Michael MacFadden <
michael.macfad...@gmail.com> wrote:

> Pratik,
>
> The issue here is we are talking about a multi-tiered overlay network.
>
> Base Network
> -------------
> Layer 1 is the Internet.  That maintains IP connectivity between hosts.
>
>
> Message Transport Overlay
> --------------------------
> Then there is the collaboration messaging overlay network. Here we need a
> mechanism to get operations from one client to other clients.  We also
> need a mechanism to get clients in and out of this network.  This could be
> a hub and spoke architecture with servers where the clients log in.  This
> could be a hybrid P2P mesh network with a central tracker.  This could be
> a full P2P network using broadcast to joint and leave.  There are many
> options here.
>
> OT Concurrency Control
>
> -----------------------
> Here we have the topology of OT itself.  Is OT primarily client server
> (e.g. Jupiter, Wave, etc.) or is it P2P (COT, GOTO, etc.)  Here regardless
> of if message go to a server for redistribution to other clients, the OT
> could still operate in a P2P mode, meaning the transformation is happening
> between clients without the server mediating the OT.
>
>
> So we have three things to consider (well two really since the Internet is
> a give).  How do messages get between clients, and where does the OT
> happen.
>
> These can be decided partially independent of one another.
>
> ~Michael
>
> On 6/19/13 2:04 PM, "Pratik Paranjape" <pratikparanj...@gmail.com> wrote:
>
> >( I think this was sent to Joseph only, error!)
> >
> >If OT system actually acts as layer between application sub-components and
> >transport, where sub-components
> >may have their own data models, We can have a canonical structure as in
> >Option 2, managing raw incoming deltas from various
> >subscribers at various leaves. While actually subscribing to the OT
> >system,
> >the particular sub-application
> >will specify data type. We don't own the documents, or know their
> >structure, our inputs are changes
> >coming from the higher layer.
> >
> >Option 3 probably suggests tight integration with sub-documents.
> >
> >I am still not sure how you are thinking of managing a persistent
> >connection between two nodes on the internet though.
> >Without the presence of at least one tracker to keep a registry of peers,
> >the model is very fragile. Consider two vim instances
> >on two different machines. How will they initially pair up without
> >knowledge of each other's IPs? If they somehow do, what happens
> >when DHCP abruptly changes the IP address of one of the nodes?
> >
> >
> >On Thu, Jun 20, 2013 at 1:11 AM, Joseph Gentle <jose...@gmail.com> wrote:
> >
> >> Multiple nodes. We'll have an API that lets you connect to a remote
> >> peer and sync documents.
> >>
> >> And Yuri - I wasn't really talking about svn or git ;)
> >>
> >> What are your thoughts on the wavelet data model decision?
> >>
> >> On Wed, Jun 19, 2013 at 12:09 PM, Pratik Paranjape
> >> <pratikparanj...@gmail.com> wrote:
> >> > Very exciting! Are you thinking about sync between multiple nodes or
> >> just 2?
> >> >
> >> >
> >> > On Thu, Jun 20, 2013 at 12:02 AM, Yuri Z <vega...@gmail.com> wrote:
> >> >
> >> >> Sounds fantastic, especially when it comes from you Joseph.
> >> >> Just a side note regarding SVN-Git issue - it is possible to combine
> >> both
> >> >> by using git-svn - it works fine for me.
> >> >>
> >> >>
> >> >> On Wed, Jun 19, 2013 at 9:22 PM, Joseph Gentle <jose...@gmail.com>
> >> wrote:
> >> >>
> >> >> > I've given half a dozen talks about ShareJS over the last 3 years,
> >>and
> >> >> > almost every time I give a talk, someone asks me whether you can
> >>use
> >> >> > ShareJS in a peer-to-peer way instead of just through a single
> >>server.
> >> >> >
> >> >> > "You say it works like subversion. Can it work like Git?"
> >> >> > "Can you have a document shared between multiple servers?"
> >> >> >
> >> >> > Sigh no. ShareJS & Wave's algorithms were invented in 1995. Back
> >>then,
> >> >> > it was news when someone put up a new website.
> >> >> >
> >> >> > "What about Wave Federation?" Appropriately, it works like IRC, but
> >> >> > using XML. Its complicated, its vulnerable to netsplits and its
> >>buggy.
> >> >> > I guess its like IRC except it doesn't actually work.
> >> >> >
> >> >> > So lets fix that! Lets modernize wave and make it federate
> >>properly.
> >> >> > On the way we have a great opportunity to make it simpler and
> >>cleaner.
> >> >> >
> >> >> >
> >> >> > To start, I want to build a generic P2P OT container. This is a
> >>simple
> >> >> > wrapper that contains a set of OT documents and defines a network
> >> >> > protocol for keeping them in sync. The container needs to be able
> >>to
> >> >> > talk to another instance of itself running somewhere else and
> >> >> > syncronize documents between the two instances.
> >> >> >
> >> >> > Thats all I want this container to do - it should be as
> >>lightweight as
> >> >> > possible, so we can port it to lots of different languages and
> >> >> > environments. I want that code running in websites, in giant server
> >> >> > farms, in vim, and everywhere in between. It won't have any
> >>database
> >> >> > code, network code, users or a user interface (though it'll need
> >>APIs
> >> >> > to support all of that stuff). At its core it just does OT +
> >>protocol
> >> >> > work to syncronize documents.
> >> >> >
> >> >> > What are the documents? Well, like ShareJS, I'd like to support
> >> >> > multiple different kinds of data. We'll need to be able to support
> >> >> > wave's conversation model, but I'd also like to support arbitrary
> >> >> > JSON. Doing OT over arbitrary JSON structures would allow other
> >> >> > applications to be built on top of wave, using wave as a data
> >>platform
> >> >> > ("Glorious messaging bus in the sky"). It'd also be super useful
> >>for
> >> >> > gadgets and user data.
> >> >> >
> >> >> > There's three models I can imagine for what wavelets could look
> >>like:
> >> >> >
> >> >> > Option 1: All documents in the container have a unique name and a
> >> >> > type. This is how ShareJS works. We could have a JSON type and a
> >> >> > wavelet type. This is simple, but not particularly extensible (it
> >> >> > makes it hard to embed JSON inside a conversation, and vice versa).
> >> >> >
> >> >> > Option 2: At the root of every document is a JSON object. Leaves in
> >> >> > the JSON structure can be subdocuments, which could be rich text
> >>for
> >> >> > blips, or any other type we think of down the road.
> >> >> >
> >> >> > Option 3: We make another layer, which can contain a set of
> >>documents.
> >> >> > So, a wavelet could contain a JSON document describing the
> >> >> > conversation structure, some rich text documents for blips and
> >>another
> >> >> > JSON document containing gadget data or something. Access control
> >> >> > rules are at the container level. This is (sort of) how wavelets
> >>work
> >> >> > today.
> >> >> >
> >> >> > The OT itself I imagine building along the lines of Torben Weis's
> >>P2P
> >> >> > OT theory that he made in Lightwave:
> >> >> > https://code.google.com/p/lightwave/ . Briefly, every operation
> >>gets
> >> a
> >> >> > hash (like git). We add tombstones to wave's OT type and remove
> >> >> > invertability, so the transform function supports TP2. We also add
> >>a
> >> >> > prune function (inverse transform) which allows the history list
> >>to be
> >> >> > reordered (so you don't have to transform out on every site). The
> >>hard
> >> >> > part is figuring out which operations to sync, and which operations
> >> >> > need to be reordered. I'd like to go over the details with Michael
> >> >> > MacFadden and anyone else who's interested - there may well be a
> >> >> > better system that we should use instead. If there is, I'd like to
> >> >> > know about it now.
> >> >> >
> >> >> >
> >> >> > Once thats built, we can start integrating it into WIAB. The
> >>simplest
> >> >> > way to do the client-server protocol and federation will be to
> >>simply
> >> >> > reuse the container's protocol (obviously wrapped for access
> >>control).
> >> >> > We could also strip it down for pure client-server interaction if
> >>we
> >> >> > want, to make it less chatty. (If we decide thats worthwhile.)
> >> >> >
> >> >> > I'm also thinking about full end-to-end encryption. Especially in
> >>the
> >> >> > wake of the PRISM stuff, I'd quite like to make something secure.
> >> >> > Snowden: "Encryption works. Properly implemented strong crypto
> >>systems
> >> >> > are one of the few things that you can rely on. Unfortunately,
> >> >> > endpoint security is so terrifically weak that NSA can frequently
> >>find
> >> >> > ways around it." --
> >> >> >
> >> >> >
> >> >>
> >>
> >>
> http://www.guardian.co.uk/world/2013/jun/17/edward-snowden-nsa-files-whis
> >>tleblower
> >> >> > .
> >> >> >
> >> >> >
> >> >> > All of this should happen in the experiments branch (with a mirror
> >>on
> >> >> > github).
> >> >> >
> >> >> > The design decisions that we make here will be really hard to
> >>change
> >> >> > later, so I'd like to get this right. I'd like as much feedback as
> >> >> > possible. But please restrain yourself from complaining that its
> >>too
> >> >> > much work. You're not the boss of me :D
> >> >> >
> >> >> > Also, I expect the core OT piece to be no longer than a few
> >>thousand
> >> >> > lines. We can definitely pull that off - its just figuring out what
> >> >> > those lines are thats the tricky part.
> >> >> >
> >> >> > Cheers
> >> >> > Joseph
> >> >> >
> >> >>
> >>
>
>
>

Re: A Very Wavey Plan (P2P!)

Reply via email to