Re: P2P wave progress

Joseph Gentle Thu, 02 Jan 2014 18:00:29 -0800

Yeah that was a longer email than I was expecting to write. Its fine - less
bike shedding this way.


The tech tree forks out here - there's (at least) 4 things to do next:

- Get the net protocol working. Current version is local only.
- I'd like to see it fast. Either rewrite the text-tp2 code to get
comparable speed to the C version or port the concurrency control stuff to
C.
- Get some real operation data so we can test performance in realish
situations
- Design the data structure we want.
On Jan 3, 2014 12:00 PM, "Thomas Wrobel" <darkfl...@gmail.com> wrote:

> Thanks for tacking the time to explain that, that's really helpful to me.
> Sorry to have interrupted your original query.
>
> It does sound a distinct improvement and looks like you have put a lot of
> work into implementing it.
> Whats the next steps from here?
>
>
> ~~~
> Thomas & Bertines online review show:
> http://randomreviewshow.com/index.html
> Try it! You might even feel ambivalent about it :)
>
>
> On 2 January 2014 22:32, Joseph Gentle <jose...@gmail.com> wrote:
>
> > Peer-to-peer is a super overloaded term. It refers to somewhat
> > symmetric peers communicating with each other directly instead of
> > clients all talking to a central server. It gets overloaded to refer
> > to anything that doesn't fit the classic client-server model.
> >
> > P2P systems usually have mechanisms for service discovery (mechanism
> > for peers to find one another). But thats not required. And its
> > separate from the algorithms they run.
> > - Kazaa automatically chooses peers with good internet connections to
> > become 'supernodes', effectively making some users into servers.
> > - Git has no service discovery mechanism at all - you have to manually
> > add 'remotes' to interact with.
> > - Bittorrent usually uses trackers for service discovery, which is a
> > classic server-client protocol.
> > These systems are all considered canonical examples of p2p protocols.
> >
> > In comparison even ignoring service discovery, HTTP is not a p2p
> > protocol because I always have to match a client process with a server
> > process (they're not symmetrical). The client always has to initiate a
> > request, the server is the only one with URLs and the headers they
> > send to each other are different.
> >
> > The OT algorithm that wave (&ShareJS) uses is not a P2P algorithm
> > because each wave requires one computer (a server) to be the ultimate
> > arbiter of truth when it comes to deciding on the order that
> > operations are put in the operation log. Operations are then numbered
> > based on their position in the oplog (using incrementing version
> > numbers). Even in federation one server must act as the root, making
> > the system both more complicated and less stable - its vulnerable to
> > IRC style netsplits.
> >
> > In comparison, in the system I'm working on at the moment the order
> > that operations are put in the oplog doesn't matter*, and as a result
> > you don't need an ultimate arbiter of truth in your network to order
> > everything. Thats what I mean when I say the algorithm is p2p - its
> > capable of working without a centralized server. You can (and probably
> > will) still set things up using servers & clients, but the algorithm
> > doesn't require it. Even if we don't attach clients using exciting
> > network topologies, federation will be much simpler & more robust as a
> > result and it will be easier to scale wave servers. (In both cases we
> > won't have a single point of failure anymore).
> >
> > To answer your last point, browsers can (finally) connect directly to
> > one another now using WebRTC. (Its not just for video!). I'm not sure
> > if that'll actually be useful for us - the main reason I'm prototyping
> > this stuff out using coffeescript is because I work faster in
> > coffeescript than other languages, so its a good space for me to
> > iterate. I want to start rewriting bits in C before too long once we
> > have something working. I'm most excited about is integrating my
> > development tools - and they pretty much all need C hooks.
> >
> > -Joseph
> >
> > [*] Order *mostly* doesn't matter - you still have to maintain the
> > partial order of dependancy relationships. If I create operation Z in
> > a document which (locally) contains operations X and Y, Z must always
> > follow X and Y.
> >
> >
> > On Fri, Jan 3, 2014 at 12:17 AM, Thomas Wrobel <darkfl...@gmail.com>
> > wrote:
> > > Excuse my ignorance in this,  but can you briefly explain what "p2p"
> > means
> > > in this context.
> > > I have only heard it used in the likes of torrent systems and such -
> the
> > > idea of a pure client system with no servers. Peers finding eachother
> > with
> > > no server telling them who's where. A desktop wave application like
> that
> > > would be interesting....but I am not sure that's what you mean, as you
> > > mention browsers too.
> > > To my knowledge, there's is absolutely no way for a browser to connect
> to
> > > another browser with no server inbetween - even if they know eachothers
> > > IP's (and assuming everyones static), data just cant be sent directly
> > > between browser based clients like that can they :? Am I wrong in that
> > > regard?
> > >
> > >
> > >
> > > ~~~
> > > Thomas & Bertines online review show:
> > > http://randomreviewshow.com/index.html
> > > Try it! You might even feel ambivalent about it :)
> > >
> > >
> > > On 2 January 2014 05:22, Joseph Gentle <jose...@gmail.com> wrote:
> > >
> > >> Over new years I've been working on my p2p wave prototype. I've been
> > >> iterating on the part that I was most worried about, which is the
> > >> concurrency control algorithms (the part which manages OT).
> > >>
> > >> I think I've just about cracked it.
> > >>
> > >> I now have two working prototypes which simulate local clients syncing
> > >> operations in a partially-disconnected state (details below). I still
> > >> don't know how either algorithm will operate in real world conditions
> > >> - I dropped the ball near the start of the year in getting volunteers
> > >> to help me gather real world data. I still want to run that experiment
> > >> at some point.
> > >>
> > >> The difference in the two approaches is that Method 1 lets clients
> > >> cryptographically sign their operations before sending them out.
> > >> Method 2 does not, and in exchange it performs faster. (see below).
> > >> I'm no longer convinced that signing each operation is the right
> > >> approach. Modern OTR (Off the record) messaging protocols recommend
> > >> against signing and instead explicitly allow anyone who can verify a
> > >> message to also forge a message. This gives you deniability over
> > >> something you said electronically just like something you said in
> > >> person. If we pursue OTR-style security there's more work to do, but I
> > >> like the idea (and I like how much faster our nodes can run).
> > >>
> > >> Another benefit of method 2 is that if you connect peers in a tree
> > >> (like the wave federation protocol), it behaves almost exactly the
> > >> same as the current algorithm. There's just a little bit more
> > >> bookkeeping.
> > >>
> > >>
> > >> A quick note about benchmark times:
> > >>
> > >> These tests were done using my javascript text-tp2 implementation.
> > >> This implementation is EXTREMELY slow (33 k xf/sec). My optimized C
> > >> implementation (no tp2) does 75 M xf/sec, which is 2200 times faster.
> > >> This difference almost entirely comes down to microoptimizations.
> > >> Transform calls make up the majority of CPU time, and I expect I can
> > >> make them about 1000 times faster.
> > >>
> > >> Also, we could probably use asmjs for the browser once we have a
> > >> kickass C version.
> > >>
> > >>
> > >> The test:
> > >>
> > >> Given 3 peers:
> > >>
> > >> 2000 times:
> > >>   10 times:
> > >>     Pick a random peer in the network. The peer performs an operation
> > >>   2 times:
> > >>     Pick 2 peers A and B. A gives B all the operations that B is
> > missing.
> > >>
> > >> Total applied operations: 20 000
> > >>
> > >>
> > >> Method 1:
> > >> Peers only exchange primitive (original) operations.
> > >>
> > >> Transforms: 4 099 032
> > >> Total time: 87.4 seconds
> > >> (Final document: [ 9444, 'He ', 52935, 'flame ', 5744, 'O ', 111,
> > 'through
> > >> ' ])
> > >>
> > >> Method 2:
> > >> Peers exchange transformed operations. Operations are ordered
> > >> consistently on all peers to prevent thrashing. Operations are
> > >> composed during pull as a microoptimization.
> > >>
> > >> Transforms: 680 094, Composes: 48 000
> > >> Total time: 22.3 seconds
> > >> (Final document: [ 54846, 'in ', 13302, 'One ', 17, 'burbled He ' ])
> > >>
> > >>
> > >> Neither method was particularly overwhelming in terms of speed, but
> > >> 1ms of aggregate CPU time per operation across 3 peers is totally
> > >> usable. If I can boost the performance of my javascript TP2 OT
> > >> implementation by a couple orders of magnitude it'd be fun to see how
> > >> much these benchmarks improve. (And I have every reason to believe I
> > >> can!)
> > >>
> > >> It would still be good to get some real world performance data, but
> > >> I'm not worried anymore. This is usable.
> > >>
> > >> -J
> > >>
> > >>
> > >> Code:
> > >> Method 1:
> > >>
> >
> https://github.com/josephg/tp2stuff/blob/288c844593f4b0205869402925cbbe1a156c9a5e/node2.coffee
> > >> Method 2:
> > >>
> >
> https://github.com/josephg/tp2stuff/blob/288c844593f4b0205869402925cbbe1a156c9a5e/node3_bubble.coffee
> > >>
> > >> Benchmark ran on node 0.10.21 on a 2012 2ghz MBA.
> > >> OT transform benchmark code:
> > >>
> > >>
> >
> https://github.com/josephg/tp2stuff/blob/288c844593f4b0205869402925cbbe1a156c9a5e/tp2bench.coffee
> > >>
> >
>

Re: P2P wave progress

Reply via email to