Re: P2P wave progress

Joseph Gentle Thu, 02 Jan 2014 13:34:20 -0800

Peer-to-peer is a super overloaded term. It refers to somewhat
symmetric peers communicating with each other directly instead of
clients all talking to a central server. It gets overloaded to refer
to anything that doesn't fit the classic client-server model.


P2P systems usually have mechanisms for service discovery (mechanism
for peers to find one another). But thats not required. And its
separate from the algorithms they run.
- Kazaa automatically chooses peers with good internet connections to
become 'supernodes', effectively making some users into servers.
- Git has no service discovery mechanism at all - you have to manually
add 'remotes' to interact with.
- Bittorrent usually uses trackers for service discovery, which is a
classic server-client protocol.
These systems are all considered canonical examples of p2p protocols.

In comparison even ignoring service discovery, HTTP is not a p2p
protocol because I always have to match a client process with a server
process (they're not symmetrical). The client always has to initiate a
request, the server is the only one with URLs and the headers they
send to each other are different.

The OT algorithm that wave (&ShareJS) uses is not a P2P algorithm
because each wave requires one computer (a server) to be the ultimate
arbiter of truth when it comes to deciding on the order that
operations are put in the operation log. Operations are then numbered
based on their position in the oplog (using incrementing version
numbers). Even in federation one server must act as the root, making
the system both more complicated and less stable - its vulnerable to
IRC style netsplits.

In comparison, in the system I'm working on at the moment the order
that operations are put in the oplog doesn't matter*, and as a result
you don't need an ultimate arbiter of truth in your network to order
everything. Thats what I mean when I say the algorithm is p2p - its
capable of working without a centralized server. You can (and probably
will) still set things up using servers & clients, but the algorithm
doesn't require it. Even if we don't attach clients using exciting
network topologies, federation will be much simpler & more robust as a
result and it will be easier to scale wave servers. (In both cases we
won't have a single point of failure anymore).

To answer your last point, browsers can (finally) connect directly to
one another now using WebRTC. (Its not just for video!). I'm not sure
if that'll actually be useful for us - the main reason I'm prototyping
this stuff out using coffeescript is because I work faster in
coffeescript than other languages, so its a good space for me to
iterate. I want to start rewriting bits in C before too long once we
have something working. I'm most excited about is integrating my
development tools - and they pretty much all need C hooks.

-Joseph

[*] Order *mostly* doesn't matter - you still have to maintain the
partial order of dependancy relationships. If I create operation Z in
a document which (locally) contains operations X and Y, Z must always
follow X and Y.


On Fri, Jan 3, 2014 at 12:17 AM, Thomas Wrobel <darkfl...@gmail.com> wrote:
> Excuse my ignorance in this,  but can you briefly explain what "p2p" means
> in this context.
> I have only heard it used in the likes of torrent systems and such - the
> idea of a pure client system with no servers. Peers finding eachother with
> no server telling them who's where. A desktop wave application like that
> would be interesting....but I am not sure that's what you mean, as you
> mention browsers too.
> To my knowledge, there's is absolutely no way for a browser to connect to
> another browser with no server inbetween - even if they know eachothers
> IP's (and assuming everyones static), data just cant be sent directly
> between browser based clients like that can they :? Am I wrong in that
> regard?
>
>
>
> ~~~
> Thomas & Bertines online review show:
> http://randomreviewshow.com/index.html
> Try it! You might even feel ambivalent about it :)
>
>
> On 2 January 2014 05:22, Joseph Gentle <jose...@gmail.com> wrote:
>
>> Over new years I've been working on my p2p wave prototype. I've been
>> iterating on the part that I was most worried about, which is the
>> concurrency control algorithms (the part which manages OT).
>>
>> I think I've just about cracked it.
>>
>> I now have two working prototypes which simulate local clients syncing
>> operations in a partially-disconnected state (details below). I still
>> don't know how either algorithm will operate in real world conditions
>> - I dropped the ball near the start of the year in getting volunteers
>> to help me gather real world data. I still want to run that experiment
>> at some point.
>>
>> The difference in the two approaches is that Method 1 lets clients
>> cryptographically sign their operations before sending them out.
>> Method 2 does not, and in exchange it performs faster. (see below).
>> I'm no longer convinced that signing each operation is the right
>> approach. Modern OTR (Off the record) messaging protocols recommend
>> against signing and instead explicitly allow anyone who can verify a
>> message to also forge a message. This gives you deniability over
>> something you said electronically just like something you said in
>> person. If we pursue OTR-style security there's more work to do, but I
>> like the idea (and I like how much faster our nodes can run).
>>
>> Another benefit of method 2 is that if you connect peers in a tree
>> (like the wave federation protocol), it behaves almost exactly the
>> same as the current algorithm. There's just a little bit more
>> bookkeeping.
>>
>>
>> A quick note about benchmark times:
>>
>> These tests were done using my javascript text-tp2 implementation.
>> This implementation is EXTREMELY slow (33 k xf/sec). My optimized C
>> implementation (no tp2) does 75 M xf/sec, which is 2200 times faster.
>> This difference almost entirely comes down to microoptimizations.
>> Transform calls make up the majority of CPU time, and I expect I can
>> make them about 1000 times faster.
>>
>> Also, we could probably use asmjs for the browser once we have a
>> kickass C version.
>>
>>
>> The test:
>>
>> Given 3 peers:
>>
>> 2000 times:
>>   10 times:
>>     Pick a random peer in the network. The peer performs an operation
>>   2 times:
>>     Pick 2 peers A and B. A gives B all the operations that B is missing.
>>
>> Total applied operations: 20 000
>>
>>
>> Method 1:
>> Peers only exchange primitive (original) operations.
>>
>> Transforms: 4 099 032
>> Total time: 87.4 seconds
>> (Final document: [ 9444, 'He ', 52935, 'flame ', 5744, 'O ', 111, 'through
>> ' ])
>>
>> Method 2:
>> Peers exchange transformed operations. Operations are ordered
>> consistently on all peers to prevent thrashing. Operations are
>> composed during pull as a microoptimization.
>>
>> Transforms: 680 094, Composes: 48 000
>> Total time: 22.3 seconds
>> (Final document: [ 54846, 'in ', 13302, 'One ', 17, 'burbled He ' ])
>>
>>
>> Neither method was particularly overwhelming in terms of speed, but
>> 1ms of aggregate CPU time per operation across 3 peers is totally
>> usable. If I can boost the performance of my javascript TP2 OT
>> implementation by a couple orders of magnitude it'd be fun to see how
>> much these benchmarks improve. (And I have every reason to believe I
>> can!)
>>
>> It would still be good to get some real world performance data, but
>> I'm not worried anymore. This is usable.
>>
>> -J
>>
>>
>> Code:
>> Method 1:
>> https://github.com/josephg/tp2stuff/blob/288c844593f4b0205869402925cbbe1a156c9a5e/node2.coffee
>> Method 2:
>> https://github.com/josephg/tp2stuff/blob/288c844593f4b0205869402925cbbe1a156c9a5e/node3_bubble.coffee
>>
>> Benchmark ran on node 0.10.21 on a 2012 2ghz MBA.
>> OT transform benchmark code:
>>
>> https://github.com/josephg/tp2stuff/blob/288c844593f4b0205869402925cbbe1a156c9a5e/tp2bench.coffee
>>

Re: P2P wave progress

Reply via email to