Yeah that was a longer email than I was expecting to write. Its fine - less bike shedding this way.
The tech tree forks out here - there's (at least) 4 things to do next: - Get the net protocol working. Current version is local only. - I'd like to see it fast. Either rewrite the text-tp2 code to get comparable speed to the C version or port the concurrency control stuff to C. - Get some real operation data so we can test performance in realish situations - Design the data structure we want. On Jan 3, 2014 12:00 PM, "Thomas Wrobel" <darkfl...@gmail.com> wrote: > Thanks for tacking the time to explain that, that's really helpful to me. > Sorry to have interrupted your original query. > > It does sound a distinct improvement and looks like you have put a lot of > work into implementing it. > Whats the next steps from here? > > > ~~~ > Thomas & Bertines online review show: > http://randomreviewshow.com/index.html > Try it! You might even feel ambivalent about it :) > > > On 2 January 2014 22:32, Joseph Gentle <jose...@gmail.com> wrote: > > > Peer-to-peer is a super overloaded term. It refers to somewhat > > symmetric peers communicating with each other directly instead of > > clients all talking to a central server. It gets overloaded to refer > > to anything that doesn't fit the classic client-server model. > > > > P2P systems usually have mechanisms for service discovery (mechanism > > for peers to find one another). But thats not required. And its > > separate from the algorithms they run. > > - Kazaa automatically chooses peers with good internet connections to > > become 'supernodes', effectively making some users into servers. > > - Git has no service discovery mechanism at all - you have to manually > > add 'remotes' to interact with. > > - Bittorrent usually uses trackers for service discovery, which is a > > classic server-client protocol. > > These systems are all considered canonical examples of p2p protocols. > > > > In comparison even ignoring service discovery, HTTP is not a p2p > > protocol because I always have to match a client process with a server > > process (they're not symmetrical). The client always has to initiate a > > request, the server is the only one with URLs and the headers they > > send to each other are different. > > > > The OT algorithm that wave (&ShareJS) uses is not a P2P algorithm > > because each wave requires one computer (a server) to be the ultimate > > arbiter of truth when it comes to deciding on the order that > > operations are put in the operation log. Operations are then numbered > > based on their position in the oplog (using incrementing version > > numbers). Even in federation one server must act as the root, making > > the system both more complicated and less stable - its vulnerable to > > IRC style netsplits. > > > > In comparison, in the system I'm working on at the moment the order > > that operations are put in the oplog doesn't matter*, and as a result > > you don't need an ultimate arbiter of truth in your network to order > > everything. Thats what I mean when I say the algorithm is p2p - its > > capable of working without a centralized server. You can (and probably > > will) still set things up using servers & clients, but the algorithm > > doesn't require it. Even if we don't attach clients using exciting > > network topologies, federation will be much simpler & more robust as a > > result and it will be easier to scale wave servers. (In both cases we > > won't have a single point of failure anymore). > > > > To answer your last point, browsers can (finally) connect directly to > > one another now using WebRTC. (Its not just for video!). I'm not sure > > if that'll actually be useful for us - the main reason I'm prototyping > > this stuff out using coffeescript is because I work faster in > > coffeescript than other languages, so its a good space for me to > > iterate. I want to start rewriting bits in C before too long once we > > have something working. I'm most excited about is integrating my > > development tools - and they pretty much all need C hooks. > > > > -Joseph > > > > [*] Order *mostly* doesn't matter - you still have to maintain the > > partial order of dependancy relationships. If I create operation Z in > > a document which (locally) contains operations X and Y, Z must always > > follow X and Y. > > > > > > On Fri, Jan 3, 2014 at 12:17 AM, Thomas Wrobel <darkfl...@gmail.com> > > wrote: > > > Excuse my ignorance in this, but can you briefly explain what "p2p" > > means > > > in this context. > > > I have only heard it used in the likes of torrent systems and such - > the > > > idea of a pure client system with no servers. Peers finding eachother > > with > > > no server telling them who's where. A desktop wave application like > that > > > would be interesting....but I am not sure that's what you mean, as you > > > mention browsers too. > > > To my knowledge, there's is absolutely no way for a browser to connect > to > > > another browser with no server inbetween - even if they know eachothers > > > IP's (and assuming everyones static), data just cant be sent directly > > > between browser based clients like that can they :? Am I wrong in that > > > regard? > > > > > > > > > > > > ~~~ > > > Thomas & Bertines online review show: > > > http://randomreviewshow.com/index.html > > > Try it! You might even feel ambivalent about it :) > > > > > > > > > On 2 January 2014 05:22, Joseph Gentle <jose...@gmail.com> wrote: > > > > > >> Over new years I've been working on my p2p wave prototype. I've been > > >> iterating on the part that I was most worried about, which is the > > >> concurrency control algorithms (the part which manages OT). > > >> > > >> I think I've just about cracked it. > > >> > > >> I now have two working prototypes which simulate local clients syncing > > >> operations in a partially-disconnected state (details below). I still > > >> don't know how either algorithm will operate in real world conditions > > >> - I dropped the ball near the start of the year in getting volunteers > > >> to help me gather real world data. I still want to run that experiment > > >> at some point. > > >> > > >> The difference in the two approaches is that Method 1 lets clients > > >> cryptographically sign their operations before sending them out. > > >> Method 2 does not, and in exchange it performs faster. (see below). > > >> I'm no longer convinced that signing each operation is the right > > >> approach. Modern OTR (Off the record) messaging protocols recommend > > >> against signing and instead explicitly allow anyone who can verify a > > >> message to also forge a message. This gives you deniability over > > >> something you said electronically just like something you said in > > >> person. If we pursue OTR-style security there's more work to do, but I > > >> like the idea (and I like how much faster our nodes can run). > > >> > > >> Another benefit of method 2 is that if you connect peers in a tree > > >> (like the wave federation protocol), it behaves almost exactly the > > >> same as the current algorithm. There's just a little bit more > > >> bookkeeping. > > >> > > >> > > >> A quick note about benchmark times: > > >> > > >> These tests were done using my javascript text-tp2 implementation. > > >> This implementation is EXTREMELY slow (33 k xf/sec). My optimized C > > >> implementation (no tp2) does 75 M xf/sec, which is 2200 times faster. > > >> This difference almost entirely comes down to microoptimizations. > > >> Transform calls make up the majority of CPU time, and I expect I can > > >> make them about 1000 times faster. > > >> > > >> Also, we could probably use asmjs for the browser once we have a > > >> kickass C version. > > >> > > >> > > >> The test: > > >> > > >> Given 3 peers: > > >> > > >> 2000 times: > > >> 10 times: > > >> Pick a random peer in the network. The peer performs an operation > > >> 2 times: > > >> Pick 2 peers A and B. A gives B all the operations that B is > > missing. > > >> > > >> Total applied operations: 20 000 > > >> > > >> > > >> Method 1: > > >> Peers only exchange primitive (original) operations. > > >> > > >> Transforms: 4 099 032 > > >> Total time: 87.4 seconds > > >> (Final document: [ 9444, 'He ', 52935, 'flame ', 5744, 'O ', 111, > > 'through > > >> ' ]) > > >> > > >> Method 2: > > >> Peers exchange transformed operations. Operations are ordered > > >> consistently on all peers to prevent thrashing. Operations are > > >> composed during pull as a microoptimization. > > >> > > >> Transforms: 680 094, Composes: 48 000 > > >> Total time: 22.3 seconds > > >> (Final document: [ 54846, 'in ', 13302, 'One ', 17, 'burbled He ' ]) > > >> > > >> > > >> Neither method was particularly overwhelming in terms of speed, but > > >> 1ms of aggregate CPU time per operation across 3 peers is totally > > >> usable. If I can boost the performance of my javascript TP2 OT > > >> implementation by a couple orders of magnitude it'd be fun to see how > > >> much these benchmarks improve. (And I have every reason to believe I > > >> can!) > > >> > > >> It would still be good to get some real world performance data, but > > >> I'm not worried anymore. This is usable. > > >> > > >> -J > > >> > > >> > > >> Code: > > >> Method 1: > > >> > > > https://github.com/josephg/tp2stuff/blob/288c844593f4b0205869402925cbbe1a156c9a5e/node2.coffee > > >> Method 2: > > >> > > > https://github.com/josephg/tp2stuff/blob/288c844593f4b0205869402925cbbe1a156c9a5e/node3_bubble.coffee > > >> > > >> Benchmark ran on node 0.10.21 on a 2012 2ghz MBA. > > >> OT transform benchmark code: > > >> > > >> > > > https://github.com/josephg/tp2stuff/blob/288c844593f4b0205869402925cbbe1a156c9a5e/tp2bench.coffee > > >> > > >