Over new years I've been working on my p2p wave prototype. I've been iterating on the part that I was most worried about, which is the concurrency control algorithms (the part which manages OT).
I think I've just about cracked it. I now have two working prototypes which simulate local clients syncing operations in a partially-disconnected state (details below). I still don't know how either algorithm will operate in real world conditions - I dropped the ball near the start of the year in getting volunteers to help me gather real world data. I still want to run that experiment at some point. The difference in the two approaches is that Method 1 lets clients cryptographically sign their operations before sending them out. Method 2 does not, and in exchange it performs faster. (see below). I'm no longer convinced that signing each operation is the right approach. Modern OTR (Off the record) messaging protocols recommend against signing and instead explicitly allow anyone who can verify a message to also forge a message. This gives you deniability over something you said electronically just like something you said in person. If we pursue OTR-style security there's more work to do, but I like the idea (and I like how much faster our nodes can run). Another benefit of method 2 is that if you connect peers in a tree (like the wave federation protocol), it behaves almost exactly the same as the current algorithm. There's just a little bit more bookkeeping. A quick note about benchmark times: These tests were done using my javascript text-tp2 implementation. This implementation is EXTREMELY slow (33 k xf/sec). My optimized C implementation (no tp2) does 75 M xf/sec, which is 2200 times faster. This difference almost entirely comes down to microoptimizations. Transform calls make up the majority of CPU time, and I expect I can make them about 1000 times faster. Also, we could probably use asmjs for the browser once we have a kickass C version. The test: Given 3 peers: 2000 times: 10 times: Pick a random peer in the network. The peer performs an operation 2 times: Pick 2 peers A and B. A gives B all the operations that B is missing. Total applied operations: 20 000 Method 1: Peers only exchange primitive (original) operations. Transforms: 4 099 032 Total time: 87.4 seconds (Final document: [ 9444, 'He ', 52935, 'flame ', 5744, 'O ', 111, 'through ' ]) Method 2: Peers exchange transformed operations. Operations are ordered consistently on all peers to prevent thrashing. Operations are composed during pull as a microoptimization. Transforms: 680 094, Composes: 48 000 Total time: 22.3 seconds (Final document: [ 54846, 'in ', 13302, 'One ', 17, 'burbled He ' ]) Neither method was particularly overwhelming in terms of speed, but 1ms of aggregate CPU time per operation across 3 peers is totally usable. If I can boost the performance of my javascript TP2 OT implementation by a couple orders of magnitude it'd be fun to see how much these benchmarks improve. (And I have every reason to believe I can!) It would still be good to get some real world performance data, but I'm not worried anymore. This is usable. -J Code: Method 1: https://github.com/josephg/tp2stuff/blob/288c844593f4b0205869402925cbbe1a156c9a5e/node2.coffee Method 2: https://github.com/josephg/tp2stuff/blob/288c844593f4b0205869402925cbbe1a156c9a5e/node3_bubble.coffee Benchmark ran on node 0.10.21 on a 2012 2ghz MBA. OT transform benchmark code: https://github.com/josephg/tp2stuff/blob/288c844593f4b0205869402925cbbe1a156c9a5e/tp2bench.coffee