As for one small comment, yes UNDO must be implemented in OT if you have multiple participants. One simple example would be three users. Lets say an operation to insert a character goes from user A to user B it has not made it to user C. The user B decides to Undo. At that point the undo operation goes from user B to user A and user C. Now the problem is the undo action from B may arrive at C before the original operation from A arrived at C.
This would cause a problem without OT. ~Michael On Jun 21, 2013, at 6:49 AM, Pratik Paranjape <pratikparanj...@gmail.com> wrote: > Awesome Bruno! Thanks for this. > > Good to see we are not totally ignoring networking, that would have been > naive, it will be Elephant in the room. > > Good discussion there, will have to think over all points in detail, but on > first pass: > >>> *Summary of discussions:* >>> *====================* >>> *1) Underlying protocol for P2P federation* >>> Currently XMPP is used. HTTP and raw TCP are two suggested candidates > (HTTP >>> allowing to much more easily reach restricted networks). > > 1) Going by our decoupling requirement, Shouldn't we think about both HTTP > and Sockets, treating them as modes of transport under a common interface? > > My personal preference is sockets, IMHO the speed and simplicity is > required. HTTP can be used for authentication and non-real time > interactions. But when sockets isn't an option, HTTP as fallback will have > to do. > >>> *3) Routing p2p messages/events in a server-aided network* >>> One option is to somehow detect server clusters, send data to one of > them, >>> and let the rest of the cluster servers synchronize to it (locally). >>> Alternatively, the originator server can naively send stuff to all > possible >>> destination servers, regardless of the cost. >>> >>> *4) Routing p2p messages/events in a pure P2P system (5 parts)* >>> How to manage to route all wave-stuff if we want to completely get rid of >>> servers completely, and only use peers. >>> The closest way would be to use a DHT, but huge latency is an unsolved >>> problem, and makes it impossible to use for real-time waving. >>> No other solution has been proposed. > > 3, 4) Its possible to start with simplest possible configuration...two > nodes who have access to each other. Over time we will get more insights > about this. I don't think connectivity can be done without a set of > trackers for ALL possible cases, especially between two workstation nodes > or two mobiles. How about letting people run wave tracker services in > public domain? Something like oauth providers currently. Its up to you if > you trust the provider and want to "whitelist" it or register on it. > Trackers will only be used initially to make a connection, or again when > its dropped. Most other models will involve a lot of jumping through hoops. > We can do KISS initially. > > >>> *5) Implementing "undo": invertibility, tombstones, edge cases, TP2* >>> No server means no canonical order of commits, which means that undo is >>> hard to do correctly. >>> (uhm... not sure if that's a good summary, some stuff went over my head >>> :-D, please read the log instead) > > 5) I may be wrong about this, but is it necessary to do UNDO on OT level? > Can we not let application handle it depending on how data is interpreted? > I can see how it can be done easily if the application using the OT layer > is a code editor. On the other hand, is it safe to even do UNDO in OT layer > when we are not fully sure about data model and domain constraints? > >> >> *7) Comparison with BitTorrent and P2P-TV technologies* >>> Both technologies are much less restricted than wave with regards to >>> real-time responsiveness. So none are really a good reference for our >>> purposes. > > 7) Agreed. p2p file sharing is too casual system to pick up from for our > use case. > > >>> >>> *8) Identifying participants (3 parts)* >>> Pure p2p means many peers don't have a n...@centralized-server.com user >>> handle, so an alternative has to be used. >>> However, it's easy to provide a traditional friendly handle, if the user >>> prefers the tradeoff of having to often rely on a permanent server. This >>> tradeoff can be mitigated by using a sort of userhandle cache. > > 8) Please see 3, 4. > > > On Fri, Jun 21, 2013 at 6:45 PM, Michael MacFadden < > michael.macfad...@gmail.com> wrote: > >> We should create a section for these topics in the wiki in some sort of >> design space. >> >> ~Michael >> >> On Jun 21, 2013, at 6:06 AM, John Blossom <jblos...@gmail.com> wrote: >> >>> Bruno, >>> >>> Thanks, this is an excellent summary. It helps me to get the gist of >> things >>> more clearly. >>> >>> On the P2P latency, I don't think that it would be unacceptable to draw a >>> line and say that P2P provides limited, non-guaranteed realtime OT or >> that >>> it's not realtime OT and more of a syncing mode than a conversation mode. >>> That would probably be sufficient for what needs to be done, especially >>> since in some instances P2P-enabled Wave sessions may be using MESH >>> networks for transport - a key factor in how a lot of experimental >>> communications services are being deployed in developing nations (not >> just >>> the Project Loon concept). In the MESH model, you're likely to have one >>> node within range of another temporarily, which may sync with it, and >> then >>> pass along data to another node when it comes in range of it. That's the >>> most probable scenario for P2P in many instances, I would think. The >> other >>> potential scenario: two people in a remote location, for the sake of >>> argument two movie script-writers who have holed themselves up in a >> remote >>> location to collaborate on a common script. They're on two devices that >> are >>> very proximate to one another, so perhaps the latency issues will not be >> so >>> severe. >>> >>> Things to think about, I will look at this more carefully later today. >>> >>> All the best, >>> >>> John Blossom >>> >>> On Fri, Jun 21, 2013 at 8:05 AM, Bruno Gonzalez (aka stenyak) < >>> sten...@gmail.com> wrote: >>> >>>> Following Joseph's "A Very Wavey Plan (P2P!)" thread, a couple of >>>> discussions have taken place at the irc.freenode.net #wiab channel, all >>>> related to P2P. >>>> >>>> I've taken the liberty to restructure the IRC logs, remove some >> chitchat, >>>> and divide it into sub-discussions. Feel free to reply to any part of >> this >>>> email to continue a discussion. >>>> >>>> >>>> *Summary of discussions:* >>>> *====================* >>>> *1) Underlying protocol for P2P federation* >>>> Currently XMPP is used. HTTP and raw TCP are two suggested candidates >> (HTTP >>>> allowing to much more easily reach restricted networks). >>>> >>>> *2) Message/event types needed for P2P federation to work* >>>> We'd need something similar in concept to certain git operations (git >>>> clone, git push...). All will be based on hashes (not incremental >>>> integers). >>>> >>>> *3) Routing p2p messages/events in a server-aided network* >>>> One option is to somehow detect server clusters, send data to one of >> them, >>>> and let the rest of the cluster servers synchronize to it (locally). >>>> Alternatively, the originator server can naively send stuff to all >> possible >>>> destination servers, regardless of the cost. >>>> >>>> *4) Routing p2p messages/events in a pure P2P system (5 parts)* >>>> How to manage to route all wave-stuff if we want to completely get rid >> of >>>> servers completely, and only use peers. >>>> The closest way would be to use a DHT, but huge latency is an unsolved >>>> problem, and makes it impossible to use for real-time waving. >>>> No other solution has been proposed. >>>> >>>> *5) Implementing "undo": invertibility, tombstones, edge cases, TP2* >>>> No server means no canonical order of commits, which means that undo is >>>> hard to do correctly. >>>> (uhm... not sure if that's a good summary, some stuff went over my head >>>> :-D, please read the log instead) >>>> >>>> *6) Usability of a pure p2p system in Real Life (tm)* >>>> Being pragmatic, pure P2P is probably only usable in peers with good >>>> connectivity. Rest of peers will need to rely on a server/proxy that >> *does* >>>> have good connectivity. >>> >>> *7) Comparison with BitTorrent and P2P-TV technologies* >>>> Both technologies are much less restricted than wave with regards to >>>> real-time responsiveness. So none are really a good reference for our >>>> purposes. >>>> >>>> *8) Identifying participants (3 parts)* >>>> Pure p2p means many peers don't have a n...@centralized-server.com user >>>> handle, so an alternative has to be used. >>>> However, it's easy to provide a traditional friendly handle, if the user >>>> prefers the tradeoff of having to often rely on a permanent server. This >>>> tradeoff can be mitigated by using a sort of userhandle cache. >>>> >>>> *9) P2P anonymity (lurking in a wave) (2 parts)* >>>> In a pure p2p wave network, anonymous peers may want to read a public >> wave, >>>> without other peers knowing. A solution could be to make private the >>>> required wavelets (where the anonymous participants IDs are stored). >>>> >>>> *10) Encryption of waves* >>>> It's been proposed to use an AES key to encrypt all the wave data, and >> only >>>> allow participants to decrypt it. >>>> >>>> *11) Addition and removal of participants, and their ability to read >> past >>>> and future wave versions/deltas* >>>> The aforementioned AES key can change over time, allowing a >> finer-grained >>>> restriction of what deltas new/removed participants can read. >>>> >>>> >>>> * >>>> * >>>> >>>> *Actual conversations:* >>>> *====================* >>>> * >>>> * >>>> *1) Underlying protocol for P2P federation:* >>>> [in response to Joseph's email] >>>> [23:42] <alown> I [...] agree with option 2 (make every root a JSON >> blob) >>>> [23:43] <alown> You haven't really detailed (at all) how the P2P >> federation >>>> is actually going to work (beyond 'not like IRC') >>>> [23:44] <josephg> Personally, I'd love some raw TCP action >>>> [23:44] <alown> I agree using KISS principle. >>>> [23:44] <josephg> a few years ago (not long after wave was cancelled) >> there >>>> was a 'wave summit' >>>> [23:45] <josephg> - and a few of us chatted about how we could make the >>>> federation protocol simpler >>>> [23:45] <josephg> we ended up (somehow) deciding that doing it over http >>>> woul dbe a good idea >>>> [23:45] <josephg> because then we could sneak it into companies past >> their >>>> corporate HTTP firewalls, etc >>>> [23:45] <josephg> but in any case, I'd like to figure out the protocol >> and >>>> (at least) have a TCP version >>>> [23:46] <josephg> it should be pretty easy to wrap the same messages in >>>> websockets if we want >>>> >>>> >>>> *2) **Message/event types needed for P2P federation to work:* >>>> [23:46] <alown> Do we need anything more complicated than the >>>> waveletSubmit/Commit messages used currently? >>>> [23:46] <alown> (Replace wavelet with 'abstract p2p ot container name) >>>> [23:46] <josephg> um, yeah. >>>> [23:47] <josephg> we'll also be able to rip out all the code that deals >>>> with managing the tree of servers per wave >>>> [23:47] <josephg> but yeah - the protocol will get a bit more >> complicated >>>> [23:47] <josephg> ... because we'll lose our beautiful integer version >>>> numbers >>>> [23:47] <josephg> so we'll need a protocol for syncronizing ops >>>> [23:48] <josephg> yeah - ops will each have a hash >>>> [23:48] <josephg> and two servers could each have ops the other server >>>> doesn't have >>>> [23:48] <josephg> so we have to be able to deal with that >>>> [23:47] <alown> What other 'events' are cared about by any particular >>>> server? >>>> [23:47] <alown> For a SHA hash? >>>> [23:48] <josephg> -> we'll need something like git's sync protocol >>>> [23:48] <alown> So, initial server contact is 'git clone', and then some >>>> form of 'git push' on changes? >>>> [23:49] <josephg> yep. >>>> [23:49] <josephg> push on changes is easy - its basically the same >> thing we >>>> have now >>>> [23:49] <josephg> just instead of saying "This should be applied at >> version >>>> 10" we say "This op has parents [abc123, def456]" >>>> >>>> >>>> *3) **Routing **p2p **messages/events in a server-aided network:* >>>> [23:49] <alown> With P2P do we have to broadcast to all peers? How do we >>>> coordinate that between them? >>>> [23:50] <josephg> between servers? I dunno. >>>> [23:50] <alown> How does BT handle this? >>>> [23:50] <josephg> should we just connect every server to every other >>>> server? That'd work fine... >>>> [23:50] <josephg> I guess every server can address every other server >>>> [23:50] <josephg> beacuse the wave will have al...@a.com and >>>> josephg@b.comand so on on it >>>> [23:50] <alown> This feels very inefficent... >>>> [23:51] <josephg> so if you submit an op to your server, your server >> can go >>>> "Oh, I need to tell b.com about this too" >>>> [23:51] <josephg> well, if there's 10 servers, presumably all 10 servers >>>> need to find out about ops somehow. >>>> [23:51] <josephg> - assuming we stick with the current model of having >>>> servers store all your operations >>>> [23:51] <josephg> .. and documents for all the users at their domain >>>> [23:51] <alown> But server 'b' and 'c' might both be part of a wave, but >>>> also know each other, and know that they are 'closer' to each other than >>>> 'a' is. So, we would want a->b/c then b<->c >>>> [23:52] <josephg> so actually, having the server which originates an >>>> operation send it to all the other servers on that wave is actually >> close >>>> to ideal. >>>> [23:52] <josephg> yeah maybe. >>>> >>>> >>>> *4) **Routing **p2p **messages/events in a pure P2P system (part 1):* >>>> [23:54] <alown> BT uses DHT for its P2P stuff... >>>> [23:54] <josephg> ...I guess we could use a DHT storing all the ops, but >>>> thats pretty slow >>>> [23:55] <josephg> and you still need to notify all servers with users on >>>> the wave that the wave was updated. >>>> [23:55] <alown> Maybe, or perhaps only notify those within a certain >>>> 'distance', with each server doing that. (Though could mean some servers >>>> are never updated) >>>> [23:58] <alown> Perhaps we could make the network setup 'SuperWaves' >> which >>>> broadcast to all peers, and carry all information, but normal wave >> servers >>>> do not reach this status? >>>> [23:58] <alown> By having it decide itself based on how 'connected' a >>>> server is, this could find the most efficent ways to route it. >>>> [00:01] <josephg> Do you think it'll really be a problem? >>>> [00:01] <josephg> I mean, thinking about it - how many servers will be >> on a >>>> given wave? >>>> [00:01] <alown> Depends. >>>> [00:01] <alown> No idea. >>>> [00:01] <josephg> If it were a public wave, I can imagine clients just >>>> connecting to one (or more) centralized servers >>>> [00:01] * josephg nods >>>> [00:02] <josephg> ... But say if we were having a conversation on >>>> wave-dev@apache, there's like, at most 20 people in a discussion from >> 5 or >>>> so domains >>>> [00:03] <josephg> ... I think we can deal with that kind of load. >>>> [00:04] <josephg> but if the protocol lets any server tell any other >> server >>>> about an operation, then it should be pretty easy to set up something >> like >>>> that. >>>> [00:04] <josephg> maybe. >>>> [00:04] * josephg thinks >>>> [00:05] <josephg> hm - you're right. I think I've just gotten used to >> the >>>> crappy state of doing routing for broadcasting messages to a network >>>> [00:05] <josephg> if you can find / think of a better solution, I'm in. >>>> [00:12] <alown> Heh, anyway replacing the network layer code SHOULD be >>>> easy, since it SHOULD be cleanly seperated. >>>> [00:13] <alown> Getting an initial implementation up using broadcast is >>>> fine. >>>> [00:13] <alown> (I was thinking of Wave's use in other apps as a reason >> you >>>> could have a lot of different participant domains) >>>> *...4) Routing **p2p **messages/events in a pure P2P system (part 2):* >>>> [08:53] <stenyak> as for the "how to *really* do p2p", i see two >> options: >>>> a) use a dht-like algorithm and/or b) use a helper server to route stuff >>>> for you >>>> [08:54] <stenyak> a) can be pretty slow if you want all OPs to reach all >>>> peers (if I'm not mistaken) >>>> [08:54] <stenyak> and b) is essentially makes it not-p2p >>>> [08:55] <stenyak> additionally, using p2p, how are we going to deal with >>>> routing problems (such as firewalls on both sides, etc)? >>>> [08:56] <stenyak> in my mind, the only universal solution is to have a >>>> third party server available to go through if we want speed or if we >> want >>>> to work on all edge cases >>>> [08:56] <stenyak> and wave being advertised as realtime, i don't see how >>>> something like dht can ever fly >>>> [11:20] <alown> stenyak: This is why I was wondering about a DHT system >>>> with 'Superwave' servers (to act as a first point of contact). >>>> [11:59] <stenyak> that would be like skype dynamic supernode list? >>>> [11:59] <alown> The original system, yes. >>>> [12:02] <stenyak> so we would devise a method to identify candidates to >>>> being a supernode, in order to prevent cellphone wave peers from >> becoming >>>> one, and in order to promot certain other nodes (like major peers that >> have >>>> 99% uptime, e.g. wave.google.com or whatever) to become one >>>> [12:03] <stenyak> bandwidth, latency, open ports, uptime... >>>> [12:04] <alown> Once a network has been bootstrapped using something, >> it is >>>> relatively easy to identify the hosts which are most densely connected >> (and >>>> would be good supernode candidates) >>>> [12:05] <stenyak> what do you mean with "using something"? >>>> [12:06] <alown> Somehow the network has to initially be able to make >>>> contact with other nodes (before it knows anything about them) >>>> [12:07] <alown> For a LAN you could get away with a broadcast >> 'announce', >>>> but it is a bit less clear on an internet-sized scale. >>>> [12:08] <stenyak> bittorrent sync uses a broadcast for LAN. for >> internet it >>>> uses a tracker server for fast discovery of peers, or you can disable >> that >>>> and force to use DHT (with the long wait that means) >>>> [12:09] <stenyak> the tracker can also act as a meeting-point for >>>> firewalled peer pairs (which in my experience is a lot of them) >>>> [12:09] <alown> Precisely the problem, because we don't really want long >>>> waits or trackers. >>>> *...4) Routing **p2p **messages/events in a pure P2P system (part 3):* >>>> [12:42] <stenyak> hmmm... i'm not sure how a peer gets a list of waves >> in >>>> which he's a participant of >>>> [12:43] <alown> Having a canonical source makes it all so much easier. >> :P >>>> [12:44] <stenyak> for pure p2p peers to "receive" new waves, either the >>>> FROM or the TO peer (or both) would need to try to find their way to the >>>> other >>>> [12:44] <stenyak> and we're assumign here that each person only runs one >>>> peer >>>> [12:45] <stenyak> e.g. my privatekey may be used by 5 wave peers at the >>>> same time, and we must make sure the new wave reaches all of them >>>> [12:46] <alown> Looks like we may need to have mulitple DHTs then (one >> for >>>> ops, one for waves) >>>> [12:46] <stenyak> in BT, it's the receiver end who actively looks for >> peers >>>> to receive from. in wave, it's not like that.. >>>> [12:46] <alown> Or could we have a pubkey->wave mapping in one? >>>> [12:46] <stenyak> and in BT, you can assume *many* people has the data >> you >>>> want >>>> [12:46] <stenyak> in wave, its possible and probably that only one other >>>> peer in the universe has the wave >>>> [12:46] <stenyak> (because it's a personal wave sent to you) >>>> [12:47] <alown> I would expect any long-running supernodes to be >> implicitly >>>> part of all waves they know about. >>>> [12:47] <alown> Though on second thought, this seems like it would add >> its >>>> own problems to authentication, storage, promotion of supernodes etc. >>>> *...4) Routing **p2p **messages/events in a pure P2P system (part 4):* >>>> [12:51] <alown> Does it make sense for a peer to have your privkey, >> since >>>> you could be logged in anywhere, so it would be down to the place you >> are >>>> logged in, to 'subscribe' to that wave on the network, and attempt to >>>> retrieve all data from it... >>>> [12:55] <alown> I was expecting the network as a whole to act like a >>>> WaveBus pubsub system, whereby once 'logged in' at some server (which >> means >>>> it gets your privkey from the authentication system), that server then >>>> 'subscribes' to your waves on the 'network'. If somebody else at some >> other >>>> server changes it, then that server would be announcing to the network >> of a >>>> change (doesn't necesserily have to be a broadcast), which your server >>>> would 'hear'. >>>> [12:56] <alown> You could do this from any server where you logged in >>>> (hence the concept of a domain is lost). >>>> [12:57] <stenyak> by "server" you mean supernodes? >>>> [12:57] <alown> Not necessarily. >>>> [12:59] <stenyak> this pubsub network must be aware of nodes that are in >>>> it, in order to directly route wave updates to them, correct? >>>> [12:59] <stenyak> and also, this network wouldn't be very volatile, but >>>> would rather ideally be long-lived peers? >>>> [13:00] <alown> It has no reason to have to directly route updates, >> (though >>>> it would hopefully be able to identify the best routes automatically). >>>> [13:00] <alown> Yes it would require a few long-lived peers (which >> would be >>>> part of the requirement to be a supernode). >>>> [13:01] <stenyak> so let's say i connect my laptop wave peer to the >>>> "server" in the living room, at my firewalled home. this "server" would >> be >>>> already subscribed to the pubsub network, and in this specific case it >>>> would route all wave updates to me >>>> [13:02] <stenyak> in other cases (let's say, ipv6-enabled nodes >> everywhere, >>>> no firewall at home), the living room server could simply notify the >>>> original "FROM" peer to send stuff to my laptop ipv6 ip, right? >>>> [13:03] <alown> That sounds right. Supernodes are really only needed for >>>> getting the routing right. >>>> [13:05] <stenyak> ok. in both these theoretical cases, the "server" >> hasn't >>>> necessarily been a wave node per se (nor a supernode either), but >> rather a >>>> second type of wave node that helps get stuff quickly wherever it's >> needed >>>> [13:05] <alown> Yes. >>>> [13:05] <alown> I am not even sure where OT should be happening in this >>>> picture... >>>> [13:05] <stenyak> if OT happens, the "server" is a blind proxy i think >>>> [13:06] <stenyak> so does not need the privkey to work >>>> [13:07] <stenyak> unless we're also using OT in the wavebus pubusb >> network >>>> for some reason? >>>> [13:07] <alown> Supernodes can be blind (though they might also just be >>>> normal well-connected wave servers). I would expect normal servers to >> still >>>> be doing OT. The question is whether the 'client' (whatever that means) >>>> should be doing it also. >>>> [13:08] <alown> The network shouldn't need OT. (Algorithms exist that >> allow >>>> the incoming ops to be arbitarily queued and only processed when >> needed). >>>> [...] >>>> [21:21] <josephg> alown: the client always needs to do OT because >> otherwise >>>> they can't both edit a document live and receive operations from people >> who >>>> didn't have their ops. >>>> [21:22] <josephg> the server doesn't need to do OT, although if it >> doesn't >>>> do OT, it'll punt the OT work to its clients - which will result in a >>>> higher CPU utilization on mobile devices. >>>> [...] >>>> [13:08] <stenyak> i pictured this "server" as being an optional item >> that >>>> shortcuts the long waits of DHT, rather than something necessary for >>>> "clients"? >>>> [13:08] <alown> Hmm. >>>> [13:08] <alown> I suppose we should define what a 'client' is then... >>>> [13:09] <alown> We have at least 2 layers of stuff going on here: 1) >> Wave >>>> OT/operation layer 2) Network routing/P2P layer >>>> [13:13] <alown> But it is quite plausible something might be doing both >> of >>>> those >>>> [13:10] <stenyak> with your pubsub net suggestion, i was picturing 2 >> kinds: >>>> a regular pure p2p peer, and a helper kind of node to route stuff >> quickly >>>> when a peer is connected to it >>>> [13:13] <stenyak> so with that picture in mind, layer 1 stuff could go >>>> directly from peer to peer (if connectivity/firewalls allows), or >> through >>>> the "helper node" if available >>>> [...] >>>> [13:20] <stenyak> [...] all this discussion looks very similar to >>>> discussing how to design internet+dns, i think the problems are the same >>>> really >>>> [13:20] <stenyak> or at least we could take some inspiration from it >> maybe >>>> [13:20] <alown> This was my conclusion last night with josephg. ('The >>>> problmes should already be solved (see The Internet)') >>>> [14:09] <stenyak> and The Internets solved the problem how? By having a >>>> large set of supernodes (dns servers), that may take a whole day to >>>> propagate updates. The alternative being having the actual IP address in >>>> the first place, or to centralize stuff >>>> [14:10] <stenyak> (aka use servers everywhere) >>>> [14:22] <alown> Maybe, but the internet's design is X (where X > 20) >> years >>>> old, so may not represent the most modern thinking of how to make >>>> distributed networks. >>>> [14:59] <alown> (Don't forget that our aim for Wave is at the >> cutting-edge >>>> of academic research also). >>>> [...] >>>> [14:50] <stenyak> i just threw the question at some friends who should >> be >>>> more up-to-date with networking technologies than me... hopefully they >>>> comeback with some revolutionary dns-2 design or something that we can >> copy >>>> [15:18] <stenyak> could give as some ideas: http://openpeer.org/ >>>> [15:18] <stenyak> (it's not a solution, but maybe they did the same >>>> reasoning we're going through) >>>> [15:46] <stenyak> another response i got goes along the lines of... >> hard as >>>> fuck, but if you manage to do it, you are a hero >>>> [...] >>>> [15:02] <stenyak> looking at it from a wider perspective, what we want >> is >>>> similar to having each peer shout at the whole world "here i am, >> anything >>>> got something for meeee?" in some way that doesn't clog the internet >> tubes, >>>> and that is so fast as shouting would be. i start to think it's not >>>> physically possible to do that... >>>> [15:03] <stenyak> if publickeys were handed to people based on the >>>> location, then we could have routing tables similar to how internet >>>> currently works >>>> [15:03] <stenyak> but pubkeys are... well, random. so that kind of >> routing >>>> that allows anyone to connect to an arbitrary IP in a matter of >>>> milliseconds is impossible, i believe >>>> [15:04] <alown> So, we end up with DNS for public keys? >>>> [15:04] <stenyak> something like dns, but much faster [wrt. propagation >>>> times] >>>> [15:05] <stenyak> so in essence, a tree of servers or whatever (which is >>>> similar to how wave currently works, right?) >>>> [15:05] <alown> Heh. But the whole point was to avoid the tree system >>>> currently (since it is susceptible to netsplits) >>>> [...] >>>> [15:56] <stenyak> maybe the real question could be: how do we make DHT >> much >>>> faster? >>>> [16:14] <stenyak> once the initial discovery process is finished, the >>>> transmission of data will not have the lag associated with DHT, so even >> if >>>> DHT takes 10 seconds, that could be acceptable >>>> [16:15] <stenyak> i.e. a new peer takes 10 seconds to be discovered by >> the >>>> rest of participants collaborating in a wave >>>> [16:16] <stenyak> (or viceversa.. the new peer takes 10 seconds to >> discover >>>> the participants) >>>> [...] >>>> [16:25] <stenyak> this could shed some light: >> http://en.wikipedia.org/wiki/Distributed_hash_table#Algorithms_for_overlay_networks >>>> [19:06] <stenyak> http://dsn.tm.kit.edu/english/2936.php >>>> *...4) Routing **p2p **messages/events in a pure P2P system (part 5):* >>>> [21:03] <josephg> [...] For now, I want wave to be p2p in the same way >> that >>>> git is p2p. >>>> [21:04] <josephg> that is, I want the core algorithms & data structures >> to >>>> use P2P-capable algorithms, and probably the wave servers will do p2p >>>> between themselves (this is easy because they'll all be both named and >>>> accessable) >>>> [21:06] <josephg> as for client-to-client p2p, there's a few options >>>> depending on what kind of use cases we want to support - but I want to >>>> worry about getting the algorithms p2p-capable first. If you're keen to >> set >>>> up an anonymous, distributed wave system over a DHT - well, I want to >> first >>>> make that possible >>>> [21:15] <josephg> .... and as for ipv6, network admins _love_ NAT now >> that >>>> we have it >>>> >>>> >>>> *5) Implementing "undo": invertibility, tombstones, edge cases, TP2:* >>>> [00:17] <alown> I am not sure how an 'undo stack' is going to work (at >> all) >>>> with federation... >>>> [00:18] <josephg> well, you just do undo at the application level >>>> [00:19] <josephg> "submit op which inserts text" ... later "submit op >> which >>>> removes text" >>>> [00:19] <josephg> you don't need OT for that. >>>> [00:20] <josephg> I imagine like, a semantic undo. In the client you can >>>> imagine making an undo op (which might not necessarily rollback an >>>> operation (because of tombstones and all that)) >>>> [00:20] <josephg> ... but would seem that way as far as the user is >>>> concerned >>>> [00:21] <josephg> then if the user hits ctrl+z, you can transform that >>>> operation up to the current version and apply it >>>> [00:21] <josephg> - the fact that its an undo isn't really relevant. >>>> [00:21] <josephg> the bad thing about losing invertability is doing >>>> playback >>>> [00:21] <josephg> - because you can't scrub back through time >>>> [00:21] <alown> But you have all the operations since the start, so you >> can >>>> play forward at least? >>>> [00:23] <josephg> yeah exactly. >>>> [00:23] <josephg> ... and make like, keyframes of the document >>>> [00:23] <josephg> - and play forward from them or something. >>>> [00:23] <alown> Hmm, so you can do the step-back without recalculating >> the >>>> entire document? >>>> [00:24] <alown> I don't really like the idea of then having another >>>> datastructure to have to pass around... >>>> [00:24] <josephg> right - if you have a snapshot at version 1000, and >> the >>>> user is looking at 1010 and they try to step back to 1009, you can just >>>> replay ops 1001-1009 on that version 1000 snapshot >>>> [00:24] <alown> What was the problem with invertible operations (I don't >>>> understand OT enough yet to be able to properly comment on that side). >>>> [00:25] <alown> (Other than it confuses people?) >>>> [00:25] <josephg> hahaha actually people seem to love invertability >>>> [00:25] <josephg> I don't know why. >>>> [00:25] <josephg> I've been trying to remove it from sharejs, and >> everyone >>>> gets sad. >>>> [00:26] <josephg> the problem is that if I make an op which deletes the >>>> whole document (version 100, say) then I undo that operation >>>> [00:26] <josephg> and you insert in the middle of the document at >> version >>>> 100, then your op gets transformed to do that insert at the start of the >>>> document instead at version 101 (because the content has disappeared) >>>> [00:26] <josephg> and it never goes back to the middle of the document. >>>> [00:27] <josephg> so, with tombstones you can get around that by having >> a >>>> 'resurrect' operation >>>> [00:27] <josephg> (so deleting the whole document turns the whole >> document >>>> into tombstones, then we can resurrect them all again in the inverse) >>>> [00:28] <josephg> but you can't invert an insert - because deleting >> leaves >>>> the tombstone there >>>> [00:28] <josephg> and if you have a 'real delete' operation, then yeah, >>>> you're back in the hole >>>> [00:28] <josephg> also, with wave in particular, inverting is really >>>> complicated >>>> [00:29] <josephg> - see, if the wave says "<annotation bold:true>blah >>>> blah<annotation bold:false> not bolded" >>>> [00:29] <josephg> then if you insert at the end of the "blah blah", >> it'll >>>> automatically get bolded. >>>> [00:30] <josephg> ... so if the text isn't bolded, and then you bold it >>>> while I insert at the end of the text, you need to make sure my text >>>> _isn't_ bolded or something >>>> [00:31] <josephg> .... and yeah, I can't remember - but there's these >>>> horror cases that I remember kept me from sleeping when I tried to >>>> reimplement wave's OT code in C >>>> [00:31] <alown> hmm >>>> [00:31] <josephg> and it would have been fine if it wasn't invertible. >>>> Well, at least it would have been tollerable. >>>> [00:33] <josephg> So yeah. Conclusion: You can make invertability work, >> but >>>> its kind of a bitch, and you can't make it work for TP2 >>>> [00:33] <josephg> which means it won't work if we're federating >>>> [00:33] <alown> How are we hacking around that currently then? >>>> [00:33] <josephg> well, we don't do TP2 >>>> [00:34] <josephg> remember, federation just uses a bad version of the >>>> current client-server protocol >>>> [00:34] <josephg> - arranged in a tree of servers >>>> [00:34] * alown goes and looks up which one TP2 was again >>>> [00:35] <josephg> ... its the one that says you don't need a canonical >>>> ordering of operations >>>> [00:35] <josephg> sharejs and wave both use the server to pick the >> order of >>>> operations (based on which order they reach the server) >>>> [00:35] <josephg> and then they use incrementing version numbers based >> on >>>> that order >>>> [00:35] <alown> ah yep. >>>> [00:35] <josephg> -> for p2p, that doesn't work because you don't have a >>>> centralized server, and anyone can send messages to anyone >>>> [00:36] <josephg> and yeah, you need TP2 for that (which sort of says >> you >>>> can apply ops from 3 different sites in any order and it still works) >>>> [00:37] <josephg> - and apparently someone proved that if you make it >> work >>>> for 3 sites, it works for any number of sites >>>> [00:43] <alown> Anyhow, I can see leaving inversion out for simplicity, >> but >>>> don't yet understand why it can't be made to work with TP2. >>>> [00:59] <alown> Hmm. Seen 'A Sequence Transformation Algorithm for >>>> Supporting Cooperative work on Mobile Devices'? >>>> [01:02] <josephg> >> http://research.microsoft.com/en-us/um/redmond/groups/connect/cscw_10/docs/p159.pdf >>>> ? >>>> [01:15] <alown> The main feature is its use of storing local/remote >>>> operations and processing them much later than receipt time. >>>> [01:17] <alown> ABT satisfies TP1+2, so looks like this should(?) >>>> [01:19] <josephg> need to read it >>>> [01:19] <josephg> ... I'll go through it later >>>> >>>> >>>> *6) Usability of a pure p2p system in Real Life (tm):* >>>> [12:13] <alown> We also don't know if storing ops in a DHT is efficent >>>> enough for our use case... >>>> [12:14] <stenyak> in any case, let's say i fire up my wavep2p android >>>> client and want to check for any new waves >>>> [12:14] <stenyak> i definitely won't put up with a wait of 30 seconds >> when >>>> i have "this damn fast 4g connection!" in my cellphone >>>> [12:14] <stenyak> i mean, that's the point of view of six pack joe >>>> [12:14] <stenyak> and joe is definitely right.. >>>> [12:15] * alown thinks of the hours it took to download the bitcoin >>>> blockchain from the p2p system >>>> [12:15] <stenyak> or browse through freenet, or whatever... its painly >> slow >>>> [12:16] <stenyak> in the end, i think that most users won't be running a >>>> full blown peer, but will be relying on an external server instead >>>> [12:16] <stenyak> i.e. nobody runs their own email servers nowadays >>>> [12:16] <stenyak> and the same can happen with wave >>>> [12:16] <alown> Should a mobile client be doing the full p2p >> federation, or >>>> simply talking to a server which does it... >>>> [12:16] <stenyak> the few who decide to run a full-blown wave peer, >> should >>>> be aware of the problems >>>> [12:17] <alown> So, this should be less of a problem since the only >> nodes >>>> doing p2p will be proper full-time connected servers? >>>> [12:17] <stenyak> the thing is, we can assume most people wont fire up >>>> their own xmpp server, but go for jabber.org account >>>> [12:17] <stenyak> and the same thing will presumably happen for wave, >>>> simply because it's easier to do >>>> [12:18] <stenyak> which doesn't pervent me from running my own >> full-blown >>>> wave server >>>> [12:18] <stenyak> but that's a use case in which the user knows the >>>> limitations >>>> [12:19] <stenyak> [...] you and i will run several full-blown wave >> peers at >>>> home, at our parent's house, or whatever, but we'll know and accept the >>>> problems >>>> [12:19] <stenyak> i think that's the way to think about the problem >>>> [12:19] <stenyak> heck, most people use github for permanent [git] >>>> connectivity ;-) >>>> [12:19] <stenyak> instead of opening ports to their laptop in their lan >>>> [12:19] <stenyak> and those are the tech-savvy people... >>>> [12:20] <alown> So, we have a p2p system between wave servers and >> superwave >>>> servers, with clients connecting to the server rather than doing the p2p >>>> itself... >>>> [12:20] <stenyak> i'm not saying it's the way we should do it. i'm >> saying >>>> that's the way it most probably will pan out, because it's already >>>> hapennign in 100% of the existing p2p protocols i know of >>>> [12:20] <alown> Hmm... >>>> [12:21] <stenyak> so we should plan for that instead of a theoretical >> pure >>>> p2p world >>>> [12:21] <stenyak> if we assume there's servers like github, bitbucket >> and >>>> sourceforge, then suddently most of the problems go away, while still >> not >>>> preventing from people to run fully p2p if they want >>>> >>>> >>>> *7) Comparison with BitTorrent and P2P-TV technologies:* >>>> [12:21] <alown> BT doesn't have huge servers (and with magnet has >> actually >>>> move in the opposite direction). >>>> [12:21] <stenyak> BT has no real-time needs >>>> [12:22] <stenyak> that's why they can afford DHT >>>> [12:22] <stenyak> dht could be used for simulating a forum-like >> discussion >>>> in wave. but we can't force that restriction from the server >>>> [12:22] <stenyak> (i say forum-like, because people don't expect >> reaction >>>> within seconds there) >>>> [12:23] <alown> How did iplayer do its live p2p broadcastinºg? >>>> [12:23] * stenyak googles what iplayer is >>>> [12:23] <alown> Sorry, BBC iPlayer is their TV-over-the-internet system. >>>> [12:24] <alown> Originally it used a p2p system, but got lots of >> negative >>>> press (because of assosciation with BT since it used p2p), so it now >> uses a >>>> centralized system instead. (And their bandwidth costs are much higher). >>>> [...] >>>> [12:25] <stenyak> i seem to recall other [p2p] tv clients >>>> [12:25] <stenyak> >> http://wiki.xbmc.org/index.php?title=HOW-TO:Play_free_P2P_(peer-to-peer)_online_streaming_TV >>>> [...] >>>> [12:26] <alown> Found a paper titled "RT-P2P: A Scalable Real-Time >>>> Peer-to-Peer System with Probabilistic Timing Assurances" (google for >> it) >>>> [12:28] <alown> Lookt at the paper I mentioned. It relies on 'super >> nodes' >>>> to enable it to keep low latencies... >>>> [...] >>>> [12:27] <stenyak> but i'd be wary of using this (p2p tv) as an >> inspiration. >>>> i know there's delay of 10-30 seconds from my TV Formula1 image to the >>>> telemetry that comes through HTTP from formula1.com website. this is >>>> regular TV, and they don't care about 30 seconds of lag >>>> [12:27] <stenyak> the only real problem of p2p tv is avoiding much >> jitter >>>> [12:27] <stenyak> as long as the stream arrives and is viewable, a >> delay of >>>> a minute doesn't matter that much >>>> [12:28] <alown> True. >>>> >>>> >>>> *8) Identifying participants (part 1):* >>>> [12:09] <alown> I am also no longer sure what an 'account' should look >>>> like, since it has no reason to be stuck to a domain... >>>> [12:10] <stenyak> current wave discovery works by using the domain name >> of >>>> the email-address-like list of participants >>>> [12:10] <stenyak> but here we're talking about hashes, public keys or >>>> whatever >>>> [12:10] <stenyak> which do not (necessarily) point to an particular >> IP:PORT >>>> or whatever >>>> [12:10] <alown> Exactly the problem... >>>> *...8) Identifying participants (part 2):* >>>> [12:33] <stenyak> would it make sense that, while some participants are >>>> identified by a pubkey (or whatever), many of them could be identified >> by a >>>> user@domain address, with which any peer can quickly locate supernodes? >>>> [12:33] <stenyak> i mean some kind of dual "pubkey and optional domain >>>> email-like addr" for the participants list >>>> [12:34] <stenyak> the optional part being essential in the broader >> internet >>>> [12:34] <alown> Isn't that exactly what using Mozilla Persona would do >> (map >>>> user@domain to some public-key we can use) >>>> [12:34] <alown> Removing the need for us to have to roll yet-another >>>> authentication system. >>>> [...] >>>> [12:38] <stenyak> the idea would be that, for a person to be a >> participant >>>> in a wave, you *require* his pubkey. optionally, you may have acquired >> ths >>>> pubkey by asking "wave.google.com" about the user "joe", getting his >>>> pubkey >>>> as a result. >>>> [12:39] <stenyak> and now that you have the pubkey and one of many >> possible >>>> email-like addresses (in this case j...@wave.google.com), then you can >> use >>>> the email-like address for displaying in the UI >>>> [12:39] <stenyak> this means that, whoever wants to run pure p2p peers, >>>> will have to give his pubkey >>>> [12:39] <stenyak> and whoever uses the more traditional style, can >> simply >>>> give his email-like addr >>>> [12:39] <stenyak> and the participants list will show a simple >> email-like >>>> address most of the time >>>> [12:40] <alown> Do we then allow anyone to 'log in' to any wave server >>>> running at any domain, since it should no-longer make any difference >> where >>>> they are in the network... >>>> [12:41] <stenyak> yes, that's needed for world-wide-public waves, which >> is >>>> equivalent to a read-only forum on the net >>>> [12:41] <stenyak> then there could be server-public waves, which is >>>> equivalent to requiring sign-in to view a forum (and coincidentally the >>>> current implementation of public waves in WiaB, right?) >>>> [12:43] * alown has never tested what happens with public waves in the >>>> current federation system >>>> *...8) Identifying participants (part 3): >>>> * >>>> [21:35] <josephg> - Who is a user? If a user is sten...@example.com, >> then >>>> we can put a server at example.com and it can hold operations for you >>>> [21:36] <josephg> ie, if I add you to a wave, my computer (or my wave >>>> server or something) can send a message to example.com to say "Yo, >> here's >>>> some ops you should know about" >>>> [21:36] <josephg> that would be similar to a mailbox >>>> [21:37] <josephg> ... and it would work pretty well. Bear in mind that >>>> there's no reason operations have to go through the wave server at >>>> example.com - if we're both on a LAN together, we could discover one >>>> another through DNS service discovery and send ops directly >>>> [21:37] <josephg> .. without going through our respective wave servers >>>> [21:38] <josephg> However - if our identities aren't tied to a domain >> (eg >>>> bitcoin), then we'll need to use a dht or something. >>>> [21:42] <stenyak> the conclussion i've arrived at is that "users" >>>> ultimately are a publickey (for which they have the privatekey). this is >>>> inconvenient for people to "add you to a wave", so a possibility would >> be >>>> to have a friendlyname=>pubkey server converter. this way people can >> add " >>>> sten...@example.com", by first finding out what the pubkey for >>>> sten...@example.com really is >>>> [21:43] <stenyak> the friendlyname would be optional, and in LAN >>>> environments you could directly use the pubkey (instead of the friendly >>>> name) >>>> [21:43] <josephg> I think people will be more than happy to use a >> frienly >>>> name in a lan environment too >>>> [21:43] <stenyak> discovery in a local network could be done with >> bonjour >>>> or something too (not just dns) >>>> [21:44] <josephg> I <3 dns-sd >>>> [21:44] <stenyak> [...] maybe they already have a contact list (read, >> list >>>> of friendlyname<>pubkey equivalences) they can use in the UI (even if >> the >>>> underlying system will use pubkeys anyway) >>>> [21:44] <stenyak> and by contact list, i really mean a cache of some >> sort >>>> [21:45] <stenyak> (not some specific, complex roster system) >>>> [21:45] <josephg> and you can do friendlyname -> pubkey really easily by >>>> just storing the pubkey on the user's domain >>>> [21:45] <josephg> so, have the example.com webserver host >>>> https://example.com/.wellknown/stenyak >>>> [21:46] <josephg> = your public key. >>>> >>>> >>>> *9) P2P anonymity (peers that want to anonymously lurk in a wave) (part >>>> 1):* >>>> [12:48] <stenyak> by the way, what about non-participants that simply >> want >>>> to lurk a wave? >>>> [12:49] <stenyak> e.g. i'm given a wave uri >>>> (wave://look_at_these_kittens_wave), and want to view it >>>> [12:49] <alown> Whilst a wave is public, as soon as they 'read' the >> wave, >>>> they would have a metadata wavelet created, so would become a >> participant >>>> (if read-only). >>>> [12:50] <stenyak> and from then on, whenever the wave changes, someone >> will >>>> try to make the change reach the peers with my privkey >>>> [12:50] <stenyak> supposedly.. >>>> *...9) P2P anonymity (peers that want to anonymously lurk in a wave) >> (part >>>> 2):* >>>> [21:18] <josephg> stenyak: interesting point about people who want to >> not >>>> participate but follow a wave anyway - its really bad if other people >> can >>>> tell that they're there (assuming the wave is public). >>>> [21:18] <josephg> I guess we just need to make sure that the metadata >> wave >>>> is invisible, and then its ok.. >>>> [21:21] <stenyak> invisible.. to what peer/s? surely those that are >>>> transmitting deltas to the lurkers will need to know they exist? >>>> [21:21] <stenyak> (maybe some of the algorithms behind freenet can help >>>> with this) >>>> [21:21] <stenyak> (or even TOR) >>>> >>>> >>>> *10) Encryption of waves:* >>>> [21:47] <josephg> for waves themselves, I'm imagining giving each wave >> an >>>> AES key >>>> [21:47] <josephg> then storing an encrypted version of the key for each >>>> participant on the wave >>>> [21:48] <josephg> .... anyway, that way anyone who has the AES key can >> read >>>> all ops on the wave >>>> [21:48] <josephg> and can participate (because they can encrypt ops for >> the >>>> wave) >>>> >>>> >>>> *11) Addition and removal of participants, and their ability to read >> past >>>> and future wave versions/deltas:* >>>> [21:48] <stenyak> what about removing a user from a wave? >>>> [21:49] <josephg> worst case, we can just make a new key and re-add >>>> everyone using the new key >>>> [21:49] <josephg> and keep around the old key too >>>> [21:49] <josephg> so people can still read the old ops as well >>>> [21:49] <stenyak> the user can access their browser cache for all we >> care. >>>> if you ever read it, there will be ways to do it. "download now >> wave-spy to >>>> read waves you were removed from!" >>>> [21:49] <stenyak> so providing an official way sounds better >>>> [21:50] <stenyak> the AES key could change at any point in time, e.g. >>>> whenever a new users is added (to prevent them accessing the history), >> or >>>> deleting them (to prevent them from reading future history) >>>> [22:32] <josephg> um - in wave, we let new users see the whole history >>>> [22:40] <stenyak> but that use case could be desirable, right? and if we >>>> support modification/versioning of the AES key, we might as well allow >> that >>>> too? the equivalent in email world would be to forward an email, >> removing >>>> the existing quotes >>>> [23:17] <josephg> Yep definitely. >>>> >>>> >>>> -- >>>> Saludos, >>>> Bruno González >>>> >>>> _______________________________________________ >>>> Jabber: stenyak AT gmail.com >>>> http://www.stenyak.com >>