Re: IRC discussion on P2P waving

Michael MacFadden Fri, 21 Jun 2013 19:05:25 -0700

MY humble opinion on the P2P issue is as follows.  I think if we develop
algorithms that can work in a P2P mode, then we can also support a
client/server architecture as well by just controlling who the peers talk
to.  I think the problem with wave was not the client server architecture,
but rather the way the servers interacted with each other.  The servers
themselves implemented something like a client/server relationship within
federation.  This meant that even if you were connected to your local wave
server, if that wave server could not communicate with the wave server
that initiated the wave, you were out of luck.


I am not against having servers at all.  In fact I think that things get
very complicated if you have no servers what so ever (document storage,
discovery, users, etc.).  But if we need to make sure servers are peers.
So we need a P2P style OT algorithm.  Again do not confuse a P2P network
topology (DHT, etc) with P2P OT Algorithms.  A P2P OT Algorithm  can also
easily be made to behave like a Client/Server OT algorithm, where as the
reverse is not feasible.

~Michael

On 6/21/13 5:46 PM, "Sam Nelson" <so...@orcon.net.nz> wrote:

>Wow, that was some heavy reading (:
>
>This section raised some questions for me:
>
>*4) Routing p2p messages/events in a pure P2P system (5 parts)*
>How to manage to route all wave-stuff if we want to completely get rid of
>servers completely, and only use peers.
>The closest way would be to use a DHT, but huge latency is an unsolved
>problem, and makes it impossible to use for real-time waving.
>No other solution has been proposed.
>
>My question is simply, perhaps naively: why pure p2p?
>Originally when I heard of p2p OT I saw it as a way to collaboratively
>work offline in a LAN environment, and to sync pairs that are almost
>always offline, by means of a proxy peer that moves between WAN and the
>offline LAN. The peers would talk in much the same way that two
>federating servers would, using their offline caches as a datasource
>instead I'm guessingthis is like the MESH network John refers to.
>
>When talking about P2P between peers /over the internet/ - could
>somebody please explain to be the purpose of and vision for this? Why
>not just use a server, it seems to simplify things alot? (Firewalls,
>authentication - can do offline like Windows 8 does with Microsoft
>Account)
>
>Is purep2p just for privacy? Or is it really for alternate uses of the
>protocol - other than the the documents and conversation use cases we
>saw in Google Wave?
>
>Just an idea, in order to "open the eyes" of those drawn to this mailing
>list, might it be beneficial to build up a wiki page of accepted use
>cases so that everyone can read them and take them into account when
>considering different ideas? That'd facilitate discussions like "well
>this works for all our use cases except #13.... <discussion ensues about
>this case>"
>
>Sam
>
>
>On 22/06/2013 01:06, John Blossom wrote:
>> Bruno,
>>
>> Thanks, this is an excellent summary. It helps me to get the gist of
>>things
>> more clearly.
>>
>> On the P2P latency, I don't think that it would be unacceptable to draw
>>a
>> line and say that P2P provides limited, non-guaranteed realtime OT or
>>that
>> it's not realtime OT and more of a syncing mode than a conversation
>>mode.
>> That would probably be sufficient for what needs to be done, especially
>> since in some instances P2P-enabled Wave sessions may be using MESH
>> networks for transport - a key factor in how a lot of experimental
>> communications services are being deployed in developing nations (not
>>just
>> the Project Loon concept). In the MESH model, you're likely to have one
>> node within range of another temporarily, which may sync with it, and
>>then
>> pass along data to another node when it comes in range of it. That's the
>> most probable scenario for P2P in many instances, I would think. The
>>other
>> potential scenario: two people in a remote location, for the sake of
>> argument two movie script-writers who have holed themselves up in a
>>remote
>> location to collaborate on a common script. They're on two devices that
>>are
>> very proximate to one another, so perhaps the latency issues will not
>>be so
>> severe.
>>
>> Things to think about, I will look at this more carefully later today.
>>
>> All the best,
>>
>> John Blossom
>>
>> On Fri, Jun 21, 2013 at 8:05 AM, Bruno Gonzalez (aka stenyak) <
>> sten...@gmail.com> wrote:
>>
>>> Following Joseph's "A Very Wavey Plan (P2P!)" thread, a couple of
>>> discussions have taken place at the irc.freenode.net #wiab channel, all
>>> related to P2P.
>>>
>>> I've taken the liberty to restructure the IRC logs, remove some
>>>chitchat,
>>> and divide it into sub-discussions. Feel free to reply to any part of
>>>this
>>> email to continue a discussion.
>>>
>>>
>>> *Summary of discussions:*
>>> *====================*
>>> *1) Underlying protocol for P2P federation*
>>> Currently XMPP is used. HTTP and raw TCP are two suggested candidates
>>>(HTTP
>>> allowing to much more easily reach restricted networks).
>>>
>>> *2) Message/event types needed for P2P federation to work*
>>> We'd need something similar in concept to certain git operations (git
>>> clone, git push...). All will be based on hashes (not incremental
>>> integers).
>>>
>>> *3) Routing p2p messages/events in a server-aided network*
>>> One option is to somehow detect server clusters, send data to one of
>>>them,
>>> and let the rest of the cluster servers synchronize to it (locally).
>>> Alternatively, the originator server can naively send stuff to all
>>>possible
>>> destination servers, regardless of the cost.
>>>
>>> *4) Routing p2p messages/events in a pure P2P system (5 parts)*
>>> How to manage to route all wave-stuff if we want to completely get rid
>>>of
>>> servers completely, and only use peers.
>>> The closest way would be to use a DHT, but huge latency is an unsolved
>>> problem, and makes it impossible to use for real-time waving.
>>> No other solution has been proposed.
>>>
>>> *5) Implementing "undo": invertibility, tombstones, edge cases, TP2*
>>> No server means no canonical order of commits, which means that undo is
>>> hard to do correctly.
>>> (uhm... not sure if that's a good summary, some stuff went over my head
>>> :-D, please read the log instead)
>>>
>>> *6) Usability of a pure p2p system in Real Life (tm)*
>>> Being pragmatic, pure P2P is probably only usable in peers with good
>>> connectivity. Rest of peers will need to rely on a server/proxy that
>>>*does*
>>> have good connectivity.
>>>
>>> *7) Comparison with BitTorrent and P2P-TV technologies*
>>> Both technologies are much less restricted than wave with regards to
>>> real-time responsiveness. So none are really a good reference for our
>>> purposes.
>>>
>>> *8) Identifying participants (3 parts)*
>>> Pure p2p means many peers don't have a n...@centralized-server.com user
>>> handle, so an alternative has to be used.
>>> However, it's easy to provide a traditional friendly handle, if the
>>>user
>>> prefers the tradeoff of having to often rely on a permanent server.
>>>This
>>> tradeoff can be mitigated by using a sort of userhandle cache.
>>>
>>> *9) P2P anonymity (lurking in a wave) (2 parts)*
>>> In a pure p2p wave network, anonymous peers may want to read a public
>>>wave,
>>> without other peers knowing. A solution could be to make private the
>>> required wavelets (where the anonymous participants IDs are stored).
>>>
>>> *10) Encryption of waves*
>>> It's been proposed to use an AES key to encrypt all the wave data, and
>>>only
>>> allow participants to decrypt it.
>>>
>>> *11) Addition and removal of participants, and their ability to read
>>>past
>>> and future wave versions/deltas*
>>> The aforementioned AES key can change over time, allowing a
>>>finer-grained
>>> restriction of what deltas new/removed participants can read.
>>>
>>>
>>> *
>>> *
>>>
>>> *Actual conversations:*
>>> *====================*
>>> *
>>> *
>>> *1) Underlying protocol for P2P federation:*
>>> [in response to Joseph's email]
>>> [23:42] <alown> I [...] agree with option 2 (make every root a JSON
>>>blob)
>>> [23:43] <alown> You haven't really detailed (at all) how the P2P
>>>federation
>>> is actually going to work (beyond 'not like IRC')
>>> [23:44] <josephg> Personally, I'd love some raw TCP action
>>> [23:44] <alown> I agree using KISS principle.
>>> [23:44] <josephg> a few years ago (not long after wave was cancelled)
>>>there
>>> was a 'wave summit'
>>> [23:45] <josephg> - and a few of us chatted about how we could make the
>>> federation protocol simpler
>>> [23:45] <josephg> we ended up (somehow) deciding that doing it over
>>>http
>>> woul dbe a good idea
>>> [23:45] <josephg> because then we could sneak it into companies past
>>>their
>>> corporate HTTP firewalls, etc
>>> [23:45] <josephg> but in any case, I'd like to figure out the protocol
>>>and
>>> (at least) have a TCP version
>>> [23:46] <josephg> it should be pretty easy to wrap the same messages in
>>> websockets if we want
>>>
>>>
>>> *2) **Message/event types needed for P2P federation to work:*
>>> [23:46] <alown> Do we need anything more complicated than the
>>> waveletSubmit/Commit messages used currently?
>>> [23:46] <alown> (Replace wavelet with 'abstract p2p ot container name)
>>> [23:46] <josephg> um, yeah.
>>> [23:47] <josephg> we'll also be able to rip out all the code that deals
>>> with managing the tree of servers per wave
>>> [23:47] <josephg> but yeah - the protocol will get a bit more
>>>complicated
>>> [23:47] <josephg> ... because we'll lose our beautiful integer version
>>> numbers
>>> [23:47] <josephg> so we'll need a protocol for syncronizing ops
>>> [23:48] <josephg> yeah - ops will each have a hash
>>> [23:48] <josephg> and two servers could each have ops the other server
>>> doesn't have
>>> [23:48] <josephg> so we have to be able to deal with that
>>> [23:47] <alown> What other 'events' are cared about by any particular
>>> server?
>>> [23:47] <alown> For a SHA hash?
>>> [23:48] <josephg> -> we'll need something like git's sync protocol
>>> [23:48] <alown> So, initial server contact is 'git clone', and then
>>>some
>>> form of 'git push' on changes?
>>> [23:49] <josephg> yep.
>>> [23:49] <josephg> push on changes is easy - its basically the same
>>>thing we
>>> have now
>>> [23:49] <josephg> just instead of saying "This should be applied at
>>>version
>>> 10" we say "This op has parents [abc123, def456]"
>>>
>>>
>>> *3) **Routing **p2p **messages/events in a server-aided network:*
>>> [23:49] <alown> With P2P do we have to broadcast to all peers? How do
>>>we
>>> coordinate that between them?
>>> [23:50] <josephg> between servers? I dunno.
>>> [23:50] <alown> How does BT handle this?
>>> [23:50] <josephg> should we just connect every server to every other
>>> server? That'd work fine...
>>> [23:50] <josephg> I guess every server can address every other server
>>> [23:50] <josephg> beacuse the wave will have al...@a.com and
>>> josephg@b.comand so on on it
>>> [23:50] <alown> This feels very inefficent...
>>> [23:51] <josephg> so if you submit an op to your server, your server
>>>can go
>>> "Oh, I need to tell b.com about this too"
>>> [23:51] <josephg> well, if there's 10 servers, presumably all 10
>>>servers
>>> need to find out about ops somehow.
>>> [23:51] <josephg> - assuming we stick with the current model of having
>>> servers store all your operations
>>> [23:51] <josephg> .. and documents for all the users at their domain
>>> [23:51] <alown> But server 'b' and 'c' might both be part of a wave,
>>>but
>>> also know each other, and know that they are 'closer' to each other
>>>than
>>> 'a' is. So, we would want a->b/c then b<->c
>>> [23:52] <josephg> so actually, having the server which originates an
>>> operation send it to all the other servers on that wave is actually
>>>close
>>> to ideal.
>>> [23:52] <josephg> yeah maybe.
>>>
>>>
>>> *4) **Routing **p2p **messages/events in a pure P2P system (part 1):*
>>> [23:54] <alown> BT uses DHT for its P2P stuff...
>>> [23:54] <josephg> ...I guess we could use a DHT storing all the ops,
>>>but
>>> thats pretty slow
>>> [23:55] <josephg> and you still need to notify all servers with users
>>>on
>>> the wave that the wave was updated.
>>> [23:55] <alown> Maybe, or perhaps only notify those within a certain
>>> 'distance', with each server doing that. (Though could mean some
>>>servers
>>> are never updated)
>>> [23:58] <alown> Perhaps we could make the network setup 'SuperWaves'
>>>which
>>> broadcast to all peers, and carry all information, but normal wave
>>>servers
>>> do not reach this status?
>>> [23:58] <alown> By having it decide itself based on how 'connected' a
>>> server is, this could find the most efficent ways to route it.
>>> [00:01] <josephg> Do you think it'll really be a problem?
>>> [00:01] <josephg> I mean, thinking about it - how many servers will be
>>>on a
>>> given wave?
>>> [00:01] <alown> Depends.
>>> [00:01] <alown> No idea.
>>> [00:01] <josephg> If it were a public wave, I can imagine clients just
>>> connecting to one (or more) centralized servers
>>> [00:01] * josephg nods
>>> [00:02] <josephg> ... But say if we were having a conversation on
>>> wave-dev@apache, there's like, at most 20 people in a discussion from
>>>5 or
>>> so domains
>>> [00:03] <josephg> ... I think we can deal with that kind of load.
>>> [00:04] <josephg> but if the protocol lets any server tell any other
>>>server
>>> about an operation, then it should be pretty easy to set up something
>>>like
>>> that.
>>> [00:04] <josephg> maybe.
>>> [00:04] * josephg thinks
>>> [00:05] <josephg> hm - you're right. I think I've just gotten used to
>>>the
>>> crappy state of doing routing for broadcasting messages to a network
>>> [00:05] <josephg> if you can find / think of a better solution, I'm in.
>>> [00:12] <alown> Heh, anyway replacing the network layer code SHOULD be
>>> easy, since it SHOULD be cleanly seperated.
>>> [00:13] <alown> Getting an initial implementation up using broadcast is
>>> fine.
>>> [00:13] <alown> (I was thinking of Wave's use in other apps as a
>>>reason you
>>> could have a lot of different participant domains)
>>> *...4) Routing **p2p **messages/events in a pure P2P system (part 2):*
>>> [08:53] <stenyak> as for the "how to *really* do p2p", i see two
>>>options:
>>> a) use a dht-like algorithm and/or b) use a helper server to route
>>>stuff
>>> for you
>>> [08:54] <stenyak> a) can be pretty slow if you want all OPs to reach
>>>all
>>> peers (if I'm not mistaken)
>>> [08:54] <stenyak> and b) is essentially makes it not-p2p
>>> [08:55] <stenyak> additionally, using p2p, how are we going to deal
>>>with
>>> routing problems (such as firewalls on both sides, etc)?
>>> [08:56] <stenyak> in my mind, the only universal solution is to have a
>>> third party server available to go through if we want speed or if we
>>>want
>>> to work on all edge cases
>>> [08:56] <stenyak> and wave being advertised as realtime, i don't see
>>>how
>>> something like dht can ever fly
>>> [11:20] <alown> stenyak: This is why I was wondering about a DHT system
>>> with 'Superwave' servers (to act as a first point of contact).
>>> [11:59] <stenyak> that would be like skype dynamic supernode list?
>>> [11:59] <alown> The original system, yes.
>>> [12:02] <stenyak> so we would devise a method to identify candidates to
>>> being a supernode, in order to prevent cellphone wave peers from
>>>becoming
>>> one, and in order to promot certain other nodes (like major peers that
>>>have
>>> 99% uptime, e.g. wave.google.com or whatever)  to become one
>>> [12:03] <stenyak> bandwidth, latency, open ports, uptime...
>>> [12:04] <alown> Once a network has been bootstrapped using something,
>>>it is
>>> relatively easy to identify the hosts which are most densely connected
>>>(and
>>> would be good supernode candidates)
>>> [12:05] <stenyak> what do you mean with "using something"?
>>> [12:06] <alown> Somehow the network has to initially be able to make
>>> contact with other nodes (before it knows anything about them)
>>> [12:07] <alown> For a LAN you could get away with a broadcast
>>>'announce',
>>> but it is a bit less clear on an internet-sized scale.
>>> [12:08] <stenyak> bittorrent sync uses a broadcast for LAN. for
>>>internet it
>>> uses a tracker server for fast discovery of peers, or you can disable
>>>that
>>> and force to use DHT (with the long wait that means)
>>> [12:09] <stenyak> the tracker can also act as a meeting-point for
>>> firewalled peer pairs (which in my experience is a lot of them)
>>> [12:09] <alown> Precisely the problem, because we don't really want
>>>long
>>> waits or trackers.
>>> *...4) Routing **p2p **messages/events in a pure P2P system (part 3):*
>>> [12:42] <stenyak> hmmm... i'm not sure how a peer gets a list of waves
>>>in
>>> which he's a participant of
>>> [12:43] <alown> Having a canonical source makes it all so much easier.
>>>:P
>>> [12:44] <stenyak> for pure p2p peers to "receive" new waves, either the
>>> FROM or the TO peer (or both) would need to try to find their way to
>>>the
>>> other
>>> [12:44] <stenyak> and we're assumign here that each person only runs
>>>one
>>> peer
>>> [12:45] <stenyak> e.g. my privatekey may be used by 5 wave peers at the
>>> same time, and we must make sure the new wave reaches all of them
>>> [12:46] <alown> Looks like we may need to have mulitple DHTs then (one
>>>for
>>> ops, one for waves)
>>> [12:46] <stenyak> in BT, it's the receiver end who actively looks for
>>>peers
>>> to receive from. in wave, it's not like that..
>>> [12:46] <alown> Or could we have a pubkey->wave mapping in one?
>>> [12:46] <stenyak> and in BT, you can assume *many* people has the data
>>>you
>>> want
>>> [12:46] <stenyak> in wave, its possible and probably that only one
>>>other
>>> peer in the universe has the wave
>>> [12:46] <stenyak> (because it's a personal wave sent to you)
>>> [12:47] <alown> I would expect any long-running supernodes to be
>>>implicitly
>>> part of all waves they know about.
>>> [12:47] <alown> Though on second thought, this seems like it would add
>>>its
>>> own problems to authentication, storage, promotion of supernodes etc.
>>> *...4) Routing **p2p **messages/events in a pure P2P system (part 4):*
>>> [12:51] <alown> Does it make sense for a peer to have your privkey,
>>>since
>>> you could be logged in anywhere, so it would be down to the place you
>>>are
>>> logged in, to 'subscribe' to that wave on the network, and attempt to
>>> retrieve all data from it...
>>> [12:55] <alown> I was expecting the network as a whole to act like a
>>> WaveBus pubsub system, whereby once 'logged in' at some server (which
>>>means
>>> it gets your privkey from the authentication system), that server then
>>> 'subscribes' to your waves on the 'network'. If somebody else at some
>>>other
>>> server changes it, then that server would be announcing to the network
>>>of a
>>> change (doesn't necesserily have to be a broadcast), which your server
>>> would 'hear'.
>>> [12:56] <alown> You could do this from any server where you logged in
>>> (hence the concept of a domain is lost).
>>> [12:57] <stenyak> by "server" you mean supernodes?
>>> [12:57] <alown> Not necessarily.
>>> [12:59] <stenyak> this pubsub network must be aware of nodes that are
>>>in
>>> it, in order to directly route wave updates to them, correct?
>>> [12:59] <stenyak> and also, this network wouldn't be very volatile, but
>>> would rather ideally be long-lived peers?
>>> [13:00] <alown> It has no reason to have to directly route updates,
>>>(though
>>> it would hopefully be able to identify the best routes automatically).
>>> [13:00] <alown> Yes it would require a few long-lived peers (which
>>>would be
>>> part of the requirement to be a supernode).
>>> [13:01] <stenyak> so let's say i connect my laptop wave peer to the
>>> "server" in the living room, at my firewalled home. this "server"
>>>would be
>>> already subscribed to the pubsub network, and in this specific case it
>>> would route all wave updates to me
>>> [13:02] <stenyak> in other cases (let's say, ipv6-enabled nodes
>>>everywhere,
>>> no firewall at home), the living room server could simply notify the
>>> original "FROM" peer to send stuff to my laptop ipv6 ip, right?
>>> [13:03] <alown> That sounds right. Supernodes are really only needed
>>>for
>>> getting the routing right.
>>> [13:05] <stenyak> ok. in both these theoretical cases, the "server"
>>>hasn't
>>> necessarily been a wave node per se (nor a supernode either), but
>>>rather a
>>> second type of wave node that helps get stuff quickly wherever it's
>>>needed
>>> [13:05] <alown> Yes.
>>> [13:05] <alown> I am not even sure where OT should be happening in this
>>> picture...
>>> [13:05] <stenyak> if OT happens, the "server" is a blind proxy i think
>>> [13:06] <stenyak> so does not need the privkey to work
>>> [13:07] <stenyak> unless we're also using OT in the wavebus pubusb
>>>network
>>> for some reason?
>>> [13:07] <alown> Supernodes can be blind (though they might also just be
>>> normal well-connected wave servers). I would expect normal servers to
>>>still
>>> be doing OT. The question is whether the 'client' (whatever that means)
>>> should be doing it also.
>>> [13:08] <alown> The network shouldn't need OT. (Algorithms exist that
>>>allow
>>> the incoming ops to be arbitarily queued and only processed when
>>>needed).
>>> [...]
>>> [21:21] <josephg> alown: the client always needs to do OT because
>>>otherwise
>>> they can't both edit a document live and receive operations from
>>>people who
>>> didn't have their ops.
>>> [21:22] <josephg> the server doesn't need to do OT, although if it
>>>doesn't
>>> do OT, it'll punt the OT work to its clients - which will result in a
>>> higher CPU utilization on mobile devices.
>>> [...]
>>> [13:08] <stenyak> i pictured this "server" as being an optional item
>>>that
>>> shortcuts the long waits of DHT, rather than something necessary for
>>> "clients"?
>>> [13:08] <alown> Hmm.
>>> [13:08] <alown> I suppose we should define what a 'client' is then...
>>> [13:09] <alown> We have at least 2 layers of stuff going on here: 1)
>>>Wave
>>> OT/operation layer 2) Network routing/P2P layer
>>> [13:13] <alown> But it is quite plausible something might be doing
>>>both of
>>> those
>>> [13:10] <stenyak> with your pubsub net suggestion, i was picturing 2
>>>kinds:
>>> a regular pure p2p peer, and a helper kind of node to route stuff
>>>quickly
>>> when a peer is connected to it
>>> [13:13] <stenyak> so with that picture in mind, layer 1 stuff could go
>>> directly from peer to peer (if connectivity/firewalls allows), or
>>>through
>>> the "helper node" if available
>>> [...]
>>> [13:20] <stenyak> [...] all this discussion looks very similar to
>>> discussing how to design internet+dns, i think the problems are the
>>>same
>>> really
>>> [13:20] <stenyak> or at least we could take some inspiration from it
>>>maybe
>>> [13:20] <alown> This was my conclusion last night with josephg. ('The
>>> problmes should already be solved (see The Internet)')
>>> [14:09] <stenyak> and The Internets solved the problem how? By having a
>>> large set of supernodes (dns servers), that may take a whole day to
>>> propagate updates. The alternative being having the actual IP address
>>>in
>>> the first place, or to centralize stuff
>>> [14:10] <stenyak> (aka use servers everywhere)
>>> [14:22] <alown> Maybe, but the internet's design is X (where X > 20)
>>>years
>>> old, so may not represent the most modern thinking of how to make
>>> distributed networks.
>>> [14:59] <alown> (Don't forget that our aim for Wave is at the
>>>cutting-edge
>>> of academic research also).
>>> [...]
>>> [14:50] <stenyak> i just threw the question at some friends who should
>>>be
>>> more up-to-date with networking technologies than me... hopefully they
>>> comeback with some revolutionary dns-2 design or something that we can
>>>copy
>>> [15:18] <stenyak> could give as some ideas: http://openpeer.org/
>>> [15:18] <stenyak> (it's not a solution, but maybe they did the same
>>> reasoning we're going through)
>>> [15:46] <stenyak> another response i got goes along the lines of...
>>>hard as
>>> fuck, but if you manage to do it, you are a hero
>>> [...]
>>> [15:02] <stenyak> looking at it from a wider perspective, what we want
>>>is
>>> similar to having each peer shout at the whole world "here i am,
>>>anything
>>> got something for meeee?" in some way that doesn't clog the internet
>>>tubes,
>>> and that is so fast as shouting would be. i start to think it's not
>>> physically possible to do that...
>>> [15:03] <stenyak> if publickeys were handed to people based on the
>>> location, then we could have routing tables similar to how internet
>>> currently works
>>> [15:03] <stenyak> but pubkeys are... well, random. so that kind of
>>>routing
>>> that allows anyone to connect to an arbitrary IP in a matter of
>>> milliseconds is impossible, i believe
>>> [15:04] <alown> So, we end up with DNS for public keys?
>>> [15:04] <stenyak> something like dns, but much faster [wrt. propagation
>>> times]
>>> [15:05] <stenyak> so in essence, a tree of servers or whatever (which
>>>is
>>> similar to how wave currently works, right?)
>>> [15:05] <alown> Heh. But the whole point was to avoid the tree system
>>> currently (since it is susceptible to netsplits)
>>> [...]
>>> [15:56] <stenyak> maybe the real question could be: how do we make DHT
>>>much
>>> faster?
>>> [16:14] <stenyak> once the initial discovery process is finished, the
>>> transmission of data will not have the lag associated with DHT, so
>>>even if
>>> DHT takes 10 seconds, that could be acceptable
>>> [16:15] <stenyak> i.e. a new peer takes 10 seconds to be discovered by
>>>the
>>> rest of participants collaborating in a wave
>>> [16:16] <stenyak> (or viceversa.. the new peer takes 10 seconds to
>>>discover
>>> the participants)
>>> [...]
>>> [16:25] <stenyak> this could shed some light:
>>>
>>> 
>>>http://en.wikipedia.org/wiki/Distributed_hash_table#Algorithms_for_overl
>>>ay_networks
>>> [19:06] <stenyak> http://dsn.tm.kit.edu/english/2936.php
>>> *...4) Routing **p2p **messages/events in a pure P2P system (part 5):*
>>> [21:03] <josephg> [...] For now, I want wave to be p2p in the same way
>>>that
>>> git is p2p.
>>> [21:04] <josephg> that is, I want the core algorithms & data
>>>structures to
>>> use P2P-capable algorithms, and probably the wave servers will do p2p
>>> between themselves (this is easy because they'll all be both named and
>>> accessable)
>>> [21:06] <josephg> as for client-to-client p2p, there's a few options
>>> depending on what kind of use cases we want to support - but I want to
>>> worry about getting the algorithms p2p-capable first. If you're keen
>>>to set
>>> up an anonymous, distributed wave system over a DHT - well, I want to
>>>first
>>> make that possible
>>> [21:15] <josephg> .... and as for ipv6, network admins _love_ NAT now
>>>that
>>> we have it
>>>
>>>
>>> *5) Implementing "undo": invertibility, tombstones, edge cases, TP2:*
>>> [00:17] <alown> I am not sure how an 'undo stack' is going to work (at
>>>all)
>>> with federation...
>>> [00:18] <josephg> well, you just do undo at the application level
>>> [00:19] <josephg> "submit op which inserts text" ... later "submit op
>>>which
>>> removes text"
>>> [00:19] <josephg> you don't need OT for that.
>>> [00:20] <josephg> I imagine like, a semantic undo. In the client you
>>>can
>>> imagine making an undo op (which might not necessarily rollback an
>>> operation (because of tombstones and all that))
>>> [00:20] <josephg> ... but would seem that way as far as the user is
>>> concerned
>>> [00:21] <josephg> then if the user hits ctrl+z, you can transform that
>>> operation up to the current version and apply it
>>> [00:21] <josephg> - the fact that its an undo isn't really relevant.
>>> [00:21] <josephg> the bad thing about losing invertability is doing
>>> playback
>>> [00:21] <josephg> - because you can't scrub back through time
>>> [00:21] <alown> But you have all the operations since the start, so
>>>you can
>>> play forward at least?
>>> [00:23] <josephg> yeah exactly.
>>> [00:23] <josephg> ... and make like, keyframes of the document
>>> [00:23] <josephg> - and play forward from them or something.
>>> [00:23] <alown> Hmm, so you can do the step-back without recalculating
>>>the
>>> entire document?
>>> [00:24] <alown> I don't really like the idea of then having another
>>> datastructure to have to pass around...
>>> [00:24] <josephg> right - if you have a snapshot at version 1000, and
>>>the
>>> user is looking at 1010 and they try to step back to 1009, you can just
>>> replay ops 1001-1009 on that version 1000 snapshot
>>> [00:24] <alown> What was the problem with invertible operations (I
>>>don't
>>> understand OT enough yet to be able to properly comment on that side).
>>> [00:25] <alown> (Other than it confuses people?)
>>> [00:25] <josephg> hahaha actually people seem to love invertability
>>> [00:25] <josephg> I don't know why.
>>> [00:25] <josephg> I've been trying to remove it from sharejs, and
>>>everyone
>>> gets sad.
>>> [00:26] <josephg> the problem is that if I make an op which deletes the
>>> whole document (version 100, say) then I undo that operation
>>> [00:26] <josephg> and you insert in the middle of the document at
>>>version
>>> 100, then your op gets transformed to do that insert at the start of
>>>the
>>> document instead at version 101 (because the content has disappeared)
>>> [00:26] <josephg> and it never goes back to the middle of the document.
>>> [00:27] <josephg> so, with tombstones you can get around that by
>>>having a
>>> 'resurrect' operation
>>> [00:27] <josephg> (so deleting the whole document turns the whole
>>>document
>>> into tombstones, then we can resurrect them all again in the inverse)
>>> [00:28] <josephg> but you can't invert an insert - because deleting
>>>leaves
>>> the tombstone there
>>> [00:28] <josephg> and if you have a 'real delete' operation, then yeah,
>>> you're back in the hole
>>> [00:28] <josephg> also, with wave in particular, inverting is really
>>> complicated
>>> [00:29] <josephg> - see, if the wave says "<annotation bold:true>blah
>>> blah<annotation bold:false> not bolded"
>>> [00:29] <josephg> then if you insert at the end of the "blah blah",
>>>it'll
>>> automatically get bolded.
>>> [00:30] <josephg> ... so if the text isn't bolded, and then you bold it
>>> while I insert at the end of the text, you need to make sure my text
>>> _isn't_ bolded or something
>>> [00:31] <josephg> .... and yeah, I can't remember - but there's these
>>> horror cases that I remember kept me from sleeping when I tried to
>>> reimplement wave's OT code in C
>>> [00:31] <alown> hmm
>>> [00:31] <josephg> and it would have been fine if it wasn't invertible.
>>> Well, at least it would have been tollerable.
>>> [00:33] <josephg> So yeah. Conclusion: You can make invertability
>>>work, but
>>> its kind of a bitch, and you can't make it work for TP2
>>> [00:33] <josephg> which means it won't work if we're federating
>>> [00:33] <alown> How are we hacking around that currently then?
>>> [00:33] <josephg> well, we don't do TP2
>>> [00:34] <josephg> remember, federation just uses a bad version of the
>>> current client-server protocol
>>> [00:34] <josephg> - arranged in a tree of servers
>>> [00:34] * alown goes and looks up which one TP2 was again
>>> [00:35] <josephg> ... its the one that says you don't need a canonical
>>> ordering of operations
>>> [00:35] <josephg> sharejs and wave both use the server to pick the
>>>order of
>>> operations (based on which order they reach the server)
>>> [00:35] <josephg> and then they use incrementing version numbers based
>>>on
>>> that order
>>> [00:35] <alown> ah yep.
>>> [00:35] <josephg> -> for p2p, that doesn't work because you don't have
>>>a
>>> centralized server, and anyone can send messages to anyone
>>> [00:36] <josephg> and yeah, you need TP2 for that (which sort of says
>>>you
>>> can apply ops from 3 different sites in any order and it still works)
>>> [00:37] <josephg> - and apparently someone proved that if you make it
>>>work
>>> for 3 sites, it works for any number of sites
>>> [00:43] <alown> Anyhow, I can see leaving inversion out for
>>>simplicity, but
>>> don't yet understand why it can't be made to work with TP2.
>>> [00:59] <alown> Hmm. Seen 'A Sequence Transformation Algorithm for
>>> Supporting Cooperative work on Mobile Devices'?
>>> [01:02] <josephg>
>>>
>>> 
>>>http://research.microsoft.com/en-us/um/redmond/groups/connect/cscw_10/do
>>>cs/p159.pdf
>>> ?
>>> [01:15] <alown> The main feature is its use of storing local/remote
>>> operations and processing them much later than receipt time.
>>> [01:17] <alown> ABT satisfies TP1+2, so looks like this should(?)
>>> [01:19] <josephg> need to read it
>>> [01:19] <josephg> ... I'll go through it later
>>>
>>>
>>> *6) Usability of a pure p2p system in Real Life (tm):*
>>> [12:13] <alown> We also don't know if storing ops in a DHT is efficent
>>> enough for our use case...
>>> [12:14] <stenyak> in any case, let's say i fire up my wavep2p android
>>> client and want to check for any new waves
>>> [12:14] <stenyak> i definitely won't put up with a wait of 30 seconds 
>>>when
>>> i have "this damn fast 4g connection!" in my cellphone
>>> [12:14] <stenyak> i mean, that's the point of view of six pack joe
>>> [12:14] <stenyak> and joe is definitely right..
>>> [12:15] * alown thinks of the hours it took to download the bitcoin
>>> blockchain from the p2p system
>>> [12:15] <stenyak> or browse through freenet, or whatever... its painly 
>>>slow
>>> [12:16] <stenyak> in the end, i think that most users won't be running 
>>>a
>>> full blown peer, but will be relying on an external server instead
>>> [12:16] <stenyak> i.e. nobody runs their own email servers nowadays
>>> [12:16] <stenyak> and the same can happen with wave
>>> [12:16] <alown> Should a mobile client be doing the full p2p 
>>>federation, or
>>> simply talking to a server which does it...
>>> [12:16] <stenyak> the few who decide to run a full-blown wave peer, 
>>>should
>>> be aware of the problems
>>> [12:17] <alown> So, this should be less of a problem since the only 
>>>nodes
>>> doing p2p will be proper full-time connected servers?
>>> [12:17] <stenyak> the thing is, we can assume most people wont fire up
>>> their own xmpp server, but go for jabber.org account
>>> [12:17] <stenyak> and the same thing will presumably happen for wave,
>>> simply because it's easier to do
>>> [12:18] <stenyak> which doesn't pervent me from running my own 
>>>full-blown
>>> wave server
>>> [12:18] <stenyak> but that's a use case in which the user knows the
>>> limitations
>>> [12:19] <stenyak> [...] you and i will run several full-blown wave 
>>>peers at
>>> home, at our parent's house, or whatever, but we'll know and accept the
>>> problems
>>> [12:19] <stenyak> i think that's the way to think about the problem
>>> [12:19] <stenyak> heck, most people use github for permanent [git]
>>> connectivity ;-)
>>> [12:19] <stenyak> instead of opening ports to their laptop in their lan
>>> [12:19] <stenyak> and those are the tech-savvy people...
>>> [12:20] <alown> So, we have a p2p system between wave servers and 
>>>superwave
>>> servers, with clients connecting to the server rather than doing the 
>>>p2p
>>> itself...
>>> [12:20] <stenyak> i'm not saying it's the way we should do it. i'm 
>>>saying
>>> that's the way it most probably will pan out, because it's already
>>> hapennign in 100% of the existing p2p protocols i know of
>>> [12:20] <alown> Hmm...
>>> [12:21] <stenyak> so we should plan for that instead of a theoretical 
>>>pure
>>> p2p world
>>> [12:21] <stenyak> if we assume there's servers like github, bitbucket 
>>>and
>>> sourceforge, then suddently most of the problems go away, while still 
>>>not
>>> preventing from people to run fully p2p if they want
>>>
>>>
>>> *7) Comparison with BitTorrent and P2P-TV technologies:*
>>> [12:21] <alown> BT doesn't have huge servers (and with magnet has 
>>>actually
>>> move in the opposite direction).
>>> [12:21] <stenyak> BT has no real-time needs
>>> [12:22] <stenyak> that's why they can afford DHT
>>> [12:22] <stenyak> dht could be used for simulating a forum-like 
>>>discussion
>>> in wave. but we can't force that restriction from the server
>>> [12:22] <stenyak> (i say forum-like, because people don't expect 
>>>reaction
>>> within seconds there)
>>> [12:23] <alown> How did iplayer do its live p2p broadcastinºg?
>>> [12:23] * stenyak googles what iplayer is
>>> [12:23] <alown> Sorry, BBC iPlayer is their TV-over-the-internet 
>>>system.
>>> [12:24] <alown> Originally it used a p2p system, but got lots of 
>>>negative
>>> press (because of assosciation with BT since it used p2p), so it now 
>>>uses a
>>> centralized system instead. (And their bandwidth costs are much 
>>>higher).
>>> [...]
>>> [12:25] <stenyak> i seem to recall other [p2p] tv clients
>>> [12:25] <stenyak>
>>>
>>> 
>>>http://wiki.xbmc.org/index.php?title=HOW-TO:Play_free_P2P_(peer-to-peer)
>>>_online_streaming_TV
>>> [...]
>>> [12:26] <alown> Found a paper titled "RT-P2P: A Scalable Real-Time
>>> Peer-to-Peer System with Probabilistic Timing Assurances" (google for 
>>>it)
>>> [12:28] <alown> Lookt at the paper I mentioned. It relies on 'super 
>>>nodes'
>>> to enable it to keep low latencies...
>>> [...]
>>> [12:27] <stenyak> but i'd be wary of using this (p2p tv) as an 
>>>inspiration.
>>> i know there's delay of 10-30 seconds from my TV Formula1 image to the
>>> telemetry that comes through HTTP from formula1.com website. this is
>>> regular TV, and they don't care about 30 seconds of lag
>>> [12:27] <stenyak> the only real problem of p2p tv is avoiding much 
>>>jitter
>>> [12:27] <stenyak> as long as the stream arrives and is viewable, a 
>>>delay of
>>> a minute doesn't matter that much
>>> [12:28] <alown> True.
>>>
>>>
>>> *8) Identifying participants (part 1):*
>>> [12:09] <alown> I am also no longer sure what an 'account' should look
>>> like, since it has no reason to be stuck to a domain...
>>> [12:10] <stenyak> current wave discovery works by using the domain 
>>>name of
>>> the email-address-like list of participants
>>> [12:10] <stenyak> but here we're talking about hashes, public keys or
>>> whatever
>>> [12:10] <stenyak> which do not (necessarily) point to an particular 
>>>IP:PORT
>>> or whatever
>>> [12:10] <alown> Exactly the problem...
>>> *...8) Identifying participants (part 2):*
>>> [12:33] <stenyak> would it make sense that, while some participants are
>>> identified by a pubkey (or whatever), many of them could be identified 
>>>by a
>>> user@domain address, with which any peer can quickly locate supernodes?
>>> [12:33] <stenyak> i mean some kind of dual "pubkey and optional domain
>>> email-like addr" for the participants list
>>> [12:34] <stenyak> the optional part being essential in the broader 
>>>internet
>>> [12:34] <alown> Isn't that exactly what using Mozilla Persona would do 
>>>(map
>>> user@domain to some public-key we can use)
>>> [12:34] <alown> Removing the need for us to have to roll yet-another
>>> authentication system.
>>> [...]
>>> [12:38] <stenyak> the idea would be that, for a person to be a 
>>>participant
>>> in a wave, you *require* his pubkey. optionally, you may have acquired 
>>>ths
>>> pubkey by asking "wave.google.com" about the user "joe", getting his
>>> pubkey
>>> as a result.
>>> [12:39] <stenyak> and now that you have the pubkey and one of many 
>>>possible
>>> email-like addresses (in this case j...@wave.google.com), then you can 
>>>use
>>> the email-like address for displaying in the UI
>>> [12:39] <stenyak> this means that, whoever wants to run pure p2p peers,
>>> will have to give his pubkey
>>> [12:39] <stenyak> and whoever uses the more traditional style, can 
>>>simply
>>> give his email-like addr
>>> [12:39] <stenyak> and the participants list will show a simple 
>>>email-like
>>> address most of the time
>>> [12:40] <alown> Do we then allow anyone to 'log in' to any wave server
>>> running at any domain, since it should no-longer make any difference 
>>>where
>>> they are in the network...
>>> [12:41] <stenyak> yes, that's needed for world-wide-public waves, 
>>>which is
>>> equivalent to a read-only forum on the net
>>> [12:41] <stenyak> then there could be server-public waves, which is
>>> equivalent to requiring sign-in to view a forum (and coincidentally the
>>> current implementation of public waves in WiaB, right?)
>>> [12:43] * alown has never tested what happens with public waves in the
>>> current federation system
>>> *...8) Identifying participants (part 3):
>>> *
>>> [21:35] <josephg> - Who is a user? If a user is sten...@example.com, 
>>>then
>>> we can put a server at example.com and it can hold operations for you
>>> [21:36] <josephg> ie, if I add you to a wave, my computer (or my wave
>>> server or something) can send a message to example.com to say "Yo, 
>>>here's
>>> some ops you should know about"
>>> [21:36] <josephg> that would be similar to a mailbox
>>> [21:37] <josephg> ... and it would work pretty well. Bear in mind that
>>> there's no reason operations have to go through the wave server at
>>> example.com - if we're both on a LAN together, we could discover one
>>> another through DNS service discovery and send ops directly
>>> [21:37] <josephg> .. without going through our respective wave servers
>>> [21:38] <josephg> However - if our identities aren't tied to a domain 
>>>(eg
>>> bitcoin), then we'll need to use a dht or something.
>>> [21:42] <stenyak> the conclussion i've arrived at is that "users"
>>> ultimately are a publickey (for which they have the privatekey). this 
>>>is
>>> inconvenient for people to "add you to a wave", so a possibility would 
>>>be
>>> to have a friendlyname=>pubkey server converter. this way people can 
>>>add "
>>> sten...@example.com", by first finding out what the pubkey for
>>> sten...@example.com really is
>>> [21:43] <stenyak> the friendlyname would be optional, and in LAN
>>> environments you could directly use the pubkey (instead of the friendly
>>> name)
>>> [21:43] <josephg> I think people will be more than happy to use a 
>>>frienly
>>> name in a lan environment too
>>> [21:43] <stenyak> discovery in a local network could be done with 
>>>bonjour
>>> or something too (not just dns)
>>> [21:44] <josephg> I <3 dns-sd
>>> [21:44] <stenyak> [...] maybe they already have a contact list (read, 
>>>list
>>> of friendlyname<>pubkey equivalences) they can use in the UI (even if 
>>>the
>>> underlying system will use pubkeys anyway)
>>> [21:44] <stenyak> and by contact list, i really mean a cache of some 
>>>sort
>>> [21:45] <stenyak> (not some specific, complex roster system)
>>> [21:45] <josephg> and you can do friendlyname -> pubkey really easily 
>>>by
>>> just storing the pubkey on the user's domain
>>> [21:45] <josephg> so, have the example.com webserver host
>>> https://example.com/.wellknown/stenyak
>>> [21:46] <josephg> = your public key.
>>>
>>>
>>> *9) P2P anonymity (peers that want to anonymously lurk in a wave) (part
>>> 1):*
>>> [12:48] <stenyak> by the way, what about non-participants that simply 
>>>want
>>> to lurk a wave?
>>> [12:49] <stenyak> e.g. i'm given a wave uri
>>> (wave://look_at_these_kittens_wave), and want to view it
>>> [12:49] <alown> Whilst a wave is  public, as soon as they 'read' the 
>>>wave,
>>> they would have a metadata wavelet created, so would become a 
>>>participant
>>> (if read-only).
>>> [12:50] <stenyak> and from then on, whenever the wave changes, someone 
>>>will
>>> try to make the change reach the peers with my privkey
>>> [12:50] <stenyak> supposedly..
>>> *...9) P2P anonymity (peers that want to anonymously lurk in a wave) 
>>>(part
>>> 2):*
>>> [21:18] <josephg> stenyak: interesting point about people who want to 
>>>not
>>> participate but follow a wave anyway - its really bad if other people 
>>>can
>>> tell that they're there (assuming the wave is public).
>>> [21:18] <josephg> I guess we just need to make sure that the metadata 
>>>wave
>>> is invisible, and then its ok..
>>> [21:21] <stenyak> invisible.. to what peer/s? surely those that are
>>> transmitting deltas to the lurkers will need to know they exist?
>>> [21:21] <stenyak> (maybe some of the algorithms behind freenet can help
>>> with this)
>>> [21:21] <stenyak> (or even TOR)
>>>
>>>
>>> *10) Encryption of waves:*
>>> [21:47] <josephg> for waves themselves, I'm imagining giving each wave 
>>>an
>>> AES key
>>> [21:47] <josephg> then storing an encrypted version of the key for each
>>> participant on the wave
>>> [21:48] <josephg> .... anyway, that way anyone who has the AES key can 
>>>read
>>> all ops on the wave
>>> [21:48] <josephg> and can participate (because they can encrypt ops 
>>>for the
>>> wave)
>>>
>>>
>>> *11) Addition and removal of participants, and their ability to read 
>>>past
>>> and future wave versions/deltas:*
>>> [21:48] <stenyak> what about removing a user from a wave?
>>> [21:49] <josephg> worst case, we can just make a new key and re-add
>>> everyone using the new key
>>> [21:49] <josephg> and keep around the old key too
>>> [21:49] <josephg> so people can still read the old ops as well
>>> [21:49] <stenyak> the user can access their browser cache for all we 
>>>care..
>>> if you ever read it, there will be ways to do it. "download now 
>>>wave-spy to
>>> read waves you were removed from!"
>>> [21:49] <stenyak> so providing an official way sounds better
>>> [21:50] <stenyak> the AES key could change at any point in time, e.g.
>>> whenever a new users is added (to prevent them accessing the history), 
>>>or
>>> deleting them (to prevent them from reading future history)
>>> [22:32] <josephg> um - in wave, we let new users see the whole history
>>> [22:40] <stenyak> but that use case could be desirable, right? and if 
>>>we
>>> support modification/versioning of the AES key, we might as well allow 
>>>that
>>> too? the equivalent in email world would be to forward an email, 
>>>removing
>>> the existing quotes
>>> [23:17] <josephg> Yep definitely.
>>>
>>>
>>> --
>>> Saludos,
>>>       Bruno González
>>>
>>> _______________________________________________
>>> Jabber: stenyak AT gmail.com
>>> http://www.stenyak.com
>>>
>

Re: IRC discussion on P2P waving

Reply via email to