IRC discussion on P2P waving

Bruno Gonzalez (aka stenyak) Fri, 21 Jun 2013 05:06:52 -0700

Following Joseph's "A Very Wavey Plan (P2P!)" thread, a couple of
discussions have taken place at the irc.freenode.net #wiab channel, all
related to P2P.


I've taken the liberty to restructure the IRC logs, remove some chitchat,
and divide it into sub-discussions. Feel free to reply to any part of this
email to continue a discussion.


*Summary of discussions:*
*====================*
*1) Underlying protocol for P2P federation*
Currently XMPP is used. HTTP and raw TCP are two suggested candidates (HTTP
allowing to much more easily reach restricted networks).

*2) Message/event types needed for P2P federation to work*
We'd need something similar in concept to certain git operations (git
clone, git push...). All will be based on hashes (not incremental integers).

*3) Routing p2p messages/events in a server-aided network*
One option is to somehow detect server clusters, send data to one of them,
and let the rest of the cluster servers synchronize to it (locally).
Alternatively, the originator server can naively send stuff to all possible
destination servers, regardless of the cost.

*4) Routing p2p messages/events in a pure P2P system (5 parts)*
How to manage to route all wave-stuff if we want to completely get rid of
servers completely, and only use peers.
The closest way would be to use a DHT, but huge latency is an unsolved
problem, and makes it impossible to use for real-time waving.
No other solution has been proposed.

*5) Implementing "undo": invertibility, tombstones, edge cases, TP2*
No server means no canonical order of commits, which means that undo is
hard to do correctly.
(uhm... not sure if that's a good summary, some stuff went over my head
:-D, please read the log instead)

*6) Usability of a pure p2p system in Real Life (tm)*
Being pragmatic, pure P2P is probably only usable in peers with good
connectivity. Rest of peers will need to rely on a server/proxy that *does*
have good connectivity.

*7) Comparison with BitTorrent and P2P-TV technologies*
Both technologies are much less restricted than wave with regards to
real-time responsiveness. So none are really a good reference for our
purposes.

*8) Identifying participants (3 parts)*
Pure p2p means many peers don't have a n...@centralized-server.com user
handle, so an alternative has to be used.
However, it's easy to provide a traditional friendly handle, if the user
prefers the tradeoff of having to often rely on a permanent server. This
tradeoff can be mitigated by using a sort of userhandle cache.

*9) P2P anonymity (lurking in a wave) (2 parts)*
In a pure p2p wave network, anonymous peers may want to read a public wave,
without other peers knowing. A solution could be to make private the
required wavelets (where the anonymous participants IDs are stored).

*10) Encryption of waves*
It's been proposed to use an AES key to encrypt all the wave data, and only
allow participants to decrypt it.

*11) Addition and removal of participants, and their ability to read past
and future wave versions/deltas*
The aforementioned AES key can change over time, allowing a finer-grained
restriction of what deltas new/removed participants can read.


*
*

*Actual conversations:*
*====================*
*
*
*1) Underlying protocol for P2P federation:*
[in response to Joseph's email]
[23:42] <alown> I [...] agree with option 2 (make every root a JSON blob)
[23:43] <alown> You haven't really detailed (at all) how the P2P federation
is actually going to work (beyond 'not like IRC')
[23:44] <josephg> Personally, I'd love some raw TCP action
[23:44] <alown> I agree using KISS principle.
[23:44] <josephg> a few years ago (not long after wave was cancelled) there
was a 'wave summit'
[23:45] <josephg> - and a few of us chatted about how we could make the
federation protocol simpler
[23:45] <josephg> we ended up (somehow) deciding that doing it over http
woul dbe a good idea
[23:45] <josephg> because then we could sneak it into companies past their
corporate HTTP firewalls, etc
[23:45] <josephg> but in any case, I'd like to figure out the protocol and
(at least) have a TCP version
[23:46] <josephg> it should be pretty easy to wrap the same messages in
websockets if we want


*2) **Message/event types needed for P2P federation to work:*
[23:46] <alown> Do we need anything more complicated than the
waveletSubmit/Commit messages used currently?
[23:46] <alown> (Replace wavelet with 'abstract p2p ot container name)
[23:46] <josephg> um, yeah.
[23:47] <josephg> we'll also be able to rip out all the code that deals
with managing the tree of servers per wave
[23:47] <josephg> but yeah - the protocol will get a bit more complicated
[23:47] <josephg> ... because we'll lose our beautiful integer version
numbers
[23:47] <josephg> so we'll need a protocol for syncronizing ops
[23:48] <josephg> yeah - ops will each have a hash
[23:48] <josephg> and two servers could each have ops the other server
doesn't have
[23:48] <josephg> so we have to be able to deal with that
[23:47] <alown> What other 'events' are cared about by any particular
server?
[23:47] <alown> For a SHA hash?
[23:48] <josephg> -> we'll need something like git's sync protocol
[23:48] <alown> So, initial server contact is 'git clone', and then some
form of 'git push' on changes?
[23:49] <josephg> yep.
[23:49] <josephg> push on changes is easy - its basically the same thing we
have now
[23:49] <josephg> just instead of saying "This should be applied at version
10" we say "This op has parents [abc123, def456]"


*3) **Routing **p2p **messages/events in a server-aided network:*
[23:49] <alown> With P2P do we have to broadcast to all peers? How do we
coordinate that between them?
[23:50] <josephg> between servers? I dunno.
[23:50] <alown> How does BT handle this?
[23:50] <josephg> should we just connect every server to every other
server? That'd work fine...
[23:50] <josephg> I guess every server can address every other server
[23:50] <josephg> beacuse the wave will have al...@a.com and
josephg@b.comand so on on it
[23:50] <alown> This feels very inefficent...
[23:51] <josephg> so if you submit an op to your server, your server can go
"Oh, I need to tell b.com about this too"
[23:51] <josephg> well, if there's 10 servers, presumably all 10 servers
need to find out about ops somehow.
[23:51] <josephg> - assuming we stick with the current model of having
servers store all your operations
[23:51] <josephg> .. and documents for all the users at their domain
[23:51] <alown> But server 'b' and 'c' might both be part of a wave, but
also know each other, and know that they are 'closer' to each other than
'a' is. So, we would want a->b/c then b<->c
[23:52] <josephg> so actually, having the server which originates an
operation send it to all the other servers on that wave is actually close
to ideal.
[23:52] <josephg> yeah maybe.


*4) **Routing **p2p **messages/events in a pure P2P system (part 1):*
[23:54] <alown> BT uses DHT for its P2P stuff...
[23:54] <josephg> ...I guess we could use a DHT storing all the ops, but
thats pretty slow
[23:55] <josephg> and you still need to notify all servers with users on
the wave that the wave was updated.
[23:55] <alown> Maybe, or perhaps only notify those within a certain
'distance', with each server doing that. (Though could mean some servers
are never updated)
[23:58] <alown> Perhaps we could make the network setup 'SuperWaves' which
broadcast to all peers, and carry all information, but normal wave servers
do not reach this status?
[23:58] <alown> By having it decide itself based on how 'connected' a
server is, this could find the most efficent ways to route it.
[00:01] <josephg> Do you think it'll really be a problem?
[00:01] <josephg> I mean, thinking about it - how many servers will be on a
given wave?
[00:01] <alown> Depends.
[00:01] <alown> No idea.
[00:01] <josephg> If it were a public wave, I can imagine clients just
connecting to one (or more) centralized servers
[00:01] * josephg nods
[00:02] <josephg> ... But say if we were having a conversation on
wave-dev@apache, there's like, at most 20 people in a discussion from 5 or
so domains
[00:03] <josephg> ... I think we can deal with that kind of load.
[00:04] <josephg> but if the protocol lets any server tell any other server
about an operation, then it should be pretty easy to set up something like
that.
[00:04] <josephg> maybe.
[00:04] * josephg thinks
[00:05] <josephg> hm - you're right. I think I've just gotten used to the
crappy state of doing routing for broadcasting messages to a network
[00:05] <josephg> if you can find / think of a better solution, I'm in.
[00:12] <alown> Heh, anyway replacing the network layer code SHOULD be
easy, since it SHOULD be cleanly seperated.
[00:13] <alown> Getting an initial implementation up using broadcast is
fine.
[00:13] <alown> (I was thinking of Wave's use in other apps as a reason you
could have a lot of different participant domains)
*...4) Routing **p2p **messages/events in a pure P2P system (part 2):*
[08:53] <stenyak> as for the "how to *really* do p2p", i see two options:
a) use a dht-like algorithm and/or b) use a helper server to route stuff
for you
[08:54] <stenyak> a) can be pretty slow if you want all OPs to reach all
peers (if I'm not mistaken)
[08:54] <stenyak> and b) is essentially makes it not-p2p
[08:55] <stenyak> additionally, using p2p, how are we going to deal with
routing problems (such as firewalls on both sides, etc)?
[08:56] <stenyak> in my mind, the only universal solution is to have a
third party server available to go through if we want speed or if we want
to work on all edge cases
[08:56] <stenyak> and wave being advertised as realtime, i don't see how
something like dht can ever fly
[11:20] <alown> stenyak: This is why I was wondering about a DHT system
with 'Superwave' servers (to act as a first point of contact).
[11:59] <stenyak> that would be like skype dynamic supernode list?
[11:59] <alown> The original system, yes.
[12:02] <stenyak> so we would devise a method to identify candidates to
being a supernode, in order to prevent cellphone wave peers from becoming
one, and in order to promot certain other nodes (like major peers that have
99% uptime, e.g. wave.google.com or whatever)  to become one
[12:03] <stenyak> bandwidth, latency, open ports, uptime...
[12:04] <alown> Once a network has been bootstrapped using something, it is
relatively easy to identify the hosts which are most densely connected (and
would be good supernode candidates)
[12:05] <stenyak> what do you mean with "using something"?
[12:06] <alown> Somehow the network has to initially be able to make
contact with other nodes (before it knows anything about them)
[12:07] <alown> For a LAN you could get away with a broadcast 'announce',
but it is a bit less clear on an internet-sized scale.
[12:08] <stenyak> bittorrent sync uses a broadcast for LAN. for internet it
uses a tracker server for fast discovery of peers, or you can disable that
and force to use DHT (with the long wait that means)
[12:09] <stenyak> the tracker can also act as a meeting-point for
firewalled peer pairs (which in my experience is a lot of them)
[12:09] <alown> Precisely the problem, because we don't really want long
waits or trackers.
*...4) Routing **p2p **messages/events in a pure P2P system (part 3):*
[12:42] <stenyak> hmmm... i'm not sure how a peer gets a list of waves in
which he's a participant of
[12:43] <alown> Having a canonical source makes it all so much easier. :P
[12:44] <stenyak> for pure p2p peers to "receive" new waves, either the
FROM or the TO peer (or both) would need to try to find their way to the
other
[12:44] <stenyak> and we're assumign here that each person only runs one
peer
[12:45] <stenyak> e.g. my privatekey may be used by 5 wave peers at the
same time, and we must make sure the new wave reaches all of them
[12:46] <alown> Looks like we may need to have mulitple DHTs then (one for
ops, one for waves)
[12:46] <stenyak> in BT, it's the receiver end who actively looks for peers
to receive from. in wave, it's not like that..
[12:46] <alown> Or could we have a pubkey->wave mapping in one?
[12:46] <stenyak> and in BT, you can assume *many* people has the data you
want
[12:46] <stenyak> in wave, its possible and probably that only one other
peer in the universe has the wave
[12:46] <stenyak> (because it's a personal wave sent to you)
[12:47] <alown> I would expect any long-running supernodes to be implicitly
part of all waves they know about.
[12:47] <alown> Though on second thought, this seems like it would add its
own problems to authentication, storage, promotion of supernodes etc.
*...4) Routing **p2p **messages/events in a pure P2P system (part 4):*
[12:51] <alown> Does it make sense for a peer to have your privkey, since
you could be logged in anywhere, so it would be down to the place you are
logged in, to 'subscribe' to that wave on the network, and attempt to
retrieve all data from it...
[12:55] <alown> I was expecting the network as a whole to act like a
WaveBus pubsub system, whereby once 'logged in' at some server (which means
it gets your privkey from the authentication system), that server then
'subscribes' to your waves on the 'network'. If somebody else at some other
server changes it, then that server would be announcing to the network of a
change (doesn't necesserily have to be a broadcast), which your server
would 'hear'.
[12:56] <alown> You could do this from any server where you logged in
(hence the concept of a domain is lost).
[12:57] <stenyak> by "server" you mean supernodes?
[12:57] <alown> Not necessarily.
[12:59] <stenyak> this pubsub network must be aware of nodes that are in
it, in order to directly route wave updates to them, correct?
[12:59] <stenyak> and also, this network wouldn't be very volatile, but
would rather ideally be long-lived peers?
[13:00] <alown> It has no reason to have to directly route updates, (though
it would hopefully be able to identify the best routes automatically).
[13:00] <alown> Yes it would require a few long-lived peers (which would be
part of the requirement to be a supernode).
[13:01] <stenyak> so let's say i connect my laptop wave peer to the
"server" in the living room, at my firewalled home. this "server" would be
already subscribed to the pubsub network, and in this specific case it
would route all wave updates to me
[13:02] <stenyak> in other cases (let's say, ipv6-enabled nodes everywhere,
no firewall at home), the living room server could simply notify the
original "FROM" peer to send stuff to my laptop ipv6 ip, right?
[13:03] <alown> That sounds right. Supernodes are really only needed for
getting the routing right.
[13:05] <stenyak> ok. in both these theoretical cases, the "server" hasn't
necessarily been a wave node per se (nor a supernode either), but rather a
second type of wave node that helps get stuff quickly wherever it's needed
[13:05] <alown> Yes.
[13:05] <alown> I am not even sure where OT should be happening in this
picture...
[13:05] <stenyak> if OT happens, the "server" is a blind proxy i think
[13:06] <stenyak> so does not need the privkey to work
[13:07] <stenyak> unless we're also using OT in the wavebus pubusb network
for some reason?
[13:07] <alown> Supernodes can be blind (though they might also just be
normal well-connected wave servers). I would expect normal servers to still
be doing OT. The question is whether the 'client' (whatever that means)
should be doing it also.
[13:08] <alown> The network shouldn't need OT. (Algorithms exist that allow
the incoming ops to be arbitarily queued and only processed when needed).
[...]
[21:21] <josephg> alown: the client always needs to do OT because otherwise
they can't both edit a document live and receive operations from people who
didn't have their ops.
[21:22] <josephg> the server doesn't need to do OT, although if it doesn't
do OT, it'll punt the OT work to its clients - which will result in a
higher CPU utilization on mobile devices.
[...]
[13:08] <stenyak> i pictured this "server" as being an optional item that
shortcuts the long waits of DHT, rather than something necessary for
"clients"?
[13:08] <alown> Hmm.
[13:08] <alown> I suppose we should define what a 'client' is then...
[13:09] <alown> We have at least 2 layers of stuff going on here: 1) Wave
OT/operation layer 2) Network routing/P2P layer
[13:13] <alown> But it is quite plausible something might be doing both of
those
[13:10] <stenyak> with your pubsub net suggestion, i was picturing 2 kinds:
a regular pure p2p peer, and a helper kind of node to route stuff quickly
when a peer is connected to it
[13:13] <stenyak> so with that picture in mind, layer 1 stuff could go
directly from peer to peer (if connectivity/firewalls allows), or through
the "helper node" if available
[...]
[13:20] <stenyak> [...] all this discussion looks very similar to
discussing how to design internet+dns, i think the problems are the same
really
[13:20] <stenyak> or at least we could take some inspiration from it maybe
[13:20] <alown> This was my conclusion last night with josephg. ('The
problmes should already be solved (see The Internet)')
[14:09] <stenyak> and The Internets solved the problem how? By having a
large set of supernodes (dns servers), that may take a whole day to
propagate updates. The alternative being having the actual IP address in
the first place, or to centralize stuff
[14:10] <stenyak> (aka use servers everywhere)
[14:22] <alown> Maybe, but the internet's design is X (where X > 20) years
old, so may not represent the most modern thinking of how to make
distributed networks.
[14:59] <alown> (Don't forget that our aim for Wave is at the cutting-edge
of academic research also).
[...]
[14:50] <stenyak> i just threw the question at some friends who should be
more up-to-date with networking technologies than me... hopefully they
comeback with some revolutionary dns-2 design or something that we can copy
[15:18] <stenyak> could give as some ideas: http://openpeer.org/
[15:18] <stenyak> (it's not a solution, but maybe they did the same
reasoning we're going through)
[15:46] <stenyak> another response i got goes along the lines of... hard as
fuck, but if you manage to do it, you are a hero
[...]
[15:02] <stenyak> looking at it from a wider perspective, what we want is
similar to having each peer shout at the whole world "here i am, anything
got something for meeee?" in some way that doesn't clog the internet tubes,
and that is so fast as shouting would be. i start to think it's not
physically possible to do that...
[15:03] <stenyak> if publickeys were handed to people based on the
location, then we could have routing tables similar to how internet
currently works
[15:03] <stenyak> but pubkeys are... well, random. so that kind of routing
that allows anyone to connect to an arbitrary IP in a matter of
milliseconds is impossible, i believe
[15:04] <alown> So, we end up with DNS for public keys?
[15:04] <stenyak> something like dns, but much faster [wrt. propagation
times]
[15:05] <stenyak> so in essence, a tree of servers or whatever (which is
similar to how wave currently works, right?)
[15:05] <alown> Heh. But the whole point was to avoid the tree system
currently (since it is susceptible to netsplits)
[...]
[15:56] <stenyak> maybe the real question could be: how do we make DHT much
faster?
[16:14] <stenyak> once the initial discovery process is finished, the
transmission of data will not have the lag associated with DHT, so even if
DHT takes 10 seconds, that could be acceptable
[16:15] <stenyak> i.e. a new peer takes 10 seconds to be discovered by the
rest of participants collaborating in a wave
[16:16] <stenyak> (or viceversa.. the new peer takes 10 seconds to discover
the participants)
[...]
[16:25] <stenyak> this could shed some light:
http://en.wikipedia.org/wiki/Distributed_hash_table#Algorithms_for_overlay_networks
[19:06] <stenyak> http://dsn.tm.kit.edu/english/2936.php
*...4) Routing **p2p **messages/events in a pure P2P system (part 5):*
[21:03] <josephg> [...] For now, I want wave to be p2p in the same way that
git is p2p.
[21:04] <josephg> that is, I want the core algorithms & data structures to
use P2P-capable algorithms, and probably the wave servers will do p2p
between themselves (this is easy because they'll all be both named and
accessable)
[21:06] <josephg> as for client-to-client p2p, there's a few options
depending on what kind of use cases we want to support - but I want to
worry about getting the algorithms p2p-capable first. If you're keen to set
up an anonymous, distributed wave system over a DHT - well, I want to first
make that possible
[21:15] <josephg> .... and as for ipv6, network admins _love_ NAT now that
we have it


*5) Implementing "undo": invertibility, tombstones, edge cases, TP2:*
[00:17] <alown> I am not sure how an 'undo stack' is going to work (at all)
with federation...
[00:18] <josephg> well, you just do undo at the application level
[00:19] <josephg> "submit op which inserts text" ... later "submit op which
removes text"
[00:19] <josephg> you don't need OT for that.
[00:20] <josephg> I imagine like, a semantic undo. In the client you can
imagine making an undo op (which might not necessarily rollback an
operation (because of tombstones and all that))
[00:20] <josephg> ... but would seem that way as far as the user is
concerned
[00:21] <josephg> then if the user hits ctrl+z, you can transform that
operation up to the current version and apply it
[00:21] <josephg> - the fact that its an undo isn't really relevant.
[00:21] <josephg> the bad thing about losing invertability is doing playback
[00:21] <josephg> - because you can't scrub back through time
[00:21] <alown> But you have all the operations since the start, so you can
play forward at least?
[00:23] <josephg> yeah exactly.
[00:23] <josephg> ... and make like, keyframes of the document
[00:23] <josephg> - and play forward from them or something.
[00:23] <alown> Hmm, so you can do the step-back without recalculating the
entire document?
[00:24] <alown> I don't really like the idea of then having another
datastructure to have to pass around...
[00:24] <josephg> right - if you have a snapshot at version 1000, and the
user is looking at 1010 and they try to step back to 1009, you can just
replay ops 1001-1009 on that version 1000 snapshot
[00:24] <alown> What was the problem with invertible operations (I don't
understand OT enough yet to be able to properly comment on that side).
[00:25] <alown> (Other than it confuses people?)
[00:25] <josephg> hahaha actually people seem to love invertability
[00:25] <josephg> I don't know why.
[00:25] <josephg> I've been trying to remove it from sharejs, and everyone
gets sad.
[00:26] <josephg> the problem is that if I make an op which deletes the
whole document (version 100, say) then I undo that operation
[00:26] <josephg> and you insert in the middle of the document at version
100, then your op gets transformed to do that insert at the start of the
document instead at version 101 (because the content has disappeared)
[00:26] <josephg> and it never goes back to the middle of the document.
[00:27] <josephg> so, with tombstones you can get around that by having a
'resurrect' operation
[00:27] <josephg> (so deleting the whole document turns the whole document
into tombstones, then we can resurrect them all again in the inverse)
[00:28] <josephg> but you can't invert an insert - because deleting leaves
the tombstone there
[00:28] <josephg> and if you have a 'real delete' operation, then yeah,
you're back in the hole
[00:28] <josephg> also, with wave in particular, inverting is really
complicated
[00:29] <josephg> - see, if the wave says "<annotation bold:true>blah
blah<annotation bold:false> not bolded"
[00:29] <josephg> then if you insert at the end of the "blah blah", it'll
automatically get bolded.
[00:30] <josephg> ... so if the text isn't bolded, and then you bold it
while I insert at the end of the text, you need to make sure my text
_isn't_ bolded or something
[00:31] <josephg> .... and yeah, I can't remember - but there's these
horror cases that I remember kept me from sleeping when I tried to
reimplement wave's OT code in C
[00:31] <alown> hmm
[00:31] <josephg> and it would have been fine if it wasn't invertible.
Well, at least it would have been tollerable.
[00:33] <josephg> So yeah. Conclusion: You can make invertability work, but
its kind of a bitch, and you can't make it work for TP2
[00:33] <josephg> which means it won't work if we're federating
[00:33] <alown> How are we hacking around that currently then?
[00:33] <josephg> well, we don't do TP2
[00:34] <josephg> remember, federation just uses a bad version of the
current client-server protocol
[00:34] <josephg> - arranged in a tree of servers
[00:34] * alown goes and looks up which one TP2 was again
[00:35] <josephg> ... its the one that says you don't need a canonical
ordering of operations
[00:35] <josephg> sharejs and wave both use the server to pick the order of
operations (based on which order they reach the server)
[00:35] <josephg> and then they use incrementing version numbers based on
that order
[00:35] <alown> ah yep.
[00:35] <josephg> -> for p2p, that doesn't work because you don't have a
centralized server, and anyone can send messages to anyone
[00:36] <josephg> and yeah, you need TP2 for that (which sort of says you
can apply ops from 3 different sites in any order and it still works)
[00:37] <josephg> - and apparently someone proved that if you make it work
for 3 sites, it works for any number of sites
[00:43] <alown> Anyhow, I can see leaving inversion out for simplicity, but
don't yet understand why it can't be made to work with TP2.
[00:59] <alown> Hmm. Seen 'A Sequence Transformation Algorithm for
Supporting Cooperative work on Mobile Devices'?
[01:02] <josephg>
http://research.microsoft.com/en-us/um/redmond/groups/connect/cscw_10/docs/p159.pdf?
[01:15] <alown> The main feature is its use of storing local/remote
operations and processing them much later than receipt time.
[01:17] <alown> ABT satisfies TP1+2, so looks like this should(?)
[01:19] <josephg> need to read it
[01:19] <josephg> ... I'll go through it later


*6) Usability of a pure p2p system in Real Life (tm):*
[12:13] <alown> We also don't know if storing ops in a DHT is efficent
enough for our use case...
[12:14] <stenyak> in any case, let's say i fire up my wavep2p android
client and want to check for any new waves
[12:14] <stenyak> i definitely won't put up with a wait of 30 seconds when
i have "this damn fast 4g connection!" in my cellphone
[12:14] <stenyak> i mean, that's the point of view of six pack joe
[12:14] <stenyak> and joe is definitely right..
[12:15] * alown thinks of the hours it took to download the bitcoin
blockchain from the p2p system
[12:15] <stenyak> or browse through freenet, or whatever... its painly slow
[12:16] <stenyak> in the end, i think that most users won't be running a
full blown peer, but will be relying on an external server instead
[12:16] <stenyak> i.e. nobody runs their own email servers nowadays
[12:16] <stenyak> and the same can happen with wave
[12:16] <alown> Should a mobile client be doing the full p2p federation, or
simply talking to a server which does it...
[12:16] <stenyak> the few who decide to run a full-blown wave peer, should
be aware of the problems
[12:17] <alown> So, this should be less of a problem since the only nodes
doing p2p will be proper full-time connected servers?
[12:17] <stenyak> the thing is, we can assume most people wont fire up
their own xmpp server, but go for jabber.org account
[12:17] <stenyak> and the same thing will presumably happen for wave,
simply because it's easier to do
[12:18] <stenyak> which doesn't pervent me from running my own full-blown
wave server
[12:18] <stenyak> but that's a use case in which the user knows the
limitations
[12:19] <stenyak> [...] you and i will run several full-blown wave peers at
home, at our parent's house, or whatever, but we'll know and accept the
problems
[12:19] <stenyak> i think that's the way to think about the problem
[12:19] <stenyak> heck, most people use github for permanent [git]
connectivity ;-)
[12:19] <stenyak> instead of opening ports to their laptop in their lan
[12:19] <stenyak> and those are the tech-savvy people...
[12:20] <alown> So, we have a p2p system between wave servers and superwave
servers, with clients connecting to the server rather than doing the p2p
itself...
[12:20] <stenyak> i'm not saying it's the way we should do it. i'm saying
that's the way it most probably will pan out, because it's already
hapennign in 100% of the existing p2p protocols i know of
[12:20] <alown> Hmm...
[12:21] <stenyak> so we should plan for that instead of a theoretical pure
p2p world
[12:21] <stenyak> if we assume there's servers like github, bitbucket and
sourceforge, then suddently most of the problems go away, while still not
preventing from people to run fully p2p if they want


*7) Comparison with BitTorrent and P2P-TV technologies:*
[12:21] <alown> BT doesn't have huge servers (and with magnet has actually
move in the opposite direction).
[12:21] <stenyak> BT has no real-time needs
[12:22] <stenyak> that's why they can afford DHT
[12:22] <stenyak> dht could be used for simulating a forum-like discussion
in wave. but we can't force that restriction from the server
[12:22] <stenyak> (i say forum-like, because people don't expect reaction
within seconds there)
[12:23] <alown> How did iplayer do its live p2p broadcastinºg?
[12:23] * stenyak googles what iplayer is
[12:23] <alown> Sorry, BBC iPlayer is their TV-over-the-internet system.
[12:24] <alown> Originally it used a p2p system, but got lots of negative
press (because of assosciation with BT since it used p2p), so it now uses a
centralized system instead. (And their bandwidth costs are much higher).
[...]
[12:25] <stenyak> i seem to recall other [p2p] tv clients
[12:25] <stenyak>
http://wiki.xbmc.org/index.php?title=HOW-TO:Play_free_P2P_(peer-to-peer)_online_streaming_TV
[...]
[12:26] <alown> Found a paper titled "RT-P2P: A Scalable Real-Time
Peer-to-Peer System with Probabilistic Timing Assurances" (google for it)
[12:28] <alown> Lookt at the paper I mentioned. It relies on 'super nodes'
to enable it to keep low latencies...
[...]
[12:27] <stenyak> but i'd be wary of using this (p2p tv) as an inspiration.
i know there's delay of 10-30 seconds from my TV Formula1 image to the
telemetry that comes through HTTP from formula1.com website. this is
regular TV, and they don't care about 30 seconds of lag
[12:27] <stenyak> the only real problem of p2p tv is avoiding much jitter
[12:27] <stenyak> as long as the stream arrives and is viewable, a delay of
a minute doesn't matter that much
[12:28] <alown> True.


*8) Identifying participants (part 1):*
[12:09] <alown> I am also no longer sure what an 'account' should look
like, since it has no reason to be stuck to a domain...
[12:10] <stenyak> current wave discovery works by using the domain name of
the email-address-like list of participants
[12:10] <stenyak> but here we're talking about hashes, public keys or
whatever
[12:10] <stenyak> which do not (necessarily) point to an particular IP:PORT
or whatever
[12:10] <alown> Exactly the problem...
*...8) Identifying participants (part 2):*
[12:33] <stenyak> would it make sense that, while some participants are
identified by a pubkey (or whatever), many of them could be identified by a
user@domain address, with which any peer can quickly locate supernodes?
[12:33] <stenyak> i mean some kind of dual "pubkey and optional domain
email-like addr" for the participants list
[12:34] <stenyak> the optional part being essential in the broader internet
[12:34] <alown> Isn't that exactly what using Mozilla Persona would do (map
user@domain to some public-key we can use)
[12:34] <alown> Removing the need for us to have to roll yet-another
authentication system.
[...]
[12:38] <stenyak> the idea would be that, for a person to be a participant
in a wave, you *require* his pubkey. optionally, you may have acquired ths
pubkey by asking "wave.google.com" about the user "joe", getting his pubkey
as a result.
[12:39] <stenyak> and now that you have the pubkey and one of many possible
email-like addresses (in this case j...@wave.google.com), then you can use
the email-like address for displaying in the UI
[12:39] <stenyak> this means that, whoever wants to run pure p2p peers,
will have to give his pubkey
[12:39] <stenyak> and whoever uses the more traditional style, can simply
give his email-like addr
[12:39] <stenyak> and the participants list will show a simple email-like
address most of the time
[12:40] <alown> Do we then allow anyone to 'log in' to any wave server
running at any domain, since it should no-longer make any difference where
they are in the network...
[12:41] <stenyak> yes, that's needed for world-wide-public waves, which is
equivalent to a read-only forum on the net
[12:41] <stenyak> then there could be server-public waves, which is
equivalent to requiring sign-in to view a forum (and coincidentally the
current implementation of public waves in WiaB, right?)
[12:43] * alown has never tested what happens with public waves in the
current federation system
*...8) Identifying participants (part 3):
*
[21:35] <josephg> - Who is a user? If a user is sten...@example.com, then
we can put a server at example.com and it can hold operations for you
[21:36] <josephg> ie, if I add you to a wave, my computer (or my wave
server or something) can send a message to example.com to say "Yo, here's
some ops you should know about"
[21:36] <josephg> that would be similar to a mailbox
[21:37] <josephg> ... and it would work pretty well. Bear in mind that
there's no reason operations have to go through the wave server at
example.com - if we're both on a LAN together, we could discover one
another through DNS service discovery and send ops directly
[21:37] <josephg> .. without going through our respective wave servers
[21:38] <josephg> However - if our identities aren't tied to a domain (eg
bitcoin), then we'll need to use a dht or something.
[21:42] <stenyak> the conclussion i've arrived at is that "users"
ultimately are a publickey (for which they have the privatekey). this is
inconvenient for people to "add you to a wave", so a possibility would be
to have a friendlyname=>pubkey server converter. this way people can add "
sten...@example.com", by first finding out what the pubkey for
sten...@example.com really is
[21:43] <stenyak> the friendlyname would be optional, and in LAN
environments you could directly use the pubkey (instead of the friendly
name)
[21:43] <josephg> I think people will be more than happy to use a frienly
name in a lan environment too
[21:43] <stenyak> discovery in a local network could be done with bonjour
or something too (not just dns)
[21:44] <josephg> I <3 dns-sd
[21:44] <stenyak> [...] maybe they already have a contact list (read, list
of friendlyname<>pubkey equivalences) they can use in the UI (even if the
underlying system will use pubkeys anyway)
[21:44] <stenyak> and by contact list, i really mean a cache of some sort
[21:45] <stenyak> (not some specific, complex roster system)
[21:45] <josephg> and you can do friendlyname -> pubkey really easily by
just storing the pubkey on the user's domain
[21:45] <josephg> so, have the example.com webserver host
https://example.com/.wellknown/stenyak
[21:46] <josephg> = your public key.


*9) P2P anonymity (peers that want to anonymously lurk in a wave) (part 1):*
[12:48] <stenyak> by the way, what about non-participants that simply want
to lurk a wave?
[12:49] <stenyak> e.g. i'm given a wave uri
(wave://look_at_these_kittens_wave), and want to view it
[12:49] <alown> Whilst a wave is  public, as soon as they 'read' the wave,
they would have a metadata wavelet created, so would become a participant
(if read-only).
[12:50] <stenyak> and from then on, whenever the wave changes, someone will
try to make the change reach the peers with my privkey
[12:50] <stenyak> supposedly..
*...9) P2P anonymity (peers that want to anonymously lurk in a wave) (part
2):*
[21:18] <josephg> stenyak: interesting point about people who want to not
participate but follow a wave anyway - its really bad if other people can
tell that they're there (assuming the wave is public).
[21:18] <josephg> I guess we just need to make sure that the metadata wave
is invisible, and then its ok..
[21:21] <stenyak> invisible.. to what peer/s? surely those that are
transmitting deltas to the lurkers will need to know they exist?
[21:21] <stenyak> (maybe some of the algorithms behind freenet can help
with this)
[21:21] <stenyak> (or even TOR)


*10) Encryption of waves:*
[21:47] <josephg> for waves themselves, I'm imagining giving each wave an
AES key
[21:47] <josephg> then storing an encrypted version of the key for each
participant on the wave
[21:48] <josephg> .... anyway, that way anyone who has the AES key can read
all ops on the wave
[21:48] <josephg> and can participate (because they can encrypt ops for the
wave)


*11) Addition and removal of participants, and their ability to read past
and future wave versions/deltas:*
[21:48] <stenyak> what about removing a user from a wave?
[21:49] <josephg> worst case, we can just make a new key and re-add
everyone using the new key
[21:49] <josephg> and keep around the old key too
[21:49] <josephg> so people can still read the old ops as well
[21:49] <stenyak> the user can access their browser cache for all we care.
if you ever read it, there will be ways to do it. "download now wave-spy to
read waves you were removed from!"
[21:49] <stenyak> so providing an official way sounds better
[21:50] <stenyak> the AES key could change at any point in time, e.g.
whenever a new users is added (to prevent them accessing the history), or
deleting them (to prevent them from reading future history)
[22:32] <josephg> um - in wave, we let new users see the whole history
[22:40] <stenyak> but that use case could be desirable, right? and if we
support modification/versioning of the AES key, we might as well allow that
too? the equivalent in email world would be to forward an email, removing
the existing quotes
[23:17] <josephg> Yep definitely.


-- 
Saludos,
     Bruno González

_______________________________________________
Jabber: stenyak AT gmail.com
http://www.stenyak.com

IRC discussion on P2P waving

Reply via email to