On Fri, Mar 9, 2012 at 8:52 PM, Paul Syverson <[email protected]> wrote: > On Thu, Mar 08, 2012 at 06:41:25AM +0000, The23rd Raccoon wrote: >> has blinded the tor devs to a very serious type of active attack >> that actually will: the crypo-tagging attack. > > Nobody's blinded to the possibility. Many of us knew long ago that > several things like this are easy to do.
I meant blinded to the severity. >> In 2009, the devs dismissed a version of the crypto-tagging attack >> presented by Xinwen Fu as being equivalent to correlation back when >> the "One Cell is Enough to Break Tor's Anonymity" attack came out[1]. > > As noted earlier in the post and in many other places, > it's trivial to put in active timing signatures if they are needed. Only if you have enough data to encode a time signature into. One cell is not very much data. You'll see why this matters in a few paragraphs. > So, to convince me that your analysis shows we should revisit tagging for Tor > you > would have to show three things: Your requirements don't seem to match the goal of revisiting tagging, so I bent them slightly. Your requirements seem instead to invite revisiting correlation entirely. But that's OK. I want to deal with tagging, too, so I'll deal with it first. It's along the way, as they say. As we'll see, tagging allows a type of amplification attack that can be *simulated* with a timing attack, but I'll argue it is simulated poorly. I have not yet provided full Bayesian analysis of the bounds of the accuracy of simulation, but I have written the dominating components, and I'll finish it if you like. If you want me to analyze active timing attacks using similar Bayesian analysis, that might be a taller order. I'd need to scavenge the local dumpster archives for a while to collect a representative sample of attacks and pour over how to interpret their (very likely misrepresented or at least embellished) results. If you could select your favorites, it might speed things along. Either way, just let me know. > (1) Convince me that a truly global adversary is realistically worth worrying > about Intuitively, tagging attacks create a "half-duplex global" adversary in places where there was no adversary before, because the non-colluding entrances and exits of the network start working for you. You get to automatically boost your attack resource utilization by causing any uncorrelated activity you see to immediately fail, so you don't even have to worry about it. This effect is by virtue of the tag being destructive to the circuit if the cell is not untagged, and also being destructive when a cell is "untagged" on a non-tagged circuit. In other words: in the EFFs graphic, tagging attacks create a second translucent NSA dude everywhere in the world *for free*. This translucent NSA dude is effectively closing circuits that the real NSA dude didn't want to go to there in the first place. He makes sure that your circuits only go through another NSA dude. So to answer your question: because of this "half-duplex global" property, the tagging attack actually does not require you to have to worry about a true global adversary to see it is worse than correlation (active or passive). Any amount of resources (global or local) that you devote to tagging automatically get amplified for free by the global translucent NSA dude. How well you are able to correlate afterword requires a secondary attack. Depending upon the nature of the tagging vulnerability you find, you might be able to encode an arbitrary bitstring to uniquely identify the user, eliminating the need for any subsequent correlation. In fact, I'm pretty sure this is possible. > (2) convince me that an adversary that does active timing correlation would > not > remain a significant threat even if tagging were no longer possible I'm going to bend the rules again and instead try to convince you that an attacker who tags can observe more compromised traffic than an active timing attacker who attempts to simulate his attack, making tagging qualify as an amplification attack in a separate class entirely. To simulate the same amplification attack with correlation (active or passive), you have to correlate every circuit at your first NSA dude to every other circuit at your second NSA dude, and kill the circuits that don't have a match on both sides. You also have the added challenge of doing the initial correlation with few enough cells to kill the circuit before any streams are attached (so users don't notice). The need for early detection rules out virtually all of the benefits of active timing attacks for this step, which require quite a lot of data to encode their fingerprints (especially when making them provably effective or practically invisible). Therefore, we are back to analysis dominated by passive correlation for the circuit killing step (the crux of the simulation). In order to kill the circuits that don't match, NSAdude1 has to ask NSAdude2 out of band if NSAdude2 has seen a match for each circuit that NSAdude1 sees, and viceversa. The probability P(M|C) of the NSA dudes seeing a true match given their correlater predicted one trends down in proportion to P(M) = (c/n)^2 * (1/M)^2, similar to my Example 3 but with an extra 1/M factor in there, since we're talking about a fully correlating adversary. Note that that M doesn't change (from 5000 in my examples) just because you see less streams locally. Your probability of seeing a match is pretty low compared to all the other things you see. (This piece will also be key for any later analysis of active timing attacks, which still will be dominated by 1/M^2). Therefore, even if (or perhaps *especially if*) you don't devote global-scale resources to the attack, you're going to be crushed by the base rate. To complete the simulation, at the circuit killing stage your choice is either over-estimate and take the union of the first pass mismatches and waste resources, or only pick the intersection of 1:1 matches on the first pass and kill off quite a few actual matches. Therefore, you lose either resource amplification or the omnipresent "half-duplex" translucent NSA dude that the tagger gets for free, and depending on implementation choice you might even end up doing worse than the active attack by itself without attempting the circuit-closing amplification. The exact amount of tradeoff depends on how global vs how local you are, and if you choose to be lenient or aggressive in your uncorrelated killing. I conclude that the superiority of true tagging over simulated tagging clearly makes true tagging qualify as a resource amplification attack, which is indeed considered a different class of attack than correlation alone. Would you like a Bayesian proof with some real numbers, or do you concede we should move on to active timing attacks? > (3) convince me that your numbers correspond to reality and that the results > are robust to > intersection attacks and ancillary information. Is this a trick question? Dude, you realize I'm a Raccoon, right?... Nothing is robust to intersection attacks. If you add up enough pieces of info over time, you deanonymize someone. The game's all about collecting enough bits from wherever you can (or about scattering those bits to the wind, if you're on the other side of the line). > I also want to comment on your consideration of an adversary looking > for the clients visiting a given website. Let's accept for the moment > the idea of full GPA and accept your numbers. Even if we accept your > EER that is at least an order of magnitude worse than experiments have > found (i.e., 99%) you come up with initial anonymity sets of who is > visiting a particular website (respectively which destinations a given > client is visiting) of around 50. That is essentially zero for a big > and powerful adversary. Then add in any ancillary information > geographic location of IP addresses, prior knowledge about the > clients, nature of the destination servers, etc. not to mention > intersections over time. Rather than undermine the adequacy of passive > correlation, you have supported its effectiveness. You (and others in this thread) misunderstand me. I'm not saying that correlation never works, or that all three of my examples are safe places to be if you want anonymity from the tor network as it is currently deployed and used. I'm merely saying that sweeping all types of end to end attacks under the rug blinds you to the very real effect that adding more concurrent users to the network has on correlation, and the difference is in fact substantial enough to alter at least some aspects of the threat model to take user base size and activity into account before evaluating attacks. _______________________________________________ tor-talk mailing list [email protected] https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk
