(What is this mail? See my explanation from the May mail I sent: https://lists.torproject.org/pipermail/tor-talk/2012-June/024572.html )
------------------------------------------------------------------------ Here's what I said at the beginning of June that I hoped to do: > - Participate in the Q2 Tor directors board meeting, including approving > the updated 2012 budget. Done. We now have a budget for the rest of 2012, including new positions we want to fill -- but see below. > - Understand the open positions in our current budget, what funders > each one maps to, and what the priorities are in terms of spending the > money. Then we can start putting some calls-for-resumes-and-code-samples > up on the website, as well as prioritizing which calls we want to do > when. Unless I ignore all of this until July. I made a map of funders-to-expenses, to try to get a handle on how much money from each grant could go towards new hires/contractors. Alas, that's not sufficient -- we need to know not just how much we planned to spend on the grant, but how much we have actually billed to the funder so far on each grant. Melissa is our official juggler-of-these-numbers: she makes sure all of our invoices are legit so we can continue to pass our non-profit audits. I need to get up-to-date numbers from her before we can plan for the future. I hope to do this at the dev meeting in July. (Aren't you glad you get to learn about the operations details for Tor? :) > - Get 0.2.3.16-alpha out. Get 0.2.2.37 out. Get 0.2.3.17-alpha out. Done. We even called 0.2.3.17 a beta, and put out an 0.2.3.18-rc too. We also moved Debian Wheezy to Tor 0.2.3.x, since Wheezy is stabilizing soon and we want to be sure to have the new Tor version in it. That means everybody who uses the torproject.org debs got upgraded to 0.2.3.x without much warning. Hopefully it went smoothly. > - Orchestrate the FOCI discussion and select the program. Done. We ended up picking 10 of the 20 papers: https://www.usenix.org/conference/foci12/tech-schedule/workshop-program I think we have a great selection of people and papers. Please drop by if you're in the Seattle area in August! > - Tell Micah Sherr and Chris Wacek (Georgetown) about the open > simulation questions; and get Rob Jansen (UMN/NRL), Mashael AlSabah > (Waterloo), etc a good summary of the current situation. Not done. But I now have a third target audience -- Nick and Andrea want to learn which performance-related dev tasks they should be thinking about. That's great, because "write Tor patches that we can try in the simulators" is a big component of the upcoming performance work. > - Read and consider http://microsoftjobsblog.com/zen-of-pm so I can > help Adam Shostack help us get a good project manager. Read; now mulling it over. Initial thoughts are that this is not the right context for Tor -- the MS PM position has a huge "understand the space and come up with a competitive product" component, whereas we have plenty of competitive products and not enough coordination helping us make them reality. Or said another way, the projects in question are already imagined, sold, and have a funder; now we need to follow through. That said, a lot of the properties described in the article are useful to have in people who make sure we follow through. And in the future, we need more people to imagine and "sell" more research/development projects. So I sure wouldn't *mind* having more of these 'program manager' people. Adam will be at PETS in Vigo, and I'll talk to him more about the topic there. > - I have a three hour slot at the SponsorF meeting this month. I'm going > to try to bring everybody there up to speed on everything. While also > letting other people talk for most of the time. Preparing for this talk > will be a big part of my June. Done. I put my slides here: https://svn.torproject.org/svn/projects/presentations/slides-june2012.pdf I'm afraid they don't work particularly well as stand-alone slides without me talking next to them. I'll try to get the information from them down into blog posts sometime; but alas, many other items are higher on the list. The talk went very well -- we got a lot of the other researchers asking questions and making comments, and we ended up running out of time. I feel a little bit bad that I have twice now left the "attacks on anonymity" slides for the end, and twice now run out of time to cover them well. If they schedule me for another slot next time, I should be sure to start with them. One of the nice things about working with this group of professors and grad students is that they're excited to talk to people who a) have a good handle on the real-world problems and adversaries, and b) know how to present that information in a way that makes sense to security academics. Basically every follow-on talk at the meeting had a reference to something from my talk in it. > - Meet with Kevin Dyer's lab at Portland State before the SponsorF > meeting. Rob Jansen and Aaron Johnson (NRL) will be joining me. Done. We talked about the state of website fingerprinting attacks in the literature -- most studied attacks look at the "closed world" question where you know that the user went to 1 of n web pages (chosen uniformly at random), and you try to guess which one. It comes down to how similar the page she picked is to the other pages she might have picked. http://freehaven.net/anonbib/#oakland2012-peekaboo The first issue I raised is that all these website fingerprinting attacks are going about it the wrong way, at least for Tor. They look at the packet level, and try to train their machine learning algorithms on various properties of the packets. Seems to me that since we know it's the Tor protocol underneath, what we should do is reconstruct the TCP stream, and learn how many Tor cells are sent in each direction. Then see which web pages in our set would use that number of cells. This approach should lead to much simpler attacks, since it's just a question of "how many pages fall into the same bucket as our target page?" rather than doing machine learning over a trace of IP packets. The approach should also lead to a new set of defenses to test: try to make the set of pages that would use the same number of cells as the target page ("collisions") as big as possible. This might be done for example by having the entry guard add padding cells to "snap up" to the next largest common size. The second issue was one that Mike Perry keeps raising: just because your closed-world data set doesn't include any web page similar to your target page doesn't mean that your attack will work well in the wild. In practice, there could be many other pages that the user visits, and if you don't know about them, you don't know what your false positive rate is. Knowing that there's an unknown level of inaccuracy should give at least some attackers pause. In fact, all the stats from the papers show how likely you are to guess correctly, but they don't explore the distribution of which pages are often guessed wrong and which are often guessed right. We need to explore what properties of pages make them more/less likely to collide with many other pages. It could even turn into a set of recommendations for website authors -- "how to write your website in a way that doesn't make your Tor users especially vulnerable to website fingerprinting". Kevin acknowledged these issues as 'future work'. Some time I'll flesh out the above two paragraphs and turn them into a blog post, to try to draw more attention to this very important research area. > - Help prepare for the SponsorF site visit that will occur a few days > before PETS. We'll need to provide slides/etc, and likely even call in > and do the phone presentation thing. Started to prepare. I need to write a set of slides that will work well even if somebody else presents them. > - Go to Stamford CT to do a Tor talk for one of Ian's past students. > http://privacyandsecurity.pbiresearch.com/agenda.html Done. I used a subset of the "june2012" slides from above. It turns out the talk slot was only 35 minutes, so I ended up smashing together fractions of three different talks. I'm told it went well. I also got a chance to chat with Ian about the vulnerability of obfsproxy's "obfs2" protocol to smarter DPI approaches. I think we need a new obfs3 protocol that uses ECDHE handshakes, so passive mitm attacks won't be effective at looking for redundancy in the protocol (just because each byte in the obfs2 traffic flow is uniformly random doesn't mean that the flow as a whole has no redundancy). I'll file a bug sometime in July to expand on this paragraph, for those who didn't understand it. > - Write an abstract for the ecrypt talk I'm doing at the workshop > before PETS: > https://www.cosic.esat.kuleuven.be/ecrypt/provpriv2012/invited.html Done: """ Title: Tor, real-world attackers, and (un)provable privacy Abstract: Tor's approach to threat models is to try to understand the capabilities of realistic attackers we expect to encounter, rather than picking adversaries our protocols can withstand. This strategy has led us to deploy systems that are not amenable to security proofs. Or to say it even more strongly, we deploy provably _insecure_ systems relative to real-world adversaries, because they're still the safest ones we can deploy. In this talk I'll explain some realistic attacks against Tor's anonymity and blocking-resistance properties, and discuss some reasons why it's hard to produce accurate and useful models for these attacks (and thus hard to prove things about them). """ I tried not to present it as "you people who do proofs are useless to us", because I want to draw them in and help them realize that the real world is messy and hard to model cleanly. It's a long-shot. > - Fly to Florence, for the Tor developers meeting and hackfest in July. > https://trac.torproject.org/projects/tor/wiki/org/meetings/2012SummerDevMeeting > https://trac.torproject.org/projects/tor/wiki/org/meetings/2012FlorenceHackfest Done. I also had a nice chat with Gunner about our internal politics and landmines he should be aware of, so he can facilitate the meeting more effectively. More in July on how it goes. > - Launch a working-group of pluggable transport developers and > researchers, and make sure they all know about each other. Done. I also set up a webpage with links as I know them right now: https://www.torproject.org/docs/pluggable-transports Please let me know if I missed anything! > - Help SponsorF come up with metrics by which the SponsorF Red Team > will judge the project's success. Here are some early thoughts on two "claims" we should explore. The idea is that we describe security properties we think we can back up with our tools, and then they analyze them and try to find contradictions and vulnerabilities. 1) Tor: "the adversary can't learn which user is communicating with which destination." Except: if he can see/measure the traffic flow between the user and the Tor network, and also the traffic flow between the Tor network and the destination. Except: lots of other subtle things from the various anonbib papers that it's probably not worth the red team's time to explore, since anonymity researchers have already worked on them for years. 2) Obfsproxy: "the adversary can't DPI for the flows made by obfs2, when it's using the shared-secret extension (#3 at https://gitweb.torproject.org/obfsproxy.git/blob/HEAD:/doc/obfs2/protocol-spec.txt#l96) That is, the most the adversary can learn from the bytes in the flows is that they're random." Except: after the handshake, timing and volume characteristics are still like the underlying flow (Tor in this case). More broadly, here are some notes on a wide variety of metrics we might consider for each component in our blocking-resistance world, with more emphasis on Tor's components and less emphasis on components I'm hoping other projects will make progress on: 1) tor anonymity: "chance the adversary can learn that alice is talking to bob" computed by measuring... diversity of relay location against traffic confirmation anonymity against selective DoS attacks anonymity against website fingerprinting attacks ... against many other anonymity attacks each wrt an adversary with various capabilities diversity of user types ("deniability") within the "nearby" subset of users performance of network (bw, jitter, latency, etc) mean/median, but also consider long tails load on network (how many users, what they are doing) 2) obfuscating transport how much "similar" background traffic exists how similar it is relative to how much scrutiny by the adversary pain from blocking false positives ("value" of background traffic) ...over what timescales space and computational efficiency for embedding/disembedding 3) rate-limited credential distribution (e.g. how we use bridgedb) how many honest users we can support for a given adversary level how much work we require the adversary to do per credential he gets ratio of credentials we can support (bridge addresses we have) to credentials we can afford the adversary to get 4) scanning-resistance / being an innocent service until credential is shown [related questions to the obfuscating transport] 5) address allocation strategy (reachability testing) for various strategies, how quickly do addresses get blocked? with what distribution? how much does our reachability testing help the adversary? ------------------------------------------------------------------------ Here are some other things I did in June: Reviewed Philipp Winter's upcoming FOCI paper "How the Great Firewall of China is Blocking Tor" and gave him comments. You can read more at http://www.cs.kau.se/philwint/static/gfc/ or wait for the revised version of the paper. Met with Nathan Freitas (of the Guardian Project) in NYC, to discuss the state of Tor on mobile, and let him know about the upcoming pluggable transports we're going to experiment with. So far it's not entirely trivial to put Python-based Tor components on Android, but the variety of pluggable transports that are going to be Python-based in the near future means he's going to have to add it at some point. Got up to speed on Tor design proposals 188-191, and sent comments to tor-dev. I should read / comment on the later proposals one day too. Talked to Nadia Heninger about her student Deepika Gopal's thesis at UCSD entitled "Torchestra: Reducing interactive traffic delays over Tor". It's like Rob Jansen's "Throttling Tor Parasites" paper, except she proposes to have each pair of relays use several TCP sessions in parallel, and migrate circuits between the sessions based on how loud they are. Then TCP can push back on the session full of loud circuits, while we still read freely from the session full of polite circuits. More analysis required, but it sounds worth exploring. Had a chat with a network security friend who pointed me at http://code.google.com/p/appid/ which uses a huge regexp to identify traffic flows by protocol. Apparently each DPI vendor has their own version of this tool. It's really easy to make it think that a given flow is http, since it doesn't aim to resist attacks like our obfuscating transports. It would be neat to make an automated pluggable transport that uses this regexp to automatically generate flows that appid thinks are a given protocol -- not with the intent of building a perfectly indistinguishable flow, but rather with the intent of driving up the false positives and uncertainty. The same friend also pointed me at http://jon.oberheide.org/0trace/ which can apparently traceroute over an established TCP connection -- it might be perfect for trying to figure out where China's GFW bridge probes are coming from. I passed the link on to Philipp and George, who I hope will do something neat with it. Talked to Nick Feamster about posting a job advertisement for him -- he wants to hire a researcher/developer to work on Bismark, which is quite related to OONI. We (Tor) need to figure out our policy for posting job descriptions on behalf of other parties -- it seems to me that at least in this case we should do it. ------------------------------------------------------------------------ Here are some items I expect to do in July: - Attend the Dev meeting and hack fest in Florence. Help everybody understand about our upcoming grants, and the upcoming deliverables that go with them. - Attend PETS plus do a talk at the 'provably privacy' workshop in Vigo. - Probably go to Berkeley for the last week in July. - Summarize open simulation tickets and open performance tickets, so we can prioritize them and get more developer attention on them. - Publicize one or more new job openings on our jobs page: https://www.torproject.org/about/jobs.html.en and start collecting applications. - Make sure our new core dev gets added to the people page, and make sure we do some sort of announcement so there's closure. Follow-up on the original core dev job announcements to say we've got one (but leave the job announcement up, because we wouldn't mind having another if the perfect person came along). - Ian told me that Tariq's "Changing of the Guards" paper was flawed. I don't yet agree that it's flawed -- I should follow up with them and see which parts of the design need to be discarded and which I can resurrect. - Get Tor 0.2.3.x closer to stable. - Organize and announce (hopefully in that order) our upcoming plans for encouraging more exit relays. - Track down all the plans for my November trip to Amsterdam. The original plan was to speak in Rotterdam at their CA conference (organized after the DigiNotar thing), but that expanded to maybe talking to Dutch law enforcement, and then maybe Austrian law enforcement, and now the Belgian law enforcement want me to come explain the Internet to them too. All of these things are worth doing (the more law enforcement groups understand Tor, the less they hassle our exit relay operators and the less they lobby for laws to outlaw privacy), but we'll see how many I can fit in. - Start looking into properties we want for a more DPI-resistant "obfs3" protocol. ------------------------------------------------------------------------ Things I'm still dropping the ball on: - Transparently document the secteam process, especially since we have concluded to use it far less often and only for critical security things. - Answer the thread between Karsten and Jake where we had an excited volunteer with a clearly useful contribution that we totally dropped on the floor. Try to generalize the experience to improve our response to new contributors. We used to be great at it, and lately we're all overloaded. - Add a "scientific papers" exception to our trademark-faq: I want to give blanket permission to scientific papers to use the word Tor in their paper name, so long as they don't go and write software under that name too. https://www.torproject.org/docs/trademark-faq - Make a plan for fixing all the "CBT sometimes breaks Tor" issues. https://trac.torproject.org/projects/tor/ticket/3443 - Start summarizing Tor research papers on the blog more regularly. There have been a huge number of really important research papers lately, and most Tor people don't know about them. Should I summarize them on the blog (for a broader audience), or on tor-dev (for the rest of the Tor developers), or what? - I need new business cards. - Get https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorA through D back up on the wiki somewhere (Andrew took them down since they were concluded, and since they just listed contract deliverables rather than the progress reports and trac ticket links that we've been doing for later funders; but we should keep them there for posterity). _______________________________________________ tor-talk mailing list tor-talk@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk