Dear all, I'm very happy to see the direction this conversation has taken, seems we've moved on towards focussing on solutions and outcomes - this is encouraging.
On Mon, Oct 01, 2018 at 05:44:17PM +0100, Nick Hilliard wrote: > John Curran wrote on 01/10/2018 00:21: > > There is likely some on the nanog mailing list who have a view on > > this matter, so I pose the question of "who should be responsible" > > for consequences of RPKI RIR CA failure to this list for further > > discussion. > > other replies in this thread have assumed that RPKI CA failure modes > are restricted to loss of availability, but there are others failure > modes, for example: > > - fraud: rogue CA employee / external threat actor signs ROAs > illegitimately > > - negligence: CA accidentally signs illegitimate ROAs due to e.g. > software bug > > - force majeure: e.g. court orders CA to sign prefix with AS0, > complicated by NIR RPKI delegation in jurisdictions which may have > difficult relations with other parts of the world. > > These types of situations are well-trodden territory for other types > of PKI CA, where users > > Otherwise, as other people have pointed out, catastrophic systems > failure at the CA is designed to be fail-safe. I.e. if the CA goes > away, ROAs will be evaluated as "unknown" and life will continue on. > If people misconfigure their networks and do silly things with this > specific failure mode, that's their problem. You can't stop people > from aiming guns at their feet and pulling the trigger. There are a number of failure modes and I believe the operational community has yet to fully explore how to mitigate most risks. Over time I expect we'll develop BCPs how to improve the robustness of the system; these BCPs can only come into existence driven by actual operational experierence. A positive development that addresses some aspects of the concerns raised is Certificate Transparency. Cloudflare set up a CT log (https://groups.google.com/forum/#!topic/certificate-transparency/_deL5iGB5sY) and I hope others like Google will also consider doing this. CT is a great tool to help keep the roots perform in line with community expectations. I consider it the operator community's responsibility to figure out how to deal with outages. I don't intend to hold the RIRs liable - we'll need to learn to protect ourselves. Kind regards, Job