Michael StJohns <m...@nthpermutation.com> writes:

Hi Mike,

Thanks for explaining your thinking because I think, after reading it:
we're actually in agreement but using different terms for where to put
in the slop you're worried about.

Specifically:

> A perfectly operating resolver with perfect clock and perfect
> connectivity and no outages MIGHT possibly keep a perfect interval
> between each query it makes (making your activeRefreshOffset
> meaningful), but 10000 resolvers ALL keeping perfect intervals?

Yes, I agree.  But, this is why I want the majority of the equation to
be defining the mathematical perfect certainty.  And then *after* that,
add the operational slop factor (safetyMargin) to account for both
problems and reality (you forgot to add "speed of light issues" in your
text above, for example).

Thus, I break the equation into two critical parts:

addWallClockTime = lastSigExpirationTime
                   + addHoldDownTime
                   + activeRefresh                   ^
                   + activeRefreshOffset             |
                                                     |
Precise Math                                         |
-----------------------------------------------------|
Needed Fuzz                                          |
                                                     |
                   + safetyMargin                    |
                                                     v


IE, if a perfect resolver hitting a RFC5011 zone with an activeRefresh
that evenly divides into 30 days:

  1) queries at T--- = lastSigExpirationTime - .000001
  2) queries at T+1--- = lastSigExpirationTime - .000001 + activeRefresh
  3) Notes that it just saw a new key (assuming worst case #1 is replayed)
  4) starts timer
  5) will query again at lastSigExpirationTime + 30 days - .000001
  6) notes this is still in waiting period
  7) will query again at lastSigExpirationTime + 30 days - .000001 + 
activeRefresh
  8) now notes that it's been 30 days and accepts key

There is only 1 activeRefresh in that sequence.  And that's what's in
the equation.  Because the timing distance between #7 and #2 is still 30
days when queried to the perfect sub-nano second.

Then there should be a bunch of delays inserted, network timeouts, etc.
That's where the safetyMargin should come in and catch all the issues
with the impreciseness of the real world.  Now, if you want to add an
activeRefresh to the already defined safetyMargin suggested term, I'm
willing to consider that.  But it shouldn't be listed as part of
anything but the slop term for security analysis clarity.

Would you like to add more time to the safetyMargin to deal with the
non-perfect world, including clock drift because of time delays in a
bunch of queries back to back or any other reason?


Ending note about the precise timeline: when 30 days isn't divisible by
the activeRefresh, then you need to add the other term we haven't talked
about much which is the activeRefreshOffset which accounts for this
case.

Cheers,
Wes

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to