On 12/12/2017 12:24 PM, Wes Hardaker wrote:
Michael StJohns <m...@nthpermutation.com> writes:

2) T + activeRefresh  is the time at which the server sees the last
query from the last resolver just starting their trust anchor
installation.
3) T + activeRefresh + addHoldDownTime is the time at which the server
sees the first query from any resolver finalizing its trust anchor
installation.
There is where we disagree.  Given 2, where you state "last query from
the last resolver is just starting", I argue that exactly an
addHoldDownTime beyond that is when that *exact same resolver* will
finish because it will have sampled the first at (2) and again at
exactly T + activeRefresh + addHoldDownTime and accept it, per 5011:

    Once the timer expires, the new key will be added as a trust anchor
    the next time the validated RRSet with the new key is seen at the
    resolver.

And the last query from the last resolver will be at T + activeRefresh +
addHoldDownTime.

Seriously no.

This isn't this hard.  You need to stop thinking about what's happening from the point of view of one client and think about how the server views the behavior of the collection of clients.

Dealing with your "attack" scenario and assuming no retransmissions for any client before it gets a response to its first query, the earliest time that the server can assume that the first client can start its addHoldDown timer is right after T (the lastExpirationTime).   The latest time that the server can assume that the last client will start its addHoldDown timer is the activeRefresh interval after T.

(Assume a queryInterval of 14 hours and a set of 840 clients evenly distributed with their refreshes happening one a minute - the last client (#840) will make its query at T + activeRefresh and start its addHoldDown timer then).

So dealing only with that last client the server has to wait at least T + activeRefresh before it assumes that the client has started its addHoldDown and T + activeRefresh + addHoldDown before the client has finished its addHoldDown and is about to make its last query.

The best case scenario (from the servers point of view) is when that same client has its last query before its addHoldDown time expires at just under the activeRefresh interval (e.g. if the expiration is at noon, and the active refresh is 1 hour, then the best case if the last query was at 11:00:00.00001) causing the query after the addHoldDown time to occur at 12:00:00.00001.   The worst case scenario is when ANY client has its last query at .00001 before the addHoldDown time expires, making the final query happen at the activeRefresh interval after the expiration or in the example at 12:59.59.99999.

For a given client assuming no query losses, there are  FLOOR (addHoldDown/activeRefresh) queries in the addHoldDown interval (between when the client starts its timer and when it goes off). The difference addHoldDown - (FLOOR (addHoldDown/activeRefresh) * activeRefresh) is this activeRefreshOffset interval you keep trying to put in.  However, we do assume losses and we do (and MUST) assume the worst case that at least one client out of 10K, 100K or 1M is going to end up doing fast queries and changing that difference such that they end up with their last query before their addHoldDown timer occurs JUST before it expires.


Between (2) and (3) any given resolver may drift/retransmit with the
result that any given resolver may end up making a query just before
(3) placing its next and final query at (3) plus activeRefresh.
Please forget drift in the top half of the equation.  There is zero
drift in the mathematically precise section.  We will deal with drift,
delays, and everything else in the safetyFactor alone, with many terms
or concepts within it to get it right.

And here's where you go off the rails.   You don't need to include the safety factor to deal with the (2) to (3) interval as retransmit can't reduce the addHoldDown period, but can reduce - for a given client - the number of queries in the period.  Retransmits and drift also can change *when* in that interval the given client produces its last query before expiration - e.g. cause a "phase shift".



    5) will query again at lastSigExpirationTime + 30 days - .000001
No - from the servers point of view, the worst client (which is the
only one the server cares about) will make its last query before trust
anchor installation at lastSigExpirationTime + activeRefresh (when the
last CLIENT saw its first valid update)  + 30 days -.0000001.
Yes, I said that in 6 stating that it was *still waiting*.  IE, #5 was
supposed to describe the second to last query.

    6) notes this is still in waiting period
Let me put together something, per Paul's request, to work at this from
another angle where one of us can be shown right or wrong.

[... retry, delay text by me deleted ...]

And again. NO.  The retransmits over a given set of clients in the
addHoldDown period will result in at least one client (the "worst"
client) ending up making a query just before the expiration of ITS
addHoldDown timer.  Assuming the worst case of at least one client
making a query just before the lastSigExpirationTime and that same
client drifting/retransmitting enough to make a query just before its
addHoldDown time the activeRefreshOffset is a useless value to
calculate.
If you want to put an extra activeRefresh into the safetyMargin to
account for drift, I'm willing to do so.  Or we can insert a new term
labeled "driftSafetyMargin" and define it as activeRefresh if you want.
But that goes below my math line, not above it (and we can relabel
safetyMargin as retryFailureSafetyMargin).


 From a purely security analysis point of view, the first thing we have
to agree upon is the precise moment at which all clients in a perfect
world, with *no errors at all* (no drift, no retries, no transmission
line delays, no CPU processing delays, no clock failures, etc).  Once we
have this line in the sand in place, then we can introduce real-world
correctional elements to account for reality sneaking into our perfect
world.  I'm trying to talk only about the perfectionists world line in
the sand first, and then introduce needed operational components *after
that line*.  You keep inserting "drift" (eg) everywhere in the process
of this argument, which I absolutely agree needs to be dealt with.  But
below my perfect-world line only.  The way I keep reading everything
you've written is that your perfect line includes two activeRefreshes,
which I (still) argue is incorrect.  As I said last time and this time,
I'd be happy to insert a "drift" term, but lets please label it what it
is.  If you agree with that, I'll make that change and push.


*sigh*

No.   You screw up the analysis doing it that way because it doesn't account for the possible "phase shift" that can happen with a client during its addHoldDown period.

I've been using "sigExpire + activeRefresh + addHoldDown + activeRefresh + safetyFactor" because its actually easier to calculate.  The actual formula is

"sigExpire + (activeRefresh + retransSlop) + addHoldDown + (activeRefresh + retransSlop)"  where retransSlop is about 1/2 the safety factor.

Both the interval after sigExpire and before addHoldDown and the interval after the last activeRefresh require a safety factor to account for retransmissions.   The addHoldDown interval does not because retrans and drift result in a phase shift within the interval, but do not affect the length of the total interval.


A "perfect" system will behave the way you've described - but adding a safety factor while ignoring the phase shift brought on by retransmits within the addHoldDown interval will not characterize the actual system.




I hope this is visible.  The first group is the "perfect" one where we start at random point in the interval [0..activeRefresh].  The second group is the one with retransmits inside the add Hold down and where the signature expired just after the last refresh.  The third is Wes' perfect with the queries starting just as the sigPeriod expires.  Wes would add the saftety factor at the end of the third one and call it done.  The second one represents the actual worst case to which we'd add the safetyFactor to account for drift and retransmits for the two activeRefresh intervals.

Can we stop now?


Mike


_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to