On 12/12/2017 12:24 PM, Wes Hardaker wrote:
Michael StJohns <m...@nthpermutation.com> writes:
2) T + activeRefresh is the time at which the server sees the last
query from the last resolver just starting their trust anchor
installation.
3) T + activeRefresh + addHoldDownTime is the time at which the server
sees the first query from any resolver finalizing its trust anchor
installation.
There is where we disagree. Given 2, where you state "last query from
the last resolver is just starting", I argue that exactly an
addHoldDownTime beyond that is when that *exact same resolver* will
finish because it will have sampled the first at (2) and again at
exactly T + activeRefresh + addHoldDownTime and accept it, per 5011:
Once the timer expires, the new key will be added as a trust anchor
the next time the validated RRSet with the new key is seen at the
resolver.
And the last query from the last resolver will be at T + activeRefresh +
addHoldDownTime.
Seriously no.
This isn't this hard. You need to stop thinking about what's happening
from the point of view of one client and think about how the server
views the behavior of the collection of clients.
Dealing with your "attack" scenario and assuming no retransmissions for
any client before it gets a response to its first query, the earliest
time that the server can assume that the first client can start its
addHoldDown timer is right after T (the lastExpirationTime). The
latest time that the server can assume that the last client will start
its addHoldDown timer is the activeRefresh interval after T.
(Assume a queryInterval of 14 hours and a set of 840 clients evenly
distributed with their refreshes happening one a minute - the last
client (#840) will make its query at T + activeRefresh and start its
addHoldDown timer then).
So dealing only with that last client the server has to wait at least T
+ activeRefresh before it assumes that the client has started its
addHoldDown and T + activeRefresh + addHoldDown before the client has
finished its addHoldDown and is about to make its last query.
The best case scenario (from the servers point of view) is when that
same client has its last query before its addHoldDown time expires at
just under the activeRefresh interval (e.g. if the expiration is at
noon, and the active refresh is 1 hour, then the best case if the last
query was at 11:00:00.00001) causing the query after the addHoldDown
time to occur at 12:00:00.00001. The worst case scenario is when ANY
client has its last query at .00001 before the addHoldDown time expires,
making the final query happen at the activeRefresh interval after the
expiration or in the example at 12:59.59.99999.
For a given client assuming no query losses, there are FLOOR
(addHoldDown/activeRefresh) queries in the addHoldDown interval (between
when the client starts its timer and when it goes off). The difference
addHoldDown - (FLOOR (addHoldDown/activeRefresh) * activeRefresh) is
this activeRefreshOffset interval you keep trying to put in. However,
we do assume losses and we do (and MUST) assume the worst case that at
least one client out of 10K, 100K or 1M is going to end up doing fast
queries and changing that difference such that they end up with their
last query before their addHoldDown timer occurs JUST before it expires.
Between (2) and (3) any given resolver may drift/retransmit with the
result that any given resolver may end up making a query just before
(3) placing its next and final query at (3) plus activeRefresh.
Please forget drift in the top half of the equation. There is zero
drift in the mathematically precise section. We will deal with drift,
delays, and everything else in the safetyFactor alone, with many terms
or concepts within it to get it right.
And here's where you go off the rails. You don't need to include the
safety factor to deal with the (2) to (3) interval as retransmit can't
reduce the addHoldDown period, but can reduce - for a given client - the
number of queries in the period. Retransmits and drift also can change
*when* in that interval the given client produces its last query before
expiration - e.g. cause a "phase shift".
5) will query again at lastSigExpirationTime + 30 days - .000001
No - from the servers point of view, the worst client (which is the
only one the server cares about) will make its last query before trust
anchor installation at lastSigExpirationTime + activeRefresh (when the
last CLIENT saw its first valid update) + 30 days -.0000001.
Yes, I said that in 6 stating that it was *still waiting*. IE, #5 was
supposed to describe the second to last query.
6) notes this is still in waiting period
Let me put together something, per Paul's request, to work at this from
another angle where one of us can be shown right or wrong.
[... retry, delay text by me deleted ...]
And again. NO. The retransmits over a given set of clients in the
addHoldDown period will result in at least one client (the "worst"
client) ending up making a query just before the expiration of ITS
addHoldDown timer. Assuming the worst case of at least one client
making a query just before the lastSigExpirationTime and that same
client drifting/retransmitting enough to make a query just before its
addHoldDown time the activeRefreshOffset is a useless value to
calculate.
If you want to put an extra activeRefresh into the safetyMargin to
account for drift, I'm willing to do so. Or we can insert a new term
labeled "driftSafetyMargin" and define it as activeRefresh if you want.
But that goes below my math line, not above it (and we can relabel
safetyMargin as retryFailureSafetyMargin).
From a purely security analysis point of view, the first thing we have
to agree upon is the precise moment at which all clients in a perfect
world, with *no errors at all* (no drift, no retries, no transmission
line delays, no CPU processing delays, no clock failures, etc). Once we
have this line in the sand in place, then we can introduce real-world
correctional elements to account for reality sneaking into our perfect
world. I'm trying to talk only about the perfectionists world line in
the sand first, and then introduce needed operational components *after
that line*. You keep inserting "drift" (eg) everywhere in the process
of this argument, which I absolutely agree needs to be dealt with. But
below my perfect-world line only. The way I keep reading everything
you've written is that your perfect line includes two activeRefreshes,
which I (still) argue is incorrect. As I said last time and this time,
I'd be happy to insert a "drift" term, but lets please label it what it
is. If you agree with that, I'll make that change and push.
*sigh*
No. You screw up the analysis doing it that way because it doesn't
account for the possible "phase shift" that can happen with a client
during its addHoldDown period.
I've been using "sigExpire + activeRefresh + addHoldDown + activeRefresh
+ safetyFactor" because its actually easier to calculate. The actual
formula is
"sigExpire + (activeRefresh + retransSlop) + addHoldDown +
(activeRefresh + retransSlop)" where retransSlop is about 1/2 the
safety factor.
Both the interval after sigExpire and before addHoldDown and the
interval after the last activeRefresh require a safety factor to account
for retransmissions. The addHoldDown interval does not because retrans
and drift result in a phase shift within the interval, but do not affect
the length of the total interval.
A "perfect" system will behave the way you've described - but adding a
safety factor while ignoring the phase shift brought on by retransmits
within the addHoldDown interval will not characterize the actual system.
I hope this is visible. The first group is the "perfect" one where we
start at random point in the interval [0..activeRefresh]. The second
group is the one with retransmits inside the add Hold down and where the
signature expired just after the last refresh. The third is Wes'
perfect with the queries starting just as the sigPeriod expires. Wes
would add the saftety factor at the end of the third one and call it
done. The second one represents the actual worst case to which we'd add
the safetyFactor to account for drift and retransmits for the two
activeRefresh intervals.
Can we stop now?
Mike
_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop