Re: [DNSOP] I-D Action: draft-ietf-dnsop-rfc5011-security-considerations-08.txt

Michael StJohns Tue, 12 Dec 2017 11:35:10 -0800

On 12/12/2017 12:24 PM, Wes Hardaker wrote:

Michael StJohns <m...@nthpermutation.com> writes:

2) T + activeRefresh  is the time at which the server sees the last
query from the last resolver just starting their trust anchor
installation.
3) T + activeRefresh + addHoldDownTime is the time at which the server
sees the first query from any resolver finalizing its trust anchor
installation.

There is where we disagree.  Given 2, where you state "last query from
the last resolver is just starting", I argue that exactly an
addHoldDownTime beyond that is when that *exact same resolver* will
finish because it will have sampled the first at (2) and again at
exactly T + activeRefresh + addHoldDownTime and accept it, per 5011:

    Once the timer expires, the new key will be added as a trust anchor
    the next time the validated RRSet with the new key is seen at the
    resolver.

And the last query from the last resolver will be at T + activeRefresh +
addHoldDownTime.


Seriously no.

This isn't this hard. You need to stop thinking about what's happeningfrom the point of view of one client and think about how the serverviews the behavior of the collection of clients.

Dealing with your "attack" scenario and assuming no retransmissions forany client before it gets a response to its first query, the earliesttime that the server can assume that the first client can start itsaddHoldDown timer is right after T (the lastExpirationTime). Thelatest time that the server can assume that the last client will startits addHoldDown timer is the activeRefresh interval after T.

(Assume a queryInterval of 14 hours and a set of 840 clients evenlydistributed with their refreshes happening one a minute - the lastclient (#840) will make its query at T + activeRefresh and start itsaddHoldDown timer then).

So dealing only with that last client the server has to wait at least T+ activeRefresh before it assumes that the client has started itsaddHoldDown and T + activeRefresh + addHoldDown before the client hasfinished its addHoldDown and is about to make its last query.

The best case scenario (from the servers point of view) is when thatsame client has its last query before its addHoldDown time expires atjust under the activeRefresh interval (e.g. if the expiration is atnoon, and the active refresh is 1 hour, then the best case if the lastquery was at 11:00:00.00001) causing the query after the addHoldDowntime to occur at 12:00:00.00001. The worst case scenario is when ANYclient has its last query at .00001 before the addHoldDown time expires,making the final query happen at the activeRefresh interval after theexpiration or in the example at 12:59.59.99999.

For a given client assuming no query losses, there are FLOOR(addHoldDown/activeRefresh) queries in the addHoldDown interval (betweenwhen the client starts its timer and when it goes off). The differenceaddHoldDown - (FLOOR (addHoldDown/activeRefresh) * activeRefresh) isthis activeRefreshOffset interval you keep trying to put in. However,we do assume losses and we do (and MUST) assume the worst case that atleast one client out of 10K, 100K or 1M is going to end up doing fastqueries and changing that difference such that they end up with theirlast query before their addHoldDown timer occurs JUST before it expires.

Between (2) and (3) any given resolver may drift/retransmit with the
result that any given resolver may end up making a query just before
(3) placing its next and final query at (3) plus activeRefresh.

Please forget drift in the top half of the equation.  There is zero
drift in the mathematically precise section.  We will deal with drift,
delays, and everything else in the safetyFactor alone, with many terms
or concepts within it to get it right.

And here's where you go off the rails. You don't need to include thesafety factor to deal with the (2) to (3) interval as retransmit can'treduce the addHoldDown period, but can reduce - for a given client - thenumber of queries in the period. Retransmits and drift also can change*when* in that interval the given client produces its last query beforeexpiration - e.g. cause a "phase shift".

    5) will query again at lastSigExpirationTime + 30 days - .000001

No - from the servers point of view, the worst client (which is the
only one the server cares about) will make its last query before trust
anchor installation at lastSigExpirationTime + activeRefresh (when the
last CLIENT saw its first valid update)  + 30 days -.0000001.

Yes, I said that in 6 stating that it was *still waiting*.  IE, #5 was
supposed to describe the second to last query.

    6) notes this is still in waiting period

Let me put together something, per Paul's request, to work at this from
another angle where one of us can be shown right or wrong.

[... retry, delay text by me deleted ...]

And again. NO.  The retransmits over a given set of clients in the
addHoldDown period will result in at least one client (the "worst"
client) ending up making a query just before the expiration of ITS
addHoldDown timer.  Assuming the worst case of at least one client
making a query just before the lastSigExpirationTime and that same
client drifting/retransmitting enough to make a query just before its
addHoldDown time the activeRefreshOffset is a useless value to
calculate.

If you want to put an extra activeRefresh into the safetyMargin to
account for drift, I'm willing to do so.  Or we can insert a new term
labeled "driftSafetyMargin" and define it as activeRefresh if you want.
But that goes below my math line, not above it (and we can relabel
safetyMargin as retryFailureSafetyMargin).


 From a purely security analysis point of view, the first thing we have
to agree upon is the precise moment at which all clients in a perfect
world, with *no errors at all* (no drift, no retries, no transmission
line delays, no CPU processing delays, no clock failures, etc).  Once we
have this line in the sand in place, then we can introduce real-world
correctional elements to account for reality sneaking into our perfect
world.  I'm trying to talk only about the perfectionists world line in
the sand first, and then introduce needed operational components *after
that line*.  You keep inserting "drift" (eg) everywhere in the process
of this argument, which I absolutely agree needs to be dealt with.  But
below my perfect-world line only.  The way I keep reading everything
you've written is that your perfect line includes two activeRefreshes,
which I (still) argue is incorrect.  As I said last time and this time,
I'd be happy to insert a "drift" term, but lets please label it what it
is.  If you agree with that, I'll make that change and push.


*sigh*

No. You screw up the analysis doing it that way because it doesn'taccount for the possible "phase shift" that can happen with a clientduring its addHoldDown period.

I've been using "sigExpire + activeRefresh + addHoldDown + activeRefresh+ safetyFactor" because its actually easier to calculate. The actualformula is

"sigExpire + (activeRefresh + retransSlop) + addHoldDown +(activeRefresh + retransSlop)" where retransSlop is about 1/2 thesafety factor.

Both the interval after sigExpire and before addHoldDown and theinterval after the last activeRefresh require a safety factor to accountfor retransmissions. The addHoldDown interval does not because retransand drift result in a phase shift within the interval, but do not affectthe length of the total interval.

A "perfect" system will behave the way you've described - but adding asafety factor while ignoring the phase shift brought on by retransmitswithin the addHoldDown interval will not characterize the actual system.

I hope this is visible. The first group is the "perfect" one where westart at random point in the interval [0..activeRefresh]. The secondgroup is the one with retransmits inside the add Hold down and where thesignature expired just after the last refresh. The third is Wes'perfect with the queries starting just as the sigPeriod expires. Weswould add the saftety factor at the end of the third one and call itdone. The second one represents the actual worst case to which we'd addthe safetyFactor to account for drift and retransmits for the twoactiveRefresh intervals.


Can we stop now?


Mike

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] I-D Action: draft-ietf-dnsop-rfc5011-security-considerations-08.txt

Reply via email to