Hi Wes -

Interesting approach, but characterizing a single resolver isn't useful.  You need to characterize the entire set of resolvers doing the query.

Also, you're still missing the fact that a given resolver can start its addHoldDown timer anywhere in the range of [0..activeRefresh] AFTER the signature expiration depending on when it was starting its clock.  All of your calculations are starting from the wrong point.

You're also continuing to miss the point that a given resolver makes its last query (assuming no retransmits) anywhere in the range of [0..activeRefresh] after the addHoldDown timer expires.

Your calculations represent the best case [e.g. triggers at 0 for both ends of the problem] when what you want is worst case.

Below you've calculated activeRefresh as 43200


Timestamp

        Human Time      Note
0       0d 0h 0m 0s     Resolver Queries
*New DNSKEY Publication*
604800  7d 0h 0m 0s     Resolver Queries
*sigExpirationTime = Original DNSKEY RRSIG Expires*
604800  7d 0h 0m 0s     DNSKEY First seen

648000 7d 12h 0m 0s Latest time DNSKEY first seen. (sigExpirationTime + activeRefresh)
3196800         37d 0h 0m 0s    Resolver Queries
*sigExpirationTime + addHoldDownTimer*

**
3196800         37d 0h 0m 0s    DNSKEY Accepted

*3240000 37d 12h - Add Hold down expired for last client sigExpirationTime + activeRefresh + addHoldDownTimer (without retransmits)
*
3240000         37d 12h 0m 0s   Resolver Queries
*sigExpirationTime + addHoldDownTimer + activeRefresh*
3240000 37d 12h 0m 0s *sigExpirationTime + addHoldDownTimer + activeRefresh + activeRefreshOffset*

*3283200  38d 0m 0s  DNSKey accepted by last client sigExpirationTime + activeRefresh + addHoldDownTimer  (without retransmits)*
3326400         38d 12h 0m 0s   Resolver Queries
*sigExpirationTime + addHoldDownTimer + activeRefresh + driftSafety*
3326400 38d 12h 0m 0s *sigExpirationTime + addHoldDownTimer + activeRefresh + activeRefreshOffset + driftSafety*
3412800         39d 12h 0m 0s   Resolver Queries
*sigExpirationTime + addHoldDownTimer + activeRefresh + driftSafety + retrySafety*



For the rest - drift safety is probably no more 10seconds per query - call it for this formula about 620 seconds for the worst case (not assuming retransmit) and could mostly be ignored - the worst case is going to be less than a fastQueryInterval in all but pathological cases.    Basically 10s * (2 * activeRefresh + addHoldDown)/activeRefresh.

RetrySafety needs to be calculated on the set of clients as we're looking for the worst case of all of the clients (rightmost point of the normal distribution curve).

I'd suggest redoing this as a simulation.  I used the two sets of params (30 days and 24 hours vs 30 days and  28 hours) (sig expire and ttl) plus .05 failure plus 25000 clients and ran multiple trials for each.  The values in my table represent - for each trial - the latest time a client got to that point.  For the last four columns, those values represent the number of times a client exceeded the calculated safe interval (and divide by the total number of clients to get a percentage...).

Later, Mike


On 12/15/2017 6:55 PM, Wes Hardaker wrote:
Michael StJohns <m...@nthpermutation.com> writes:

Below is a java program I wrote to model this stuff.  In the table,
SF2 represents the number of clients that blew past twice the safety
factor (for aR+aHD+aR), SF1 represents the number of clients that blew
past the single safety factor.  OF is the number of clients using the
activeRefreshOffset calculation that finished after the calculated
interval (e.g. aR+aHD+aRO).  OF+s is the number of clients that
finished after the activeRefreshOffset + safetyFactor (in the first
table these are the same because of perfect responses).   In the
second table, compare SF1 to OF+s - SF1 < OF+s suggesting that
activeRefresh is a better choice that activeRefreshQuery for the third
term of the equation.  You can try a lot of different combinations,
but I haven't found any case where OF+s performs better that SF1.

The difference between lastStart and lAddHoldBegin represents the
retransmits after the first query.  The differences between
lAddHoldEnd and lFinalQuery represent retransmits after the last
normal query before the end of the add hold down time until a valid
answer was received after the addHoldDown time expired.

Feel free to twiddle with this.
Work bogged me down to able to write anything back so far.  Thanks for
the java code; I'll respond with the java*script* code I've been hacking
up at the same time:

https://www.isi.edu/~hardaker/projects/5011/


I didn't add the re-transmit time issue that your code takes into
account, but I did add a query drift that nicely shows one of your
concerns.  In particular, with various values of query drift (including
-1) you can reproduce the real world situation that you're worried
about, which is (as I've mentioned) an important one to call out.


_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to