Hi Wes -
Interesting approach, but characterizing a single resolver isn't
useful. You need to characterize the entire set of resolvers doing the
query.
Also, you're still missing the fact that a given resolver can start its
addHoldDown timer anywhere in the range of [0..activeRefresh] AFTER the
signature expiration depending on when it was starting its clock. All
of your calculations are starting from the wrong point.
You're also continuing to miss the point that a given resolver makes its
last query (assuming no retransmits) anywhere in the range of
[0..activeRefresh] after the addHoldDown timer expires.
Your calculations represent the best case [e.g. triggers at 0 for both
ends of the problem] when what you want is worst case.
Below you've calculated activeRefresh as 43200
Timestamp
Human Time Note
0 0d 0h 0m 0s Resolver Queries
*New DNSKEY Publication*
604800 7d 0h 0m 0s Resolver Queries
*sigExpirationTime = Original DNSKEY RRSIG Expires*
604800 7d 0h 0m 0s DNSKEY First seen
648000 7d 12h 0m 0s Latest time DNSKEY first seen. (sigExpirationTime +
activeRefresh)
3196800 37d 0h 0m 0s Resolver Queries
*sigExpirationTime + addHoldDownTimer*
**
3196800 37d 0h 0m 0s DNSKEY Accepted
*3240000 37d 12h - Add Hold down expired for last client
sigExpirationTime + activeRefresh + addHoldDownTimer (without retransmits)
*
3240000 37d 12h 0m 0s Resolver Queries
*sigExpirationTime + addHoldDownTimer + activeRefresh*
3240000 37d 12h 0m 0s *sigExpirationTime + addHoldDownTimer +
activeRefresh + activeRefreshOffset*
*3283200 38d 0m 0s DNSKey accepted by last client sigExpirationTime +
activeRefresh + addHoldDownTimer (without retransmits)*
3326400 38d 12h 0m 0s Resolver Queries
*sigExpirationTime + addHoldDownTimer + activeRefresh + driftSafety*
3326400 38d 12h 0m 0s *sigExpirationTime + addHoldDownTimer +
activeRefresh + activeRefreshOffset + driftSafety*
3412800 39d 12h 0m 0s Resolver Queries
*sigExpirationTime + addHoldDownTimer + activeRefresh + driftSafety +
retrySafety*
For the rest - drift safety is probably no more 10seconds per query -
call it for this formula about 620 seconds for the worst case (not
assuming retransmit) and could mostly be ignored - the worst case is
going to be less than a fastQueryInterval in all but pathological
cases. Basically 10s * (2 * activeRefresh + addHoldDown)/activeRefresh.
RetrySafety needs to be calculated on the set of clients as we're
looking for the worst case of all of the clients (rightmost point of the
normal distribution curve).
I'd suggest redoing this as a simulation. I used the two sets of params
(30 days and 24 hours vs 30 days and 28 hours) (sig expire and ttl)
plus .05 failure plus 25000 clients and ran multiple trials for each.
The values in my table represent - for each trial - the latest time a
client got to that point. For the last four columns, those values
represent the number of times a client exceeded the calculated safe
interval (and divide by the total number of clients to get a
percentage...).
Later, Mike
On 12/15/2017 6:55 PM, Wes Hardaker wrote:
Michael StJohns <m...@nthpermutation.com> writes:
Below is a java program I wrote to model this stuff. In the table,
SF2 represents the number of clients that blew past twice the safety
factor (for aR+aHD+aR), SF1 represents the number of clients that blew
past the single safety factor. OF is the number of clients using the
activeRefreshOffset calculation that finished after the calculated
interval (e.g. aR+aHD+aRO). OF+s is the number of clients that
finished after the activeRefreshOffset + safetyFactor (in the first
table these are the same because of perfect responses). In the
second table, compare SF1 to OF+s - SF1 < OF+s suggesting that
activeRefresh is a better choice that activeRefreshQuery for the third
term of the equation. You can try a lot of different combinations,
but I haven't found any case where OF+s performs better that SF1.
The difference between lastStart and lAddHoldBegin represents the
retransmits after the first query. The differences between
lAddHoldEnd and lFinalQuery represent retransmits after the last
normal query before the end of the add hold down time until a valid
answer was received after the addHoldDown time expired.
Feel free to twiddle with this.
Work bogged me down to able to write anything back so far. Thanks for
the java code; I'll respond with the java*script* code I've been hacking
up at the same time:
https://www.isi.edu/~hardaker/projects/5011/
I didn't add the re-transmit time issue that your code takes into
account, but I did add a query drift that nicely shows one of your
concerns. In particular, with various values of query drift (including
-1) you can reproduce the real world situation that you're worried
about, which is (as I've mentioned) an important one to call out.
_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop