Re: [DNSOP] I-D Action: draft-ietf-dnsop-kskroll-sentinel-00.txt

2017-12-12 Thread Bob Harold
On Sun, Dec 10, 2017 at 8:21 PM,  wrote:

>
> A New Internet-Draft is available from the on-line Internet-Drafts
> directories.
> This draft is a work item of the Domain Name System Operations WG of the
> IETF.
>
> Title   : A Sentinel for Detecting Trusted Keys in DNSSEC
> Authors : Geoff Huston
>   Joao Silva Damas
>   Warren Kumari
> Filename: draft-ietf-dnsop-kskroll-sentinel-00.txt
> Pages   : 8
> Date: 2017-12-10
>
> Abstract:
>The DNS Security Extensions (DNSSEC) were developed to provide origin
>authentication and integrity protection for DNS data by using digital
>signatures.  These digital signatures can be verified by building a
>chain of trust starting from a trust anchor and proceeding down to a
>particular node in the DNS.  This document specifies a mechanism that
>will allow an end user to determine the trusted key state of the
>resolvers that handle the user's DNS queries.
>
>
> The IETF datatracker status page for this draft is:
> https://datatracker.ietf.org/doc/draft-ietf-dnsop-kskroll-sentinel/
>
> There are also htmlized versions available at:
> https://tools.ietf.org/html/draft-ietf-dnsop-kskroll-sentinel-00
> https://datatracker.ietf.org/doc/html/draft-ietf-dnsop-kskroll-sentinel-00
>
>
>
Looks good to me.  One minor typo:

 4. Sentinel Test Result Considerations
paragraph 6

"If the resolver is non-validating, and it has a single forwarder
clause, then the resolver will presumably mirror the capabilities of
the forwarder target resolver. If this non-validating resolver it
has multiple forwarders, then the above considerations will apply."

"it" at end of the third line should be deleted.

-- 
Bob Harold
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] I-D Action: draft-ietf-dnsop-rfc5011-security-considerations-08.txt

2017-12-12 Thread Wes Hardaker
Michael StJohns  writes:

> 2) T + activeRefresh  is the time at which the server sees the last
> query from the last resolver just starting their trust anchor
> installation.
> 3) T + activeRefresh + addHoldDownTime is the time at which the server
> sees the first query from any resolver finalizing its trust anchor
> installation.

There is where we disagree.  Given 2, where you state "last query from
the last resolver is just starting", I argue that exactly an
addHoldDownTime beyond that is when that *exact same resolver* will
finish because it will have sampled the first at (2) and again at
exactly T + activeRefresh + addHoldDownTime and accept it, per 5011:

   Once the timer expires, the new key will be added as a trust anchor
   the next time the validated RRSet with the new key is seen at the
   resolver.

And the last query from the last resolver will be at T + activeRefresh +
addHoldDownTime.

> Between (2) and (3) any given resolver may drift/retransmit with the
> result that any given resolver may end up making a query just before
> (3) placing its next and final query at (3) plus activeRefresh.

Please forget drift in the top half of the equation.  There is zero
drift in the mathematically precise section.  We will deal with drift,
delays, and everything else in the safetyFactor alone, with many terms
or concepts within it to get it right.

>>5) will query again at lastSigExpirationTime + 30 days - .01
> No - from the servers point of view, the worst client (which is the
> only one the server cares about) will make its last query before trust
> anchor installation at lastSigExpirationTime + activeRefresh (when the
> last CLIENT saw its first valid update)  + 30 days -.001.

Yes, I said that in 6 stating that it was *still waiting*.  IE, #5 was
supposed to describe the second to last query.

>>6) notes this is still in waiting period

Let me put together something, per Paul's request, to work at this from
another angle where one of us can be shown right or wrong. 

[... retry, delay text by me deleted ...]  

> And again. NO.  The retransmits over a given set of clients in the
> addHoldDown period will result in at least one client (the "worst"
> client) ending up making a query just before the expiration of ITS
> addHoldDown timer.  Assuming the worst case of at least one client
> making a query just before the lastSigExpirationTime and that same
> client drifting/retransmitting enough to make a query just before its
> addHoldDown time the activeRefreshOffset is a useless value to
> calculate.

If you want to put an extra activeRefresh into the safetyMargin to
account for drift, I'm willing to do so.  Or we can insert a new term
labeled "driftSafetyMargin" and define it as activeRefresh if you want.
But that goes below my math line, not above it (and we can relabel
safetyMargin as retryFailureSafetyMargin).


>From a purely security analysis point of view, the first thing we have
to agree upon is the precise moment at which all clients in a perfect
world, with *no errors at all* (no drift, no retries, no transmission
line delays, no CPU processing delays, no clock failures, etc).  Once we
have this line in the sand in place, then we can introduce real-world
correctional elements to account for reality sneaking into our perfect
world.  I'm trying to talk only about the perfectionists world line in
the sand first, and then introduce needed operational components *after
that line*.  You keep inserting "drift" (eg) everywhere in the process
of this argument, which I absolutely agree needs to be dealt with.  But
below my perfect-world line only.  The way I keep reading everything
you've written is that your perfect line includes two activeRefreshes,
which I (still) argue is incorrect.  As I said last time and this time,
I'd be happy to insert a "drift" term, but lets please label it what it
is.  If you agree with that, I'll make that change and push.

-- 
Wes Hardaker
USC/ISI

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] I-D Action: draft-ietf-dnsop-rfc5011-security-considerations-08.txt

2017-12-12 Thread Wes Hardaker
Paul Vixie  writes:

> This timing based approach to online DNSSEC signing key changes is
> subtle beyond anybody's expectations, and because it will be used by
> the root zone, it is vital that we do more than simply whiteboard our
> proposed methods.

I have a thought about a demonstration. Will try to work on it later today.
-- 
Wes Hardaker
USC/ISI

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] I-D Action: draft-ietf-dnsop-rfc5011-security-considerations-08.txt

2017-12-12 Thread Michael StJohns

On 12/12/2017 12:24 PM, Wes Hardaker wrote:

Michael StJohns  writes:


2) T + activeRefresh  is the time at which the server sees the last
query from the last resolver just starting their trust anchor
installation.
3) T + activeRefresh + addHoldDownTime is the time at which the server
sees the first query from any resolver finalizing its trust anchor
installation.

There is where we disagree.  Given 2, where you state "last query from
the last resolver is just starting", I argue that exactly an
addHoldDownTime beyond that is when that *exact same resolver* will
finish because it will have sampled the first at (2) and again at
exactly T + activeRefresh + addHoldDownTime and accept it, per 5011:

Once the timer expires, the new key will be added as a trust anchor
the next time the validated RRSet with the new key is seen at the
resolver.

And the last query from the last resolver will be at T + activeRefresh +
addHoldDownTime.


Seriously no.

This isn't this hard.  You need to stop thinking about what's happening 
from the point of view of one client and think about how the server 
views the behavior of the collection of clients.


Dealing with your "attack" scenario and assuming no retransmissions for 
any client before it gets a response to its first query, the earliest 
time that the server can assume that the first client can start its 
addHoldDown timer is right after T (the lastExpirationTime).   The 
latest time that the server can assume that the last client will start 
its addHoldDown timer is the activeRefresh interval after T.


(Assume a queryInterval of 14 hours and a set of 840 clients evenly 
distributed with their refreshes happening one a minute - the last 
client (#840) will make its query at T + activeRefresh and start its 
addHoldDown timer then).


So dealing only with that last client the server has to wait at least T 
+ activeRefresh before it assumes that the client has started its 
addHoldDown and T + activeRefresh + addHoldDown before the client has 
finished its addHoldDown and is about to make its last query.


The best case scenario (from the servers point of view) is when that 
same client has its last query before its addHoldDown time expires at 
just under the activeRefresh interval (e.g. if the expiration is at 
noon, and the active refresh is 1 hour, then the best case if the last 
query was at 11:00:00.1) causing the query after the addHoldDown 
time to occur at 12:00:00.1.   The worst case scenario is when ANY 
client has its last query at .1 before the addHoldDown time expires, 
making the final query happen at the activeRefresh interval after the 
expiration or in the example at 12:59.59.9.


For a given client assuming no query losses, there are  FLOOR 
(addHoldDown/activeRefresh) queries in the addHoldDown interval (between 
when the client starts its timer and when it goes off). The difference 
addHoldDown - (FLOOR (addHoldDown/activeRefresh) * activeRefresh) is 
this activeRefreshOffset interval you keep trying to put in.  However, 
we do assume losses and we do (and MUST) assume the worst case that at 
least one client out of 10K, 100K or 1M is going to end up doing fast 
queries and changing that difference such that they end up with their 
last query before their addHoldDown timer occurs JUST before it expires.





Between (2) and (3) any given resolver may drift/retransmit with the
result that any given resolver may end up making a query just before
(3) placing its next and final query at (3) plus activeRefresh.

Please forget drift in the top half of the equation.  There is zero
drift in the mathematically precise section.  We will deal with drift,
delays, and everything else in the safetyFactor alone, with many terms
or concepts within it to get it right.


And here's where you go off the rails.   You don't need to include the 
safety factor to deal with the (2) to (3) interval as retransmit can't 
reduce the addHoldDown period, but can reduce - for a given client - the 
number of queries in the period.  Retransmits and drift also can change 
*when* in that interval the given client produces its last query before 
expiration - e.g. cause a "phase shift".






5) will query again at lastSigExpirationTime + 30 days - .01

No - from the servers point of view, the worst client (which is the
only one the server cares about) will make its last query before trust
anchor installation at lastSigExpirationTime + activeRefresh (when the
last CLIENT saw its first valid update)  + 30 days -.001.

Yes, I said that in 6 stating that it was *still waiting*.  IE, #5 was
supposed to describe the second to last query.


6) notes this is still in waiting period

Let me put together something, per Paul's request, to work at this from
another angle where one of us can be shown right or wrong.

[... retry, delay text by me deleted ...]


And again. NO.  The retransmits over a given set of clients in the
addHoldDown period will result i

Re: [DNSOP] I-D Action: draft-ietf-dnsop-rfc5011-security-considerations-08.txt

2017-12-12 Thread Wes Hardaker
Michael StJohns  writes:

> A "perfect" system will behave the way you've described - but adding a
> safety factor while ignoring the phase shift brought on by retransmits
> within the addHoldDown interval will not characterize the actual
> system.

Ah ha!  So, you do actually agree that my description of the perfect
case is true, which means we really have been in violent agreement about
the "perfect" line in the sand.  [as I've said: I absolutely understand
the point your making about real world issues, such as timing drifts]

And as I said in the last message, I agreed that a delay/phase-shift
made sense and I'd be happy to produce a new term for it.  Ideally I'd
like to add that into the safetyMargin because it reflects real-world
conditions, and I was trying to contain that to just one particular
term.  But I understand you don't want it buried in there, so I'll
create a new term to deal with timing phase shifts and add that before
the existing safetyMargin.

> Can we stop now?

I think so as I believe I can add words that we'll both agree to.  But
I've said that before, so...


The only remaining questions, just to double double be sure:

1) "is one activeRefresh period long enough to account for the slop
   associated with time clock drifts?"

   I'd argue yes, as any clock that drifted longer than an activeRefresh
   (min = 1hr) is a seriously broken clock.  I think you agree.

2) "is one activeRefresh period long enough to account for network
   delays and other elements, aside from 'retries and missing queries'?"

   I think you and I agree on this too, that one should be sufficient to
   cover network delays too.

-- 
Wes Hardaker
USC/ISI

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] I-D Action: draft-ietf-dnsop-rfc5011-security-considerations-08.txt

2017-12-12 Thread Michael StJohns

On 12/12/2017 4:03 PM, Wes Hardaker wrote:

Michael StJohns  writes:


A "perfect" system will behave the way you've described - but adding a
safety factor while ignoring the phase shift brought on by retransmits
within the addHoldDown interval will not characterize the actual
system.

Ah ha!  So, you do actually agree that my description of the perfect
case is true, which means we really have been in violent agreement about
the "perfect" line in the sand.  [as I've said: I absolutely understand
the point your making about real world issues, such as timing drifts]


This is what you got out of all of that?   What I said was that doing 
any analysis starting from the "perfect" model would not lead you to the 
right place.  Your description of the perfect case is correct in its 
behavior and totally irrelevant to figuring out the final answer.



And as I said in the last message, I agreed that a delay/phase-shift
made sense and I'd be happy to produce a new term for it.  Ideally I'd
like to add that into the safetyMargin because it reflects real-world
conditions,
No - it doesn't.  Please look at the diagrams I provided and think about 
them for a day before responding.    I think you're still confusing 
system behavior with client behavior.




and I was trying to contain that to just one particular
term.  But I understand you don't want it buried in there, so I'll
create a new term to deal with timing phase shifts and add that before
the existing safetyMargin.

No - please don't. Please use the math I gave you.  It is correct.

Here's the whole diagram including retransmits in all the possible 
places and assuming a best case start compared against a "perfect" model 
with the same best case start:





The only effect of the retransmissions within the addHoldDown time is to 
shift when the final query happens.  In a pool of 10K clients, you'll 
get a percentage of those clients taking an entire activeRefresh 
interval after THEIR addHoldDown expires.


In no case does any calculation that involves an "activeRefreshOffset" 
provide a worst case analysis.
In no case does a "phaseShift" term provide any value over assuming that 
the addHoldDown expires just after the last query in the 
addHoldDownInterval.


So again:

sigExpiration + activeRefresh + addHoldDown + activeRefresh + 
retransmission slop/aka safetyFactor.









Can we stop now?

I think so as I believe I can add words that we'll both agree to.  But
I've said that before, so...


The only remaining questions, just to double double be sure:

1) "is one activeRefresh period long enough to account for the slop
associated with time clock drifts?"

I'd argue yes, as any clock that drifted longer than an activeRefresh
(min = 1hr) is a seriously broken clock.  I think you agree.
I don't know why you think this matters.  Assume that drift + 
retransmissions is enough to get the phase shifted to the worst case 
(e.g. just before the end of the addHoldDown period for at least one 
client) and we're done with this analysis.




2) "is one activeRefresh period long enough to account for network
delays and other elements, aside from 'retries and missing queries'?"

I think you and I agree on this too, that one should be sufficient to
cover network delays too.

Again, I don't know why you think this matters.   If this is the safety 
factor, then no - you're asking the wrong question.  If this is just 
about accounting for the actual time to accomplish a query then you only 
have to account for the round trip delays for the query before the 
addHoldDown and the round trip delays for the query after.  Any delays 
during the addHoldDown time only cause the phase to shift.  Those delays 
are more than covered by a single fastQuery interval and can be ignored 
with any reasonable safetyFactor.


For the safetyFactor you want to consider how large the pool of queriers 
and how often the queries fail to figure out how big to make the safety 
factor.  This ends up looking like a cumulative distribution function 
and you're looking to pick a cutoff far enough along the curve that you 
get most (tm) of the clients having completed the installation before 
moving on. 
https://en.wikipedia.org/wiki/Binomial_distribution#/media/File:Binomial_distribution_cdf.svg 
for example.


Later, Mike



___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] I-D Action: draft-ietf-dnsop-kskroll-sentinel-00.txt

2017-12-12 Thread Geoff Huston


> On 13 Dec 2017, at 3:44 am, Bob Harold  wrote:
> 
> 
> On Sun, Dec 10, 2017 at 8:21 PM,  > wrote:
> 
> A New Internet-Draft is available from the on-line Internet-Drafts 
> directories.
> This draft is a work item of the Domain Name System Operations WG of the IETF.
> 
> Title   : A Sentinel for Detecting Trusted Keys in DNSSEC
> Authors : Geoff Huston
>   Joao Silva Damas
>   Warren Kumari
> Filename: draft-ietf-dnsop-kskroll-sentinel-00.txt
> Pages   : 8
> Date: 2017-12-10
> 
> Abstract:
>The DNS Security Extensions (DNSSEC) were developed to provide origin
>authentication and integrity protection for DNS data by using digital
>signatures.  These digital signatures can be verified by building a
>chain of trust starting from a trust anchor and proceeding down to a
>particular node in the DNS.  This document specifies a mechanism that
>will allow an end user to determine the trusted key state of the
>resolvers that handle the user's DNS queries.
> 
> 
> The IETF datatracker status page for this draft is:
> https://datatracker.ietf.org/doc/draft-ietf-dnsop-kskroll-sentinel/ 
> 
> 
> There are also htmlized versions available at:
> https://tools.ietf.org/html/draft-ietf-dnsop-kskroll-sentinel-00 
> 
> https://datatracker.ietf.org/doc/html/draft-ietf-dnsop-kskroll-sentinel-00 
> 
> 
> 
> 
> Looks good to me.  One minor typo:
> 
>  4. Sentinel Test Result Considerations
> paragraph 6
> 
> "If the resolver is non-validating, and it has a single forwarder
> clause, then the resolver will presumably mirror the capabilities of
> the forwarder target resolver. If this non-validating resolver it
> has multiple forwarders, then the above considerations will apply."
> 
> "it" at end of the third line should be deleted.


noted - “it" will be removed in the next version


Geoff





smime.p7s
Description: S/MIME cryptographic signature
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


[DNSOP] Ask for advice of 3 new RRs for precise traffic scheduling

2017-12-12 Thread zuop...@cnnic.cn
Hi  everyone,
 
 Here’s a problem about CDN traffic scheduling.  So far as I know, many 
business companies use multi-CDN to speed up their websites and the CDN 
providers have requirements for precise traffic scheduling especially in the 
rush hour of traffic. 

 As CDN providers usually manage authoritative DNS for their clients, the most 
common method for real-time traffic scheduling is to change the A records of  
CDN nodes. It does have some positive effects. But because of the lack of DNS 
protocol support (especially on the recursive server side), a CDN company can’t 
schedule traffic very precisely. 
 For example, a CDN provider can’t schedule 70% of traffic to node A and 30% of 
traffic to node B.  Even though it places the addresses of both A and B, it 
can’t determine recursive server’s response to clients. For example, some 
recursive server may round-robin the address to clients.
 
For better precise CDN traffic scheduling, I have an idea that defines 3 new 
records from extending 3 existing DNS resource records: A,  and CNAME, by 
adding a “weight” attribute,as below:
   [   CNAME ]  -> [   CNAMEX 
 ] 
   [   A ]  ->  [   AX  
]
   [    ]  ->  [   X 
 ]
 
The reasons for doing this are :
(1) By adding “weight” in CNAMEX, a multi-CDN user can manage the traffic ratio 
among different CDN providers by itself easily.
(2) By adding “weight” in A/, a CDN provider can manage the traffic ratio 
among different nodes by itself easily.
 
 For compatibility, an authoritative server should place the CNAMEX/AX/X 
records in additional section in a DNS response for a “A/” query. A 
“weight-aware” recursive server should make use of the “CNAMEX/AX/X” in the 
additional section to manage the answer to clinents according to the weight of 
each RR. A “weight-not-aware” recursive server can just ignore these RRs and 
still work normally.
 
 Here is an example:  If a CDN provider configures AX records for 
“www.example.com” as below which indicates “1.1.1.1” should account for 80% in 
response while “2.2.2.2” accounting for 20%.  A “weight-aware” recursive server 
should reply to clients accordingly.
  Any comment or advice is highly appreciated!
 Thank you!!
 



zuop...@cnnic.cn
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop