Re: [Gen-art] Gen-ART Telechat review of draft-ietf-anima-grasp-api-08

Alissa Cooper Thu, 03 Dec 2020 05:53:26 -0800

Paul, thanks for your reviews. Brian, thanks for your responses. I entered a No 
Objection ballot and would like to see resolution for Paul’s outstanding 
concerns, particularly as they pertain to Section 2.3.5.


Thanks,
Alissa


> On Dec 2, 2020, at 12:29 PM, Paul Kyzivat <pkyzi...@alum.mit.edu> wrote:
> 
> Brian,
> 
> On 12/1/20 8:13 PM, Brian E Carpenter wrote:
>> Hi Paul,
>> Comments in line. There's one definite good catch in your
>> review, and obviously more clarifications are needed.
> 
> I'll comment this one last time. Then I will be satisfied that I have 
> communicated my thoughts, and I'm happy with whatever you decide to do.
> 
>> On 01-Dec-20 15:06, Paul Kyzivat wrote:
>>> I am the assigned Gen-ART reviewer for this draft. The General Area
>>> Review Team (Gen-ART) reviews all IETF documents being processed by the
>>> IESG for the IETF Chair. Please wait for direction from your document
>>> shepherd or AD before posting a new version of the draft. For more
>>> information, please see the FAQ at <
>>> http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.
>>> 
>>> Document: draft-ietf-anima-grasp-api-08
>>> Reviewer: Paul Kyzivat
>>> Review Date: 2020-11-30
>>> IETF LC End Date: 2020-10-28
>>> IESG Telechat date: 2020-12-01
>>> 
>>> Summary:
>>> 
>>> This draft is on the right track but has open issues, described in the
>>> review.
>>> 
>>> General:
>>> 
>>> This document has addressed some of the concerns I had during the last
>>> call review. However some of my concerns remain and some new ones have
>>> arisen in this version.
>>> 
>>> Issues:
>>> 
>>> Major: 3
>>> Minor: 6
>>> Nits:  1
>>> 
>>> 1) MAJOR: Negotiation
>>> 
>>> The text in section 2.3.5 now makes clear that the sequence of steps in
>>> the negotiation is non-deterministic - both sides can call
>>> negotiate_step and negotiate_wait. I believe this can result in the two
>>> sides not agreeing on what values have been negotiated. (For instance,
>>> what if one side calls negotiate_step concurrently with the other side
>>> calling end_negotiate? Which value has been agreed upon?)
>> The negotiate_step calls alternate between the two peers, until one of them
>> calls end_negotiate (or a timeout kills the session). I hoped that
>> was clear in the protocol diagram. We can make it explicit, for people
>> who haven't fully digested the protocol spec. Since that's ~50 pages, it
>> certainly takes some digestion.
>> (negotiate_wait would be interjected by the peer who has the next go, simply
>> indicating that the next step is delayed. That could happen, for example,
>> if the ASA needed to negotiate something with a third party before
>> continuing.)
> 
> The figure shows alternation, but IIUC that is only an example. I understood 
> the text as permitting either side to perform negotiate_step or 
> negotiate_wait at will. A requirement for alternation would solve the problem.
> 
> Reading it again, I see how it can be interpreted as you say is intended. But 
> it isn't clear. To clarify, is the intent that a negotiate_step, 
> negotiate_wait, or end_negotiate is required in response to receipt of 
> request_negotiate or negotiate_step? And a negotiate_step or end_negotiate is 
> also required following the sending of negotiate_wait. And those are the only 
> permitted sequences?
> 
> A state machine would be helpful to show this in an unambiguous way.
> 
>>> The loop_count
>>> adds to the confusion. Are the two sides intended to have independent
>>> loop count values? It seems these too can become unsynchronized.
>> The loop count also bounces backwards and forwards in alternate steps.
>> Again, we can underline that in the text.
> 
> That would be helpful.
> 
>>> Also, the goal of negotiation isn't clear to me. I gather it must be for
>>> the two sides to agree on a particular value for the objective. But for
>>> that to work there must be some rules about how values can change in
>>> each step so that the result stabililizes, rather than causing a battle
>>> that ends with loop count exhaustion. This could be achieved by always
>>> negotiating *down*, or always *up*. But that requires that the objective
>>> value type have an ordering function. Given the general nature of the
>>> objective I don't think that can be assumed.
>> No, it explicitly is not defined either in the protocol nor the API.
>> The syntax and semantics of the objective value are defined per-objective,
>> and the objective might or might not be ordered. So there is intentionally
>> no answer to your question.
>> In most cases I'd expect that there would be an ordering but we didn't want
>> to constrain the use cases in that way. Also note that a failed negotiation
>> (e.g. the loop count expires, or where one end simply rejects the other's
>> offer) is not a protocol failure.
>>> 
>>> ISTM that more work is needed to define the negotiation process in a way
>>> that ensures it ends with both sides agreeing on a single value for the
>>> objective.
>> As noted, that is per-objective. The most complicated case I've coded
>> is IP prefix assignment, and it works fine, except that if there is
>> no prefix available of the maximum desired length, the requester ends
>> up unsatisfied - as intended. There should be no condition in which
>> the negotiation loops indefinitely; it either succeeds or fails.
> 
> Without that the result in non-deterministic. The two sides may have 
> conflicting goals, and then the result will only be determined by the loop 
> count and timeout.
> 
> Alternately, implementors will establish side agreements that aren't governed 
> by standards.
> 
> That seems like an undesirable state of affairs.
> 
>>> 2) MINOR: Dry Run Negotiation
>>> 
>>> Dry Run negotiation is very under-specified. Why would it be used? I
>>> guess that an ASA might use dry run negotiation to inform future actual
>>> negotiation. Can anything be inferred from a dry run negotiation about
>>> how an actual negotiation will go? When participating in a dry run
>>> negotiation, how should an ASA decide what response to make? Should it
>>> take into account current resource availability? Or should it respond
>>> based on best-case or worst-case resource availability? Or what?
>>> 
>>> This requires further clarification.
>> Again, *all* these issues are specific to the objective in question,
>> and they are intentionally not addressed in the protocol design or
>> the API.
> 
> So this is a semantics free feature? Then all semantics are determined by 
> side agreements? Again, IMO this is a bad thing.
> 
>>> 3) MAJOR: Confusing semantics of 'request_negotiate'
>>> 
>>> In section 2.3.5 I don't understand the following:
>>> 
>>>           1.  The 'session_nonce' parameter is null.  In this case the
>>>               negotiation has succeeded in one step and the peer has
>>>               accepted the request.  The returned 'proffered_objective'
>>>               contains the value accepted by the peer, which is therefore
>>>               equal to the value in the requested 'objective'.  For this
>>>               reason, no session nonce is needed, since the session has
>>>               ended.
>>> 
>>> IIUC this requires a network exchange with the peer. I don't see how
>>> this can complete *immediately*. ISTM that this could only complete
>>> immediately if it were satisfied from a local cache. That doesn't seem
>>> appropriate for this function.
>> I changed that at your request from "immediately" to "in one step": the 
>> request_negotiate() from A gets back an immediate end_negotiate()
>> from B and we're done. If "in one step" still isn't clear, we could
>> change to "in one message exchange".
>>> Similarly, in bullet 2 I don't see how the proffered_objective would be
>>> available in the initial call, before a response has been received from
>>> the peer..
>> It can't. The call doesn't complete until the first response has arrived
>> (in a threaded implementation).
>>> Does "immediately" here simply mean that the negotiation is completed in
>>> one exchange between the two ends? If so, isn't a session nonce still
>>> required in an event loop implementation in order to handle the one
>>> response?
>> I think that's a good catch. It's fixable, because the session nonce is
>> generated anyway when the request message is generated, but if the
>> return code is noReply, the nonce is needed. The same applies to
>> synchronize(). The issue is implied by this bullet:
>>  * Event loop implementation: An additional read/write 'session_nonce'
>>    parameter is used.
>> but that's incomplete.
>> My bad. I missed this point because I only coded the threaded version,
>> which is the natural approach in Python. We need to add an extra case
>> that only applies in the event loop case:
>> * If the 'errorcode' parameter has the value 2 ('noReply'), no response
>> has been received so far. The 'session_nonce' parameter must be presented
>> in subsequent calls.
> 
> IIUC, the session nonce isn't needed at all in threaded implementations. And 
> in the event loop implementation it is needed, even in this case. So ISTM 
> this need not be singled out as a special case.
> 
>>> Bullet 2 also says:
>>> 
>>>               ... The
>>>               returned 'proffered_objective' contains the first value
>>>               proffered by the negotiation peer.  The contents of this
>>>               instance of the objective must be used to prepare the next
>>>               negotiation step (see negotiate_step() below) because it
>>>               contains the updated loop count, sent by the negotiation
>>>               peer.  The GRASP code automatically decrements the loop
>>>               count by 1 at each step, and returns an error if it becomes
>>>               zero.
>>> 
>>> I guess that the 'proffered_objective' in the return parameters is the
>>> counter-offer to the objective passed in the call. And that you expect
>>> the objective value used in any subsequent negotiate_step to be derived
>>> by modifying this value. So far this new wording has improved my
>>> understanding.
>> Yes, and it's probably worth adding the term "counter-offer" for clarity.
>>> But the loop_count in the objective is especially confusing. It seems
>>> that it is handled quite differently from the rest of the objective. You
>>> specify (in 2.3.2.3) that it has a default value of GRASP_DEP_LOOPCT.
>>> But who is expected to initialize this? (Is it simply that the ASP
>>> should use this value if it doesn't have any particular preference?)
>> Yes, that's the intention (but the details depend on the
>> implementation; in my Python code the class definition for
>> 'objective' sets the default). The default is set quite low
>> since we assumed that normal negotations require few steps.
>>> Then you say that the GRASP decrements this. Is this decrementing done
>>> on the calling side before sending the message, the calling side after
>>> receiving the response? Or by the peer, on receipt or when sending the
>>> response? Is it permissible for the ASA to modify this value during
>>> negotiation? Since this seems intended to prevent a loop, having clarity
>>> about how this value is managed seems important.
>> That is well specified in the protocol spec itself. The sender
>> of each M_NEGOTIATE message decrements it, i.e. it happens
>> inside negotiate_step(). It would do no harm to mention that.
>> What happens if an ASA messes with the loop count? Bad things
>> could happen, but just as one end can prolong the timeout with
>> negotiate_wait(), it could prolong the negotiation with
>> obj.loop_count += 1 if it's useful. That, for example, enables
>> using GRASP for bulk transfer, if you don't like FTP (joke
>> intended). (Not really a joke, though; I have running code for
>> simple file transfer using GRASP, and I find it a handy way to
>> move stuff from Windows to my Linux test machine.)
>> A malicious ASA could do all sorts of damage, so I don't think
>> that loop count manipulation is a serious concern. The other
>> end, if paranoid, can always check if the loop count is behaving
>> strangely.
> 
> IIUC the only involvement the application should have with the loop count is 
> to override the default at the start of negotiation. After that, it is really 
> up to the GRASP to manage it. If that is right, why expose it as part of the 
> objective? Why not simply allow the override value in the request_negotiate 
> call, and otherwise let the GRASP manage it for the duration of the 
> negotiation?
> 
>>> 4) MINOR: negotiate_wait
>>> 
>>> The negotiate_wait call allows one ASA to extend the timeout of another
>>> ASA. This could, in perverse cases, cause an ASA to wait indefinitely.
>>> ISTM that this is dangerous. I would think it better make the other ASA
>>> aware of the desire to extend the timeout and let it decide whether to
>>> do so.
>> That's more of a comment on the GRASP spec & implementation than
>> on the API. But our trust model is that if a node can get access
>> to the autonomic control plane, we trust the ASAs in that node.
>> There is new text on that aspect following the Security Area review.
> 
> As you wish.
> 
>>> 5) MAJOR: Consistency of Objective definitions
>>> 
>>> In section 2.3.2.3 and elsewhere, presumably all parties that use a
>>> particular objective must agree on the values of synch, neg, dry, and
>>> the size and structure of the value.
>>> 
>>> There is no communication of the size and structure in the abstract API.
>>> Presumably the implementation of a language binding to the API is
>>> required to at least communicate the size and alignment requirements to
>>> the core.
>> No, it's not needed. That's the great advantage of using CBOR;
>> the CBOR object is self-defining and flows opaquely through the
>> GRASP code (modulo the maximum message size, which should prevent
>> buffer overflow issues).
> 
> Sorry. I was thinking that CBOR was just a possibility for the value. Now I 
> see that the value be representable in CBOR and in fact will be so 
> represented on the wire.
> 
> Based on that I withdraw my concern.
> 
>>> The matching of definitions between nodes must be achieved
>>> solely by the name,
>> Yes.
>>> the respective language bindings at the two ends,
>> No, again CBOR bridges over this. In a C implementation,
>> the value part of the objective is delivered to the ASA
>> as a CBOR-encoded byte string, and the programmer uses
>> a CBOR library to de-serialize it. In Python, Ruby, etc.
>> the API can deserialize it automatically.
>>> and out of band mutual agreement.
>> Yes, that's why each objective needs a spec of its syntax and semantics
>> and an IANA registration of its name. (Or it can use the name format
>> for privately defined objectives, but it still needs a defined syntax
>> and semantics).
> 
> Ah, got it! While I read draft-ietf-anima-grasp in preparation for my last 
> call review I obviously didn't absorb it all, and missed this point.
> 
>>> Furthermore, different language
>>> bindings may use different in-memory representations of the value. In
>>> such cases, how is the on-wire format to be determined?
>> CBOR (specified using CDDL). For the protocol itself, that is all
>> you need to specify. The semantics needs English.
> 
> OK. Got it.
> 
>>> If the two ends disagree on size and structure then problems will occur.
>>> Perhaps the core can identify size mismatches based on size communicated
>>> on the wire vs the size defined by the language binding, but there are
>>> no error codes defined for this situation. And of course differing
>>> structures with the same size would not be detectable.
>>> 
>>> Furthermore, there is potential for different ASAs to (accidentally)
>>> have incompatible definitions for the same objective. What happens in
>>> this case? How can blame be ascribed so that the problem can be fixed?
>> You have to find out which one doesn't meet the spec. It's app-level
>> debugging. (It's also one of the reasons I made sure my prototype
>> could run two separate instances in one machine: makes debugging
>> a whole lot easier.)
> 
> The IANA registration of names/formats resolves my concern. I suppose that 
> means that revisions to the format will require choosing a new name.
> 
>>> IMO more needs to be said about all of this. At the least a number of
>>> disclaimers that put the burden on the ASAs to recognize the risk, take
>>> these potential problems into account and avoid them. But there could be
>>> some requirements placed on API language bindings and core
>>> implementations to deal with some of these. And probably some added
>>> error codes to report what problems can be detected.
>> The only one I found necessary was CBORfail ("CBOR decode failure")
>> and that's pretty fatal. It means the other party has sent a byte
>> string that is not in fact valid CBOR. Beyond that, the value
>> is passed up to the consumer (the ASA) which does indeed have to
>> validate the contents. That's CBOR 101, but it should be stated.
> 
> OK.
> 
>>> 6) MINOR/MAJOR: Session State
>>> 
>>> I continue to find the lifetime and state of a session to be unclear.
>>> The API calls that return session_nonce seem to signal creation of a new
>>> session. The end_negotiate() call seems to terminate a negotiation
>>> session. But what causes other sessions to end? This seems important
>>> because there is state associated with a session that consumes resources
>>> and can't be reclaimed until the session ends. So it should be important
>>> for the ASA to end all sessions. Some clarification of this seems
>>> important both for core implementors and for ASA developers that will be
>>> using the API.
>> I'm not sure this belongs in the API though. The ASA doesn't need
>> to worry about this. A session ends either when it completes normally
>> by end_negotiate(), or a synchronize() reply is received, or discovery
>> terminates, or a session times out, or a loop count is exhausted, or
>> something causes a socket error, etc. There are 29 calls to
>> _disactivate_session() in my GRASP implementation.
> 
> So IIUC the GRASP will is expected to discard all state either when it sees 
> an end_negotiate, the loop count is exhausted, or the timeout expires. In 
> that case my concerns are resolved.
> 
>>> (Or is this document only for implementors of core and those
>>> instantiating a particular language binding of the API, with
>>> documentation for end users left to others?)
>> I'm pretty sure we'll need an O'Reilly book one of these days.
>> But in fact draft-ietf-anima-asa-guidelines is on its way,
>> and will be recommended reading for ASA implementers, along
>> with the API.
>>> 7) MINOR/MAJOR: Timeout
>>> 
>>> Section 2.3.2.2 indicates that the API returns an error response to the
>>> ASA if the timeout expires. But the other end is presumably still
>>> working on the request and will eventually send a response. What does
>>> the core do when it receives this? Must it retain state so that it can
>>> detect the case and ignore the message? It seems that this could result
>>> in the two peers disagreeing on some state.
>> The timeout expiring is fatal to the session, so any further
>> messages for that session will not be processed. Depending on
>> the sequence of events, the actual error code returned at both
>> ends might vary. The most likely case is that it will show up
>> as a socket error at both ends - a read() timeout at the
>> receiving end and a dead socket at the sending end.
>> So the two ends will both see a failed negotiation.
> 
> As long as the GRASP discards incoming messages for sessions it has no state 
> for then this seems fine.
> 
>>> 8) MINOR: Text regarding "minimum_TTL"
>>> 
>>> There is a small problem with the following in section 2.3.4:
>>> 
>>>        -  If the parameter 'minimum_TTL' is greater than zero, any
>>>           locally cached locators for the objective whose remaining time
>>>           to live in milliseconds is less than or equal to 'minimum_TTL'
>>>           are deleted first.  Thus 'minimum_TTL' = 0 will flush all
>>>           entries.
>>> 
>>> The first sentence qualifies the paragraph to cases where minimum_TTL is
>>> greater than zero. But the final sentence then infers the behavior when
>>> minimum_TTL is equal to zero.
>>> 
>>> Also, minimum_TTL is typed as an integer, which permits negative values.
>>> I gather that negative values are not allowed. I can suggest two ways to
>>> fix this:
>>> 
>>>        -  The parameter 'minimum_TTL' MUST be greater than or equal to
>>>           zero. Any locally cached locators for the objective whose
>>>           remaining time to live in milliseconds is less than or equal to
>>>           'minimum_TTL' are deleted first.  Thus 'minimum_TTL' = 0 will
>>>           flush all entries.
>> OK.
>>> Or, change they type to unsigned integer.
>> Not all languages have unsigned integers, so I'd prefer your version.
>>> Then the statement can be
>>> simplified by removing the first sentence:
>>> 
>>>        -  Any locally cached locators for the objective whose remaining
>>>           time to live in milliseconds is less than or equal to
>>>           'minimum_TTL' are deleted first.  Thus 'minimum_TTL' = 0 will
>>>           flush all entries.
>>> 
>>> 9) MINOR: Terminology - Session nonce
>>> 
>>> The new first paragraph of section 2.2.3 talks about identifying the
>>> session by a pseudo-random session identifier, and tagging it with an IP
>>> for further uniqueness. The 2nd paragraph talks about a session_nonce.
>>> It isn't clear at this point in the text if these the same thing. Or is
>>> the session id shared on the wire, the IP tag added by the core, and the
>>> session_nonce an artifact of the API, shared only between the ASA and
>>> the core?
>> Yes, it's locally munfactured, but the natural implementation is
>> specified in the GRASP spec:
>>  However, there is a finite probability that two nodes might generate
>>  the same Session ID value.  For that reason, when a Session ID is
>>  communicated via GRASP, the receiving node MUST tag it with the
>>  initiator's IP address to allow disambiguation.
>> Will clarify.
>>> 
>>> Section 2.3.2.7 seems to confirm that the nonce is just an identifier
>>> used between the core and the ASA. But here it says that using the id
>>> plus the IP is simply one possible implementation choice.
>> Right. Because although GRASP says what I just quoted, an implementor
>> might choose anything else that they preferred. Is there any reason
>> to specify this more tightly?
> 
> No. No need. Its just a matter of being clear. My confusion arose from the 
> text in 2.2.3:
> 
>   ... It is
>   identified by a pseudo-random session identifier tagged with an IP
>   address of the initiator of the session to guarantee uniqueness.
>   ...
>   On the first call in a new GRASP session, the API returns a
>   'session_nonce' value based on the GRASP session identifier.
> 
> I took this as specifying the representation of the session noncee. (Though 
> on rereading I see the "based on" does give some wiggle room.)
> 
> Perhaps a minor tweak in wording would help:
> 
>   On the first call in a new GRASP session, the API returns a
>   'session_nonce' value used to identify the session in the API.
> 
> Then 2.3.2.7 spells out implementation options for this.
> 
>>> Further, I question whether "nonce" is the best term to use here. ISTM
>>> that "handle" (session_handle) would more clearly reflect the purpose of
>>> this item.
>> We could debate that at length, I think. To me 'nonce' suggests
>> something that's intentionally meaningless and for a single session.
>> 'Handle' suggests something with longer term significance.
>> https://en.wikipedia.org/wiki/Handle_(computing) seems to be
>> on my side. Anyway, we will await advice from the AD before
>> changing the terminology.
> 
> Good naming is difficult. This was only a suggestion.
> 
>>> I think it would be helpful to be clearer in distinguishing what is
>>> fundamental vs what is implementation choice. For instance, in section
>>> 2.2.3:
>>> 
>>>     A GRASP session consists of a finite sequence of messages (for
>>>     discovery, synchronization, or negotiation) between a pair of ASAs.
>>>     The core identifies it on the wire by a pseudo-random session
>>>     identifier. Further details are given in [I-D.ietf-anima-grasp].
>>> 
>>>     On the first call in a new GRASP session, the API returns a
>>>     'session_handle' value used to identify the session. This
>>>     value must be used in all subsequent calls for the same session, and
>>>     will be provided as a parameter in the callback functions.  By this
>>>     mechanism, multiple overlapping sessions can be distinguished, both
>>>     in the ASA and in the GRASP core.  The value of the 'session_handle"
>>>     is opaque to the ASA.
>>> 
>>> This establishes the role and relationship of the two terms, while
>>> section 2.3.2.7 gives a possible implementation without as much
>>> confusion. (It will require some rewording to switch from session_nonce
>>> to session_handle. It already uses "session handle" in passing.)
>>> 
>>> 10) NIT: Terminology - ASA nonce
>>> 
>>> For similar reasons to those above for session_nonce/session_handle, IMO
>>> it would be clearer to use asa_handle rather than asa_nonce. But this is
>>> only a suggestion.
>> Roman suggested using different terms, e.g. ASA "nonce" and
>> Session "handle".
> 
> Again, only a suggestion.
> 
>> Thanks again for all your work on this.
> 
> You are welcome.
> 
>       Thanks,
>       Paul
> 
> _______________________________________________
> Gen-art mailing list
> Gen-art@ietf.org
> https://www.ietf.org/mailman/listinfo/gen-art

_______________________________________________
Gen-art mailing list
Gen-art@ietf.org
https://www.ietf.org/mailman/listinfo/gen-art

Re: [Gen-art] Gen-ART Telechat review of draft-ietf-anima-grasp-api-08

Reply via email to