Paul, thanks for your reviews. Brian, thanks for your responses. I entered a No Objection ballot and would like to see resolution for Paul’s outstanding concerns, particularly as they pertain to Section 2.3.5.
Thanks, Alissa > On Dec 2, 2020, at 12:29 PM, Paul Kyzivat <pkyzi...@alum.mit.edu> wrote: > > Brian, > > On 12/1/20 8:13 PM, Brian E Carpenter wrote: >> Hi Paul, >> Comments in line. There's one definite good catch in your >> review, and obviously more clarifications are needed. > > I'll comment this one last time. Then I will be satisfied that I have > communicated my thoughts, and I'm happy with whatever you decide to do. > >> On 01-Dec-20 15:06, Paul Kyzivat wrote: >>> I am the assigned Gen-ART reviewer for this draft. The General Area >>> Review Team (Gen-ART) reviews all IETF documents being processed by the >>> IESG for the IETF Chair. Please wait for direction from your document >>> shepherd or AD before posting a new version of the draft. For more >>> information, please see the FAQ at < >>> http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>. >>> >>> Document: draft-ietf-anima-grasp-api-08 >>> Reviewer: Paul Kyzivat >>> Review Date: 2020-11-30 >>> IETF LC End Date: 2020-10-28 >>> IESG Telechat date: 2020-12-01 >>> >>> Summary: >>> >>> This draft is on the right track but has open issues, described in the >>> review. >>> >>> General: >>> >>> This document has addressed some of the concerns I had during the last >>> call review. However some of my concerns remain and some new ones have >>> arisen in this version. >>> >>> Issues: >>> >>> Major: 3 >>> Minor: 6 >>> Nits: 1 >>> >>> 1) MAJOR: Negotiation >>> >>> The text in section 2.3.5 now makes clear that the sequence of steps in >>> the negotiation is non-deterministic - both sides can call >>> negotiate_step and negotiate_wait. I believe this can result in the two >>> sides not agreeing on what values have been negotiated. (For instance, >>> what if one side calls negotiate_step concurrently with the other side >>> calling end_negotiate? Which value has been agreed upon?) >> The negotiate_step calls alternate between the two peers, until one of them >> calls end_negotiate (or a timeout kills the session). I hoped that >> was clear in the protocol diagram. We can make it explicit, for people >> who haven't fully digested the protocol spec. Since that's ~50 pages, it >> certainly takes some digestion. >> (negotiate_wait would be interjected by the peer who has the next go, simply >> indicating that the next step is delayed. That could happen, for example, >> if the ASA needed to negotiate something with a third party before >> continuing.) > > The figure shows alternation, but IIUC that is only an example. I understood > the text as permitting either side to perform negotiate_step or > negotiate_wait at will. A requirement for alternation would solve the problem. > > Reading it again, I see how it can be interpreted as you say is intended. But > it isn't clear. To clarify, is the intent that a negotiate_step, > negotiate_wait, or end_negotiate is required in response to receipt of > request_negotiate or negotiate_step? And a negotiate_step or end_negotiate is > also required following the sending of negotiate_wait. And those are the only > permitted sequences? > > A state machine would be helpful to show this in an unambiguous way. > >>> The loop_count >>> adds to the confusion. Are the two sides intended to have independent >>> loop count values? It seems these too can become unsynchronized. >> The loop count also bounces backwards and forwards in alternate steps. >> Again, we can underline that in the text. > > That would be helpful. > >>> Also, the goal of negotiation isn't clear to me. I gather it must be for >>> the two sides to agree on a particular value for the objective. But for >>> that to work there must be some rules about how values can change in >>> each step so that the result stabililizes, rather than causing a battle >>> that ends with loop count exhaustion. This could be achieved by always >>> negotiating *down*, or always *up*. But that requires that the objective >>> value type have an ordering function. Given the general nature of the >>> objective I don't think that can be assumed. >> No, it explicitly is not defined either in the protocol nor the API. >> The syntax and semantics of the objective value are defined per-objective, >> and the objective might or might not be ordered. So there is intentionally >> no answer to your question. >> In most cases I'd expect that there would be an ordering but we didn't want >> to constrain the use cases in that way. Also note that a failed negotiation >> (e.g. the loop count expires, or where one end simply rejects the other's >> offer) is not a protocol failure. >>> >>> ISTM that more work is needed to define the negotiation process in a way >>> that ensures it ends with both sides agreeing on a single value for the >>> objective. >> As noted, that is per-objective. The most complicated case I've coded >> is IP prefix assignment, and it works fine, except that if there is >> no prefix available of the maximum desired length, the requester ends >> up unsatisfied - as intended. There should be no condition in which >> the negotiation loops indefinitely; it either succeeds or fails. > > Without that the result in non-deterministic. The two sides may have > conflicting goals, and then the result will only be determined by the loop > count and timeout. > > Alternately, implementors will establish side agreements that aren't governed > by standards. > > That seems like an undesirable state of affairs. > >>> 2) MINOR: Dry Run Negotiation >>> >>> Dry Run negotiation is very under-specified. Why would it be used? I >>> guess that an ASA might use dry run negotiation to inform future actual >>> negotiation. Can anything be inferred from a dry run negotiation about >>> how an actual negotiation will go? When participating in a dry run >>> negotiation, how should an ASA decide what response to make? Should it >>> take into account current resource availability? Or should it respond >>> based on best-case or worst-case resource availability? Or what? >>> >>> This requires further clarification. >> Again, *all* these issues are specific to the objective in question, >> and they are intentionally not addressed in the protocol design or >> the API. > > So this is a semantics free feature? Then all semantics are determined by > side agreements? Again, IMO this is a bad thing. > >>> 3) MAJOR: Confusing semantics of 'request_negotiate' >>> >>> In section 2.3.5 I don't understand the following: >>> >>> 1. The 'session_nonce' parameter is null. In this case the >>> negotiation has succeeded in one step and the peer has >>> accepted the request. The returned 'proffered_objective' >>> contains the value accepted by the peer, which is therefore >>> equal to the value in the requested 'objective'. For this >>> reason, no session nonce is needed, since the session has >>> ended. >>> >>> IIUC this requires a network exchange with the peer. I don't see how >>> this can complete *immediately*. ISTM that this could only complete >>> immediately if it were satisfied from a local cache. That doesn't seem >>> appropriate for this function. >> I changed that at your request from "immediately" to "in one step": the >> request_negotiate() from A gets back an immediate end_negotiate() >> from B and we're done. If "in one step" still isn't clear, we could >> change to "in one message exchange". >>> Similarly, in bullet 2 I don't see how the proffered_objective would be >>> available in the initial call, before a response has been received from >>> the peer.. >> It can't. The call doesn't complete until the first response has arrived >> (in a threaded implementation). >>> Does "immediately" here simply mean that the negotiation is completed in >>> one exchange between the two ends? If so, isn't a session nonce still >>> required in an event loop implementation in order to handle the one >>> response? >> I think that's a good catch. It's fixable, because the session nonce is >> generated anyway when the request message is generated, but if the >> return code is noReply, the nonce is needed. The same applies to >> synchronize(). The issue is implied by this bullet: >> * Event loop implementation: An additional read/write 'session_nonce' >> parameter is used. >> but that's incomplete. >> My bad. I missed this point because I only coded the threaded version, >> which is the natural approach in Python. We need to add an extra case >> that only applies in the event loop case: >> * If the 'errorcode' parameter has the value 2 ('noReply'), no response >> has been received so far. The 'session_nonce' parameter must be presented >> in subsequent calls. > > IIUC, the session nonce isn't needed at all in threaded implementations. And > in the event loop implementation it is needed, even in this case. So ISTM > this need not be singled out as a special case. > >>> Bullet 2 also says: >>> >>> ... The >>> returned 'proffered_objective' contains the first value >>> proffered by the negotiation peer. The contents of this >>> instance of the objective must be used to prepare the next >>> negotiation step (see negotiate_step() below) because it >>> contains the updated loop count, sent by the negotiation >>> peer. The GRASP code automatically decrements the loop >>> count by 1 at each step, and returns an error if it becomes >>> zero. >>> >>> I guess that the 'proffered_objective' in the return parameters is the >>> counter-offer to the objective passed in the call. And that you expect >>> the objective value used in any subsequent negotiate_step to be derived >>> by modifying this value. So far this new wording has improved my >>> understanding. >> Yes, and it's probably worth adding the term "counter-offer" for clarity. >>> But the loop_count in the objective is especially confusing. It seems >>> that it is handled quite differently from the rest of the objective. You >>> specify (in 2.3.2.3) that it has a default value of GRASP_DEP_LOOPCT. >>> But who is expected to initialize this? (Is it simply that the ASP >>> should use this value if it doesn't have any particular preference?) >> Yes, that's the intention (but the details depend on the >> implementation; in my Python code the class definition for >> 'objective' sets the default). The default is set quite low >> since we assumed that normal negotations require few steps. >>> Then you say that the GRASP decrements this. Is this decrementing done >>> on the calling side before sending the message, the calling side after >>> receiving the response? Or by the peer, on receipt or when sending the >>> response? Is it permissible for the ASA to modify this value during >>> negotiation? Since this seems intended to prevent a loop, having clarity >>> about how this value is managed seems important. >> That is well specified in the protocol spec itself. The sender >> of each M_NEGOTIATE message decrements it, i.e. it happens >> inside negotiate_step(). It would do no harm to mention that. >> What happens if an ASA messes with the loop count? Bad things >> could happen, but just as one end can prolong the timeout with >> negotiate_wait(), it could prolong the negotiation with >> obj.loop_count += 1 if it's useful. That, for example, enables >> using GRASP for bulk transfer, if you don't like FTP (joke >> intended). (Not really a joke, though; I have running code for >> simple file transfer using GRASP, and I find it a handy way to >> move stuff from Windows to my Linux test machine.) >> A malicious ASA could do all sorts of damage, so I don't think >> that loop count manipulation is a serious concern. The other >> end, if paranoid, can always check if the loop count is behaving >> strangely. > > IIUC the only involvement the application should have with the loop count is > to override the default at the start of negotiation. After that, it is really > up to the GRASP to manage it. If that is right, why expose it as part of the > objective? Why not simply allow the override value in the request_negotiate > call, and otherwise let the GRASP manage it for the duration of the > negotiation? > >>> 4) MINOR: negotiate_wait >>> >>> The negotiate_wait call allows one ASA to extend the timeout of another >>> ASA. This could, in perverse cases, cause an ASA to wait indefinitely. >>> ISTM that this is dangerous. I would think it better make the other ASA >>> aware of the desire to extend the timeout and let it decide whether to >>> do so. >> That's more of a comment on the GRASP spec & implementation than >> on the API. But our trust model is that if a node can get access >> to the autonomic control plane, we trust the ASAs in that node. >> There is new text on that aspect following the Security Area review. > > As you wish. > >>> 5) MAJOR: Consistency of Objective definitions >>> >>> In section 2.3.2.3 and elsewhere, presumably all parties that use a >>> particular objective must agree on the values of synch, neg, dry, and >>> the size and structure of the value. >>> >>> There is no communication of the size and structure in the abstract API. >>> Presumably the implementation of a language binding to the API is >>> required to at least communicate the size and alignment requirements to >>> the core. >> No, it's not needed. That's the great advantage of using CBOR; >> the CBOR object is self-defining and flows opaquely through the >> GRASP code (modulo the maximum message size, which should prevent >> buffer overflow issues). > > Sorry. I was thinking that CBOR was just a possibility for the value. Now I > see that the value be representable in CBOR and in fact will be so > represented on the wire. > > Based on that I withdraw my concern. > >>> The matching of definitions between nodes must be achieved >>> solely by the name, >> Yes. >>> the respective language bindings at the two ends, >> No, again CBOR bridges over this. In a C implementation, >> the value part of the objective is delivered to the ASA >> as a CBOR-encoded byte string, and the programmer uses >> a CBOR library to de-serialize it. In Python, Ruby, etc. >> the API can deserialize it automatically. >>> and out of band mutual agreement. >> Yes, that's why each objective needs a spec of its syntax and semantics >> and an IANA registration of its name. (Or it can use the name format >> for privately defined objectives, but it still needs a defined syntax >> and semantics). > > Ah, got it! While I read draft-ietf-anima-grasp in preparation for my last > call review I obviously didn't absorb it all, and missed this point. > >>> Furthermore, different language >>> bindings may use different in-memory representations of the value. In >>> such cases, how is the on-wire format to be determined? >> CBOR (specified using CDDL). For the protocol itself, that is all >> you need to specify. The semantics needs English. > > OK. Got it. > >>> If the two ends disagree on size and structure then problems will occur. >>> Perhaps the core can identify size mismatches based on size communicated >>> on the wire vs the size defined by the language binding, but there are >>> no error codes defined for this situation. And of course differing >>> structures with the same size would not be detectable. >>> >>> Furthermore, there is potential for different ASAs to (accidentally) >>> have incompatible definitions for the same objective. What happens in >>> this case? How can blame be ascribed so that the problem can be fixed? >> You have to find out which one doesn't meet the spec. It's app-level >> debugging. (It's also one of the reasons I made sure my prototype >> could run two separate instances in one machine: makes debugging >> a whole lot easier.) > > The IANA registration of names/formats resolves my concern. I suppose that > means that revisions to the format will require choosing a new name. > >>> IMO more needs to be said about all of this. At the least a number of >>> disclaimers that put the burden on the ASAs to recognize the risk, take >>> these potential problems into account and avoid them. But there could be >>> some requirements placed on API language bindings and core >>> implementations to deal with some of these. And probably some added >>> error codes to report what problems can be detected. >> The only one I found necessary was CBORfail ("CBOR decode failure") >> and that's pretty fatal. It means the other party has sent a byte >> string that is not in fact valid CBOR. Beyond that, the value >> is passed up to the consumer (the ASA) which does indeed have to >> validate the contents. That's CBOR 101, but it should be stated. > > OK. > >>> 6) MINOR/MAJOR: Session State >>> >>> I continue to find the lifetime and state of a session to be unclear. >>> The API calls that return session_nonce seem to signal creation of a new >>> session. The end_negotiate() call seems to terminate a negotiation >>> session. But what causes other sessions to end? This seems important >>> because there is state associated with a session that consumes resources >>> and can't be reclaimed until the session ends. So it should be important >>> for the ASA to end all sessions. Some clarification of this seems >>> important both for core implementors and for ASA developers that will be >>> using the API. >> I'm not sure this belongs in the API though. The ASA doesn't need >> to worry about this. A session ends either when it completes normally >> by end_negotiate(), or a synchronize() reply is received, or discovery >> terminates, or a session times out, or a loop count is exhausted, or >> something causes a socket error, etc. There are 29 calls to >> _disactivate_session() in my GRASP implementation. > > So IIUC the GRASP will is expected to discard all state either when it sees > an end_negotiate, the loop count is exhausted, or the timeout expires. In > that case my concerns are resolved. > >>> (Or is this document only for implementors of core and those >>> instantiating a particular language binding of the API, with >>> documentation for end users left to others?) >> I'm pretty sure we'll need an O'Reilly book one of these days. >> But in fact draft-ietf-anima-asa-guidelines is on its way, >> and will be recommended reading for ASA implementers, along >> with the API. >>> 7) MINOR/MAJOR: Timeout >>> >>> Section 2.3.2.2 indicates that the API returns an error response to the >>> ASA if the timeout expires. But the other end is presumably still >>> working on the request and will eventually send a response. What does >>> the core do when it receives this? Must it retain state so that it can >>> detect the case and ignore the message? It seems that this could result >>> in the two peers disagreeing on some state. >> The timeout expiring is fatal to the session, so any further >> messages for that session will not be processed. Depending on >> the sequence of events, the actual error code returned at both >> ends might vary. The most likely case is that it will show up >> as a socket error at both ends - a read() timeout at the >> receiving end and a dead socket at the sending end. >> So the two ends will both see a failed negotiation. > > As long as the GRASP discards incoming messages for sessions it has no state > for then this seems fine. > >>> 8) MINOR: Text regarding "minimum_TTL" >>> >>> There is a small problem with the following in section 2.3.4: >>> >>> - If the parameter 'minimum_TTL' is greater than zero, any >>> locally cached locators for the objective whose remaining time >>> to live in milliseconds is less than or equal to 'minimum_TTL' >>> are deleted first. Thus 'minimum_TTL' = 0 will flush all >>> entries. >>> >>> The first sentence qualifies the paragraph to cases where minimum_TTL is >>> greater than zero. But the final sentence then infers the behavior when >>> minimum_TTL is equal to zero. >>> >>> Also, minimum_TTL is typed as an integer, which permits negative values. >>> I gather that negative values are not allowed. I can suggest two ways to >>> fix this: >>> >>> - The parameter 'minimum_TTL' MUST be greater than or equal to >>> zero. Any locally cached locators for the objective whose >>> remaining time to live in milliseconds is less than or equal to >>> 'minimum_TTL' are deleted first. Thus 'minimum_TTL' = 0 will >>> flush all entries. >> OK. >>> Or, change they type to unsigned integer. >> Not all languages have unsigned integers, so I'd prefer your version. >>> Then the statement can be >>> simplified by removing the first sentence: >>> >>> - Any locally cached locators for the objective whose remaining >>> time to live in milliseconds is less than or equal to >>> 'minimum_TTL' are deleted first. Thus 'minimum_TTL' = 0 will >>> flush all entries. >>> >>> 9) MINOR: Terminology - Session nonce >>> >>> The new first paragraph of section 2.2.3 talks about identifying the >>> session by a pseudo-random session identifier, and tagging it with an IP >>> for further uniqueness. The 2nd paragraph talks about a session_nonce. >>> It isn't clear at this point in the text if these the same thing. Or is >>> the session id shared on the wire, the IP tag added by the core, and the >>> session_nonce an artifact of the API, shared only between the ASA and >>> the core? >> Yes, it's locally munfactured, but the natural implementation is >> specified in the GRASP spec: >> However, there is a finite probability that two nodes might generate >> the same Session ID value. For that reason, when a Session ID is >> communicated via GRASP, the receiving node MUST tag it with the >> initiator's IP address to allow disambiguation. >> Will clarify. >>> >>> Section 2.3.2.7 seems to confirm that the nonce is just an identifier >>> used between the core and the ASA. But here it says that using the id >>> plus the IP is simply one possible implementation choice. >> Right. Because although GRASP says what I just quoted, an implementor >> might choose anything else that they preferred. Is there any reason >> to specify this more tightly? > > No. No need. Its just a matter of being clear. My confusion arose from the > text in 2.2.3: > > ... It is > identified by a pseudo-random session identifier tagged with an IP > address of the initiator of the session to guarantee uniqueness. > ... > On the first call in a new GRASP session, the API returns a > 'session_nonce' value based on the GRASP session identifier. > > I took this as specifying the representation of the session noncee. (Though > on rereading I see the "based on" does give some wiggle room.) > > Perhaps a minor tweak in wording would help: > > On the first call in a new GRASP session, the API returns a > 'session_nonce' value used to identify the session in the API. > > Then 2.3.2.7 spells out implementation options for this. > >>> Further, I question whether "nonce" is the best term to use here. ISTM >>> that "handle" (session_handle) would more clearly reflect the purpose of >>> this item. >> We could debate that at length, I think. To me 'nonce' suggests >> something that's intentionally meaningless and for a single session. >> 'Handle' suggests something with longer term significance. >> https://en.wikipedia.org/wiki/Handle_(computing) seems to be >> on my side. Anyway, we will await advice from the AD before >> changing the terminology. > > Good naming is difficult. This was only a suggestion. > >>> I think it would be helpful to be clearer in distinguishing what is >>> fundamental vs what is implementation choice. For instance, in section >>> 2.2.3: >>> >>> A GRASP session consists of a finite sequence of messages (for >>> discovery, synchronization, or negotiation) between a pair of ASAs. >>> The core identifies it on the wire by a pseudo-random session >>> identifier. Further details are given in [I-D.ietf-anima-grasp]. >>> >>> On the first call in a new GRASP session, the API returns a >>> 'session_handle' value used to identify the session. This >>> value must be used in all subsequent calls for the same session, and >>> will be provided as a parameter in the callback functions. By this >>> mechanism, multiple overlapping sessions can be distinguished, both >>> in the ASA and in the GRASP core. The value of the 'session_handle" >>> is opaque to the ASA. >>> >>> This establishes the role and relationship of the two terms, while >>> section 2.3.2.7 gives a possible implementation without as much >>> confusion. (It will require some rewording to switch from session_nonce >>> to session_handle. It already uses "session handle" in passing.) >>> >>> 10) NIT: Terminology - ASA nonce >>> >>> For similar reasons to those above for session_nonce/session_handle, IMO >>> it would be clearer to use asa_handle rather than asa_nonce. But this is >>> only a suggestion. >> Roman suggested using different terms, e.g. ASA "nonce" and >> Session "handle". > > Again, only a suggestion. > >> Thanks again for all your work on this. > > You are welcome. > > Thanks, > Paul > > _______________________________________________ > Gen-art mailing list > Gen-art@ietf.org > https://www.ietf.org/mailman/listinfo/gen-art _______________________________________________ Gen-art mailing list Gen-art@ietf.org https://www.ietf.org/mailman/listinfo/gen-art