Re: [regext] Benjamin Kaduk's Discuss on draft-ietf-regext-rdap-sorting-and-paging-17: (with DISCUSS and COMMENT)

Mario Loffredo Sat, 26 Sep 2020 07:00:46 -0700

Hi Benjamin,

thanks a lot for your extensive review. I apologize for the delay inreplying but I have been very busy the last two days and your feedbackis very detailed.


Please find my coments inline.

Il 23/09/2020 23:40, Benjamin Kaduk via Datatracker ha scritto:

Benjamin Kaduk has entered the following ballot position for
draft-ietf-regext-rdap-sorting-and-paging-17: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-regext-rdap-sorting-and-paging/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

Should we say something about which order the sorting criteria are
applied (first to last vs last to first) when multiple sortItems are
specified in a query?

[ML] The common interpretation is from left to right so I don't think weneed to clarify this concept.

I recognize that in the HATEOS model, the actual JSONPaths reported by
the server should be used by the client to determine what a given sort
property does, but it also seems like it would be confusing for this
document to specify (e.g.) an "email" property with specific JSONPath,
and then have a server go off and use "email" to mean something else,
even if that is just the addition of "pref" as discussed at the end of
Section 2.3.1.  Do we want to try to have the properties defined by this
document be universally defined and encourage the use of new/different
property names for variations on them?  (The answer may well be "no",
but the answer is not intuitively clear to me.)  To put it another way,
is the list in Section 2.3.1 normative, or just an example?

[ML] I would say "normative" just to facilitate interoperability andavoid ambiguities. Maybe it could be enough to say that the sortingproperties deifined in the document are considered reserved so an RDAPserver MUST not map them onto other RDAP response values.


Does it work for you?


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

Section 1

    However, there are some drawbacks associated with the use of the HTTP
    header.  First, the header properties cannot be set directly from a
    web browser.  Moreover, in an HTTP session, the information on the
    status (i.e. the session identifier) is usually inserted in the
    header or a cookie, while the information on the resource
    identification or the search type is included in the query string.
    The second approach is therefore not compliant with the HTTP standard
    [RFC7230].  As a result, this document describes a specification
    based on the use of query parameters.

A few more words (section number from 7230?) on why the second approach
is not compliant with HTTP might help the reader, though it isn't
stricly necessary (we're not using it, after all).

[ML] Could it be better to replace RFC7230 with RFC7231 and put arefernce to Section 8.3.1(https://tools.ietf.org/html/rfc7231#section-8.3.1) ?

Section 2.1

       *  "jsonPath": "String" (OPTIONAL) the JSONPath of the RDAP field
          corresponding to the property;

What is this path relative to?  (Does the client have to know from the
other context what type of object it refers to?)

[ML] All the JSONPath expressions defined in the document are relativeto the root of an RDAP response. The sorting_metadata object isincluded in the same response so I think that the context is clear andno further clarification is needed.


       *  "links": "Link[]" (OPTIONAL) an array of links as described in
          [RFC8288] containing the query string that applies the sort
          criterion.

Just to check: this is going to have the same structure for a Link
object that draft-ietf-regext-rdap-partial-response does?  (I am not
coming up with a great way to deduplicate the definitions, off the top
of my head.)

[ML] Yes. The sorting links have the same structure as the subsettinglinks (see Section 2.3.2.).

    o  "pageSize": "Numeric" (OPTIONAL) a numeric value representing the
       number of objects returned in the current page.  It MUST be
       provided if and only if the total number of objects exceeds the
       page size.  This property is redundant for RDAP clients because
       the page size can be derived from the length of the search results
       array but, it can be helpful if the end user interacts with the
       server through a web browser;

If it's redundant, we should probably say something about error handling
for when the things that are supposed to be identical have different
values.

[ML] I think this situation is very unlinkely. Anyway, in this case,the length of the results array really counts. Obviously, it is a bitmore likely that the totalCount value might be different from the sumof the number of results in each page. In fact, even if theregistration data can't be considered real-time data, it might happenthat the count parameter is present in the initial query and it mighttake time to scroll the result set, so there could be a small likelihoodthat the initial totalCount value is obsolete because the result set ischanged in the meantime. Also in this case, the sum of each result arraylength really counts.


Should I write something about?

Section 2.3

    Except for sorting IP addresses, servers MUST implement sorting
    according to the JSON value type of the RDAP field the sorting
    property refers to.  That is, JSON strings MUST be sorted
    lexicographically and JSON numbers MUST be sorted numerically.  If IP
    addresses are represented as JSON strings, they MUST be sorted based
    on their numeric conversion.

There are more JSON types than string and number; are those other types
garanteed to not appear in sortable RDAP fields?  (I can't see how such
a guarantee could be made, given that servers can define their own
sorting properties.)

[ML] The other primitive JSON type remaining is boolean but I don'tthink it makes sense to sort by a boolean property. Instead, I missedthat those values denoting dates and times MUST be sorted inchronological order even if they are strings. I'll update the sentenceas in the following:


Except for sorting IP addresses and values denoting dates and times, servers 
MUST implement sorting
   according to the JSON value type of the RDAP field the sorting
   property refers to.  That is, JSON strings MUST be sorted
   lexicographically and JSON numbers MUST be sorted numerically.
   Values denoting dates and times MUST be sorted in chronological order.  If IP
   addresses are represented as JSON strings, they MUST be sorted based
   on their numeric conversion.

Does it work for you?

    If the "sort" parameter reports an allowed sorting property, it MUST
    be provided in the "currentSort" field of the "sorting_metadata"
    element.

nit: is "reports" the best word to describe this behavior (which, IIUC,
is "present in the query component of the request URL"?

[ML] Sounds better.

Section 2.3.1

    In the "sort" parameter ABNF syntax, property-ref represents a
    reference to a property of an RDAP object.  Such a reference could be
    expressed by using a JSONPath.  The JSONPath in a JSON document

nit: is there a missing word here ("a JSONPath expression")?

[ML] Just for coinciseness, may I use "jsonpath" to mean "JSONPathexpression" and keep "JSONPath" to refer to the specification?

I could write something like: "JSONPath expression (named "jsonpath" inthe following)"

    o  Note that some of the object specific properties are also defined
       as query paths.  The object specific properties include:

nit: the list structure in this item does not seem parallel to the
structure of the first item.


[ML] OK. I'll change the sentence as in the following:

Object specific properties.  Note that some of these properties
      are also defined as query paths.  These properties include:

       as two representations of the same value.  By default, the
       unicodeName value MUST be used while sorting.  When the
       unicodeName is unavailable, the value of the ldhName MUST be used
       instead;

I'm not entirely sure how much value "by default" adds here.  Would the
meaning be different if we said "The unicodeName value MUST be used
while sorting if it is present; when the unicodeName is unavailable, the
value of the ldhName is used instead"?

[ML] No, it wouldn't. I'll change the sentence as you suggest.

    o  The jCard "sort-as" parameter MUST be ignored for the sorting
       capability described in this document;

It's a little bit of a juxtaposition to refer to jCard here in the prose
but vcard in the table.

[ML] I would keep it as is. Instead, I would replace all the "vcard"occurrences with "jCard". Being jCard a transliteration of vCard inJSON, it seems appropriate to me to keep the references to RFC6350sections and to use the corresponding jCard elements for the mappingbetween the sorting properties and the RDAP response elements. Besides,I would write a sentence about the fact that jCard is the JSON format ofvCard, add a link to RFC7095 and insert RFC7095 among the NormativeReferences.


Do you agree?

    o  Even if a nameserver can have multiple IPv4 and IPv6 addresses,
       the most common configuration includes one address for each IP
       version.  Therefore, the assumption of having a single IPv4 and/or
       IPv6 value for a nameserver cannot be considered too stringent.

I disagree with the flat assertion that it "cannot be considered too
stringent".  It can be so considered, as a matter of difference of
opinion; what is appropriate to do here is to say that this
document/protocol makes the assumption (especially since we go on to
describe the exception-handling procedure when the assumption is
violated).


[ML] May I udpate that sentence as in the following?

OLD

Therefore, the assumption of having a single IPv4 and/or
      IPv6 value for a nameserver cannot be considered too stringent.

NEW

Therefore, this specification makes the assumption that nameservers have a 
single IPv4 and/or
      IPv6 value.

    o  Multiple events with a given action on an object might be
       returned.  If this occurs, sorting MUST be applied to the most
       recent event;

This makes a lot of sense as the default and I don't propose changing it
now, but I do wonder how hard it would be to add support later for
sorting on (say) the oldest event instead.

[ML] Well, I wrote that sentence because some RDAP events can appearmultiple times. For example, a domain might be locked-unlockedrepeatedly. The purpose of that sentence is just to avoid ambiguitiesand implicitly suggest RDAP providers to arrange events with the sametype in descending chronological order.

    The "jsonPath" field in the "sorting_metadata" element is used to
    clarify the RDAP field the sorting property refers to.  The mapping
    between the sorting properties and the JSONPaths of the RDAP fields
    is shown below:
    [...]
       name

          $.domainSearchResults[*].unicodeName

This seems to ignore the subtlety regarding unicodeName vs ldhName.  Is
there a way it could be expressed in JSONPath?

[ML] If unicodeName and ldhName were alternative, the JSONPath unionoperator would fit (i.e.$.domainSearchResults[*].[unicodeName,ldhName]). Currently, RFC7483contains no assumption about when they should/must be present butexamples seem to recommend to present unicodeName only for IDNs. Whenboth the properties are present, the union operator doesn't fit exactlyand I haven't still found the right JSONPath expression based only onthe basic operators. However, since the "jsonPath" member is only fordocumentation, the aforesaid JSONPath expression could be the mostsuitable for conveying that sorting is applied on a kind of <unicodeName, ldhName> combination.

    o  Nameserver

       name

          $.domainSearchResults[*].unicodeName

Presumably this is supposed to be nameserverSearchResults?

[ML] Absolutely. It's a cut-and-paste typo :-)


Section 2.4

I think we want another introductory paragraph like:

% The cursor parameter is used by the server to preserve information
% about the pagination state of a given query's results across calls to
% the search API, so that successive requests by the client can return
% page N, N+1, N+2, etc.  Its value is only required to be interpretable
% by the server and could be implemented, for example, as an opaque
% database lookup key.  If a server does use a method for generating
% cursor values that involves internal structure, such as the one
% described below, the server needs to recognize that the value supplied
% by a client could have been modified (maliciously), and implement
% appropriate bounds-checking and similar measures when parsing received
% values.

The current wording strongly suggests that base64-encoding a meaningful
value that the client could inspect or even construct is required, and I
do not think that is very maintainable or what was intended, given the
current second paragraph ("servers can change the method over time
without announcing anything to clients").

(side note) I'm also pretty partial to the way JMAP discusses returning
(paginated, but non-uniformly) changes to a given data stream, e.g., at
https://www.rfc-editor.org/rfc/rfc8620.html#section-5.2 -- any given
state is named, and you can get "stuff starting at <named state>" and
the name to use for the state as of the current reply.


[ML] Maybe I didn't make myself clear.

The Base64 encoding is a simple (unrecommended) trasformation to makethe cursor value opaque to the client. It just seemed suitable to me forbeing used in some examples.But if you take a loook at the example ofFigure 6, you may note that you can't obtain a meaningful result bysimply Base64-decoding the cursor value. Definitively, the method toencrypt the cursor value must be more complex than a mere Base64 encoding.

Regarding the sentence between brackets, it means that servers canchange the underlying pagination strategy without having an impact onclients. A server can initially implement the offset pagination and thenturns to the keyset pagination but this has no effect on clients' features.

The same concepts about the checks that servers should make in order tocheck the cursor value are reported both in the "Negative Answer"section and in Appedix C.3. "Paging"

Anyway, I'll try to integrate your text in the current document and adda sentence with the purpose of discouraging the use of theBase64-encoding in the cursor implementations.

Section 4

If the server doesn't have access to an efficient (e.g.) counting
operation on the backend, would we recommend that the server not support
sorting/pagination, since there's not much benefit from having the
server pull up all the results and count them just to be able to return
the total count value back to the client, and then go do the same work again
when the client asks for the next page of results?

[ML] In my implementation the RDAP server doesn't present the countoperator in the sorting and paging links. The number of results doesn'tchange at all if the result set is sorted by a property rather thananother. The same generally occurs (as I wrote above) if the client isscrolling the result set pages. So why to repeat the count parameter inthe links? The totalCount value is returned in the response to theinitial query and, as It is no more repeated in the links, the countingoperation is not executed. Therefore, we don't need to make particularassumptions about the performance of counting operation.

Section 7

I suggest noting that (encoded) structured "cursor" values present a new
attack surface on the server that needs to be protected.

[ML] Sorry, could you futherly explain this concept? AFAIK, it ispossible to protect REST API endpoints but not query parameters.

    results in a response.  However, this last security policy can result
    in a higher inefficiency if the RDAP server does not provide any
    functionality to return the truncated results.

I'm not sure I understand (or agree with) this last sentence -- it seems
that unlateral silent truncation of results by the server leads to not
just inefficiency but also potential security considerations in its own
right, with the client not knowing that it has incomplete results.
Also, if the server is truncating the results, by definition it "has
functionality to return the truncated results" -- that's what it's
doing!  So I assume the intent was to say something about negotiating or
indicating that the results are truncated, not actually doing the
truncation.

[ML] I think that servers legitimately truncate the result sets tomitigate the risk of resource exhaustion and consequent denial ofservice. The implementation of the capablities described in thisdocument makes servers to keep on managing sustainable result sets and,at the same time, increases clients'possibility to avoid truncation andfind relevant results.


    The new parameters presented in this document provide RDAP operators
    with a way to implement a server that reduces inefficiency risks.

[same question about "inefficiency" being the right word]

[ML] Maybe I can replace the phrase "that reduces inefficiency risks."with the phrase "that reduces the risk of resource exhaustion andconsequent denial of service".


Are you ok with it?

Appendix B

    o  It does not allow direct navigation to arbitrary pages because the
       result set must be scrolled in sequential order starting from the
       initial page;

(side note) I didn't follow the references, so maybe this was covered
there, but I don't quite follow why direct navigation is impossible.  If
you use a key field for seeking, can't you just start in the middle from
some known value for that key field?

[ML] Especially when you know the total counf of a result set, you candirectly jump to a specific point in the result set through offsetpagination but you can't do the same through keyset pagination becauseyou don't know the key value at that point in advance. One can wonder:what jumping in the result set is use for? Well, for example, if you arelooking for a specific item in a ordered collection of items, you couldfind it through the quicksort algorithm.


Appendix C.2

    total count.  Therefore, as "totalCount" is an optional response
    information, fetching always the total number of rows has been

I'm not entirely sure in what sense "optional response information" is
intended -- my reading of Section 2.1 is that it's mandatory to return
totalCount if the client included the 'count' query parameter.

[ML] Exactly but it isn't returned always. For this reason, it is anoptional member of the paging_metadata object.



Looking forward for your reply to my questions/comments.

Best,

Mario

--
Dr. Mario Loffredo
Systems and Technological Development Unit
Institute of Informatics and Telematics (IIT)
National Research Council (CNR)
via G. Moruzzi 1, I-56124 PISA, Italy
Phone: +39.0503153497
Mobile: +39.3462122240
Web: http://www.iit.cnr.it/mario.loffredo

_______________________________________________
regext mailing list
regext@ietf.org
https://www.ietf.org/mailman/listinfo/regext

Re: [regext] Benjamin Kaduk's Discuss on draft-ietf-regext-rdap-sorting-and-paging-17: (with DISCUSS and COMMENT)

Reply via email to