Re: [DISCUSS] Describing REST Server capabilities

Jean-Baptiste Onofré Thu, 27 Jun 2024 02:34:08 -0700

Hi Jack

I like Robert's proposal. Back to the topics, I think grouping with
tags is more "flexible" (it was what we included in the REST spec
proposal as well).


Regards
JB

On Wed, Jun 26, 2024 at 6:26 PM Jack Ye <yezhao...@gmail.com> wrote:
>
> It seems like there are 2 sub-topics here:
> 1. should we group operations with tags, or should we do this 
> per-operation/endpoint?
> 2. how should we do the capability/versioning for each unit (either per tag 
> or per operation)
>
> Shall we first conclude on 1?
>
> For 1, my take is that we will need to do it per operation, for 2 reasons:
>
> (1) There are many REST services that would only implement a very small set 
> of APIs, such as just loadTable and loadView. Some will choose to not 
> implement very specific endpoints, such as renameTable. Tags seems convenient 
> but it is mandating people to implement a specific group of APIs together, 
> which is a lot of burdens for especially small organizations, if they just 
> want to support very specific goals like reading through IRC.
>
> (2) Suppose a new tag is added in the future, the server returns that tag, 
> but an older client does not understand it, it might cause mistakes in the 
> client's understanding of what is supported and what is not, when a tag 
> contains both features in existing APIs and also new APIs. If we define that 
> tags do not overlap with each other, this is probably not a concern. However, 
> (1) still is a problem from a usability perspective.
>
> Best,
> Jack Ye
>
>
>
>
> On Wed, Jun 26, 2024 at 9:02 AM Daniel Weeks <dwe...@apache.org> wrote:
>>
>> I think Robert's approach is a reasonable compromise here.
>>
>> If we wanted a "per operation/endpoint" versioning, I think I'd prefer 
>> Micah's OpenAPI spec based approach because it's more standardized, but I 
>> feel adds a lot of client complexity.
>>
>> -Dan
>>
>>
>>
>> On Wed, Jun 26, 2024 at 6:59 AM Robert Stupp <sn...@snazy.de> wrote:
>>>
>>> (I think, compatibility deserves a separate thread - it's a "huge" topic)
>>>
>>> Based on experience, we decided on the following with Nessie:
>>>
>>> Unknown fields/attributes in a structure _DO_ cause (de)serialization 
>>> failures.
>>> "Stable API versions" - endpoint additions and/or added query parameters 
>>> and/or enhanced structures do _NOT_ require a new API version (as in the 
>>> endpoint's route/path).
>>> "Flexible spec versions" - new and updated "capabilities" however might 
>>> cause a bump in the "spec version" that the server announces in its 
>>> `getConfig` result.
>>>
>>> Adding new routes/paths may require new endpoint implementations on the 
>>> server side, which can easily lead to a lot of (unnecessarily boilerplate) 
>>> code. Using different routes/paths is justified if the API is changed 
>>> "fundamentally". We call the "path component" (api/v1/..., api/v2/...) API 
>>> version - the server indicates the minimum and maximum supported API 
>>> version, in case a client wants to "upgrade". I recommend to _not_ bump the 
>>> API version in the route/path if it's not really necessary.
>>>
>>> Regarding the requirement to fail on unknown attributes: Unknown attributes 
>>> may contain important information. A client may send a newer version of a 
>>> request object with an important new field, but the (older) server discards 
>>> the new attribute. Think of an attribute that for example defines a "commit 
>>> condition" that the client expects to be respected. "New" attributes must 
>>> be omittable (e.g. don't serialize if null/default) - clients indicate the 
>>> "usage" of an added attribute using some request attribute (for example: 
>>> "boolean returnExtendedInformation").
>>>
>>> The list of capabilities can be indicated with included "spec versions", to 
>>> tell clients which features/functionalities a server supports."Production" 
>>> spec versions could start with 1, and "reserve" 0 for 
>>> experimental/unsupported/poc kind of implementation. It could look like 
>>> this:
>>>   capabilities: [
>>>     "table-spec/2,3",   // but not table-spec v1 here
>>>     "view-spec/1",
>>>     "table-api/1",
>>>     "view-api/1",
>>>     "udf-api/1",
>>>     "super-feature/2,4,6",   // but not spec versions 0,1,3,5,7+
>>>     ...
>>>   ]
>>> Incrementing a spec version in the list of capabilities doesn't break any 
>>> client. We could also define a structure to describe each capability:
>>>   components:
>>>     schemas:
>>>       Capability:
>>>         name:
>>>           type: string
>>>           description: Name of the capability
>>>         versions:
>>>           type: array:
>>>           description: List of supported spec versions of this capability. 
>>> 0 means experimental (non-production) without any guarantees about the 
>>> stability of schema for request and response parameters.
>>>           items:
>>>             type: integer
>>>             format: int32
>>>
>>> In Nessie, we ensure backwards and forwards compatibility using a 
>>> specialized test suite that runs the "in tree" client against older server 
>>> versions and older client versions against the "in tree" server version. It 
>>> works fine for us for a few years now - and it did help preventing 
>>> compatibility issues.
>>>
>>>
>>> On 26.06.24 07:44, Péter Váry wrote:
>>>
>>> Hi everyone,
>>>
>>> A few considerations:
>>> - I think we should explicitly state which client/service interoperability 
>>> we are aiming for. I expect that we want to support both old client -> new 
>>> server, and new client -> old server communications.
>>> - I agree with Jack, that we should think about versions in advance - HMS 
>>> tried to be backwards compatible for everything, and that made it hard to 
>>> move forward / deprecate things.
>>> - Still we should try to keep the backwards incompatible changes minimal. 
>>> (All clients should be able to ignore unknown incoming fields / New 
>>> optional input parameter should drive new features / Try to avoid enums in 
>>> responses where we expect changes (?))
>>> - OTOH, it could be important for clients to know which of the backwards 
>>> compatible changes are implemented for the given server - so I would 
>>> decouple the URI from the versioning. Maybe major version change should 
>>> (could) change the URI, but backwards compatible changes should be served 
>>> on the same URI, but could be identified by different minor versions.
>>>
>>> This is exciting stuff!
>>> Thanks for pushing this forward!
>>>
>>> Peter
>>>
>>>
>>> On Wed, Jun 26, 2024, 00:15 Jack Ye <yezhao...@gmail.com> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I feel I do not see a good answer to why not just simply version each API? 
>>>> When using tag, it means I have to offer capabilities per-tagged group. 
>>>> However, I could for example just offer loadTable and nothing else in a 
>>>> catalog, and that should still be Iceberg REST compliant. And I think we 
>>>> need a versioning story anyway, there is no way around it.
>>>>
>>>> Here is the workflow in my mind with versioning:
>>>>
>>>> 1. Going forward, every time the REST catalog spec introduces any new API 
>>>> endpoints or backwards incompatible changes to the existing APIs, the 
>>>> version of the specific API is incremented. So suppose the PlanTable API 
>>>> is added, this API will be at version v1. Suppose UpdateTable is updated 
>>>> with a new update type, that API will be at version v2, but PlanTable will 
>>>> remain at v1.
>>>>
>>>> 2. a catalog must implement getConfig. This API is the only one that is 
>>>> required.
>>>>
>>>> 3. in getConfig, in the defaults map (it could be in some new metadata 
>>>> structure, but since we want strong backwards compatibility guarantee, 
>>>> reusing string maps seems to be the best way), server returns key-value 
>>>> pairs of:
>>>> - key: operation:<operationName>
>>>> - value: version number
>>>>
>>>> 4. the client assumes that the map is ordered, and resolves API versions 
>>>> sequentially. For example, suppose I have the following map:
>>>>
>>>> { "operation:planTable": "1", "operation:loadTable": "2" }
>>>>
>>>> Note that by "supporting", it means to return a response in a predictable 
>>>> way that is compliant with the spec. It can also return 406 
>>>> UnsupportedOperation as a way to support it.
>>>>
>>>> There is also a special version *, that means any version can work.
>>>>
>>>> 5. Backwards compatibility: suppose the client is at a higher version than 
>>>> the server, then the client should always be able to understand the 
>>>> server's full list of capabilities.
>>>>
>>>> 6. Forward compatibility: suppose the client is at a lower version than 
>>>> the server, then the client should parse whatever operation it 
>>>> understands, and use the highest version it could support to execute the 
>>>> operation. Suppose the client only supports loadTable v1, then it will 
>>>> continue to hit the GET v1/namespaces/{ns}/tables/{table} route, instead 
>>>> of GET v2/namespaces/{ns}/tables/{table}. The v1 route could continue to 
>>>> support the client, or it could throw 406 to indicate that this route is 
>>>> deprecated and the client needs to upgrade.
>>>>
>>>> For initial backwards compatibility, I think not returning anything should 
>>>> mean that all API that the client understands are having version *.
>>>>
>>>> What do people think of it, compared to the tag approach?
>>>>
>>>> Best,
>>>> Jack Ye
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jun 24, 2024 at 1:42 PM Micah Kornfield <emkornfi...@gmail.com> 
>>>> wrote:
>>>>>
>>>>> I don't have strong opinions either way here, just thought it was worth 
>>>>> raising some concerns over possible evolution here.  Some responses 
>>>>> inline, but if capabilities seem to meet the requirement at hand, then it 
>>>>> does potentially seem the simplest mechanism.
>>>>>
>>>>>
>>>>>> I think we also want to avoid relyance on server specific published 
>>>>>> OpenAPI as they may leak other options/parameters/etc.  This may lead to 
>>>>>> confusion around what the canonical spec is and make clients 
>>>>>> incompatible if they're generated off of a non-standard spec document.
>>>>>
>>>>>
>>>>> Yeah, I wasn't proposing necessarily using built in functionality but a 
>>>>> pre-scrubbed document.  Since there is no reference service 
>>>>> implementation for REST it seems like each implementor would need to 
>>>>> describe the best way of scrubbing there description.
>>>>>
>>>>>
>>>>>>
>>>>>> @Micah this sounds to me as if the client would then have to parse a 
>>>>>> bunch of endpoints to figure out whether it's safe to e.g. call loading 
>>>>>> a view or dropping a table on the given REST server. Rather than having 
>>>>>> a dedicated endpoint we're just using the /config endpoint to provide 
>>>>>> information about what a server supports.
>>>>>
>>>>>
>>>>> I was not suggesting multiple endpoints here, simply different contents  
>>>>> for /config I agree in the short term this does add complexity on the 
>>>>> clients. But given that the canonical REST API clients are being 
>>>>> developed into the standard library, I'm not sure how much toil this 
>>>>> would cause in general. This also does not necessarily need to called 
>>>>> up-front but could be called to verify existence vs a permission issue 
>>>>> after an error was received.
>>>>>
>>>>> What round-trips did you have in mind here?
>>>>>
>>>>>
>>>>>> All good points though, but I'm not aware of a standard way to handle 
>>>>>> this.
>>>>>
>>>>>
>>>>> IIUC, this sounds like a standard service description problem to me, the 
>>>>> solution with capabilities appears to be one level abstraction on top of 
>>>>> this.  Service discovery seems like it has been reimplemented a few 
>>>>> different times depending on the technology [1][2][3]
>>>>>
>>>>>
>>>>>> I think versioning adds another level of complexity, but might be 
>>>>>> necessary since I expect these will evolve to some extent and may even 
>>>>>> require hitting versioned urls.
>>>>>
>>>>>
>>>>> If there is no concrete proposal on versioning, I agree it probably pays 
>>>>> to side step this.  The endpoint transitioning from list of strings to 
>>>>> list of objects, would be an obvious sign to clients that they are out of 
>>>>> date.  I think serving a service description(s), despite its complexity, 
>>>>> is likely the most principled way of versioning items appropriately, but 
>>>>> this definitely requires more in depth thought/design.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Micah
>>>>>
>>>>> [1] https://en.wikipedia.org/wiki/Web_Services_Description_Language
>>>>> [2] https://en.wikipedia.org/wiki/Web_Application_Description_Language
>>>>> [3] https://developers.google.com/discovery/v1/reference/apis
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jun 24, 2024 at 12:42 PM Daniel Weeks <dwe...@apache.org> wrote:
>>>>>>
>>>>>> Hey Micah,
>>>>>>
>>>>>> I think what we're trying to achieve is strike a balance between client 
>>>>>> complexity and ability to support multiple server-side capabilities.  
>>>>>> One challenge we've run into is if a client performs an operation (e.g. 
>>>>>> listViews), but receives a 403 code, it's not clear whether the client 
>>>>>> doesn't have access or the server doesn't support an endpoint but isn't 
>>>>>> sending a 404 for security reasons.  This is a simple way for the client 
>>>>>> to understand what it should expect from the server.
>>>>>>
>>>>>> >  Another option would be just list all endpoints . . . and let clients 
>>>>>> > take appropriate actions
>>>>>> > This could be done by vending the OpenAPI spec the server supports at 
>>>>>> > its own endpoint. I think this avoids the future problem of having to 
>>>>>> > classify new endpoints into a specific capability.
>>>>>>
>>>>>> You're right that this would be the most complete way to handle this, 
>>>>>> but it's really complicated and may require additional "handshake" calls 
>>>>>> even for small interactions with the catalog service.  I think this puts 
>>>>>> a lot of onus on the client, when what we're describing is a set of 
>>>>>> endpoints that correspond to a capability.
>>>>>>
>>>>>> I think we also want to avoid relyance on server specific published 
>>>>>> OpenAPI as they may leak other options/parameters/etc.  This may lead to 
>>>>>> confusion around what the canonical spec is and make clients 
>>>>>> incompatible if they're generated off of a non-standard spec document.
>>>>>>
>>>>>> All good points though, but I'm not aware of a standard way to handle 
>>>>>> this.
>>>>>>
>>>>>> I think versioning adds another level of complexity, but might be 
>>>>>> necessary since I expect these will evolve to some extent and may even 
>>>>>> require hitting versioned urls.
>>>>>>
>>>>>> -Dan
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 24, 2024 at 12:03 AM Eduard Tudenhöfner 
>>>>>> <etudenhoef...@apache.org> wrote:
>>>>>>>
>>>>>>> We had a separate discussion with Dan on the oauth2 flag last week and 
>>>>>>> came to the same conclusion that removing the oauth2 capability is 
>>>>>>> probably the best for now.
>>>>>>> This is mainly because we can't really act on the oauth2 capability 
>>>>>>> right now, because the /tokens endpoint is called before we hit the 
>>>>>>> /config endpoint.
>>>>>>>
>>>>>>> > Another option would be just list all endpoints (and maybe even 
>>>>>>> > further which operations are supported) the server actually supports 
>>>>>>> > and let clients take appropriate actions (i.e. grouping could happen 
>>>>>>> > on the client side).  This could be done by vending the OpenAPI spec 
>>>>>>> > the server supports at its own endpoint. I think this avoids the 
>>>>>>> > future problem of having to classify new endpoints into a specific 
>>>>>>> > capability.
>>>>>>>
>>>>>>> @Micah this sounds to me as if the client would then have to parse a 
>>>>>>> bunch of endpoints to figure out whether it's safe to e.g. call loading 
>>>>>>> a view or dropping a table on the given REST server. Rather than having 
>>>>>>> a dedicated endpoint we're just using the /config endpoint to provide 
>>>>>>> information about what a server supports.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Eduard
>>>>>>>
>>>>>>> On Fri, Jun 21, 2024 at 8:27 PM Ryan Blue <b...@databricks.com.invalid> 
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Let's remove the oauth2 tag for now until we figure out how to move 
>>>>>>>> forward there. That makes sense to me.
>>>>>>>>
>>>>>>>> On Fri, Jun 21, 2024 at 9:30 AM Dmitri Bourlatchkov 
>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>>>>
>>>>>>>>> Hi Eduard,
>>>>>>>>>
>>>>>>>>> The capabilities PR looks good to me overall. I have a concern with 
>>>>>>>>> the "oauth2" tag name though.
>>>>>>>>>
>>>>>>>>> I also commented [1] in GH but the comment appears to be closed by 
>>>>>>>>> default :)
>>>>>>>>>
>>>>>>>>> I believe the term "oauth2" is confusing in this context with respect 
>>>>>>>>> to RFC 6749 [2] as discussed in depth on another thread [3]
>>>>>>>>>
>>>>>>>>> The functionality behind the /tokens endpoint is quite specific to 
>>>>>>>>> the Iceberg REST spec and as the other discussion highlights, there 
>>>>>>>>> are concerns with respect to OAuth2 interoperability with other 
>>>>>>>>> OAuth2 servers.
>>>>>>>>>
>>>>>>>>> What do you think about using a different tag name for it, for 
>>>>>>>>> example "local-tokens" or "auth-tokens"?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Dmitri.
>>>>>>>>>
>>>>>>>>> [1] 
>>>>>>>>> https://github.com/apache/iceberg/pull/9940/files/15c769a52b85ac4deff5659978c7ffa7802612b0#r1649173934
>>>>>>>>> [2] https://www.rfc-editor.org/rfc/rfc6749
>>>>>>>>> [3] https://lists.apache.org/thread/twk84xx7v0xy5q5tfd9x5torgr82vv50
>>>>>>>>>
>>>>>>>>> On Thu, Jun 20, 2024 at 7:28 AM Eduard Tudenhoefner 
>>>>>>>>> <etudenhoef...@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>> Hey everyone,
>>>>>>>>>>
>>>>>>>>>> I'd like to bring up the discussion around describing REST server 
>>>>>>>>>> capabilities via the /config endpoint.
>>>>>>>>>> There is PR #9940 that describes the OpenAPI spec changes.
>>>>>>>>>>
>>>>>>>>>> Mainly we'd like to have a capabilities field in the ConfigResponse 
>>>>>>>>>> that allows servers to indicate to clients which capabilities are 
>>>>>>>>>> being supported.
>>>>>>>>>>
>>>>>>>>>> So far we have the following capabilities:
>>>>>>>>>>
>>>>>>>>>> tables
>>>>>>>>>> views
>>>>>>>>>> remote-signing
>>>>>>>>>> vended-credentials
>>>>>>>>>> multi-table-commit
>>>>>>>>>> register-table
>>>>>>>>>> table-metrics
>>>>>>>>>> oauth2
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The general idea behind a capability is that if e.g. a server 
>>>>>>>>>> supports views, then that server must implement all endpoints 
>>>>>>>>>> grouped under that capability.
>>>>>>>>>> It's worth noting that the /config endpoint is currently being 
>>>>>>>>>> implicit (meaning that every REST server would have to implement it).
>>>>>>>>>>
>>>>>>>>>> One discussion point that came up during review is how we want to 
>>>>>>>>>> handle capabilities and backwards compatibility and what the default 
>>>>>>>>>> capability would be, since older servers don't know anything about 
>>>>>>>>>> capabilities (in such a case we could assume that the default 
>>>>>>>>>> capabilities would be oauth2 / tables).
>>>>>>>>>>
>>>>>>>>>> Are there any other capabilities that we'd like to include in the 
>>>>>>>>>> list?
>>>>>>>>>>
>>>>>>>>>> Eduard
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ryan Blue
>>>>>>>> Databricks
>>>
>>> --
>>> Robert Stupp
>>> @snazy

Re: [DISCUSS] Describing REST Server capabilities

Reply via email to