Hi Jack I like Robert's proposal. Back to the topics, I think grouping with tags is more "flexible" (it was what we included in the REST spec proposal as well).
Regards JB On Wed, Jun 26, 2024 at 6:26 PM Jack Ye <yezhao...@gmail.com> wrote: > > It seems like there are 2 sub-topics here: > 1. should we group operations with tags, or should we do this > per-operation/endpoint? > 2. how should we do the capability/versioning for each unit (either per tag > or per operation) > > Shall we first conclude on 1? > > For 1, my take is that we will need to do it per operation, for 2 reasons: > > (1) There are many REST services that would only implement a very small set > of APIs, such as just loadTable and loadView. Some will choose to not > implement very specific endpoints, such as renameTable. Tags seems convenient > but it is mandating people to implement a specific group of APIs together, > which is a lot of burdens for especially small organizations, if they just > want to support very specific goals like reading through IRC. > > (2) Suppose a new tag is added in the future, the server returns that tag, > but an older client does not understand it, it might cause mistakes in the > client's understanding of what is supported and what is not, when a tag > contains both features in existing APIs and also new APIs. If we define that > tags do not overlap with each other, this is probably not a concern. However, > (1) still is a problem from a usability perspective. > > Best, > Jack Ye > > > > > On Wed, Jun 26, 2024 at 9:02 AM Daniel Weeks <dwe...@apache.org> wrote: >> >> I think Robert's approach is a reasonable compromise here. >> >> If we wanted a "per operation/endpoint" versioning, I think I'd prefer >> Micah's OpenAPI spec based approach because it's more standardized, but I >> feel adds a lot of client complexity. >> >> -Dan >> >> >> >> On Wed, Jun 26, 2024 at 6:59 AM Robert Stupp <sn...@snazy.de> wrote: >>> >>> (I think, compatibility deserves a separate thread - it's a "huge" topic) >>> >>> Based on experience, we decided on the following with Nessie: >>> >>> Unknown fields/attributes in a structure _DO_ cause (de)serialization >>> failures. >>> "Stable API versions" - endpoint additions and/or added query parameters >>> and/or enhanced structures do _NOT_ require a new API version (as in the >>> endpoint's route/path). >>> "Flexible spec versions" - new and updated "capabilities" however might >>> cause a bump in the "spec version" that the server announces in its >>> `getConfig` result. >>> >>> Adding new routes/paths may require new endpoint implementations on the >>> server side, which can easily lead to a lot of (unnecessarily boilerplate) >>> code. Using different routes/paths is justified if the API is changed >>> "fundamentally". We call the "path component" (api/v1/..., api/v2/...) API >>> version - the server indicates the minimum and maximum supported API >>> version, in case a client wants to "upgrade". I recommend to _not_ bump the >>> API version in the route/path if it's not really necessary. >>> >>> Regarding the requirement to fail on unknown attributes: Unknown attributes >>> may contain important information. A client may send a newer version of a >>> request object with an important new field, but the (older) server discards >>> the new attribute. Think of an attribute that for example defines a "commit >>> condition" that the client expects to be respected. "New" attributes must >>> be omittable (e.g. don't serialize if null/default) - clients indicate the >>> "usage" of an added attribute using some request attribute (for example: >>> "boolean returnExtendedInformation"). >>> >>> The list of capabilities can be indicated with included "spec versions", to >>> tell clients which features/functionalities a server supports."Production" >>> spec versions could start with 1, and "reserve" 0 for >>> experimental/unsupported/poc kind of implementation. It could look like >>> this: >>> capabilities: [ >>> "table-spec/2,3", // but not table-spec v1 here >>> "view-spec/1", >>> "table-api/1", >>> "view-api/1", >>> "udf-api/1", >>> "super-feature/2,4,6", // but not spec versions 0,1,3,5,7+ >>> ... >>> ] >>> Incrementing a spec version in the list of capabilities doesn't break any >>> client. We could also define a structure to describe each capability: >>> components: >>> schemas: >>> Capability: >>> name: >>> type: string >>> description: Name of the capability >>> versions: >>> type: array: >>> description: List of supported spec versions of this capability. >>> 0 means experimental (non-production) without any guarantees about the >>> stability of schema for request and response parameters. >>> items: >>> type: integer >>> format: int32 >>> >>> In Nessie, we ensure backwards and forwards compatibility using a >>> specialized test suite that runs the "in tree" client against older server >>> versions and older client versions against the "in tree" server version. It >>> works fine for us for a few years now - and it did help preventing >>> compatibility issues. >>> >>> >>> On 26.06.24 07:44, Péter Váry wrote: >>> >>> Hi everyone, >>> >>> A few considerations: >>> - I think we should explicitly state which client/service interoperability >>> we are aiming for. I expect that we want to support both old client -> new >>> server, and new client -> old server communications. >>> - I agree with Jack, that we should think about versions in advance - HMS >>> tried to be backwards compatible for everything, and that made it hard to >>> move forward / deprecate things. >>> - Still we should try to keep the backwards incompatible changes minimal. >>> (All clients should be able to ignore unknown incoming fields / New >>> optional input parameter should drive new features / Try to avoid enums in >>> responses where we expect changes (?)) >>> - OTOH, it could be important for clients to know which of the backwards >>> compatible changes are implemented for the given server - so I would >>> decouple the URI from the versioning. Maybe major version change should >>> (could) change the URI, but backwards compatible changes should be served >>> on the same URI, but could be identified by different minor versions. >>> >>> This is exciting stuff! >>> Thanks for pushing this forward! >>> >>> Peter >>> >>> >>> On Wed, Jun 26, 2024, 00:15 Jack Ye <yezhao...@gmail.com> wrote: >>>> >>>> Hi everyone, >>>> >>>> I feel I do not see a good answer to why not just simply version each API? >>>> When using tag, it means I have to offer capabilities per-tagged group. >>>> However, I could for example just offer loadTable and nothing else in a >>>> catalog, and that should still be Iceberg REST compliant. And I think we >>>> need a versioning story anyway, there is no way around it. >>>> >>>> Here is the workflow in my mind with versioning: >>>> >>>> 1. Going forward, every time the REST catalog spec introduces any new API >>>> endpoints or backwards incompatible changes to the existing APIs, the >>>> version of the specific API is incremented. So suppose the PlanTable API >>>> is added, this API will be at version v1. Suppose UpdateTable is updated >>>> with a new update type, that API will be at version v2, but PlanTable will >>>> remain at v1. >>>> >>>> 2. a catalog must implement getConfig. This API is the only one that is >>>> required. >>>> >>>> 3. in getConfig, in the defaults map (it could be in some new metadata >>>> structure, but since we want strong backwards compatibility guarantee, >>>> reusing string maps seems to be the best way), server returns key-value >>>> pairs of: >>>> - key: operation:<operationName> >>>> - value: version number >>>> >>>> 4. the client assumes that the map is ordered, and resolves API versions >>>> sequentially. For example, suppose I have the following map: >>>> >>>> { "operation:planTable": "1", "operation:loadTable": "2" } >>>> >>>> Note that by "supporting", it means to return a response in a predictable >>>> way that is compliant with the spec. It can also return 406 >>>> UnsupportedOperation as a way to support it. >>>> >>>> There is also a special version *, that means any version can work. >>>> >>>> 5. Backwards compatibility: suppose the client is at a higher version than >>>> the server, then the client should always be able to understand the >>>> server's full list of capabilities. >>>> >>>> 6. Forward compatibility: suppose the client is at a lower version than >>>> the server, then the client should parse whatever operation it >>>> understands, and use the highest version it could support to execute the >>>> operation. Suppose the client only supports loadTable v1, then it will >>>> continue to hit the GET v1/namespaces/{ns}/tables/{table} route, instead >>>> of GET v2/namespaces/{ns}/tables/{table}. The v1 route could continue to >>>> support the client, or it could throw 406 to indicate that this route is >>>> deprecated and the client needs to upgrade. >>>> >>>> For initial backwards compatibility, I think not returning anything should >>>> mean that all API that the client understands are having version *. >>>> >>>> What do people think of it, compared to the tag approach? >>>> >>>> Best, >>>> Jack Ye >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Jun 24, 2024 at 1:42 PM Micah Kornfield <emkornfi...@gmail.com> >>>> wrote: >>>>> >>>>> I don't have strong opinions either way here, just thought it was worth >>>>> raising some concerns over possible evolution here. Some responses >>>>> inline, but if capabilities seem to meet the requirement at hand, then it >>>>> does potentially seem the simplest mechanism. >>>>> >>>>> >>>>>> I think we also want to avoid relyance on server specific published >>>>>> OpenAPI as they may leak other options/parameters/etc. This may lead to >>>>>> confusion around what the canonical spec is and make clients >>>>>> incompatible if they're generated off of a non-standard spec document. >>>>> >>>>> >>>>> Yeah, I wasn't proposing necessarily using built in functionality but a >>>>> pre-scrubbed document. Since there is no reference service >>>>> implementation for REST it seems like each implementor would need to >>>>> describe the best way of scrubbing there description. >>>>> >>>>> >>>>>> >>>>>> @Micah this sounds to me as if the client would then have to parse a >>>>>> bunch of endpoints to figure out whether it's safe to e.g. call loading >>>>>> a view or dropping a table on the given REST server. Rather than having >>>>>> a dedicated endpoint we're just using the /config endpoint to provide >>>>>> information about what a server supports. >>>>> >>>>> >>>>> I was not suggesting multiple endpoints here, simply different contents >>>>> for /config I agree in the short term this does add complexity on the >>>>> clients. But given that the canonical REST API clients are being >>>>> developed into the standard library, I'm not sure how much toil this >>>>> would cause in general. This also does not necessarily need to called >>>>> up-front but could be called to verify existence vs a permission issue >>>>> after an error was received. >>>>> >>>>> What round-trips did you have in mind here? >>>>> >>>>> >>>>>> All good points though, but I'm not aware of a standard way to handle >>>>>> this. >>>>> >>>>> >>>>> IIUC, this sounds like a standard service description problem to me, the >>>>> solution with capabilities appears to be one level abstraction on top of >>>>> this. Service discovery seems like it has been reimplemented a few >>>>> different times depending on the technology [1][2][3] >>>>> >>>>> >>>>>> I think versioning adds another level of complexity, but might be >>>>>> necessary since I expect these will evolve to some extent and may even >>>>>> require hitting versioned urls. >>>>> >>>>> >>>>> If there is no concrete proposal on versioning, I agree it probably pays >>>>> to side step this. The endpoint transitioning from list of strings to >>>>> list of objects, would be an obvious sign to clients that they are out of >>>>> date. I think serving a service description(s), despite its complexity, >>>>> is likely the most principled way of versioning items appropriately, but >>>>> this definitely requires more in depth thought/design. >>>>> >>>>> >>>>> Thanks, >>>>> Micah >>>>> >>>>> [1] https://en.wikipedia.org/wiki/Web_Services_Description_Language >>>>> [2] https://en.wikipedia.org/wiki/Web_Application_Description_Language >>>>> [3] https://developers.google.com/discovery/v1/reference/apis >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Jun 24, 2024 at 12:42 PM Daniel Weeks <dwe...@apache.org> wrote: >>>>>> >>>>>> Hey Micah, >>>>>> >>>>>> I think what we're trying to achieve is strike a balance between client >>>>>> complexity and ability to support multiple server-side capabilities. >>>>>> One challenge we've run into is if a client performs an operation (e.g. >>>>>> listViews), but receives a 403 code, it's not clear whether the client >>>>>> doesn't have access or the server doesn't support an endpoint but isn't >>>>>> sending a 404 for security reasons. This is a simple way for the client >>>>>> to understand what it should expect from the server. >>>>>> >>>>>> > Another option would be just list all endpoints . . . and let clients >>>>>> > take appropriate actions >>>>>> > This could be done by vending the OpenAPI spec the server supports at >>>>>> > its own endpoint. I think this avoids the future problem of having to >>>>>> > classify new endpoints into a specific capability. >>>>>> >>>>>> You're right that this would be the most complete way to handle this, >>>>>> but it's really complicated and may require additional "handshake" calls >>>>>> even for small interactions with the catalog service. I think this puts >>>>>> a lot of onus on the client, when what we're describing is a set of >>>>>> endpoints that correspond to a capability. >>>>>> >>>>>> I think we also want to avoid relyance on server specific published >>>>>> OpenAPI as they may leak other options/parameters/etc. This may lead to >>>>>> confusion around what the canonical spec is and make clients >>>>>> incompatible if they're generated off of a non-standard spec document. >>>>>> >>>>>> All good points though, but I'm not aware of a standard way to handle >>>>>> this. >>>>>> >>>>>> I think versioning adds another level of complexity, but might be >>>>>> necessary since I expect these will evolve to some extent and may even >>>>>> require hitting versioned urls. >>>>>> >>>>>> -Dan >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Jun 24, 2024 at 12:03 AM Eduard Tudenhöfner >>>>>> <etudenhoef...@apache.org> wrote: >>>>>>> >>>>>>> We had a separate discussion with Dan on the oauth2 flag last week and >>>>>>> came to the same conclusion that removing the oauth2 capability is >>>>>>> probably the best for now. >>>>>>> This is mainly because we can't really act on the oauth2 capability >>>>>>> right now, because the /tokens endpoint is called before we hit the >>>>>>> /config endpoint. >>>>>>> >>>>>>> > Another option would be just list all endpoints (and maybe even >>>>>>> > further which operations are supported) the server actually supports >>>>>>> > and let clients take appropriate actions (i.e. grouping could happen >>>>>>> > on the client side). This could be done by vending the OpenAPI spec >>>>>>> > the server supports at its own endpoint. I think this avoids the >>>>>>> > future problem of having to classify new endpoints into a specific >>>>>>> > capability. >>>>>>> >>>>>>> @Micah this sounds to me as if the client would then have to parse a >>>>>>> bunch of endpoints to figure out whether it's safe to e.g. call loading >>>>>>> a view or dropping a table on the given REST server. Rather than having >>>>>>> a dedicated endpoint we're just using the /config endpoint to provide >>>>>>> information about what a server supports. >>>>>>> >>>>>>> Thanks >>>>>>> Eduard >>>>>>> >>>>>>> On Fri, Jun 21, 2024 at 8:27 PM Ryan Blue <b...@databricks.com.invalid> >>>>>>> wrote: >>>>>>>> >>>>>>>> Let's remove the oauth2 tag for now until we figure out how to move >>>>>>>> forward there. That makes sense to me. >>>>>>>> >>>>>>>> On Fri, Jun 21, 2024 at 9:30 AM Dmitri Bourlatchkov >>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>>>> >>>>>>>>> Hi Eduard, >>>>>>>>> >>>>>>>>> The capabilities PR looks good to me overall. I have a concern with >>>>>>>>> the "oauth2" tag name though. >>>>>>>>> >>>>>>>>> I also commented [1] in GH but the comment appears to be closed by >>>>>>>>> default :) >>>>>>>>> >>>>>>>>> I believe the term "oauth2" is confusing in this context with respect >>>>>>>>> to RFC 6749 [2] as discussed in depth on another thread [3] >>>>>>>>> >>>>>>>>> The functionality behind the /tokens endpoint is quite specific to >>>>>>>>> the Iceberg REST spec and as the other discussion highlights, there >>>>>>>>> are concerns with respect to OAuth2 interoperability with other >>>>>>>>> OAuth2 servers. >>>>>>>>> >>>>>>>>> What do you think about using a different tag name for it, for >>>>>>>>> example "local-tokens" or "auth-tokens"? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Dmitri. >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> https://github.com/apache/iceberg/pull/9940/files/15c769a52b85ac4deff5659978c7ffa7802612b0#r1649173934 >>>>>>>>> [2] https://www.rfc-editor.org/rfc/rfc6749 >>>>>>>>> [3] https://lists.apache.org/thread/twk84xx7v0xy5q5tfd9x5torgr82vv50 >>>>>>>>> >>>>>>>>> On Thu, Jun 20, 2024 at 7:28 AM Eduard Tudenhoefner >>>>>>>>> <etudenhoef...@apache.org> wrote: >>>>>>>>>> >>>>>>>>>> Hey everyone, >>>>>>>>>> >>>>>>>>>> I'd like to bring up the discussion around describing REST server >>>>>>>>>> capabilities via the /config endpoint. >>>>>>>>>> There is PR #9940 that describes the OpenAPI spec changes. >>>>>>>>>> >>>>>>>>>> Mainly we'd like to have a capabilities field in the ConfigResponse >>>>>>>>>> that allows servers to indicate to clients which capabilities are >>>>>>>>>> being supported. >>>>>>>>>> >>>>>>>>>> So far we have the following capabilities: >>>>>>>>>> >>>>>>>>>> tables >>>>>>>>>> views >>>>>>>>>> remote-signing >>>>>>>>>> vended-credentials >>>>>>>>>> multi-table-commit >>>>>>>>>> register-table >>>>>>>>>> table-metrics >>>>>>>>>> oauth2 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The general idea behind a capability is that if e.g. a server >>>>>>>>>> supports views, then that server must implement all endpoints >>>>>>>>>> grouped under that capability. >>>>>>>>>> It's worth noting that the /config endpoint is currently being >>>>>>>>>> implicit (meaning that every REST server would have to implement it). >>>>>>>>>> >>>>>>>>>> One discussion point that came up during review is how we want to >>>>>>>>>> handle capabilities and backwards compatibility and what the default >>>>>>>>>> capability would be, since older servers don't know anything about >>>>>>>>>> capabilities (in such a case we could assume that the default >>>>>>>>>> capabilities would be oauth2 / tables). >>>>>>>>>> >>>>>>>>>> Are there any other capabilities that we'd like to include in the >>>>>>>>>> list? >>>>>>>>>> >>>>>>>>>> Eduard >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ryan Blue >>>>>>>> Databricks >>> >>> -- >>> Robert Stupp >>> @snazy