IMO that would be a list of "capability" to "set of versions" tuples. The reason to have a "set of (integer) version" is that you have to plan for the future, now.

I also think we do need "logical" capabilities to express for example which table/view/etc specs a service supports and to express which request /response schema (query params + request/response object params) a service supports. Not doing that will put clients in the spot of guessing whether a service supports a specific feature of an existing functionality, something that's been added after the functionality has been introduced.

Whether we need a capability for each and everything ... I suspect that depends on the actual feature/functionality/change. Many things can be designed in a backwards/forwards compatible way and don't deserve a (REST) spec/capability version bump.

Repeating my point that the service must really fail when it encounters an unknown attribute: Imagine there's a new table-requirement - that has to be reflected using the version of a capability. A service should really not just ignore an unknown capability, otherwise important or data correctness issues will occur.

For new versions that are still in development, it's possibly the easiest to not do anything special, not even announce the new version. Whether we indicate some "beta version" or reserve "Integer.MAX_VALUE" doesn't really help anyways - client and server can have a completely different understanding of "that particular" beta version. I.e. if a client or service use "in development" functionality, it's up to them - whether it works or not.

On 27.06.24 17:55, Jack Ye wrote:
I feel Alex is already tapping into the more complex territory I do not want to go into, because as he says, a "capability" is logical, and it can be a set of overlapping endpoints, small features in some endpoints, etc. We already saw that in the original PR we tried to say "pagination" is a capability, but it is really just a very small feature of an endpoint, and it might also evolve on its own to extend to other endpoints in the future, and maybe one endpoint supports it and one does not...

My fear is that we are getting into the business of defining things that are totally unnecessary. It takes our energy to define that and debate the boundary of each capability, but in the end what does it buy us? As Eduard says, you still need to have documentation to explain what "partial capabilities" are supported for a catalog, and people are supposed to read the documentation to not do the unsupported things.

In the end, the client-server needs to understand exactly (1) what endpoints can be invoked, and (2) using what request-response schema, that is the key to me. If it means returning a response with 20-30ish hard-coded entries, and the client is configured based on that, that seems totally reasonable to me.

-Jack





On Thu, Jun 27, 2024 at 7:58 AM Alex Dutra <alex.du...@dremio.com.invalid> wrote:

    Hi all,

    So far we've been thinking of capabilities as equivalent to a set
    of endpoints.

    That's a rather technical definition. It also brings one important
    limitation: one endpoint can only be "governed" by one capability.

    Granted, most capabilities do require implementing specific
    endpoints. But I wonder if, for the sake of being future-proof, we
    shouldn't broaden the meaning of that term to embrace /logical/ or
    /behavioral/ concepts as well.

    One example that comes to mind: a REST catalog implementor may
    choose to implement the transactions-commit endpoint to fully
    comply with the "tables" capability; but for performance reasons,
    or simply because it's too complex, they could opt for rejecting
    multi-table commits (iow, if a CommitTransactionRequest contains
    one single CommitTableRequest, that's fine, otherwise, the
    endpoint would return an error). It would be nice to express that
    as a capability: this way the client knows that it is safe to call
    the transactions-commit endpoint, but with one CommitTableRequest
    at a time.

    Such a capability would not be defined by a specific endpoint, but
    rather, would influence the behavior exhibited by certain endpoints.

    Thanks,

    Alex

    On Thu, Jun 27, 2024 at 11:34 AM Jean-Baptiste Onofré
    <j...@nanthrax.net> wrote:

        Hi Jack

        I like Robert's proposal. Back to the topics, I think grouping
        with
        tags is more "flexible" (it was what we included in the REST spec
        proposal as well).

        Regards
        JB

        On Wed, Jun 26, 2024 at 6:26 PM Jack Ye <yezhao...@gmail.com>
        wrote:
        >
        > It seems like there are 2 sub-topics here:
        > 1. should we group operations with tags, or should we do
        this per-operation/endpoint?
        > 2. how should we do the capability/versioning for each unit
        (either per tag or per operation)
        >
        > Shall we first conclude on 1?
        >
        > For 1, my take is that we will need to do it per operation,
        for 2 reasons:
        >
        > (1) There are many REST services that would only implement a
        very small set of APIs, such as just loadTable and loadView.
        Some will choose to not implement very specific endpoints,
        such as renameTable. Tags seems convenient but it is mandating
        people to implement a specific group of APIs together, which
        is a lot of burdens for especially small organizations, if
        they just want to support very specific goals like reading
        through IRC.
        >
        > (2) Suppose a new tag is added in the future, the server
        returns that tag, but an older client does not understand it,
        it might cause mistakes in the client's understanding of what
        is supported and what is not, when a tag contains both
        features in existing APIs and also new APIs. If we define that
        tags do not overlap with each other, this is probably not a
        concern. However, (1) still is a problem from a usability
        perspective.
        >
        > Best,
        > Jack Ye
        >
        >
        >
        >
        > On Wed, Jun 26, 2024 at 9:02 AM Daniel Weeks
        <dwe...@apache.org> wrote:
        >>
        >> I think Robert's approach is a reasonable compromise here.
        >>
        >> If we wanted a "per operation/endpoint" versioning, I think
        I'd prefer Micah's OpenAPI spec based approach because it's
        more standardized, but I feel adds a lot of client complexity.
        >>
        >> -Dan
        >>
        >>
        >>
        >> On Wed, Jun 26, 2024 at 6:59 AM Robert Stupp
        <sn...@snazy.de> wrote:
        >>>
        >>> (I think, compatibility deserves a separate thread - it's
        a "huge" topic)
        >>>
        >>> Based on experience, we decided on the following with Nessie:
        >>>
        >>> Unknown fields/attributes in a structure _DO_ cause
        (de)serialization failures.
        >>> "Stable API versions" - endpoint additions and/or added
        query parameters and/or enhanced structures do _NOT_ require a
        new API version (as in the endpoint's route/path).
        >>> "Flexible spec versions" - new and updated "capabilities"
        however might cause a bump in the "spec version" that the
        server announces in its `getConfig` result.
        >>>
        >>> Adding new routes/paths may require new endpoint
        implementations on the server side, which can easily lead to a
        lot of (unnecessarily boilerplate) code. Using different
        routes/paths is justified if the API is changed
        "fundamentally". We call the "path component" (api/v1/...,
        api/v2/...) API version - the server indicates the minimum and
        maximum supported API version, in case a client wants to
        "upgrade". I recommend to _not_ bump the API version in the
        route/path if it's not really necessary.
        >>>
        >>> Regarding the requirement to fail on unknown attributes:
        Unknown attributes may contain important information. A client
        may send a newer version of a request object with an important
        new field, but the (older) server discards the new attribute.
        Think of an attribute that for example defines a "commit
        condition" that the client expects to be respected. "New"
        attributes must be omittable (e.g. don't serialize if
        null/default) - clients indicate the "usage" of an added
        attribute using some request attribute (for example: "boolean
        returnExtendedInformation").
        >>>
        >>> The list of capabilities can be indicated with included
        "spec versions", to tell clients which
        features/functionalities a server supports."Production" spec
        versions could start with 1, and "reserve" 0 for
        experimental/unsupported/poc kind of implementation. It could
        look like this:
        >>>   capabilities: [
        >>>     "table-spec/2,3",   // but not table-spec v1 here
        >>>     "view-spec/1",
        >>>     "table-api/1",
        >>>     "view-api/1",
        >>>     "udf-api/1",
        >>>     "super-feature/2,4,6",   // but not spec versions
        0,1,3,5,7+
        >>>     ...
        >>>   ]
        >>> Incrementing a spec version in the list of capabilities
        doesn't break any client. We could also define a structure to
        describe each capability:
        >>>   components:
        >>>     schemas:
        >>>       Capability:
        >>>         name:
        >>>           type: string
        >>>           description: Name of the capability
        >>>         versions:
        >>>           type: array:
        >>>           description: List of supported spec versions of
        this capability. 0 means experimental (non-production) without
        any guarantees about the stability of schema for request and
        response parameters.
        >>>           items:
        >>>             type: integer
        >>>             format: int32
        >>>
        >>> In Nessie, we ensure backwards and forwards compatibility
        using a specialized test suite that runs the "in tree" client
        against older server versions and older client versions
        against the "in tree" server version. It works fine for us for
        a few years now - and it did help preventing compatibility issues.
        >>>
        >>>
        >>> On 26.06.24 07:44, Péter Váry wrote:
        >>>
        >>> Hi everyone,
        >>>
        >>> A few considerations:
        >>> - I think we should explicitly state which client/service
        interoperability we are aiming for. I expect that we want to
        support both old client -> new server, and new client -> old
        server communications.
        >>> - I agree with Jack, that we should think about versions
        in advance - HMS tried to be backwards compatible for
        everything, and that made it hard to move forward / deprecate
        things.
        >>> - Still we should try to keep the backwards incompatible
        changes minimal. (All clients should be able to ignore unknown
        incoming fields / New optional input parameter should drive
        new features / Try to avoid enums in responses where we expect
        changes (?))
        >>> - OTOH, it could be important for clients to know which of
        the backwards compatible changes are implemented for the given
        server - so I would decouple the URI from the versioning.
        Maybe major version change should (could) change the URI, but
        backwards compatible changes should be served on the same URI,
        but could be identified by different minor versions.
        >>>
        >>> This is exciting stuff!
        >>> Thanks for pushing this forward!
        >>>
        >>> Peter
        >>>
        >>>
        >>> On Wed, Jun 26, 2024, 00:15 Jack Ye <yezhao...@gmail.com>
        wrote:
        >>>>
        >>>> Hi everyone,
        >>>>
        >>>> I feel I do not see a good answer to why not just simply
        version each API? When using tag, it means I have to offer
        capabilities per-tagged group. However, I could for example
        just offer loadTable and nothing else in a catalog, and that
        should still be Iceberg REST compliant. And I think we need a
        versioning story anyway, there is no way around it.
        >>>>
        >>>> Here is the workflow in my mind with versioning:
        >>>>
        >>>> 1. Going forward, every time the REST catalog spec
        introduces any new API endpoints or backwards incompatible
        changes to the existing APIs, the version of the specific API
        is incremented. So suppose the PlanTable API is added, this
        API will be at version v1. Suppose UpdateTable is updated with
        a new update type, that API will be at version v2, but
        PlanTable will remain at v1.
        >>>>
        >>>> 2. a catalog must implement getConfig. This API is the
        only one that is required.
        >>>>
        >>>> 3. in getConfig, in the defaults map (it could be in some
        new metadata structure, but since we want strong backwards
        compatibility guarantee, reusing string maps seems to be the
        best way), server returns key-value pairs of:
        >>>> - key: operation:<operationName>
        >>>> - value: version number
        >>>>
        >>>> 4. the client assumes that the map is ordered, and
        resolves API versions sequentially. For example, suppose I
        have the following map:
        >>>>
        >>>> { "operation:planTable": "1", "operation:loadTable": "2" }
        >>>>
        >>>> Note that by "supporting", it means to return a response
        in a predictable way that is compliant with the spec. It can
        also return 406 UnsupportedOperation as a way to support it.
        >>>>
        >>>> There is also a special version *, that means any version
        can work.
        >>>>
        >>>> 5. Backwards compatibility: suppose the client is at a
        higher version than the server, then the client should always
        be able to understand the server's full list of capabilities.
        >>>>
        >>>> 6. Forward compatibility: suppose the client is at a
        lower version than the server, then the client should parse
        whatever operation it understands, and use the highest version
        it could support to execute the operation. Suppose the client
        only supports loadTable v1, then it will continue to hit the
        GET v1/namespaces/{ns}/tables/{table} route, instead of GET
        v2/namespaces/{ns}/tables/{table}. The v1 route could continue
        to support the client, or it could throw 406 to indicate that
        this route is deprecated and the client needs to upgrade.
        >>>>
        >>>> For initial backwards compatibility, I think not
        returning anything should mean that all API that the client
        understands are having version *.
        >>>>
        >>>> What do people think of it, compared to the tag approach?
        >>>>
        >>>> Best,
        >>>> Jack Ye
        >>>>
        >>>>
        >>>>
        >>>>
        >>>>
        >>>>
        >>>>
        >>>>
        >>>>
        >>>>
        >>>>
        >>>>
        >>>>
        >>>>
        >>>>
        >>>>
        >>>>
        >>>>
        >>>>
        >>>> On Mon, Jun 24, 2024 at 1:42 PM Micah Kornfield
        <emkornfi...@gmail.com> wrote:
        >>>>>
        >>>>> I don't have strong opinions either way here, just
        thought it was worth raising some concerns over possible
        evolution here.  Some responses inline, but if capabilities
        seem to meet the requirement at hand, then it does potentially
        seem the simplest mechanism.
        >>>>>
        >>>>>
        >>>>>> I think we also want to avoid relyance on server
        specific published OpenAPI as they may leak other
        options/parameters/etc.  This may lead to confusion around
        what the canonical spec is and make clients incompatible if
        they're generated off of a non-standard spec document.
        >>>>>
        >>>>>
        >>>>> Yeah, I wasn't proposing necessarily using built in
        functionality but a pre-scrubbed document. Since there is no
        reference service implementation for REST it seems like each
        implementor would need to describe the best way of scrubbing
        there description.
        >>>>>
        >>>>>
        >>>>>>
        >>>>>> @Micah this sounds to me as if the client would then
        have to parse a bunch of endpoints to figure out whether it's
        safe to e.g. call loading a view or dropping a table on the
        given REST server. Rather than having a dedicated endpoint
        we're just using the /config endpoint to provide information
        about what a server supports.
        >>>>>
        >>>>>
        >>>>> I was not suggesting multiple endpoints here, simply
        different contents  for /config I agree in the short term this
        does add complexity on the clients. But given that the
        canonical REST API clients are being developed into the
        standard library, I'm not sure how much toil this would cause
        in general. This also does not necessarily need to called
        up-front but could be called to verify existence vs a
        permission issue after an error was received.
        >>>>>
        >>>>> What round-trips did you have in mind here?
        >>>>>
        >>>>>
        >>>>>> All good points though, but I'm not aware of a standard
        way to handle this.
        >>>>>
        >>>>>
        >>>>> IIUC, this sounds like a standard service description
        problem to me, the solution with capabilities appears to be
        one level abstraction on top of this.  Service discovery seems
        like it has been reimplemented a few different times depending
        on the technology [1][2][3]
        >>>>>
        >>>>>
        >>>>>> I think versioning adds another level of complexity,
        but might be necessary since I expect these will evolve to
        some extent and may even require hitting versioned urls.
        >>>>>
        >>>>>
        >>>>> If there is no concrete proposal on versioning, I agree
        it probably pays to side step this. The endpoint transitioning
        from list of strings to list of objects, would be an obvious
        sign to clients that they are out of date.  I think serving a
        service description(s), despite its complexity, is likely the
        most principled way of versioning items appropriately, but
        this definitely requires more in depth thought/design.
        >>>>>
        >>>>>
        >>>>> Thanks,
        >>>>> Micah
        >>>>>
        >>>>> [1]
        https://en.wikipedia.org/wiki/Web_Services_Description_Language
        >>>>> [2]
        https://en.wikipedia.org/wiki/Web_Application_Description_Language
        >>>>> [3]
        https://developers.google.com/discovery/v1/reference/apis
        >>>>>
        >>>>>
        >>>>>
        >>>>>
        >>>>> On Mon, Jun 24, 2024 at 12:42 PM Daniel Weeks
        <dwe...@apache.org> wrote:
        >>>>>>
        >>>>>> Hey Micah,
        >>>>>>
        >>>>>> I think what we're trying to achieve is strike a
        balance between client complexity and ability to support
        multiple server-side capabilities.  One challenge we've run
        into is if a client performs an operation (e.g. listViews),
        but receives a 403 code, it's not clear whether the client
        doesn't have access or the server doesn't support an endpoint
        but isn't sending a 404 for security reasons.  This is a
        simple way for the client to understand what it should expect
        from the server.
        >>>>>>
        >>>>>> >  Another option would be just list all endpoints . .
        . and let clients take appropriate actions
        >>>>>> > This could be done by vending the OpenAPI spec the
        server supports at its own endpoint. I think this avoids the
        future problem of having to classify new endpoints into a
        specific capability.
        >>>>>>
        >>>>>> You're right that this would be the most complete way
        to handle this, but it's really complicated and may require
        additional "handshake" calls even for small interactions with
        the catalog service.  I think this puts a lot of onus on the
        client, when what we're describing is a set of endpoints that
        correspond to a capability.
        >>>>>>
        >>>>>> I think we also want to avoid relyance on server
        specific published OpenAPI as they may leak other
        options/parameters/etc.  This may lead to confusion around
        what the canonical spec is and make clients incompatible if
        they're generated off of a non-standard spec document.
        >>>>>>
        >>>>>> All good points though, but I'm not aware of a standard
        way to handle this.
        >>>>>>
        >>>>>> I think versioning adds another level of complexity,
        but might be necessary since I expect these will evolve to
        some extent and may even require hitting versioned urls.
        >>>>>>
        >>>>>> -Dan
        >>>>>>
        >>>>>>
        >>>>>>
        >>>>>>
        >>>>>> On Mon, Jun 24, 2024 at 12:03 AM Eduard Tudenhöfner
        <etudenhoef...@apache.org> wrote:
        >>>>>>>
        >>>>>>> We had a separate discussion with Dan on the oauth2
        flag last week and came to the same conclusion that removing
        the oauth2 capability is probably the best for now.
        >>>>>>> This is mainly because we can't really act on the
        oauth2 capability right now, because the /tokens endpoint is
        called before we hit the /config endpoint.
        >>>>>>>
        >>>>>>> > Another option would be just list all endpoints (and
        maybe even further which operations are supported) the server
        actually supports and let clients take appropriate actions
        (i.e. grouping could happen on the client side).  This could
        be done by vending the OpenAPI spec the server supports at its
        own endpoint. I think this avoids the future problem of having
        to classify new endpoints into a specific capability.
        >>>>>>>
        >>>>>>> @Micah this sounds to me as if the client would then
        have to parse a bunch of endpoints to figure out whether it's
        safe to e.g. call loading a view or dropping a table on the
        given REST server. Rather than having a dedicated endpoint
        we're just using the /config endpoint to provide information
        about what a server supports.
        >>>>>>>
        >>>>>>> Thanks
        >>>>>>> Eduard
        >>>>>>>
        >>>>>>> On Fri, Jun 21, 2024 at 8:27 PM Ryan Blue
        <b...@databricks.com.invalid> wrote:
        >>>>>>>>
        >>>>>>>> Let's remove the oauth2 tag for now until we figure
        out how to move forward there. That makes sense to me.
        >>>>>>>>
        >>>>>>>> On Fri, Jun 21, 2024 at 9:30 AM Dmitri Bourlatchkov
        <dmitri.bourlatch...@dremio.com.invalid> wrote:
        >>>>>>>>>
        >>>>>>>>> Hi Eduard,
        >>>>>>>>>
        >>>>>>>>> The capabilities PR looks good to me overall. I have
        a concern with the "oauth2" tag name though.
        >>>>>>>>>
        >>>>>>>>> I also commented [1] in GH but the comment appears
        to be closed by default :)
        >>>>>>>>>
        >>>>>>>>> I believe the term "oauth2" is confusing in this
        context with respect to RFC 6749 [2] as discussed in depth on
        another thread [3]
        >>>>>>>>>
        >>>>>>>>> The functionality behind the /tokens endpoint is
        quite specific to the Iceberg REST spec and as the other
        discussion highlights, there are concerns with respect to
        OAuth2 interoperability with other OAuth2 servers.
        >>>>>>>>>
        >>>>>>>>> What do you think about using a different tag name
        for it, for example "local-tokens" or "auth-tokens"?
        >>>>>>>>>
        >>>>>>>>> Thanks,
        >>>>>>>>> Dmitri.
        >>>>>>>>>
        >>>>>>>>> [1]
        
https://github.com/apache/iceberg/pull/9940/files/15c769a52b85ac4deff5659978c7ffa7802612b0#r1649173934
        >>>>>>>>> [2] https://www.rfc-editor.org/rfc/rfc6749
        >>>>>>>>> [3]
        https://lists.apache.org/thread/twk84xx7v0xy5q5tfd9x5torgr82vv50
        >>>>>>>>>
        >>>>>>>>> On Thu, Jun 20, 2024 at 7:28 AM Eduard Tudenhoefner
        <etudenhoef...@apache.org> wrote:
        >>>>>>>>>>
        >>>>>>>>>> Hey everyone,
        >>>>>>>>>>
        >>>>>>>>>> I'd like to bring up the discussion around
        describing REST server capabilities via the /config endpoint.
        >>>>>>>>>> There is PR #9940 that describes the OpenAPI spec
        changes.
        >>>>>>>>>>
        >>>>>>>>>> Mainly we'd like to have a capabilities field in
        the ConfigResponse that allows servers to indicate to clients
        which capabilities are being supported.
        >>>>>>>>>>
        >>>>>>>>>> So far we have the following capabilities:
        >>>>>>>>>>
        >>>>>>>>>> tables
        >>>>>>>>>> views
        >>>>>>>>>> remote-signing
        >>>>>>>>>> vended-credentials
        >>>>>>>>>> multi-table-commit
        >>>>>>>>>> register-table
        >>>>>>>>>> table-metrics
        >>>>>>>>>> oauth2
        >>>>>>>>>>
        >>>>>>>>>>
        >>>>>>>>>> The general idea behind a capability is that if
        e.g. a server supports views, then that server must implement
        all endpoints grouped under that capability.
        >>>>>>>>>> It's worth noting that the /config endpoint is
        currently being implicit (meaning that every REST server would
        have to implement it).
        >>>>>>>>>>
        >>>>>>>>>> One discussion point that came up during review is
        how we want to handle capabilities and backwards compatibility
        and what the default capability would be, since older servers
        don't know anything about capabilities (in such a case we
        could assume that the default capabilities would be oauth2 /
        tables).
        >>>>>>>>>>
        >>>>>>>>>> Are there any other capabilities that we'd like to
        include in the list?
        >>>>>>>>>>
        >>>>>>>>>> Eduard
        >>>>>>>>
        >>>>>>>>
        >>>>>>>>
        >>>>>>>> --
        >>>>>>>> Ryan Blue
        >>>>>>>> Databricks
        >>>
        >>> --
        >>> Robert Stupp
        >>> @snazy

--
Robert Stupp
@snazy

Reply via email to