I do not see why "old clients are already broken" - they behave according to the REST spec.

The current REST spec states "If parent is a multipart namespace, the parts must be separated by the unit separator (`0x1F`) byte" [1]. Using a different separator, even if it is "backwards compatible", is a change to the Iceberg REST spec.

If _one_ service is subject to new(er) restrictions, it is IMO that service that cannot follow the REST spec. Jetty 12 (Servlet Spec v6) was mentioned as a justification. However, Jetty 12 and other implementations allow operators to disable the "suspicious character validation".

The proposed change to the REST spec and implementation would only address the problem for that service, but not for all services. PMC members said on the "Iceberg Catalog Sync" call on Aug 7 that they want this change for that service implementation and defer a solution that works for all implementations, explicitly mentioning "Nessie", for quite some time.

The "root cause" is that neither the Iceberg REST spec nor the "reference implementation" poses any restriction on the the allowed set of characters in namespace elements, which is therefore also part of the Iceberg REST spec / "reference implementation".

That said, the proposed change does not work for all implementations. Keeping the Iceberg REST spec as is and configuring the (affected) services that they accept the '%1F' separator character seems much easier and less controversial and eliminates the need for these changes entirely.

An alternative to use an appropriate escaping mechanism, was turned down on the same "Iceberg Catalog Sync" call.


Robert


[1] https://github.com/apache/iceberg/blob/bfab2c334e9b4c11de65f1f9bd1de5dab18aae5b/open-api/rest-catalog-open-api.yaml#L225


On 06.08.24 21:24, Yufei Gu wrote:
Thanks Ryan and Daniel for the context. I'm OK that the server provides a separator via config endpoint. Just want to dig a bit more, it does provide more flexibility for the separator, but why do we need this flexibility? Looks like the only benefit is to reconstruct the namespaces. What's the use case of reconstruction? We may talk about it in tomorrow's meeting.

Thanks Eduard for the PR.


Yufei


On Tue, Aug 6, 2024 at 3:10 AM Eduard Tudenhöfner <etudenhoef...@apache.org> wrote:

    It's probably best to make this configurable rather than changing
    the separator in the spec. The server can communicate to the
    client which namespace separator should be used (via a config
    override), but still needs to support *%1F* as a fallback. I've
    opened #10877 <https://github.com/apache/iceberg/pull/10877> to
    address this.

    @Robert: Old clients are already broken with *%1F* so we won't be
    able to avoid an old client failing with a 4xx. Introducing a new
    endpoint that takes a different namespace separator won't fix the
    issue for old clients either, since those clients won't know
    anything about that endpoint.


    On Mon, Aug 5, 2024 at 7:07 PM Daniel Weeks <dwe...@apache.org> wrote:

        I would agree with adding either a server side (config
        override) or client side control (query param with `?delim=.`)
        as it will be compatible with the current v1 endpoint.

        In the future we could introduce a v2 endpoint(s), but I would
        want to wait for OpenAPI 4 because they address this by
        allowing multi-segment pathing via URI templates in RFC 6570
        <https://datatracker.ietf.org/doc/html/rfc6570>, which is the
        original way we wanted to represent namespaces, but it wasn't
        supported (e.g. .../{+namespaces}/tables/{table}). I doubt
        it's really worth the effort though, so I feel like a
        configurable delimiter makes the most sense.

        -Dan

--
Robert Stupp
@snazy

Reply via email to