I do not see why "old clients are already broken" - they behave
according to the REST spec.
The current REST spec states "If parent is a multipart namespace, the
parts must be separated by the unit separator (`0x1F`) byte" [1]. Using
a different separator, even if it is "backwards compatible", is a change
to the Iceberg REST spec.
If _one_ service is subject to new(er) restrictions, it is IMO that
service that cannot follow the REST spec. Jetty 12 (Servlet Spec v6) was
mentioned as a justification. However, Jetty 12 and other
implementations allow operators to disable the "suspicious character
validation".
The proposed change to the REST spec and implementation would only
address the problem for that service, but not for all services. PMC
members said on the "Iceberg Catalog Sync" call on Aug 7 that they want
this change for that service implementation and defer a solution that
works for all implementations, explicitly mentioning "Nessie", for quite
some time.
The "root cause" is that neither the Iceberg REST spec nor the
"reference implementation" poses any restriction on the the allowed set
of characters in namespace elements, which is therefore also part of the
Iceberg REST spec / "reference implementation".
That said, the proposed change does not work for all implementations.
Keeping the Iceberg REST spec as is and configuring the (affected)
services that they accept the '%1F' separator character seems much
easier and less controversial and eliminates the need for these changes
entirely.
An alternative to use an appropriate escaping mechanism, was turned down
on the same "Iceberg Catalog Sync" call.
Robert
[1]
https://github.com/apache/iceberg/blob/bfab2c334e9b4c11de65f1f9bd1de5dab18aae5b/open-api/rest-catalog-open-api.yaml#L225
On 06.08.24 21:24, Yufei Gu wrote:
Thanks Ryan and Daniel for the context. I'm OK that the server
provides a separator via config endpoint. Just want to dig a bit more,
it does provide more flexibility for the separator, but why do we need
this flexibility? Looks like the only benefit is to reconstruct the
namespaces. What's the use case of reconstruction? We may talk about
it in tomorrow's meeting.
Thanks Eduard for the PR.
Yufei
On Tue, Aug 6, 2024 at 3:10 AM Eduard Tudenhöfner
<etudenhoef...@apache.org> wrote:
It's probably best to make this configurable rather than changing
the separator in the spec. The server can communicate to the
client which namespace separator should be used (via a config
override), but still needs to support *%1F* as a fallback. I've
opened #10877 <https://github.com/apache/iceberg/pull/10877> to
address this.
@Robert: Old clients are already broken with *%1F* so we won't be
able to avoid an old client failing with a 4xx. Introducing a new
endpoint that takes a different namespace separator won't fix the
issue for old clients either, since those clients won't know
anything about that endpoint.
On Mon, Aug 5, 2024 at 7:07 PM Daniel Weeks <dwe...@apache.org> wrote:
I would agree with adding either a server side (config
override) or client side control (query param with `?delim=.`)
as it will be compatible with the current v1 endpoint.
In the future we could introduce a v2 endpoint(s), but I would
want to wait for OpenAPI 4 because they address this by
allowing multi-segment pathing via URI templates in RFC 6570
<https://datatracker.ietf.org/doc/html/rfc6570>, which is the
original way we wanted to represent namespaces, but it wasn't
supported (e.g. .../{+namespaces}/tables/{table}). I doubt
it's really worth the effort though, so I feel like a
configurable delimiter makes the most sense.
-Dan
--
Robert Stupp
@snazy