I feel like we are over complicating this situation. It seems like our specification made a poor choice in terms of separator character, do we have any disagreement on this point? It looks like by choosing a control character, we ended up generating requests which modern server systems define as potentially dangerous. Although it seems like this hasn't come up in the wild as far as I know it seems like it makes sense for us to just change the character to something else, modify the spec and move forward.
The ability to custom define characters for namespace separation wasn't an issue before and I'm not sure why we want to introduce that capability now on either the Server or Client side although the Server side seems less complicated here. I think escaping is probably fine as well if that can work but i'm not sure the Servlet spec allows that? Any escaping seems like it will also cause client and server changes similar to just changing the character so it seems like similar developer complexity there. Overall I really want to just emphasize that I don't think this is a place where users either on the Server or Client side should have to think about configuration. On Tue, Aug 13, 2024 at 4:02 AM Robert Stupp <sn...@snazy.de> wrote: > I do not see why "old clients are already broken" - they behave according > to the REST spec. > > The current REST spec states "If parent is a multipart namespace, the > parts must be separated by the unit separator (`0x1F`) byte" [1]. Using a > different separator, even if it is "backwards compatible", is a change to > the Iceberg REST spec. > > If _one_ service is subject to new(er) restrictions, it is IMO that > service that cannot follow the REST spec. Jetty 12 (Servlet Spec v6) was > mentioned as a justification. However, Jetty 12 and other implementations > allow operators to disable the "suspicious character validation". > > The proposed change to the REST spec and implementation would only address > the problem for that service, but not for all services. PMC members said on > the "Iceberg Catalog Sync" call on Aug 7 that they want this change for > that service implementation and defer a solution that works for all > implementations, explicitly mentioning "Nessie", for quite some time. > > The "root cause" is that neither the Iceberg REST spec nor the "reference > implementation" poses any restriction on the the allowed set of characters > in namespace elements, which is therefore also part of the Iceberg REST > spec / "reference implementation". > > That said, the proposed change does not work for all implementations. > Keeping the Iceberg REST spec as is and configuring the (affected) services > that they accept the '%1F' separator character seems much easier and less > controversial and eliminates the need for these changes entirely. > > An alternative to use an appropriate escaping mechanism, was turned down > on the same "Iceberg Catalog Sync" call. > > > Robert > > > [1] > https://github.com/apache/iceberg/blob/bfab2c334e9b4c11de65f1f9bd1de5dab18aae5b/open-api/rest-catalog-open-api.yaml#L225 > > > On 06.08.24 21:24, Yufei Gu wrote: > > Thanks Ryan and Daniel for the context. I'm OK that the server provides a > separator via config endpoint. Just want to dig a bit more, it does provide > more flexibility for the separator, but why do we need this flexibility? > Looks like the only benefit is to reconstruct the namespaces. What's the > use case of reconstruction? We may talk about it in tomorrow's meeting. > > Thanks Eduard for the PR. > > > Yufei > > > On Tue, Aug 6, 2024 at 3:10 AM Eduard Tudenhöfner < > etudenhoef...@apache.org> wrote: > >> It's probably best to make this configurable rather than changing the >> separator in the spec. The server can communicate to the client which >> namespace separator should be used (via a config override), but still needs >> to support *%1F* as a fallback. I've opened #10877 >> <https://github.com/apache/iceberg/pull/10877> to address this. >> >> @Robert: Old clients are already broken with *%1F* so we won't be able >> to avoid an old client failing with a 4xx. Introducing a new endpoint that >> takes a different namespace separator won't fix the issue for old clients >> either, since those clients won't know anything about that endpoint. >> >> >> On Mon, Aug 5, 2024 at 7:07 PM Daniel Weeks <dwe...@apache.org> wrote: >> >>> I would agree with adding either a server side (config override) or >>> client side control (query param with `?delim=.`) as it will be >>> compatible with the current v1 endpoint. >>> >>> In the future we could introduce a v2 endpoint(s), but I would want to >>> wait for OpenAPI 4 because they address this by allowing multi-segment >>> pathing via URI templates in RFC 6570 >>> <https://datatracker.ietf.org/doc/html/rfc6570>, which is the original >>> way we wanted to represent namespaces, but it wasn't supported (e.g. >>> .../{+namespaces}/tables/{table}). I doubt it's really worth the >>> effort though, so I feel like a configurable delimiter makes the most sense. >>> >>> -Dan >>> >> -- > Robert Stupp > @snazy > >