I do object the plan to make that separation character configurable,
because there is no need to do this (see below), so there's no need for
any code change. Instead having a proper, mostly human readable escaping
mechanism, which is possible, should be implemented with Iceberg REST v2.
Here's the relevant part of the Servlet Spec v6 - first paragraph of
[1]: "Servlet containers may implement the standard Servlet URI path
canonicalization in any manner they see fit as long as the end result is
identical to the end result of the process described here. Servlet
containers may provide container specific configuration options to vary
the standard canonicalization process. Any such variations may have
security implications and both Servlet container implementors and users
are advised to be sure that they understand the implications of any such
container specific canonicalization options.
There's no mention of "must" - the servlet spec carefully uses "may".
Just 2-3 letters, but a huge difference, which does *not* mean that e.g.
'%1F' has to be rejected, hence Jetty allows users to disable this behavior.
I think, I mentioned my concerns already, but here's a summary:
* There is no spec that forces this change.
* The proposed change only tackles ASCII unit separator. The proposed
change does NOT tackle other characters like '%' or '/' or '\'.
* The proposed change adds yet another configuration option - there are
already quite a lot. Getting that config wrong (and humans make
mistakes, because they are not always aware or the consequences) leads
to all kinds of issues.
* Technical: The proposed change has no validation that the "separator"
is a single character.
* Technical: Allowing clients to configure such separator character
opens the door for inconsistency due to protocol level discrepancies.
* Using servlet spec 6 with the ambiguous/suspiciuous validation breaks
all currently released clients.
There are catalogs that do allow this via Spark SQL:
CREATE NAMESPACE my_catalog.`my/ugly\name%space.!`
On 14.08.24 12:40, Eduard Tudenhöfner wrote:
Keeping the Iceberg REST spec as is and configuring the (affected)
services that they accept the '%1F' separator character seems much
easier and less controversial and eliminates the need for these
changes entirely.
I do not think that this is a viable option for everyone. Such control
characters are being rejected by the new Servlet spec for
security reasons so not every REST server would want to open security
holes by enabling something that the Servlet 6 spec is specifically
trying to avoid.
Here's some additional context on the topic of the *Servlet 6 rules on
Ambiguous URIs *from the Jetty project
<https://github.com/jetty/jetty.project/issues/11448#issuecomment-1969206031>:
Servlet 6 (Jetty 12 / ee10) clarified a ton of vague language and
behaviors around a bunch of core concepts.
Back in Servlet 5 (Jetty 11 or Jetty 12 / ee9) these were
fundamentally broken and lead to a long array of security issues
and broken libraries that worked with Servlet 5.
[1]
https://jakarta.ee/specifications/servlet/6.0/jakarta-servlet-spec-6.0.html#uri-path-canonicalization