I do object the plan to make that separation character configurable, because there is no need to do this (see below), so there's no need for any code change. Instead having a proper, mostly human readable escaping mechanism, which is possible, should be implemented with Iceberg REST v2.

Here's the relevant part of the Servlet Spec v6 - first paragraph of [1]: "Servlet containers may implement the standard Servlet URI path canonicalization in any manner they see fit as long as the end result is identical to the end result of the process described here. Servlet containers may provide container specific configuration options to vary the standard canonicalization process. Any such variations may have security implications and both Servlet container implementors and users are advised to be sure that they understand the implications of any such container specific canonicalization options.

There's no mention of "must" - the servlet spec carefully uses "may". Just 2-3 letters, but a huge difference, which does *not* mean that e.g. '%1F' has to be rejected, hence Jetty allows users to disable this behavior.

I think, I mentioned my concerns already, but here's a summary:

* There is no spec that forces this change.

* The proposed change only tackles ASCII unit separator. The proposed change does NOT tackle other characters like '%' or '/' or '\'.

* The proposed change adds yet another configuration option - there are already quite a lot. Getting that config wrong (and humans make mistakes, because they are not always aware or the consequences) leads to all kinds of issues.

* Technical: The proposed change has no validation that the "separator" is a single character.

* Technical: Allowing clients to configure such separator character opens the door for inconsistency due to protocol level discrepancies.

* Using servlet spec 6 with the ambiguous/suspiciuous validation breaks all currently released clients.


There are catalogs that do allow this via Spark SQL:
  CREATE NAMESPACE my_catalog.`my/ugly\name%space.!`


On 14.08.24 12:40, Eduard Tudenhöfner wrote:

    Keeping the Iceberg REST spec as is and configuring the (affected)
    services that they accept the '%1F' separator character seems much
    easier and less controversial and eliminates the need for these
    changes entirely.


I do not think that this is a viable option for everyone. Such control characters are being rejected by the new Servlet spec for security reasons so not every REST server would want to open security holes by enabling something that the Servlet 6 spec is specifically trying to avoid. Here's some additional context on the topic of the *Servlet 6 rules on Ambiguous URIs *from the Jetty project <https://github.com/jetty/jetty.project/issues/11448#issuecomment-1969206031>:

    Servlet 6 (Jetty 12 / ee10) clarified a ton of vague language and
    behaviors around a bunch of core concepts.
    Back in Servlet 5 (Jetty 11 or Jetty 12 / ee9) these were
    fundamentally broken and lead to a long array of security issues
    and broken libraries that worked with Servlet 5.



[1] https://jakarta.ee/specifications/servlet/6.0/jakarta-servlet-spec-6.0.html#uri-path-canonicalization

Reply via email to