Hey everyone,

I would like to revisit making the namespace separator configurable.
We haven't solved the original issue
<https://github.com/apache/iceberg/issues/10338> where users of the V1 API
run into this when upgrading their server stack. Additionally, Prashant is
working on secure views <https://github.com/apache/iceberg/pull/13810>
where the namespace separator issue also came up.

Hence I'm proposing that we revisit this and unblock Prashant's work but
also provide users a simple path forward when they upgrade their server
stack (I do not think that we should wait until this becomes a wider issue
once the newer Servlet spec is more broadly used by people).

Just a reminder: the configuration part is *entirely optional* for REST
server implementers and there's no behavioral change for existing
installations. Servers would advertise the separator via the */config* option
named `*namespace-separator*`.

#14448 <https://github.com/apache/iceberg/pull/14448> contains the relevant
OpenAPI spec change for this.

Based on earlier discussions we do have general community consensus on this
topic and we only had Robert to object on making this configurable. Robert
suggested having a full escaping mechanism that would allow the use of any
namespace separator.
While this would be great, unfortunately this isn't something we would be
able to implement for users of the V1 API.
I don't think we as the community should leave users of the V1 API with
this issue given that there's a simple path forward for that. For the V2
API we can reconsider our approach and work on an escaping mechanism.

Thanks
Eduard



On Fri, Aug 16, 2024 at 10:29 AM Eduard Tudenhöfner <
[email protected]> wrote:

> I do want to remind us that the original issue
> <https://github.com/apache/iceberg/issues/10338> was reported by users
> from the community before we were internally aware of this issue, meaning
> that there are users of the V1 API that are either running into this issue
> or will eventually run into this issue when they upgrade their server stack.
>
> For the users of the V1 API we have a simple path forward (make it
> *configurable* what's currently *hardcoded*) so I do not see a reason why
> we shouldn't proceed and fix this for these users. I do not think that we
> should wait to fix this problem until we have V2 APIs.
>
> The configuration part is *entirely optional* for REST server
> implementers and there's no behavioral change for existing installations.
>
> Also it seems we have general consensus on the approach from at least 6
> people (me included).
>
> In the meantime the Iceberg community can focus on how a V2 API would look
> and what potential escaping mechanisms would be necessary to solve the
> general problem.
>
> Thanks everyone and I'll go ahead and start a VOTE thread to update the
> wording in the REST spec in #10877
> <https://github.com/apache/iceberg/pull/10877>.
>
> Eduard
>
>
> On Wed, Aug 14, 2024 at 6:46 PM Robert Stupp <[email protected]> wrote:
>
>> I do object the plan to make that separation character configurable,
>> because there is no need to do this (see below), so there's no need for any
>> code change. Instead having a proper, mostly human readable escaping
>> mechanism, which is possible, should be implemented with Iceberg REST v2.
>>
>> Here's the relevant part of the Servlet Spec v6 - first paragraph of [1]:
>> "Servlet containers may implement the standard Servlet URI path
>> canonicalization in any manner they see fit as long as the end result is
>> identical to the end result of the process described here. Servlet
>> containers may provide container specific configuration options to vary the
>> standard canonicalization process. Any such variations may have security
>> implications and both Servlet container implementors and users are advised
>> to be sure that they understand the implications of any such container
>> specific canonicalization options.
>>
>> There's no mention of "must" - the servlet spec carefully uses "may".
>> Just 2-3 letters, but a huge difference, which does *not* mean that e.g.
>> '%1F' has to be rejected, hence Jetty allows users to disable this behavior.
>>
>> I think, I mentioned my concerns already, but here's a summary:
>>
>> * There is no spec that forces this change.
>>
>> * The proposed change only tackles ASCII unit separator. The proposed
>> change does NOT tackle other characters like '%' or '/' or '\'.
>>
>> * The proposed change adds yet another configuration option - there are
>> already quite a lot. Getting that config wrong (and humans make mistakes,
>> because they are not always aware or the consequences) leads to all kinds
>> of issues.
>>
>> * Technical: The proposed change has no validation that the "separator"
>> is a single character.
>>
>> * Technical: Allowing clients to configure such separator character opens
>> the door for inconsistency due to protocol level discrepancies.
>>
>> * Using servlet spec 6 with the ambiguous/suspiciuous validation breaks
>> all currently released clients.
>>
>>
>> There are catalogs that do allow this via Spark SQL:
>>   CREATE NAMESPACE my_catalog.`my/ugly\name%space.!`
>>
>>
>> On 14.08.24 12:40, Eduard Tudenhöfner wrote:
>>
>>
>> Keeping the Iceberg REST spec as is and configuring the (affected)
>>> services that they accept the '%1F' separator character seems much easier
>>> and less controversial and eliminates the need for these changes entirely.
>>
>>
>> I do not think that this is a viable option for everyone. Such control
>> characters are being rejected by the new Servlet spec for security reasons
>> so not every REST server would want to open security holes by enabling
>> something that the Servlet 6 spec is specifically trying to avoid.
>> Here's some additional context on the topic of the *Servlet 6 rules on
>> Ambiguous URIs *from the Jetty project
>> <https://github.com/jetty/jetty.project/issues/11448#issuecomment-1969206031>
>> :
>>
>> Servlet 6 (Jetty 12 / ee10) clarified a ton of vague language and
>>> behaviors around a bunch of core concepts.
>>> Back in Servlet 5 (Jetty 11 or Jetty 12 / ee9) these were fundamentally
>>> broken and lead to a long array of security issues and broken libraries
>>> that worked with Servlet 5.
>>
>>
>>
>> [1]
>> https://jakarta.ee/specifications/servlet/6.0/jakarta-servlet-spec-6.0.html#uri-path-canonicalization
>>
>

Reply via email to