Just curious—why did we originally introduce %1F as a separator? Was it
because we wanted to allow "." as a valid character in namespaces? If
that’s the case, I get that we couldn't use "." or  "%2e" as a separator.

A follow up question: can we restrict the character "." in a namespace
name? For example, HIVE doesn't support "." in name or database names.

Yufei


On Fri, Aug 2, 2024 at 9:44 AM Robert Stupp <sn...@snazy.de> wrote:

> I'd be very careful here.
>
> The strings in `Namespace` elements are unconstrained. Neither the
> `Namespace` implementation in Iceberg/Java nor the REST spec restrict the
> contents of the namespace elements. So a '.' can appear in existing
> namespace elements and choosing %2E breaks such existing namespaces.
>
> Changing %1F to some random other char >= 0x20 has the potential to break
> existing namespaces.
>
> What's needed IMHO is likely an escaping mechanism - not a single char.
>
> On 02.08.24 01:42, Yufei Gu wrote:
>
> +1 on the first option. We may not overly use the config endpoint, but
> it'd be suitable in this case. We can introduce a new field like this:
>
> namespace.separator=%2e
>
> Yufei
>
>
> On Thu, Aug 1, 2024 at 3:46 PM Ryan Blue <b...@databricks.com.invalid>
> <b...@databricks.com.invalid> wrote:
>
>> I think the simplest way to preserve compatibility is to allow this to be
>> configured on the client and by the config route, and fall back to the
>> current value, 0x1f. Another option is to introduce a set of v2 endpoints
>> that use a different separator character. I prefer the first option since
>> the only way to work with a service that can't support 0x1f is to replace
>> the separator character. Older clients are already broken, so if they don't
>> support the property sent by the config route there is no behavior change.
>>
>> Ryan
>>
>> On Thu, Aug 1, 2024 at 9:47 AM Robert Stupp <sn...@snazy.de> wrote:
>>
>>> How is compatibility with older servers guaranteed?
>>> On 01.08.24 14:59, Eduard Tudenhöfner wrote:
>>>
>>> Hey everyone,
>>>
>>> The REST spec
>>> <https://github.com/apache/iceberg/blob/6319712b612b724fedbc5bed41942ac3426ffe48/open-api/rest-catalog-open-api.yaml#L225>
>>> currently uses *%1F* as the UTF-8 encoded namespace separator for
>>> multi-part namespaces.
>>> This causes issues <https://github.com/apache/iceberg/issues/10338>,
>>> since it's a control character
>>> <https://www.compart.com/en/unicode/category/Cc> and the Servlet spec
>>> <https://jakarta.ee/specifications/servlet/6.0/jakarta-servlet-spec-6.0.html#uri-path-canonicalization>
>>>  can
>>> reject such characters.
>>>
>>> I'm proposing to replace *%1F* with a different character that isn't
>>> problematic (such as *%2E*) and also add some backwards compatible
>>> namespace decoding logic to *RESTUtil* so that older clients sending
>>> *%1F* can still do so.
>>>
>>> PS: I also investigated why *%1F* doesn't fail in *TestRESTCatalog* and
>>> it's because we're using  Jetty 9.x and the javax.servlet API 4.0 (instead
>>> of 6.x). I'll open a separate PR to upgrade Jetty and use jakarta.servlet
>>> API 6.x, which will reproduce the issue with *%1F* being used as the
>>> namespace separator.
>>>
>>> Eduard
>>>
>>>
>>>
>>> --
>>> Robert Stupp
>>> @snazy
>>>
>>>
>>
>> --
>> Ryan Blue
>> Databricks
>>
> --
> Robert Stupp
> @snazy
>
>

Reply via email to