I'd be very careful here.

The strings in `Namespace` elements are unconstrained. Neither the `Namespace` implementation in Iceberg/Java nor the REST spec restrict the contents of the namespace elements. So a '.' can appear in existing namespace elements and choosing %2E breaks such existing namespaces.

Changing %1F to some random other char >= 0x20 has the potential to break existing namespaces.

What's needed IMHO is likely an escaping mechanism - not a single char.


On 02.08.24 01:42, Yufei Gu wrote:
+1 on the first option. We may not overly use the config endpoint, but it'd be suitable in this case. We can introduce a new field like this:

namespace.separator=%2e

Yufei


On Thu, Aug 1, 2024 at 3:46 PM Ryan Blue <b...@databricks.com.invalid> wrote:

    I think the simplest way to preserve compatibility is to allow
    this to be configured on the client and by the config route, and
    fall back to the current value, 0x1f. Another option is to
    introduce a set of v2 endpoints that use a different separator
    character. I prefer the first option since the only way to work
    with a service that can't support 0x1f is to replace the separator
    character. Older clients are already broken, so if they don't
    support the property sent by the config route there is no behavior
    change.

    Ryan

    On Thu, Aug 1, 2024 at 9:47 AM Robert Stupp <sn...@snazy.de> wrote:

        How is compatibility with older servers guaranteed?

        On 01.08.24 14:59, Eduard Tudenhöfner wrote:
        Hey everyone,

        The REST spec
        
<https://github.com/apache/iceberg/blob/6319712b612b724fedbc5bed41942ac3426ffe48/open-api/rest-catalog-open-api.yaml#L225>
        currently uses *%1F* as the UTF-8 encoded namespace separator
        for multi-part namespaces.
        This causes issues
        <https://github.com/apache/iceberg/issues/10338>, since it's
        a control character
        <https://www.compart.com/en/unicode/category/Cc> and the
        Servlet spec
        
<https://jakarta.ee/specifications/servlet/6.0/jakarta-servlet-spec-6.0.html#uri-path-canonicalization>
 can
        reject such characters.

        I'm proposing to replace *%1F* with a different character
        that isn't problematic (such as *%2E*) and also add some
        backwards compatible namespace decoding logic to *RESTUtil*
        so that older clients sending *%1F* can still do so.

        PS: I also investigated why *%1F* doesn't fail in
        *TestRESTCatalog* and it's because we're using  Jetty 9.x and
        the javax.servlet API 4.0 (instead of 6.x). I'll open a
        separate PR to upgrade Jetty and use jakarta.servlet API 6.x,
        which will reproduce the issue with *%1F* being used as the
        namespace separator.

        Eduard

-- Robert Stupp
        @snazy



-- Ryan Blue
    Databricks

--
Robert Stupp
@snazy

Reply via email to