I'd be very careful here.
The strings in `Namespace` elements are unconstrained. Neither the
`Namespace` implementation in Iceberg/Java nor the REST spec restrict
the contents of the namespace elements. So a '.' can appear in existing
namespace elements and choosing %2E breaks such existing namespaces.
Changing %1F to some random other char >= 0x20 has the potential to
break existing namespaces.
What's needed IMHO is likely an escaping mechanism - not a single char.
On 02.08.24 01:42, Yufei Gu wrote:
+1 on the first option. We may not overly use the config endpoint, but
it'd be suitable in this case. We can introduce a new field like this:
namespace.separator=%2e
Yufei
On Thu, Aug 1, 2024 at 3:46 PM Ryan Blue <b...@databricks.com.invalid>
wrote:
I think the simplest way to preserve compatibility is to allow
this to be configured on the client and by the config route, and
fall back to the current value, 0x1f. Another option is to
introduce a set of v2 endpoints that use a different separator
character. I prefer the first option since the only way to work
with a service that can't support 0x1f is to replace the separator
character. Older clients are already broken, so if they don't
support the property sent by the config route there is no behavior
change.
Ryan
On Thu, Aug 1, 2024 at 9:47 AM Robert Stupp <sn...@snazy.de> wrote:
How is compatibility with older servers guaranteed?
On 01.08.24 14:59, Eduard Tudenhöfner wrote:
Hey everyone,
The REST spec
<https://github.com/apache/iceberg/blob/6319712b612b724fedbc5bed41942ac3426ffe48/open-api/rest-catalog-open-api.yaml#L225>
currently uses *%1F* as the UTF-8 encoded namespace separator
for multi-part namespaces.
This causes issues
<https://github.com/apache/iceberg/issues/10338>, since it's
a control character
<https://www.compart.com/en/unicode/category/Cc> and the
Servlet spec
<https://jakarta.ee/specifications/servlet/6.0/jakarta-servlet-spec-6.0.html#uri-path-canonicalization>
can
reject such characters.
I'm proposing to replace *%1F* with a different character
that isn't problematic (such as *%2E*) and also add some
backwards compatible namespace decoding logic to *RESTUtil*
so that older clients sending *%1F* can still do so.
PS: I also investigated why *%1F* doesn't fail in
*TestRESTCatalog* and it's because we're using Jetty 9.x and
the javax.servlet API 4.0 (instead of 6.x). I'll open a
separate PR to upgrade Jetty and use jakarta.servlet API 6.x,
which will reproduce the issue with *%1F* being used as the
namespace separator.
Eduard
--
Robert Stupp
@snazy
--
Ryan Blue
Databricks
--
Robert Stupp
@snazy