Just curious—why did we originally introduce %1F as a separator? Was it because we wanted to allow "." as a valid character in namespaces? If that’s the case, I get that we couldn't use "." or "%2e" as a separator.
A follow up question: can we restrict the character "." in a namespace name? For example, HIVE doesn't support "." in name or database names. Yufei On Fri, Aug 2, 2024 at 9:44 AM Robert Stupp <sn...@snazy.de> wrote: > I'd be very careful here. > > The strings in `Namespace` elements are unconstrained. Neither the > `Namespace` implementation in Iceberg/Java nor the REST spec restrict the > contents of the namespace elements. So a '.' can appear in existing > namespace elements and choosing %2E breaks such existing namespaces. > > Changing %1F to some random other char >= 0x20 has the potential to break > existing namespaces. > > What's needed IMHO is likely an escaping mechanism - not a single char. > > On 02.08.24 01:42, Yufei Gu wrote: > > +1 on the first option. We may not overly use the config endpoint, but > it'd be suitable in this case. We can introduce a new field like this: > > namespace.separator=%2e > > Yufei > > > On Thu, Aug 1, 2024 at 3:46 PM Ryan Blue <b...@databricks.com.invalid> > <b...@databricks.com.invalid> wrote: > >> I think the simplest way to preserve compatibility is to allow this to be >> configured on the client and by the config route, and fall back to the >> current value, 0x1f. Another option is to introduce a set of v2 endpoints >> that use a different separator character. I prefer the first option since >> the only way to work with a service that can't support 0x1f is to replace >> the separator character. Older clients are already broken, so if they don't >> support the property sent by the config route there is no behavior change. >> >> Ryan >> >> On Thu, Aug 1, 2024 at 9:47 AM Robert Stupp <sn...@snazy.de> wrote: >> >>> How is compatibility with older servers guaranteed? >>> On 01.08.24 14:59, Eduard Tudenhöfner wrote: >>> >>> Hey everyone, >>> >>> The REST spec >>> <https://github.com/apache/iceberg/blob/6319712b612b724fedbc5bed41942ac3426ffe48/open-api/rest-catalog-open-api.yaml#L225> >>> currently uses *%1F* as the UTF-8 encoded namespace separator for >>> multi-part namespaces. >>> This causes issues <https://github.com/apache/iceberg/issues/10338>, >>> since it's a control character >>> <https://www.compart.com/en/unicode/category/Cc> and the Servlet spec >>> <https://jakarta.ee/specifications/servlet/6.0/jakarta-servlet-spec-6.0.html#uri-path-canonicalization> >>> can >>> reject such characters. >>> >>> I'm proposing to replace *%1F* with a different character that isn't >>> problematic (such as *%2E*) and also add some backwards compatible >>> namespace decoding logic to *RESTUtil* so that older clients sending >>> *%1F* can still do so. >>> >>> PS: I also investigated why *%1F* doesn't fail in *TestRESTCatalog* and >>> it's because we're using Jetty 9.x and the javax.servlet API 4.0 (instead >>> of 6.x). I'll open a separate PR to upgrade Jetty and use jakarta.servlet >>> API 6.x, which will reproduce the issue with *%1F* being used as the >>> namespace separator. >>> >>> Eduard >>> >>> >>> >>> -- >>> Robert Stupp >>> @snazy >>> >>> >> >> -- >> Ryan Blue >> Databricks >> > -- > Robert Stupp > @snazy > >