> Do you think it's worth having a separate discussion about guardrails for namespace elements and table/view names? [...]
Completely agree here. I think the slash character in particular should definitely be banned. Thanks, Alex On Thu, Apr 16, 2026 at 6:03 PM Dmitri Bourlatchkov <[email protected]> wrote: > > > Do you think it's worth having a separate discussion about guardrails for > namespace elements and table/view names? [...] > > Definitely! > > Cheers, > Dmitri. > > On Thu, Apr 16, 2026 at 6:57 AM Robert Stupp <[email protected]> wrote: > > > Hi, > > > > > spark-sql ()> create namespace `n/s`; > > > However, the S3 location in this case gets a proper directory breakdown: > > > ... and table metadata has: "location":"s3://pol/n/s/t1" > > > ... but that is probably a different issue. > > > > Yea, it's different from the URL en/decoding topic. Do you think it's worth > > having a separate discussion about guardrails for namespace elements and > > table/view names? For example, disallowing '/', disallowing empty/blank > > namespace elements and table/view names, disallowing leading/trailing > > whitespaces? Sure, some of these checks already happen, but not at every > > level/layer (defense-in-depth). > > > > > when Iceberg itself will introduce configurable separators, we MAY ask > > ourselves if Polaris should allow them to beconfigurable or not. [...] > > separator is just a REST layer thing > > > > True, the separator is a primarily a REST-layer namespace en/decoding > > thing. What worries me slightly is that (existing) namespace elements with > > the configured separator character could become inaccessible. However, > > "configurable separator" is IMO a different discussion. > > > > Best, > > Robert > > > > > > On Wed, Apr 15, 2026 at 8:20 PM Dmitri Bourlatchkov <[email protected]> > > wrote: > > > > > Hi All, > > > > > > My understanding of the need to make namespace separators configurable is > > > that there exist a rather narrow set of deployment cases where the ASCII > > > "0x1F" (unit separator) character is not permitted in URL paths by some > > > infrastructure components. > > > > > > It might be worth allowing users to define a different separator, but > > since > > > no one has brought this up yet, I assume it is not a priority. > > > > > > In any case, using a different separator is completely a REST API > > > concern and should not affect how Polaris stores data internally. > > > > > > Cheers, > > > Dmitri. > > > > > > On Wed, Apr 15, 2026 at 2:03 PM Alexandre Dutra <[email protected]> > > wrote: > > > > > > > Hi all, > > > > > > > > > I wonder how namespace elements and table/view names with a slash > > ('/') > > > > character in the middle behave. Or other characters like '&' or '?' or > > > '#'. > > > > > > > > For the REST layer, these will be percent-encoded, and with my PR to > > > > fix a double-decoding issue, these characters "survive" the REST layer > > > > just fine. > > > > > > > > The issue now is in some layers beneath: as I pointed out and as > > > > Dmitri demonstrated, we are unfortunately concatenating identifiers > > > > together to create storage locations, without proper escaping. This > > > > currently results in corrupted storage locations. > > > > > > > > I'm trying first to fix the REST layer first, then I'll move to the > > > > storage layer. > > > > > > > > > What's your take on leveraging > > jakarta.ws.rs.ext.ParamConverterProvider > > > > / jakarta.ws.rs.ext.ParamConverter for the path parameters and have > > > > centralized helpers that deal with "proper" URL encoding/decoding? > > > > > > > > For now I don't see a valid usage in Polaris for that, since Jersey > > > > handles decoding path parameters already. > > > > > > > > > I also agree that the "configurable namespace separator" must never > > > > change. Is my assumption correct, that it must always be the same > > > character > > > > as it is today? > > > > > > > > In Polaris, we are using the namespace separator in two different use > > > > cases: > > > > > > > > 1) For path parameters in the REST layer > > > > 2) For storing namespaces in Polaris entities > > > > > > > > What is clear is that in the second use case, the namespace must NEVER > > > > change. I just opened a PR for that: > > > > https://github.com/apache/polaris/pull/4214 > > > > > > > > Regarding the first use case, once we solve all our encoding/decoding > > > > issues, and when Iceberg itself will introduce configurable > > > > separators, we MAY ask ourselves if Polaris should allow them to be > > > > configurable or not. I don't have strong opinions, but if the > > > > separator is just a REST layer thing, it should be possible to change > > > > it without breaking the storage layer or the metastore. > > > > > > > > Thanks, > > > > Alex > > > > > > > > On Wed, Apr 15, 2026 at 7:47 PM Dmitri Bourlatchkov <[email protected]> > > > > wrote: > > > > > > > > > > Hi All, > > > > > > > > > > Slashes in namespace seem to work fine (Spark 3.5 + Iceberg 1.10.0): > > > > > > > > > > spark-sql ()> create namespace `n/s`; > > > > > Time taken: 0.335 seconds > > > > > spark-sql ()> show namespaces; > > > > > `n/s` > > > > > Time taken: 0.232 seconds, Fetched 1 row(s) > > > > > spark-sql ()> use `n/s`; > > > > > Time taken: 0.028 seconds > > > > > spark-sql (`n/s`)> create table t1 (n string); > > > > > Time taken: 0.702 seconds > > > > > > > > > > The URLs appear to be encoded properly, e.g. (from Polaris log): > > > > > > > > > > 2026-04-15 13:41:17,594 INFO [io.qua.htt.access-log] > > > > > [dee1505c-ec1d-4f90-a9de-154eac66a40c_0000000000000000013,POLARIS] > > > [,,,] > > > > > (executor-thread-1) 127.0.0.1 - root [15/Apr/2026:13:41:17 -0400] > > "GET > > > > > /api/catalog/v1/polaris/namespaces/n%2Fs/tables?pageToken= HTTP/1.1" > > > 200 > > > > 74 > > > > > > > > > > I did not test trickier chars, but adding CI coverage for them would > > be > > > > > good. > > > > > > > > > > However, the S3 location in this case gets a proper directory > > > breakdown: > > > > > > > > > > $ mc ls rustfs/pol/n/s > > > > > [2026-04-15 13:44:37 EDT] 0B t1/ > > > > > > > > > > ... and table metadata has: "location":"s3://pol/n/s/t1" > > > > > > > > > > ... but that is probably a different issue. > > > > > > > > > > Cheers, > > > > > Dmitri. > > > > > > > > > > > > > > > > > > > > On Wed, Apr 15, 2026 at 10:35 AM Robert Stupp <[email protected]> > > wrote: > > > > > > > > > > > Thanks Alex for the thorough investigation! > > > > > > > > > > > > URL en/decoding is really not that easy. > > > > > > I wonder how namespace elements and table/view names with a slash > > > ('/') > > > > > > character in the middle behave. Or other characters like '&' or '?' > > > or > > > > '#'. > > > > > > > > > > > > Overall, I agree with your idea to implement correct URL > > > > encoding/decoding > > > > > > in the Polaris code base to protect Polaris from upstream behavior > > > > changes > > > > > > that can seriously break or even corrupt things. > > > > > > > > > > > > What's your take on leveraging > > > jakarta.ws.rs.ext.ParamConverterProvider > > > > > > / jakarta.ws.rs.ext.ParamConverter for the path parameters and have > > > > > > centralized helpers that deal with "proper" URL encoding/decoding? > > > > > > > > > > > > I also agree that the "configurable namespace separator" must never > > > > change. > > > > > > Is my assumption correct, that it must always be the same character > > > as > > > > it > > > > > > is today? > > > > > > > > > > > > Best, > > > > > > Robert > > > > > > > > > > > > > > > > > > On Wed, Apr 15, 2026 at 3:48 PM Alexandre Dutra <[email protected] > > > > > > > wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > FYI I created a first PR to address the double-decoding issue: > > > > > > > > > > > > > > https://github.com/apache/polaris/pull/4210 > > > > > > > > > > > > > > Thanks, > > > > > > > Alex > > > > > > > > > > > > > > On Tue, Apr 14, 2026 at 9:56 PM Alexandre Dutra < > > [email protected] > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > I would also point out that Polaris uses > > RESTUtil.encodeNamespace > > > > and > > > > > > > > RESTUtil.decodeNamespace for encoding and decoding the parent > > > > > > > > namespace within a NamespaceEntity [1]. > > > > > > > > > > > > > > > > These methods also exhibit the faulty space encoding behavior. > > > > > > > > Therefore, we must exercise **extreme caution** regarding any > > > > upcoming > > > > > > > > Iceberg project fixes for space-encoding issues. If these > > methods > > > > are > > > > > > > > modified, it is imperative that we retain the legacy versions > > > > > > > > specifically for encoding and decoding NamespaceEntity > > > properties – > > > > > > > > otherwise we could end up with a corrupted database. > > > > > > > > > > > > > > > > The same goes for the future namespace separator coming with > > > > Iceberg > > > > > > > > 1.11: for the sake of encoding and decoding NamespaceEntity > > > > > > > > properties, the separator must never change. > > > > > > > > > > > > > > > > I would actually be in favor of proactively internalizing the > > > > > > > > encoding/decoding algorithm used in NamespaceEntity. What do > > you > > > > > > > > think? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Alex > > > > > > > > > > > > > > > > [1]: > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/polaris/blob/8ad8f74f62258ab6238190271603e4d4c8a75998/polaris-core/src/main/java/org/apache/polaris/core/entity/NamespaceEntity.java#L92 > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Apr 14, 2026 at 7:43 PM Alexandre Dutra < > > > [email protected] > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > A discussion on the Iceberg ML [1] recently highlighted that > > > URL > > > > path > > > > > > > > > segments are not being decoded correctly according to RFC > > 3986, > > > > > > > > > specifically regarding space encoding. > > > > > > > > > > > > > > > > > > I investigated the situation in Polaris, and found many > > > problems: > > > > > > > > > > > > > > > > > > TLDR > > > > > > > > > > > > > > > > > > - Table names with the + sign can be created but cannot be > > > > retrieved > > > > > > > > > - Namespace names with the + sign are OK (can be created and > > > > > > retrieved) > > > > > > > > > - Table names with spaces cannot be created > > > > > > > > > - Namespace names with spaces cannot be created > > > > > > > > > > > > > > > > > > DISCUSSION > > > > > > > > > > > > > > > > > > Table names such as "foo+bar" can be created (via POST, where > > > the > > > > > > name > > > > > > > > > is in the request body). But they cannot be retrieved: when > > > > reading > > > > > > > > > tables, the name is part of the URL path. Polaris incorrectly > > > > > > performs > > > > > > > > > a second decoding step using RESTUtil.decodeString(table), > > even > > > > > > though > > > > > > > > > the REST framework has already decoded it. Consequently, a > > > client > > > > > > > > > sends "foo%2Bbar" which is first decoded to "foo+bar" by the > > > > > > framework > > > > > > > > > (correct) and then re-decoded by Polaris to "foo bar" > > > > (incorrect), > > > > > > > > > resulting in a "not found" error. > > > > > > > > > > > > > > > > > > Table and namespace names like "foo bar" simply cannot be > > > > created at > > > > > > > > > all. This is because in > > > > IcebergCatalog.defaultWarehouseLocation() and > > > > > > > > > other similar places, we create locations merely by joining > > > > > > > > > identifiers together, without any form of URL encoding: see > > [2] > > > > [3]. > > > > > > > > > > > > > > > > > > And even if tables like "foo bar" could be created, they > > > > couldn't be > > > > > > > > > retrieved by Java clients. This occurs because current Java > > > > clients > > > > > > > > > incorrectly encode that name as "foo+bar", which the REST > > > > framework > > > > > > > > > does not modify. Consequently, Polaris would look for a table > > > > named > > > > > > > > > "foo+bar" instead and throw a "not found" error. (Other > > clients > > > > would > > > > > > > > > send "foo%20bar" which would be correctly decoded by the > > > > framework as > > > > > > > > > "foo bar", and thus it would succeed.) > > > > > > > > > > > > > > > > > > PROPOSAL > > > > > > > > > > > > > > > > > > To resolve the issue with the + sign in table names, we > > simply > > > > need > > > > > > to > > > > > > > > > eliminate the redundant decoding step. I can open a PR for > > that > > > > > > > > > shortly. > > > > > > > > > > > > > > > > > > To resolve the issue with spaces in table and namespace > > names, > > > we > > > > > > > > > could fix all the methods that incorrectly join together > > > > identifiers > > > > > > > > > without proper URL encoding. > > > > > > > > > > > > > > > > > > Finally, addressing the Java clients encoding problem is > > > > complex, but > > > > > > > > > we could consider implementing a workaround as follows: > > > > > > > > > > > > > > > > > > 1) If the client is Java and lacks the upcoming Iceberg fix > > for > > > > space > > > > > > > > > encoding, manually replace "+" with a space to correct the > > > > client's > > > > > > > > > faulty encoding. > > > > > > > > > > > > > > > > > > 2) For non-Java clients or those with the fix, no workaround > > > > would be > > > > > > > required. > > > > > > > > > > > > > > > > > > What are your thoughts on this? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Alex > > > > > > > > > > > > > > > > > > [1]: > > > > > > https://lists.apache.org/thread/c498svln0x18vvm42998b9nm9j6ck5yh > > > > > > > > > [2]: > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/polaris/blob/e94fdff63852dc41635c9e7eb62b3627ba562b85/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L379 > > > > > > > > > [3]: > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/polaris/blob/e94fdff63852dc41635c9e7eb62b3627ba562b85/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L571 > > > > > > > > > > > > > > > > > > > > > >
