> Do you think it's worth having a separate discussion about guardrails for
namespace elements and table/view names? [...]

Completely agree here. I think the slash character in particular
should definitely be banned.

Thanks,
Alex

On Thu, Apr 16, 2026 at 6:03 PM Dmitri Bourlatchkov <[email protected]> wrote:
>
> > Do you think it's worth having a separate discussion about guardrails for
> namespace elements and table/view names? [...]
>
> Definitely!
>
> Cheers,
> Dmitri.
>
> On Thu, Apr 16, 2026 at 6:57 AM Robert Stupp <[email protected]> wrote:
>
> > Hi,
> >
> > > spark-sql ()> create namespace `n/s`;
> > > However, the S3 location in this case gets a proper directory breakdown:
> > > ... and table metadata has: "location":"s3://pol/n/s/t1"
> > > ... but that is probably a different issue.
> >
> > Yea, it's different from the URL en/decoding topic. Do you think it's worth
> > having a separate discussion about guardrails for namespace elements and
> > table/view names? For example, disallowing '/', disallowing empty/blank
> > namespace elements and table/view names, disallowing leading/trailing
> > whitespaces? Sure, some of these checks already happen, but not at every
> > level/layer (defense-in-depth).
> >
> > > when Iceberg itself will introduce configurable separators, we MAY ask
> > ourselves if Polaris should allow them to beconfigurable or not. [...]
> > separator is just a REST layer thing
> >
> > True, the separator is a primarily a REST-layer namespace en/decoding
> > thing. What worries me slightly is that (existing) namespace elements with
> > the configured separator character could become inaccessible. However,
> > "configurable separator" is IMO a different discussion.
> >
> > Best,
> > Robert
> >
> >
> > On Wed, Apr 15, 2026 at 8:20 PM Dmitri Bourlatchkov <[email protected]>
> > wrote:
> >
> > > Hi All,
> > >
> > > My understanding of the need to make namespace separators configurable is
> > > that there exist a rather narrow set of deployment cases where the ASCII
> > > "0x1F" (unit separator) character is not permitted in URL paths by some
> > > infrastructure components.
> > >
> > > It might be worth allowing users to define a different separator, but
> > since
> > > no one has brought this up yet, I assume it is not a priority.
> > >
> > > In any case, using a different separator is completely a REST API
> > > concern and should not affect how Polaris stores data internally.
> > >
> > > Cheers,
> > > Dmitri.
> > >
> > > On Wed, Apr 15, 2026 at 2:03 PM Alexandre Dutra <[email protected]>
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > > I wonder how namespace elements and table/view names with a slash
> > ('/')
> > > > character in the middle behave. Or other characters like '&' or '?' or
> > > '#'.
> > > >
> > > > For the REST layer, these will be percent-encoded, and with my PR to
> > > > fix a double-decoding issue, these characters "survive" the REST layer
> > > > just fine.
> > > >
> > > > The issue now is in some layers beneath: as I pointed out and as
> > > > Dmitri demonstrated, we are unfortunately concatenating identifiers
> > > > together to create storage locations, without proper escaping. This
> > > > currently results in corrupted storage locations.
> > > >
> > > > I'm trying first to fix the REST layer first, then I'll move to the
> > > > storage layer.
> > > >
> > > > > What's your take on leveraging
> > jakarta.ws.rs.ext.ParamConverterProvider
> > > > / jakarta.ws.rs.ext.ParamConverter for the path parameters and have
> > > > centralized helpers that deal with "proper" URL encoding/decoding?
> > > >
> > > > For now I don't see a valid usage in Polaris for that, since Jersey
> > > > handles decoding path parameters already.
> > > >
> > > > > I also agree that the "configurable namespace separator" must never
> > > > change. Is my assumption correct, that it must always be the same
> > > character
> > > > as it is today?
> > > >
> > > > In Polaris, we are using the namespace separator in two different use
> > > > cases:
> > > >
> > > > 1) For path parameters in the REST layer
> > > > 2) For storing namespaces in Polaris entities
> > > >
> > > > What is clear is that in the second use case, the namespace must NEVER
> > > > change. I just opened a PR for that:
> > > > https://github.com/apache/polaris/pull/4214
> > > >
> > > > Regarding the first use case, once we solve all our encoding/decoding
> > > > issues, and when Iceberg itself will introduce configurable
> > > > separators, we MAY ask ourselves if Polaris should allow them to be
> > > > configurable or not. I don't have strong opinions, but if the
> > > > separator is just a REST layer thing, it should be possible to change
> > > > it without breaking the storage layer or the metastore.
> > > >
> > > > Thanks,
> > > > Alex
> > > >
> > > > On Wed, Apr 15, 2026 at 7:47 PM Dmitri Bourlatchkov <[email protected]>
> > > > wrote:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > Slashes in namespace seem to work fine (Spark 3.5 + Iceberg 1.10.0):
> > > > >
> > > > > spark-sql ()> create namespace `n/s`;
> > > > > Time taken: 0.335 seconds
> > > > > spark-sql ()> show namespaces;
> > > > > `n/s`
> > > > > Time taken: 0.232 seconds, Fetched 1 row(s)
> > > > > spark-sql ()> use `n/s`;
> > > > > Time taken: 0.028 seconds
> > > > > spark-sql (`n/s`)> create table t1 (n string);
> > > > > Time taken: 0.702 seconds
> > > > >
> > > > > The URLs appear to be encoded properly, e.g. (from Polaris log):
> > > > >
> > > > > 2026-04-15 13:41:17,594 INFO  [io.qua.htt.access-log]
> > > > > [dee1505c-ec1d-4f90-a9de-154eac66a40c_0000000000000000013,POLARIS]
> > > [,,,]
> > > > > (executor-thread-1) 127.0.0.1 - root [15/Apr/2026:13:41:17 -0400]
> > "GET
> > > > > /api/catalog/v1/polaris/namespaces/n%2Fs/tables?pageToken= HTTP/1.1"
> > > 200
> > > > 74
> > > > >
> > > > > I did not test trickier chars, but adding CI coverage for them would
> > be
> > > > > good.
> > > > >
> > > > > However, the S3 location in this case gets a proper directory
> > > breakdown:
> > > > >
> > > > > $ mc ls rustfs/pol/n/s
> > > > > [2026-04-15 13:44:37 EDT]     0B t1/
> > > > >
> > > > > ... and table metadata has: "location":"s3://pol/n/s/t1"
> > > > >
> > > > > ... but that is probably a different issue.
> > > > >
> > > > > Cheers,
> > > > > Dmitri.
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Apr 15, 2026 at 10:35 AM Robert Stupp <[email protected]>
> > wrote:
> > > > >
> > > > > > Thanks Alex for the thorough investigation!
> > > > > >
> > > > > > URL en/decoding is really not that easy.
> > > > > > I wonder how namespace elements and table/view names with a slash
> > > ('/')
> > > > > > character in the middle behave. Or other characters like '&' or '?'
> > > or
> > > > '#'.
> > > > > >
> > > > > > Overall, I agree with your idea to implement correct URL
> > > > encoding/decoding
> > > > > > in the Polaris code base to protect Polaris from upstream behavior
> > > > changes
> > > > > > that can seriously break or even corrupt things.
> > > > > >
> > > > > > What's your take on leveraging
> > > jakarta.ws.rs.ext.ParamConverterProvider
> > > > > > / jakarta.ws.rs.ext.ParamConverter for the path parameters and have
> > > > > > centralized helpers that deal with "proper" URL encoding/decoding?
> > > > > >
> > > > > > I also agree that the "configurable namespace separator" must never
> > > > change.
> > > > > > Is my assumption correct, that it must always be the same character
> > > as
> > > > it
> > > > > > is today?
> > > > > >
> > > > > > Best,
> > > > > > Robert
> > > > > >
> > > > > >
> > > > > > On Wed, Apr 15, 2026 at 3:48 PM Alexandre Dutra <[email protected]
> > >
> > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > FYI I created a first PR to address the double-decoding issue:
> > > > > > >
> > > > > > > https://github.com/apache/polaris/pull/4210
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Alex
> > > > > > >
> > > > > > > On Tue, Apr 14, 2026 at 9:56 PM Alexandre Dutra <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I would also point out that Polaris uses
> > RESTUtil.encodeNamespace
> > > > and
> > > > > > > > RESTUtil.decodeNamespace for encoding and decoding the parent
> > > > > > > > namespace within a NamespaceEntity [1].
> > > > > > > >
> > > > > > > > These methods also exhibit the faulty space encoding behavior.
> > > > > > > > Therefore, we must exercise **extreme caution** regarding any
> > > > upcoming
> > > > > > > > Iceberg project fixes for space-encoding issues. If these
> > methods
> > > > are
> > > > > > > > modified, it is imperative that we retain the legacy versions
> > > > > > > > specifically for encoding and decoding NamespaceEntity
> > > properties –
> > > > > > > > otherwise we could end up with a corrupted database.
> > > > > > > >
> > > > > > > > The same goes for the future namespace separator coming with
> > > > Iceberg
> > > > > > > > 1.11: for the sake of encoding and decoding NamespaceEntity
> > > > > > > > properties, the separator must never change.
> > > > > > > >
> > > > > > > > I would actually be in favor of proactively internalizing the
> > > > > > > > encoding/decoding algorithm used in NamespaceEntity. What do
> > you
> > > > > > > > think?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Alex
> > > > > > > >
> > > > > > > > [1]:
> > > > > > >
> > > > > >
> > > >
> > >
> > https://github.com/apache/polaris/blob/8ad8f74f62258ab6238190271603e4d4c8a75998/polaris-core/src/main/java/org/apache/polaris/core/entity/NamespaceEntity.java#L92
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Apr 14, 2026 at 7:43 PM Alexandre Dutra <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > A discussion on the Iceberg ML [1] recently highlighted that
> > > URL
> > > > path
> > > > > > > > > segments are not being decoded correctly according to RFC
> > 3986,
> > > > > > > > > specifically regarding space encoding.
> > > > > > > > >
> > > > > > > > > I investigated the situation in Polaris, and found many
> > > problems:
> > > > > > > > >
> > > > > > > > > TLDR
> > > > > > > > >
> > > > > > > > > - Table names with the + sign can be created but cannot be
> > > > retrieved
> > > > > > > > > - Namespace names with the + sign are OK (can be created and
> > > > > > retrieved)
> > > > > > > > > - Table names with spaces cannot be created
> > > > > > > > > - Namespace names with spaces cannot be created
> > > > > > > > >
> > > > > > > > > DISCUSSION
> > > > > > > > >
> > > > > > > > > Table names such as "foo+bar" can be created (via POST, where
> > > the
> > > > > > name
> > > > > > > > > is in the request body). But they cannot be retrieved: when
> > > > reading
> > > > > > > > > tables, the name is part of the URL path. Polaris incorrectly
> > > > > > performs
> > > > > > > > > a second decoding step using RESTUtil.decodeString(table),
> > even
> > > > > > though
> > > > > > > > > the REST framework has already decoded it. Consequently, a
> > > client
> > > > > > > > > sends "foo%2Bbar" which is first decoded to "foo+bar" by the
> > > > > > framework
> > > > > > > > > (correct) and then re-decoded by Polaris to "foo bar"
> > > > (incorrect),
> > > > > > > > > resulting in a "not found" error.
> > > > > > > > >
> > > > > > > > > Table and namespace names like "foo bar" simply cannot be
> > > > created at
> > > > > > > > > all. This is because in
> > > > IcebergCatalog.defaultWarehouseLocation() and
> > > > > > > > > other similar places, we create locations merely by joining
> > > > > > > > > identifiers together, without any form of URL encoding: see
> > [2]
> > > > [3].
> > > > > > > > >
> > > > > > > > > And even if tables like "foo bar" could be created, they
> > > > couldn't be
> > > > > > > > > retrieved by Java clients. This occurs because current Java
> > > > clients
> > > > > > > > > incorrectly encode that name as "foo+bar", which the REST
> > > > framework
> > > > > > > > > does not modify. Consequently, Polaris would look for a table
> > > > named
> > > > > > > > > "foo+bar" instead and throw a "not found" error. (Other
> > clients
> > > > would
> > > > > > > > > send "foo%20bar" which would be correctly decoded by the
> > > > framework as
> > > > > > > > > "foo bar", and thus it would succeed.)
> > > > > > > > >
> > > > > > > > > PROPOSAL
> > > > > > > > >
> > > > > > > > > To resolve the issue with the + sign in table names, we
> > simply
> > > > need
> > > > > > to
> > > > > > > > > eliminate the redundant decoding step. I can open a PR for
> > that
> > > > > > > > > shortly.
> > > > > > > > >
> > > > > > > > > To resolve the issue with spaces in table and namespace
> > names,
> > > we
> > > > > > > > > could fix all the methods that incorrectly join together
> > > > identifiers
> > > > > > > > > without proper URL encoding.
> > > > > > > > >
> > > > > > > > > Finally, addressing the Java clients encoding problem is
> > > > complex, but
> > > > > > > > > we could consider implementing a workaround as follows:
> > > > > > > > >
> > > > > > > > > 1) If the client is Java and lacks the upcoming Iceberg fix
> > for
> > > > space
> > > > > > > > > encoding, manually replace "+" with a space to correct the
> > > > client's
> > > > > > > > > faulty encoding.
> > > > > > > > >
> > > > > > > > > 2) For non-Java clients or those with the fix, no workaround
> > > > would be
> > > > > > > required.
> > > > > > > > >
> > > > > > > > > What are your thoughts on this?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Alex
> > > > > > > > >
> > > > > > > > > [1]:
> > > > > > https://lists.apache.org/thread/c498svln0x18vvm42998b9nm9j6ck5yh
> > > > > > > > > [2]:
> > > > > > >
> > > > > >
> > > >
> > >
> > https://github.com/apache/polaris/blob/e94fdff63852dc41635c9e7eb62b3627ba562b85/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L379
> > > > > > > > > [3]:
> > > > > > >
> > > > > >
> > > >
> > >
> > https://github.com/apache/polaris/blob/e94fdff63852dc41635c9e7eb62b3627ba562b85/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L571
> > > > > > >
> > > > > >
> > > >
> > >
> >

Reply via email to