Hi Yufei,

Yes, I think we can view storage location sanitizing as a parallel effort.

With that, here is a simple PR that aims at forbidding slashes and a
few other pathological cases for Iceberg and Generic Tables entities
at creation time:

https://github.com/apache/polaris/pull/4282

Thanks,
Alex

On Thu, Apr 23, 2026 at 1:14 AM Yufei Gu <[email protected]> wrote:
>
> Hi Alex, it's a good point that the storage location build is also
> affected, but it feels less controversial and somewhat separate from the
> main question here.
>
> The immediate discussion, at least from my perspective, is about entity
> naming guardrails and externally visible behavior, for example preventing
> names that are ambiguous or likely to break REST access and cross client
> behavior.
>
> Storage location construction is important too, but that feels more like an
> internal implementation hardening task than a spec or user-facing semantics
> question. I would view it as a parallel track rather than something that
> should block agreement on the narrower entity name issue. I'm also fine if
> someone wants to tackle the location building issue first. That could
> provide useful context for resolving the user-facing naming questions.
>
> Yufei
>
>
> On Wed, Apr 22, 2026 at 8:28 AM Alexandre Dutra <[email protected]> wrote:
>
> > Hi all,
> >
> > Disallowing the most problematic cases seems the right way to go. I
> > can provide a PR to quickly implement that.
> >
> > However, we must keep in mind that disallowing a few chars will not
> > solve all our problems. IMHO we need to consistently replace all
> > string concatenations that we use today for creating storage locations
> > with a proper location builder that will take care of proper path
> > escaping and sanitization. That part of the job is way more complex,
> > due to the blast radius.
> >
> > Thanks,
> > Alex
> >
> >
> > On Wed, Apr 22, 2026 at 2:07 AM Yufei Gu <[email protected]> wrote:
> > >
> > > Sorry for jumping into this thread a bit late.
> > >
> > > I’m supportive of introducing some guardrails for namespace and table or
> > > view names. Specifically, I think we should disallow a few problematic
> > > cases to avoid ambiguity and downstream issues:
> > >
> > >    - Disallow the slash character “/”
> > >    - Disallow empty strings
> > >    - Disallow leading or trailing whitespace
> > >
> > > These constraints seem reasonable given the interactions across REST,
> > > storage paths, and different client behaviors. Adding clear guardrails
> > > early can prevent subtle bugs and inconsistencies later on. Curious to
> > hear
> > > if others see any concerns or edge cases with this approach.
> > >
> > > Thanks,
> > >
> > > Yufei
> > >
> > >
> > > On Thu, Apr 16, 2026 at 9:11 AM Alexandre Dutra <[email protected]>
> > wrote:
> > >
> > > > > Do you think it's worth having a separate discussion about
> > guardrails for
> > > > namespace elements and table/view names? [...]
> > > >
> > > > Completely agree here. I think the slash character in particular
> > > > should definitely be banned.
> > > >
> > > > Thanks,
> > > > Alex
> > > >
> > > > On Thu, Apr 16, 2026 at 6:03 PM Dmitri Bourlatchkov <[email protected]>
> > > > wrote:
> > > > >
> > > > > > Do you think it's worth having a separate discussion about
> > guardrails
> > > > for
> > > > > namespace elements and table/view names? [...]
> > > > >
> > > > > Definitely!
> > > > >
> > > > > Cheers,
> > > > > Dmitri.
> > > > >
> > > > > On Thu, Apr 16, 2026 at 6:57 AM Robert Stupp <[email protected]> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > > spark-sql ()> create namespace `n/s`;
> > > > > > > However, the S3 location in this case gets a proper directory
> > > > breakdown:
> > > > > > > ... and table metadata has: "location":"s3://pol/n/s/t1"
> > > > > > > ... but that is probably a different issue.
> > > > > >
> > > > > > Yea, it's different from the URL en/decoding topic. Do you think
> > it's
> > > > worth
> > > > > > having a separate discussion about guardrails for namespace
> > elements
> > > > and
> > > > > > table/view names? For example, disallowing '/', disallowing
> > empty/blank
> > > > > > namespace elements and table/view names, disallowing
> > leading/trailing
> > > > > > whitespaces? Sure, some of these checks already happen, but not at
> > > > every
> > > > > > level/layer (defense-in-depth).
> > > > > >
> > > > > > > when Iceberg itself will introduce configurable separators, we
> > MAY
> > > > ask
> > > > > > ourselves if Polaris should allow them to beconfigurable or not.
> > [...]
> > > > > > separator is just a REST layer thing
> > > > > >
> > > > > > True, the separator is a primarily a REST-layer namespace
> > en/decoding
> > > > > > thing. What worries me slightly is that (existing) namespace
> > elements
> > > > with
> > > > > > the configured separator character could become inaccessible.
> > However,
> > > > > > "configurable separator" is IMO a different discussion.
> > > > > >
> > > > > > Best,
> > > > > > Robert
> > > > > >
> > > > > >
> > > > > > On Wed, Apr 15, 2026 at 8:20 PM Dmitri Bourlatchkov <
> > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > My understanding of the need to make namespace separators
> > > > configurable is
> > > > > > > that there exist a rather narrow set of deployment cases where
> > the
> > > > ASCII
> > > > > > > "0x1F" (unit separator) character is not permitted in URL paths
> > by
> > > > some
> > > > > > > infrastructure components.
> > > > > > >
> > > > > > > It might be worth allowing users to define a different
> > separator, but
> > > > > > since
> > > > > > > no one has brought this up yet, I assume it is not a priority.
> > > > > > >
> > > > > > > In any case, using a different separator is completely a REST API
> > > > > > > concern and should not affect how Polaris stores data internally.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Dmitri.
> > > > > > >
> > > > > > > On Wed, Apr 15, 2026 at 2:03 PM Alexandre Dutra <
> > [email protected]>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > > I wonder how namespace elements and table/view names with a
> > slash
> > > > > > ('/')
> > > > > > > > character in the middle behave. Or other characters like '&' or
> > > > '?' or
> > > > > > > '#'.
> > > > > > > >
> > > > > > > > For the REST layer, these will be percent-encoded, and with my
> > PR
> > > > to
> > > > > > > > fix a double-decoding issue, these characters "survive" the
> > REST
> > > > layer
> > > > > > > > just fine.
> > > > > > > >
> > > > > > > > The issue now is in some layers beneath: as I pointed out and
> > as
> > > > > > > > Dmitri demonstrated, we are unfortunately concatenating
> > identifiers
> > > > > > > > together to create storage locations, without proper escaping.
> > This
> > > > > > > > currently results in corrupted storage locations.
> > > > > > > >
> > > > > > > > I'm trying first to fix the REST layer first, then I'll move
> > to the
> > > > > > > > storage layer.
> > > > > > > >
> > > > > > > > > What's your take on leveraging
> > > > > > jakarta.ws.rs.ext.ParamConverterProvider
> > > > > > > > / jakarta.ws.rs.ext.ParamConverter for the path parameters and
> > have
> > > > > > > > centralized helpers that deal with "proper" URL
> > encoding/decoding?
> > > > > > > >
> > > > > > > > For now I don't see a valid usage in Polaris for that, since
> > Jersey
> > > > > > > > handles decoding path parameters already.
> > > > > > > >
> > > > > > > > > I also agree that the "configurable namespace separator" must
> > > > never
> > > > > > > > change. Is my assumption correct, that it must always be the
> > same
> > > > > > > character
> > > > > > > > as it is today?
> > > > > > > >
> > > > > > > > In Polaris, we are using the namespace separator in two
> > different
> > > > use
> > > > > > > > cases:
> > > > > > > >
> > > > > > > > 1) For path parameters in the REST layer
> > > > > > > > 2) For storing namespaces in Polaris entities
> > > > > > > >
> > > > > > > > What is clear is that in the second use case, the namespace
> > must
> > > > NEVER
> > > > > > > > change. I just opened a PR for that:
> > > > > > > > https://github.com/apache/polaris/pull/4214
> > > > > > > >
> > > > > > > > Regarding the first use case, once we solve all our
> > > > encoding/decoding
> > > > > > > > issues, and when Iceberg itself will introduce configurable
> > > > > > > > separators, we MAY ask ourselves if Polaris should allow them
> > to be
> > > > > > > > configurable or not. I don't have strong opinions, but if the
> > > > > > > > separator is just a REST layer thing, it should be possible to
> > > > change
> > > > > > > > it without breaking the storage layer or the metastore.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Alex
> > > > > > > >
> > > > > > > > On Wed, Apr 15, 2026 at 7:47 PM Dmitri Bourlatchkov <
> > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi All,
> > > > > > > > >
> > > > > > > > > Slashes in namespace seem to work fine (Spark 3.5 + Iceberg
> > > > 1.10.0):
> > > > > > > > >
> > > > > > > > > spark-sql ()> create namespace `n/s`;
> > > > > > > > > Time taken: 0.335 seconds
> > > > > > > > > spark-sql ()> show namespaces;
> > > > > > > > > `n/s`
> > > > > > > > > Time taken: 0.232 seconds, Fetched 1 row(s)
> > > > > > > > > spark-sql ()> use `n/s`;
> > > > > > > > > Time taken: 0.028 seconds
> > > > > > > > > spark-sql (`n/s`)> create table t1 (n string);
> > > > > > > > > Time taken: 0.702 seconds
> > > > > > > > >
> > > > > > > > > The URLs appear to be encoded properly, e.g. (from Polaris
> > log):
> > > > > > > > >
> > > > > > > > > 2026-04-15 13:41:17,594 INFO  [io.qua.htt.access-log]
> > > > > > > > >
> > > > [dee1505c-ec1d-4f90-a9de-154eac66a40c_0000000000000000013,POLARIS]
> > > > > > > [,,,]
> > > > > > > > > (executor-thread-1) 127.0.0.1 - root [15/Apr/2026:13:41:17
> > -0400]
> > > > > > "GET
> > > > > > > > > /api/catalog/v1/polaris/namespaces/n%2Fs/tables?pageToken=
> > > > HTTP/1.1"
> > > > > > > 200
> > > > > > > > 74
> > > > > > > > >
> > > > > > > > > I did not test trickier chars, but adding CI coverage for
> > them
> > > > would
> > > > > > be
> > > > > > > > > good.
> > > > > > > > >
> > > > > > > > > However, the S3 location in this case gets a proper directory
> > > > > > > breakdown:
> > > > > > > > >
> > > > > > > > > $ mc ls rustfs/pol/n/s
> > > > > > > > > [2026-04-15 13:44:37 EDT]     0B t1/
> > > > > > > > >
> > > > > > > > > ... and table metadata has: "location":"s3://pol/n/s/t1"
> > > > > > > > >
> > > > > > > > > ... but that is probably a different issue.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Dmitri.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Apr 15, 2026 at 10:35 AM Robert Stupp <
> > [email protected]>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks Alex for the thorough investigation!
> > > > > > > > > >
> > > > > > > > > > URL en/decoding is really not that easy.
> > > > > > > > > > I wonder how namespace elements and table/view names with a
> > > > slash
> > > > > > > ('/')
> > > > > > > > > > character in the middle behave. Or other characters like
> > '&'
> > > > or '?'
> > > > > > > or
> > > > > > > > '#'.
> > > > > > > > > >
> > > > > > > > > > Overall, I agree with your idea to implement correct URL
> > > > > > > > encoding/decoding
> > > > > > > > > > in the Polaris code base to protect Polaris from upstream
> > > > behavior
> > > > > > > > changes
> > > > > > > > > > that can seriously break or even corrupt things.
> > > > > > > > > >
> > > > > > > > > > What's your take on leveraging
> > > > > > > jakarta.ws.rs.ext.ParamConverterProvider
> > > > > > > > > > / jakarta.ws.rs.ext.ParamConverter for the path parameters
> > and
> > > > have
> > > > > > > > > > centralized helpers that deal with "proper" URL
> > > > encoding/decoding?
> > > > > > > > > >
> > > > > > > > > > I also agree that the "configurable namespace separator"
> > must
> > > > never
> > > > > > > > change.
> > > > > > > > > > Is my assumption correct, that it must always be the same
> > > > character
> > > > > > > as
> > > > > > > > it
> > > > > > > > > > is today?
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Robert
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Apr 15, 2026 at 3:48 PM Alexandre Dutra <
> > > > [email protected]
> > > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > FYI I created a first PR to address the double-decoding
> > > > issue:
> > > > > > > > > > >
> > > > > > > > > > > https://github.com/apache/polaris/pull/4210
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Alex
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Apr 14, 2026 at 9:56 PM Alexandre Dutra <
> > > > > > [email protected]
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi all,
> > > > > > > > > > > >
> > > > > > > > > > > > I would also point out that Polaris uses
> > > > > > RESTUtil.encodeNamespace
> > > > > > > > and
> > > > > > > > > > > > RESTUtil.decodeNamespace for encoding and decoding the
> > > > parent
> > > > > > > > > > > > namespace within a NamespaceEntity [1].
> > > > > > > > > > > >
> > > > > > > > > > > > These methods also exhibit the faulty space encoding
> > > > behavior.
> > > > > > > > > > > > Therefore, we must exercise **extreme caution**
> > regarding
> > > > any
> > > > > > > > upcoming
> > > > > > > > > > > > Iceberg project fixes for space-encoding issues. If
> > these
> > > > > > methods
> > > > > > > > are
> > > > > > > > > > > > modified, it is imperative that we retain the legacy
> > > > versions
> > > > > > > > > > > > specifically for encoding and decoding NamespaceEntity
> > > > > > > properties –
> > > > > > > > > > > > otherwise we could end up with a corrupted database.
> > > > > > > > > > > >
> > > > > > > > > > > > The same goes for the future namespace separator coming
> > > > with
> > > > > > > > Iceberg
> > > > > > > > > > > > 1.11: for the sake of encoding and decoding
> > NamespaceEntity
> > > > > > > > > > > > properties, the separator must never change.
> > > > > > > > > > > >
> > > > > > > > > > > > I would actually be in favor of proactively
> > internalizing
> > > > the
> > > > > > > > > > > > encoding/decoding algorithm used in NamespaceEntity.
> > What
> > > > do
> > > > > > you
> > > > > > > > > > > > think?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Alex
> > > > > > > > > > > >
> > > > > > > > > > > > [1]:
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://github.com/apache/polaris/blob/8ad8f74f62258ab6238190271603e4d4c8a75998/polaris-core/src/main/java/org/apache/polaris/core/entity/NamespaceEntity.java#L92
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Apr 14, 2026 at 7:43 PM Alexandre Dutra <
> > > > > > > [email protected]
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > >
> > > > > > > > > > > > > A discussion on the Iceberg ML [1] recently
> > highlighted
> > > > that
> > > > > > > URL
> > > > > > > > path
> > > > > > > > > > > > > segments are not being decoded correctly according
> > to RFC
> > > > > > 3986,
> > > > > > > > > > > > > specifically regarding space encoding.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I investigated the situation in Polaris, and found
> > many
> > > > > > > problems:
> > > > > > > > > > > > >
> > > > > > > > > > > > > TLDR
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Table names with the + sign can be created but
> > cannot
> > > > be
> > > > > > > > retrieved
> > > > > > > > > > > > > - Namespace names with the + sign are OK (can be
> > created
> > > > and
> > > > > > > > > > retrieved)
> > > > > > > > > > > > > - Table names with spaces cannot be created
> > > > > > > > > > > > > - Namespace names with spaces cannot be created
> > > > > > > > > > > > >
> > > > > > > > > > > > > DISCUSSION
> > > > > > > > > > > > >
> > > > > > > > > > > > > Table names such as "foo+bar" can be created (via
> > POST,
> > > > where
> > > > > > > the
> > > > > > > > > > name
> > > > > > > > > > > > > is in the request body). But they cannot be
> > retrieved:
> > > > when
> > > > > > > > reading
> > > > > > > > > > > > > tables, the name is part of the URL path. Polaris
> > > > incorrectly
> > > > > > > > > > performs
> > > > > > > > > > > > > a second decoding step using
> > > > RESTUtil.decodeString(table),
> > > > > > even
> > > > > > > > > > though
> > > > > > > > > > > > > the REST framework has already decoded it.
> > Consequently,
> > > > a
> > > > > > > client
> > > > > > > > > > > > > sends "foo%2Bbar" which is first decoded to
> > "foo+bar" by
> > > > the
> > > > > > > > > > framework
> > > > > > > > > > > > > (correct) and then re-decoded by Polaris to "foo bar"
> > > > > > > > (incorrect),
> > > > > > > > > > > > > resulting in a "not found" error.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Table and namespace names like "foo bar" simply
> > cannot be
> > > > > > > > created at
> > > > > > > > > > > > > all. This is because in
> > > > > > > > IcebergCatalog.defaultWarehouseLocation() and
> > > > > > > > > > > > > other similar places, we create locations merely by
> > > > joining
> > > > > > > > > > > > > identifiers together, without any form of URL
> > encoding:
> > > > see
> > > > > > [2]
> > > > > > > > [3].
> > > > > > > > > > > > >
> > > > > > > > > > > > > And even if tables like "foo bar" could be created,
> > they
> > > > > > > > couldn't be
> > > > > > > > > > > > > retrieved by Java clients. This occurs because
> > current
> > > > Java
> > > > > > > > clients
> > > > > > > > > > > > > incorrectly encode that name as "foo+bar", which the
> > REST
> > > > > > > > framework
> > > > > > > > > > > > > does not modify. Consequently, Polaris would look
> > for a
> > > > table
> > > > > > > > named
> > > > > > > > > > > > > "foo+bar" instead and throw a "not found" error.
> > (Other
> > > > > > clients
> > > > > > > > would
> > > > > > > > > > > > > send "foo%20bar" which would be correctly decoded by
> > the
> > > > > > > > framework as
> > > > > > > > > > > > > "foo bar", and thus it would succeed.)
> > > > > > > > > > > > >
> > > > > > > > > > > > > PROPOSAL
> > > > > > > > > > > > >
> > > > > > > > > > > > > To resolve the issue with the + sign in table names,
> > we
> > > > > > simply
> > > > > > > > need
> > > > > > > > > > to
> > > > > > > > > > > > > eliminate the redundant decoding step. I can open a
> > PR
> > > > for
> > > > > > that
> > > > > > > > > > > > > shortly.
> > > > > > > > > > > > >
> > > > > > > > > > > > > To resolve the issue with spaces in table and
> > namespace
> > > > > > names,
> > > > > > > we
> > > > > > > > > > > > > could fix all the methods that incorrectly join
> > together
> > > > > > > > identifiers
> > > > > > > > > > > > > without proper URL encoding.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Finally, addressing the Java clients encoding
> > problem is
> > > > > > > > complex, but
> > > > > > > > > > > > > we could consider implementing a workaround as
> > follows:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1) If the client is Java and lacks the upcoming
> > Iceberg
> > > > fix
> > > > > > for
> > > > > > > > space
> > > > > > > > > > > > > encoding, manually replace "+" with a space to
> > correct
> > > > the
> > > > > > > > client's
> > > > > > > > > > > > > faulty encoding.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2) For non-Java clients or those with the fix, no
> > > > workaround
> > > > > > > > would be
> > > > > > > > > > > required.
> > > > > > > > > > > > >
> > > > > > > > > > > > > What are your thoughts on this?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Alex
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1]:
> > > > > > > > > >
> > > > https://lists.apache.org/thread/c498svln0x18vvm42998b9nm9j6ck5yh
> > > > > > > > > > > > > [2]:
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://github.com/apache/polaris/blob/e94fdff63852dc41635c9e7eb62b3627ba562b85/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L379
> > > > > > > > > > > > > [3]:
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://github.com/apache/polaris/blob/e94fdff63852dc41635c9e7eb62b3627ba562b85/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L571
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> >

Reply via email to