Hi all,

Disallowing the most problematic cases seems the right way to go. I
can provide a PR to quickly implement that.

However, we must keep in mind that disallowing a few chars will not
solve all our problems. IMHO we need to consistently replace all
string concatenations that we use today for creating storage locations
with a proper location builder that will take care of proper path
escaping and sanitization. That part of the job is way more complex,
due to the blast radius.

Thanks,
Alex


On Wed, Apr 22, 2026 at 2:07 AM Yufei Gu <[email protected]> wrote:
>
> Sorry for jumping into this thread a bit late.
>
> I’m supportive of introducing some guardrails for namespace and table or
> view names. Specifically, I think we should disallow a few problematic
> cases to avoid ambiguity and downstream issues:
>
>    - Disallow the slash character “/”
>    - Disallow empty strings
>    - Disallow leading or trailing whitespace
>
> These constraints seem reasonable given the interactions across REST,
> storage paths, and different client behaviors. Adding clear guardrails
> early can prevent subtle bugs and inconsistencies later on. Curious to hear
> if others see any concerns or edge cases with this approach.
>
> Thanks,
>
> Yufei
>
>
> On Thu, Apr 16, 2026 at 9:11 AM Alexandre Dutra <[email protected]> wrote:
>
> > > Do you think it's worth having a separate discussion about guardrails for
> > namespace elements and table/view names? [...]
> >
> > Completely agree here. I think the slash character in particular
> > should definitely be banned.
> >
> > Thanks,
> > Alex
> >
> > On Thu, Apr 16, 2026 at 6:03 PM Dmitri Bourlatchkov <[email protected]>
> > wrote:
> > >
> > > > Do you think it's worth having a separate discussion about guardrails
> > for
> > > namespace elements and table/view names? [...]
> > >
> > > Definitely!
> > >
> > > Cheers,
> > > Dmitri.
> > >
> > > On Thu, Apr 16, 2026 at 6:57 AM Robert Stupp <[email protected]> wrote:
> > >
> > > > Hi,
> > > >
> > > > > spark-sql ()> create namespace `n/s`;
> > > > > However, the S3 location in this case gets a proper directory
> > breakdown:
> > > > > ... and table metadata has: "location":"s3://pol/n/s/t1"
> > > > > ... but that is probably a different issue.
> > > >
> > > > Yea, it's different from the URL en/decoding topic. Do you think it's
> > worth
> > > > having a separate discussion about guardrails for namespace elements
> > and
> > > > table/view names? For example, disallowing '/', disallowing empty/blank
> > > > namespace elements and table/view names, disallowing leading/trailing
> > > > whitespaces? Sure, some of these checks already happen, but not at
> > every
> > > > level/layer (defense-in-depth).
> > > >
> > > > > when Iceberg itself will introduce configurable separators, we MAY
> > ask
> > > > ourselves if Polaris should allow them to beconfigurable or not. [...]
> > > > separator is just a REST layer thing
> > > >
> > > > True, the separator is a primarily a REST-layer namespace en/decoding
> > > > thing. What worries me slightly is that (existing) namespace elements
> > with
> > > > the configured separator character could become inaccessible. However,
> > > > "configurable separator" is IMO a different discussion.
> > > >
> > > > Best,
> > > > Robert
> > > >
> > > >
> > > > On Wed, Apr 15, 2026 at 8:20 PM Dmitri Bourlatchkov <[email protected]>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > My understanding of the need to make namespace separators
> > configurable is
> > > > > that there exist a rather narrow set of deployment cases where the
> > ASCII
> > > > > "0x1F" (unit separator) character is not permitted in URL paths by
> > some
> > > > > infrastructure components.
> > > > >
> > > > > It might be worth allowing users to define a different separator, but
> > > > since
> > > > > no one has brought this up yet, I assume it is not a priority.
> > > > >
> > > > > In any case, using a different separator is completely a REST API
> > > > > concern and should not affect how Polaris stores data internally.
> > > > >
> > > > > Cheers,
> > > > > Dmitri.
> > > > >
> > > > > On Wed, Apr 15, 2026 at 2:03 PM Alexandre Dutra <[email protected]>
> > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > > I wonder how namespace elements and table/view names with a slash
> > > > ('/')
> > > > > > character in the middle behave. Or other characters like '&' or
> > '?' or
> > > > > '#'.
> > > > > >
> > > > > > For the REST layer, these will be percent-encoded, and with my PR
> > to
> > > > > > fix a double-decoding issue, these characters "survive" the REST
> > layer
> > > > > > just fine.
> > > > > >
> > > > > > The issue now is in some layers beneath: as I pointed out and as
> > > > > > Dmitri demonstrated, we are unfortunately concatenating identifiers
> > > > > > together to create storage locations, without proper escaping. This
> > > > > > currently results in corrupted storage locations.
> > > > > >
> > > > > > I'm trying first to fix the REST layer first, then I'll move to the
> > > > > > storage layer.
> > > > > >
> > > > > > > What's your take on leveraging
> > > > jakarta.ws.rs.ext.ParamConverterProvider
> > > > > > / jakarta.ws.rs.ext.ParamConverter for the path parameters and have
> > > > > > centralized helpers that deal with "proper" URL encoding/decoding?
> > > > > >
> > > > > > For now I don't see a valid usage in Polaris for that, since Jersey
> > > > > > handles decoding path parameters already.
> > > > > >
> > > > > > > I also agree that the "configurable namespace separator" must
> > never
> > > > > > change. Is my assumption correct, that it must always be the same
> > > > > character
> > > > > > as it is today?
> > > > > >
> > > > > > In Polaris, we are using the namespace separator in two different
> > use
> > > > > > cases:
> > > > > >
> > > > > > 1) For path parameters in the REST layer
> > > > > > 2) For storing namespaces in Polaris entities
> > > > > >
> > > > > > What is clear is that in the second use case, the namespace must
> > NEVER
> > > > > > change. I just opened a PR for that:
> > > > > > https://github.com/apache/polaris/pull/4214
> > > > > >
> > > > > > Regarding the first use case, once we solve all our
> > encoding/decoding
> > > > > > issues, and when Iceberg itself will introduce configurable
> > > > > > separators, we MAY ask ourselves if Polaris should allow them to be
> > > > > > configurable or not. I don't have strong opinions, but if the
> > > > > > separator is just a REST layer thing, it should be possible to
> > change
> > > > > > it without breaking the storage layer or the metastore.
> > > > > >
> > > > > > Thanks,
> > > > > > Alex
> > > > > >
> > > > > > On Wed, Apr 15, 2026 at 7:47 PM Dmitri Bourlatchkov <
> > [email protected]>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > Slashes in namespace seem to work fine (Spark 3.5 + Iceberg
> > 1.10.0):
> > > > > > >
> > > > > > > spark-sql ()> create namespace `n/s`;
> > > > > > > Time taken: 0.335 seconds
> > > > > > > spark-sql ()> show namespaces;
> > > > > > > `n/s`
> > > > > > > Time taken: 0.232 seconds, Fetched 1 row(s)
> > > > > > > spark-sql ()> use `n/s`;
> > > > > > > Time taken: 0.028 seconds
> > > > > > > spark-sql (`n/s`)> create table t1 (n string);
> > > > > > > Time taken: 0.702 seconds
> > > > > > >
> > > > > > > The URLs appear to be encoded properly, e.g. (from Polaris log):
> > > > > > >
> > > > > > > 2026-04-15 13:41:17,594 INFO  [io.qua.htt.access-log]
> > > > > > >
> > [dee1505c-ec1d-4f90-a9de-154eac66a40c_0000000000000000013,POLARIS]
> > > > > [,,,]
> > > > > > > (executor-thread-1) 127.0.0.1 - root [15/Apr/2026:13:41:17 -0400]
> > > > "GET
> > > > > > > /api/catalog/v1/polaris/namespaces/n%2Fs/tables?pageToken=
> > HTTP/1.1"
> > > > > 200
> > > > > > 74
> > > > > > >
> > > > > > > I did not test trickier chars, but adding CI coverage for them
> > would
> > > > be
> > > > > > > good.
> > > > > > >
> > > > > > > However, the S3 location in this case gets a proper directory
> > > > > breakdown:
> > > > > > >
> > > > > > > $ mc ls rustfs/pol/n/s
> > > > > > > [2026-04-15 13:44:37 EDT]     0B t1/
> > > > > > >
> > > > > > > ... and table metadata has: "location":"s3://pol/n/s/t1"
> > > > > > >
> > > > > > > ... but that is probably a different issue.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Dmitri.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Apr 15, 2026 at 10:35 AM Robert Stupp <[email protected]>
> > > > wrote:
> > > > > > >
> > > > > > > > Thanks Alex for the thorough investigation!
> > > > > > > >
> > > > > > > > URL en/decoding is really not that easy.
> > > > > > > > I wonder how namespace elements and table/view names with a
> > slash
> > > > > ('/')
> > > > > > > > character in the middle behave. Or other characters like '&'
> > or '?'
> > > > > or
> > > > > > '#'.
> > > > > > > >
> > > > > > > > Overall, I agree with your idea to implement correct URL
> > > > > > encoding/decoding
> > > > > > > > in the Polaris code base to protect Polaris from upstream
> > behavior
> > > > > > changes
> > > > > > > > that can seriously break or even corrupt things.
> > > > > > > >
> > > > > > > > What's your take on leveraging
> > > > > jakarta.ws.rs.ext.ParamConverterProvider
> > > > > > > > / jakarta.ws.rs.ext.ParamConverter for the path parameters and
> > have
> > > > > > > > centralized helpers that deal with "proper" URL
> > encoding/decoding?
> > > > > > > >
> > > > > > > > I also agree that the "configurable namespace separator" must
> > never
> > > > > > change.
> > > > > > > > Is my assumption correct, that it must always be the same
> > character
> > > > > as
> > > > > > it
> > > > > > > > is today?
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Robert
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Apr 15, 2026 at 3:48 PM Alexandre Dutra <
> > [email protected]
> > > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > FYI I created a first PR to address the double-decoding
> > issue:
> > > > > > > > >
> > > > > > > > > https://github.com/apache/polaris/pull/4210
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Alex
> > > > > > > > >
> > > > > > > > > On Tue, Apr 14, 2026 at 9:56 PM Alexandre Dutra <
> > > > [email protected]
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > I would also point out that Polaris uses
> > > > RESTUtil.encodeNamespace
> > > > > > and
> > > > > > > > > > RESTUtil.decodeNamespace for encoding and decoding the
> > parent
> > > > > > > > > > namespace within a NamespaceEntity [1].
> > > > > > > > > >
> > > > > > > > > > These methods also exhibit the faulty space encoding
> > behavior.
> > > > > > > > > > Therefore, we must exercise **extreme caution** regarding
> > any
> > > > > > upcoming
> > > > > > > > > > Iceberg project fixes for space-encoding issues. If these
> > > > methods
> > > > > > are
> > > > > > > > > > modified, it is imperative that we retain the legacy
> > versions
> > > > > > > > > > specifically for encoding and decoding NamespaceEntity
> > > > > properties –
> > > > > > > > > > otherwise we could end up with a corrupted database.
> > > > > > > > > >
> > > > > > > > > > The same goes for the future namespace separator coming
> > with
> > > > > > Iceberg
> > > > > > > > > > 1.11: for the sake of encoding and decoding NamespaceEntity
> > > > > > > > > > properties, the separator must never change.
> > > > > > > > > >
> > > > > > > > > > I would actually be in favor of proactively internalizing
> > the
> > > > > > > > > > encoding/decoding algorithm used in NamespaceEntity. What
> > do
> > > > you
> > > > > > > > > > think?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Alex
> > > > > > > > > >
> > > > > > > > > > [1]:
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > https://github.com/apache/polaris/blob/8ad8f74f62258ab6238190271603e4d4c8a75998/polaris-core/src/main/java/org/apache/polaris/core/entity/NamespaceEntity.java#L92
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Apr 14, 2026 at 7:43 PM Alexandre Dutra <
> > > > > [email protected]
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > A discussion on the Iceberg ML [1] recently highlighted
> > that
> > > > > URL
> > > > > > path
> > > > > > > > > > > segments are not being decoded correctly according to RFC
> > > > 3986,
> > > > > > > > > > > specifically regarding space encoding.
> > > > > > > > > > >
> > > > > > > > > > > I investigated the situation in Polaris, and found many
> > > > > problems:
> > > > > > > > > > >
> > > > > > > > > > > TLDR
> > > > > > > > > > >
> > > > > > > > > > > - Table names with the + sign can be created but cannot
> > be
> > > > > > retrieved
> > > > > > > > > > > - Namespace names with the + sign are OK (can be created
> > and
> > > > > > > > retrieved)
> > > > > > > > > > > - Table names with spaces cannot be created
> > > > > > > > > > > - Namespace names with spaces cannot be created
> > > > > > > > > > >
> > > > > > > > > > > DISCUSSION
> > > > > > > > > > >
> > > > > > > > > > > Table names such as "foo+bar" can be created (via POST,
> > where
> > > > > the
> > > > > > > > name
> > > > > > > > > > > is in the request body). But they cannot be retrieved:
> > when
> > > > > > reading
> > > > > > > > > > > tables, the name is part of the URL path. Polaris
> > incorrectly
> > > > > > > > performs
> > > > > > > > > > > a second decoding step using
> > RESTUtil.decodeString(table),
> > > > even
> > > > > > > > though
> > > > > > > > > > > the REST framework has already decoded it. Consequently,
> > a
> > > > > client
> > > > > > > > > > > sends "foo%2Bbar" which is first decoded to "foo+bar" by
> > the
> > > > > > > > framework
> > > > > > > > > > > (correct) and then re-decoded by Polaris to "foo bar"
> > > > > > (incorrect),
> > > > > > > > > > > resulting in a "not found" error.
> > > > > > > > > > >
> > > > > > > > > > > Table and namespace names like "foo bar" simply cannot be
> > > > > > created at
> > > > > > > > > > > all. This is because in
> > > > > > IcebergCatalog.defaultWarehouseLocation() and
> > > > > > > > > > > other similar places, we create locations merely by
> > joining
> > > > > > > > > > > identifiers together, without any form of URL encoding:
> > see
> > > > [2]
> > > > > > [3].
> > > > > > > > > > >
> > > > > > > > > > > And even if tables like "foo bar" could be created, they
> > > > > > couldn't be
> > > > > > > > > > > retrieved by Java clients. This occurs because current
> > Java
> > > > > > clients
> > > > > > > > > > > incorrectly encode that name as "foo+bar", which the REST
> > > > > > framework
> > > > > > > > > > > does not modify. Consequently, Polaris would look for a
> > table
> > > > > > named
> > > > > > > > > > > "foo+bar" instead and throw a "not found" error. (Other
> > > > clients
> > > > > > would
> > > > > > > > > > > send "foo%20bar" which would be correctly decoded by the
> > > > > > framework as
> > > > > > > > > > > "foo bar", and thus it would succeed.)
> > > > > > > > > > >
> > > > > > > > > > > PROPOSAL
> > > > > > > > > > >
> > > > > > > > > > > To resolve the issue with the + sign in table names, we
> > > > simply
> > > > > > need
> > > > > > > > to
> > > > > > > > > > > eliminate the redundant decoding step. I can open a PR
> > for
> > > > that
> > > > > > > > > > > shortly.
> > > > > > > > > > >
> > > > > > > > > > > To resolve the issue with spaces in table and namespace
> > > > names,
> > > > > we
> > > > > > > > > > > could fix all the methods that incorrectly join together
> > > > > > identifiers
> > > > > > > > > > > without proper URL encoding.
> > > > > > > > > > >
> > > > > > > > > > > Finally, addressing the Java clients encoding problem is
> > > > > > complex, but
> > > > > > > > > > > we could consider implementing a workaround as follows:
> > > > > > > > > > >
> > > > > > > > > > > 1) If the client is Java and lacks the upcoming Iceberg
> > fix
> > > > for
> > > > > > space
> > > > > > > > > > > encoding, manually replace "+" with a space to correct
> > the
> > > > > > client's
> > > > > > > > > > > faulty encoding.
> > > > > > > > > > >
> > > > > > > > > > > 2) For non-Java clients or those with the fix, no
> > workaround
> > > > > > would be
> > > > > > > > > required.
> > > > > > > > > > >
> > > > > > > > > > > What are your thoughts on this?
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Alex
> > > > > > > > > > >
> > > > > > > > > > > [1]:
> > > > > > > >
> > https://lists.apache.org/thread/c498svln0x18vvm42998b9nm9j6ck5yh
> > > > > > > > > > > [2]:
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > https://github.com/apache/polaris/blob/e94fdff63852dc41635c9e7eb62b3627ba562b85/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L379
> > > > > > > > > > > [3]:
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > https://github.com/apache/polaris/blob/e94fdff63852dc41635c9e7eb62b3627ba562b85/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L571
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> >

Reply via email to