Re: [DISCUSS] URL path decoding issues in Polaris

Alexandre Dutra Wed, 15 Apr 2026 11:02:59 -0700

Hi all,

> I wonder how namespace elements and table/view names with a slash ('/') 
> character in the middle behave. Or other characters like '&' or '?' or '#'.


For the REST layer, these will be percent-encoded, and with my PR to
fix a double-decoding issue, these characters "survive" the REST layer
just fine.

The issue now is in some layers beneath: as I pointed out and as
Dmitri demonstrated, we are unfortunately concatenating identifiers
together to create storage locations, without proper escaping. This
currently results in corrupted storage locations.

I'm trying first to fix the REST layer first, then I'll move to the
storage layer.

> What's your take on leveraging jakarta.ws.rs.ext.ParamConverterProvider / 
> jakarta.ws.rs.ext.ParamConverter for the path parameters and have centralized 
> helpers that deal with "proper" URL encoding/decoding?

For now I don't see a valid usage in Polaris for that, since Jersey
handles decoding path parameters already.

> I also agree that the "configurable namespace separator" must never change. 
> Is my assumption correct, that it must always be the same character as it is 
> today?

In Polaris, we are using the namespace separator in two different use cases:

1) For path parameters in the REST layer
2) For storing namespaces in Polaris entities

What is clear is that in the second use case, the namespace must NEVER
change. I just opened a PR for that:
https://github.com/apache/polaris/pull/4214

Regarding the first use case, once we solve all our encoding/decoding
issues, and when Iceberg itself will introduce configurable
separators, we MAY ask ourselves if Polaris should allow them to be
configurable or not. I don't have strong opinions, but if the
separator is just a REST layer thing, it should be possible to change
it without breaking the storage layer or the metastore.

Thanks,
Alex

On Wed, Apr 15, 2026 at 7:47 PM Dmitri Bourlatchkov <[email protected]> wrote:
>
> Hi All,
>
> Slashes in namespace seem to work fine (Spark 3.5 + Iceberg 1.10.0):
>
> spark-sql ()> create namespace `n/s`;
> Time taken: 0.335 seconds
> spark-sql ()> show namespaces;
> `n/s`
> Time taken: 0.232 seconds, Fetched 1 row(s)
> spark-sql ()> use `n/s`;
> Time taken: 0.028 seconds
> spark-sql (`n/s`)> create table t1 (n string);
> Time taken: 0.702 seconds
>
> The URLs appear to be encoded properly, e.g. (from Polaris log):
>
> 2026-04-15 13:41:17,594 INFO  [io.qua.htt.access-log]
> [dee1505c-ec1d-4f90-a9de-154eac66a40c_0000000000000000013,POLARIS] [,,,]
> (executor-thread-1) 127.0.0.1 - root [15/Apr/2026:13:41:17 -0400] "GET
> /api/catalog/v1/polaris/namespaces/n%2Fs/tables?pageToken= HTTP/1.1" 200 74
>
> I did not test trickier chars, but adding CI coverage for them would be
> good.
>
> However, the S3 location in this case gets a proper directory breakdown:
>
> $ mc ls rustfs/pol/n/s
> [2026-04-15 13:44:37 EDT]     0B t1/
>
> ... and table metadata has: "location":"s3://pol/n/s/t1"
>
> ... but that is probably a different issue.
>
> Cheers,
> Dmitri.
>
>
>
> On Wed, Apr 15, 2026 at 10:35 AM Robert Stupp <[email protected]> wrote:
>
> > Thanks Alex for the thorough investigation!
> >
> > URL en/decoding is really not that easy.
> > I wonder how namespace elements and table/view names with a slash ('/')
> > character in the middle behave. Or other characters like '&' or '?' or '#'.
> >
> > Overall, I agree with your idea to implement correct URL encoding/decoding
> > in the Polaris code base to protect Polaris from upstream behavior changes
> > that can seriously break or even corrupt things.
> >
> > What's your take on leveraging jakarta.ws.rs.ext.ParamConverterProvider
> > / jakarta.ws.rs.ext.ParamConverter for the path parameters and have
> > centralized helpers that deal with "proper" URL encoding/decoding?
> >
> > I also agree that the "configurable namespace separator" must never change.
> > Is my assumption correct, that it must always be the same character as it
> > is today?
> >
> > Best,
> > Robert
> >
> >
> > On Wed, Apr 15, 2026 at 3:48 PM Alexandre Dutra <[email protected]> wrote:
> >
> > > Hi all,
> > >
> > > FYI I created a first PR to address the double-decoding issue:
> > >
> > > https://github.com/apache/polaris/pull/4210
> > >
> > > Thanks,
> > > Alex
> > >
> > > On Tue, Apr 14, 2026 at 9:56 PM Alexandre Dutra <[email protected]>
> > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I would also point out that Polaris uses RESTUtil.encodeNamespace and
> > > > RESTUtil.decodeNamespace for encoding and decoding the parent
> > > > namespace within a NamespaceEntity [1].
> > > >
> > > > These methods also exhibit the faulty space encoding behavior.
> > > > Therefore, we must exercise **extreme caution** regarding any upcoming
> > > > Iceberg project fixes for space-encoding issues. If these methods are
> > > > modified, it is imperative that we retain the legacy versions
> > > > specifically for encoding and decoding NamespaceEntity properties –
> > > > otherwise we could end up with a corrupted database.
> > > >
> > > > The same goes for the future namespace separator coming with Iceberg
> > > > 1.11: for the sake of encoding and decoding NamespaceEntity
> > > > properties, the separator must never change.
> > > >
> > > > I would actually be in favor of proactively internalizing the
> > > > encoding/decoding algorithm used in NamespaceEntity. What do you
> > > > think?
> > > >
> > > > Thanks,
> > > > Alex
> > > >
> > > > [1]:
> > >
> > https://github.com/apache/polaris/blob/8ad8f74f62258ab6238190271603e4d4c8a75998/polaris-core/src/main/java/org/apache/polaris/core/entity/NamespaceEntity.java#L92
> > > >
> > > >
> > > > On Tue, Apr 14, 2026 at 7:43 PM Alexandre Dutra <[email protected]>
> > > wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > A discussion on the Iceberg ML [1] recently highlighted that URL path
> > > > > segments are not being decoded correctly according to RFC 3986,
> > > > > specifically regarding space encoding.
> > > > >
> > > > > I investigated the situation in Polaris, and found many problems:
> > > > >
> > > > > TLDR
> > > > >
> > > > > - Table names with the + sign can be created but cannot be retrieved
> > > > > - Namespace names with the + sign are OK (can be created and
> > retrieved)
> > > > > - Table names with spaces cannot be created
> > > > > - Namespace names with spaces cannot be created
> > > > >
> > > > > DISCUSSION
> > > > >
> > > > > Table names such as "foo+bar" can be created (via POST, where the
> > name
> > > > > is in the request body). But they cannot be retrieved: when reading
> > > > > tables, the name is part of the URL path. Polaris incorrectly
> > performs
> > > > > a second decoding step using RESTUtil.decodeString(table), even
> > though
> > > > > the REST framework has already decoded it. Consequently, a client
> > > > > sends "foo%2Bbar" which is first decoded to "foo+bar" by the
> > framework
> > > > > (correct) and then re-decoded by Polaris to "foo bar" (incorrect),
> > > > > resulting in a "not found" error.
> > > > >
> > > > > Table and namespace names like "foo bar" simply cannot be created at
> > > > > all. This is because in IcebergCatalog.defaultWarehouseLocation() and
> > > > > other similar places, we create locations merely by joining
> > > > > identifiers together, without any form of URL encoding: see [2] [3].
> > > > >
> > > > > And even if tables like "foo bar" could be created, they couldn't be
> > > > > retrieved by Java clients. This occurs because current Java clients
> > > > > incorrectly encode that name as "foo+bar", which the REST framework
> > > > > does not modify. Consequently, Polaris would look for a table named
> > > > > "foo+bar" instead and throw a "not found" error. (Other clients would
> > > > > send "foo%20bar" which would be correctly decoded by the framework as
> > > > > "foo bar", and thus it would succeed.)
> > > > >
> > > > > PROPOSAL
> > > > >
> > > > > To resolve the issue with the + sign in table names, we simply need
> > to
> > > > > eliminate the redundant decoding step. I can open a PR for that
> > > > > shortly.
> > > > >
> > > > > To resolve the issue with spaces in table and namespace names, we
> > > > > could fix all the methods that incorrectly join together identifiers
> > > > > without proper URL encoding.
> > > > >
> > > > > Finally, addressing the Java clients encoding problem is complex, but
> > > > > we could consider implementing a workaround as follows:
> > > > >
> > > > > 1) If the client is Java and lacks the upcoming Iceberg fix for space
> > > > > encoding, manually replace "+" with a space to correct the client's
> > > > > faulty encoding.
> > > > >
> > > > > 2) For non-Java clients or those with the fix, no workaround would be
> > > required.
> > > > >
> > > > > What are your thoughts on this?
> > > > >
> > > > > Thanks,
> > > > > Alex
> > > > >
> > > > > [1]:
> > https://lists.apache.org/thread/c498svln0x18vvm42998b9nm9j6ck5yh
> > > > > [2]:
> > >
> > https://github.com/apache/polaris/blob/e94fdff63852dc41635c9e7eb62b3627ba562b85/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L379
> > > > > [3]:
> > >
> > https://github.com/apache/polaris/blob/e94fdff63852dc41635c9e7eb62b3627ba562b85/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L571
> > >
> >

Re: [DISCUSS] URL path decoding issues in Polaris

Reply via email to