@Dmitri thanks for the quick review. The query param is actually being handled in #10904 (spec) <https://github.com/apache/iceberg/pull/10904> and in #10905 (impl) <https://github.com/apache/iceberg/pull/10905> and is just a change on top of #10877 <https://github.com/apache/iceberg/pull/10877>.
Eduard On Thu, Aug 8, 2024 at 4:58 PM Dmitri Bourlatchkov <dmitri.bourlatch...@dremio.com.invalid> wrote: > Thanks, Eduard, for 10877 [1] and addressing my concerns quickly :) > > This approach is fine from my POV, although I personally prefer the > flexibility of what Jack proposed. > > Cheers, > Dmitri. > > [1] https://github.com/apache/iceberg/pull/10877 > > On Thu, Aug 8, 2024 at 6:28 AM Eduard Tudenhöfner < > etudenhoef...@apache.org> wrote: > >> I've opened #10877 <https://github.com/apache/iceberg/pull/10877> a few >> days ago to make the namespace separator configurable and let servers >> communicate to clients which separator should be used. Worth mentioning >> that this doesn't require any spec chance and it is backwards compatible >> with older clients that send *%1F*. >> >> I'm ok if we want to make the namespace separator also be controllable >> from the client and send a *delim* query param, but I agree with Dmitri >> that the server should still be able to tell a client which separator it >> prefers (as indicated in #10877 >> <https://github.com/apache/iceberg/pull/10877>). >> >> Sending the *delim* query param requires a non-breaking spec change, so >> I've opened #10904 <https://github.com/apache/iceberg/pull/10904>. >> >> After we vote on the spec change I'll add the required impl >> <https://github.com/apache/iceberg/pull/10905> (which just goes on top >> of #10877 <https://github.com/apache/iceberg/pull/10877>) to send the >> *delim* query param. >> >> Eduard >> >> On Thu, Aug 8, 2024 at 12:52 AM Dmitri Bourlatchkov >> <dmitri.bourlatch...@dremio.com.invalid> wrote: >> >>> The idea of client-chosen separator char (?delim=.) sounds pretty >>> reasonable to me. >>> >>> Nonetheless, I do not think this covers all the issues in putting >>> namespaces in URI paths for servers running under the new Servlet spec. In >>> particular, there are other chars that are considered "suspicious" by the >>> servlet spec and may not be allowed. >>> >>> Since Iceberg defines a REST API spec, I think it is reasonable for REST >>> server implementers to expect Iceberg to provide means in the REST spec to >>> ensure compatibility with modern servlet frameworks. This is critical so >>> that REST clients and servers have the same approach to URI >>> construction/interpretation. >>> >>> That said, if the general consensus is to keep this discussion focused >>> only on the namespace element separators, I'd like to add a couple of >>> points: >>> >>> It would be nice to have a default value for this parameter matching >>> current separator to ensure compatibility with older clients. >>> >>> I think we should also allow a client-side configuration setting for >>> end-users to control this. I think it would be ok to allow servers to >>> advertise a default value for the separator. However servers should accept >>> different values according to the `delim` parameter. End-users should be >>> able to override server-side config with their own client-side config since >>> end users ultimately know more about their namespace names. >>> >>> Cheers, >>> Dmitri. >>> >>> On Wed, Aug 7, 2024 at 6:07 PM Jack Ye <yezhao...@gmail.com> wrote: >>> >>>> Sorry a bit late to this thread. >>>> >>>> I would personally prefer the client side separator solution (query >>>> param with `?delim=.`) a bit more than the server side (config >>>> override), just given the experience of handling similar situations for >>>> Glue data catalog which allows any name for database (namespace) and table, >>>> except for white space characters [1]. >>>> >>>> In the Glue-HiveMetaStore connector [2], there is a feature to use a >>>> single string to reference a namespace in a non-default catalog, e.g. >>>> "cat1:ns1" can mean catalog cat1 namespace ns1. This is basically a 2-level >>>> namespace. After a similar discussion thread and exploring what users are >>>> actually using, we realized that no separator could work for everyone, so >>>> we introduced a config value [3] that users can set at client side when >>>> using the connector. If there is a namespace with name "ns1:ns2" in a >>>> catalog "cat1", then the user can choose a different separator like "$' to >>>> write "cat1$ns1:ns2", which can allow us to correctly resolve all the name >>>> references. >>>> >>>> We found this the most flexible, since an organization would typically >>>> stick to just one or two special characters, and can always find something >>>> else as a separator. Even in extreme cases, a user can choose a long string >>>> like "__SEPARATOR__" as the separator. >>>> >>>> -Jack >>>> >>>> [1] https://docs.aws.amazon.com/glue/latest/webapi/API_GlueTable.html >>>> [2] >>>> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore >>>> [3] >>>> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/branch-3.4.0/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/util/AWSGlueConfig.java#L26 >>>> >>>> >>>> On Tue, Aug 6, 2024 at 1:32 PM Ryan Blue <b...@databricks.com.invalid> >>>> wrote: >>>> >>>>> >>>>> 1. Change %1F to %2E: >>>>> >>>>> As I noted in my earlier email, a catalog may choose to use . as a >>>>> one-way conversion so that it doesn’t matter that you can’t split the >>>>> namespace. This does work, but with slightly different behavior. >>>>> >>>>> The original decision on this issue was that the behavior should be a >>>>> catalog choice, which is why we went with what we thought was a safe >>>>> delimiter. For this discussion, the model I originally proposed could be >>>>> supported by setting the delimiter to .. >>>>> >>>>> New option returned from the config-endpoint telling the which >>>>> namespace-element separator to use >>>>> >>>>> As Eduard noted, old clients affected by this problem are already >>>>> broken so it is okay if the fix is only for newer clients. If a client >>>>> isn’t broken, then the service still needs to support 0x1F for >>>>> compatibility. Given that 0x1F is not a common character in names, I >>>>> think that should be safe. >>>>> >>>>> I do not see any other option than changing the REST spec and define >>>>> an escaping mechanism, which requires new endpoints. >>>>> >>>>> I disagree here. I think making the delimiter configurable is a good >>>>> solution that doesn’t require new endpoints. I definitely prefer this to a >>>>> more complicated scheme. >>>>> >>>>> Ryan >>>>> >>>>> On Tue, Aug 6, 2024 at 2:06 AM Robert Stupp <sn...@snazy.de> wrote: >>>>> >>>>>> I'd like to summarize the proposals that came up: >>>>>> >>>>>> 1. Change `%1F` to `%2E`: >>>>>> >>>>>> Already existing namespaces that have the dot-character (`%2E` == >>>>>> `.`) become inaccessible. A namespace ["my", "elem.foo"] is then encoded >>>>>> as >>>>>> `my.elem.foo` and decoded as ["my", "elem", "foo"], which is incorrect. >>>>>> >>>>>> 2. New option returned from the config-endpoint telling the which >>>>>> namespace-element separator to use: >>>>>> >>>>>> (Old) clients won't respect this option - and in turn, the service >>>>>> has to expect both `%1F` and the "proposed" separator character. If an >>>>>> old >>>>>> client sends a namespace element that contains the separator proposed by >>>>>> the service, the namespace representation on the service side is not the >>>>>> same as the one on the client side. Ex: A new service advertises `%2E" >>>>>> via >>>>>> the config endpoint - the old client doesn't know about it and encodes >>>>>> ["my", "elem", "foo"] as `my%1Felem%1Ffoo`, which the service either >>>>>> rejects with a HTTP/4xx or interprets as ["my%1Felem%1Ffoo"] - both are >>>>>> incorrect. >>>>>> >>>>>> 3. Clients send a new query parameter `?delim=.` >>>>>> >>>>>> (Old) services that don't know about this new query parameter will >>>>>> interpret the namespace differently, the namespace representation on the >>>>>> service side is not the same as the one on the client side. A client >>>>>> encodes a namespace ["my", "elem", "foo"] as `my.elem.foo` and adds the >>>>>> `?delim=.` query param. Old services interpret this as ["my.elem.foo"], >>>>>> which is incorrect. >>>>>> >>>>>> >>>>>> In any case, services _have to_ support `%1F` or compatibility w/ >>>>>> older clients will be broken. >>>>>> >>>>>> I do not see any other option than changing the REST spec and define >>>>>> an escaping mechanism, which requires new endpoints. >>>>>> >>>>>> All options proposed so far are potentially breaking REST spec >>>>>> changes. >>>>>> >>>>>> >>>>>> On 05.08.24 19:06, Daniel Weeks wrote: >>>>>> >>>>>> I would agree with adding either a server side (config override) or >>>>>> client side control (query param with `?delim=.`) as it will be >>>>>> compatible with the current v1 endpoint. >>>>>> >>>>>> In the future we could introduce a v2 endpoint(s), but I would want >>>>>> to wait for OpenAPI 4 because they address this by allowing multi-segment >>>>>> pathing via URI templates in RFC 6570 >>>>>> <https://datatracker.ietf.org/doc/html/rfc6570>, which is the >>>>>> original way we wanted to represent namespaces, but it wasn't supported >>>>>> (e.g. .../{+namespaces}/tables/{table}). I doubt it's really worth >>>>>> the effort though, so I feel like a configurable delimiter makes the most >>>>>> sense. >>>>>> >>>>>> -Dan >>>>>> >>>>>> -- >>>>>> Robert Stupp >>>>>> @snazy >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Ryan Blue >>>>> Databricks >>>>> >>>>