@Dmitri thanks for the quick review. The query param is actually being
handled in #10904 (spec) <https://github.com/apache/iceberg/pull/10904> and
in #10905 (impl) <https://github.com/apache/iceberg/pull/10905> and is just
a change on top of #10877 <https://github.com/apache/iceberg/pull/10877>.

Eduard

On Thu, Aug 8, 2024 at 4:58 PM Dmitri Bourlatchkov
<dmitri.bourlatch...@dremio.com.invalid> wrote:

> Thanks, Eduard, for 10877 [1] and addressing my concerns quickly :)
>
> This approach is fine from my POV, although I personally prefer the
> flexibility of what Jack proposed.
>
> Cheers,
> Dmitri.
>
> [1] https://github.com/apache/iceberg/pull/10877
>
> On Thu, Aug 8, 2024 at 6:28 AM Eduard Tudenhöfner <
> etudenhoef...@apache.org> wrote:
>
>> I've opened #10877 <https://github.com/apache/iceberg/pull/10877> a few
>> days ago to make the namespace separator configurable and let servers
>> communicate to clients which separator should be used. Worth mentioning
>> that this doesn't require any spec chance and it is backwards compatible
>> with older clients that send *%1F*.
>>
>> I'm ok if we want to make the namespace separator also be controllable
>> from the client and send a *delim* query param, but I agree with Dmitri
>> that the server should still be able to tell a client which separator it
>> prefers (as indicated in #10877
>> <https://github.com/apache/iceberg/pull/10877>).
>>
>> Sending the *delim* query param requires a non-breaking spec change, so
>> I've opened #10904 <https://github.com/apache/iceberg/pull/10904>.
>>
>> After we vote on the spec change I'll add the required impl
>> <https://github.com/apache/iceberg/pull/10905> (which just goes on top
>> of #10877 <https://github.com/apache/iceberg/pull/10877>) to send the
>> *delim* query param.
>>
>> Eduard
>>
>> On Thu, Aug 8, 2024 at 12:52 AM Dmitri Bourlatchkov
>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>
>>> The idea of client-chosen separator char (?delim=.) sounds pretty
>>> reasonable to me.
>>>
>>> Nonetheless, I do not think this covers all the issues in putting
>>> namespaces in URI paths for servers running under the new Servlet spec. In
>>> particular, there are other chars that are considered "suspicious" by the
>>> servlet spec and may not be allowed.
>>>
>>> Since Iceberg defines a REST API spec, I think it is reasonable for REST
>>> server implementers to expect Iceberg to provide means in the REST spec to
>>> ensure compatibility with modern servlet frameworks. This is critical so
>>> that REST clients and servers have the same approach to URI
>>> construction/interpretation.
>>>
>>> That said, if the general consensus is to keep this discussion focused
>>> only on the namespace element separators, I'd like to add a couple of
>>> points:
>>>
>>> It would be nice to have a default value for this parameter matching
>>> current separator to ensure compatibility with older clients.
>>>
>>> I think we should also allow a client-side configuration setting for
>>> end-users to control this. I think it would be ok to allow servers to
>>> advertise a default value for the separator. However servers should accept
>>> different values according to the `delim` parameter. End-users should be
>>> able to override server-side config with their own client-side config since
>>> end users ultimately know more about their namespace names.
>>>
>>> Cheers,
>>> Dmitri.
>>>
>>> On Wed, Aug 7, 2024 at 6:07 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>
>>>> Sorry a bit late to this thread.
>>>>
>>>> I would personally prefer the client side separator solution (query
>>>> param with `?delim=.`) a bit more than the server side (config
>>>> override), just given the experience of handling similar situations for
>>>> Glue data catalog which allows any name for database (namespace) and table,
>>>> except for white space characters [1].
>>>>
>>>> In the Glue-HiveMetaStore connector [2], there is a feature to use a
>>>> single string to reference a namespace in a non-default catalog, e.g.
>>>> "cat1:ns1" can mean catalog cat1 namespace ns1. This is basically a 2-level
>>>> namespace. After a similar discussion thread and exploring what users are
>>>> actually using, we realized that no separator could work for everyone, so
>>>> we introduced a config value [3] that users can set at client side when
>>>> using the connector. If there is a namespace with name "ns1:ns2" in a
>>>> catalog "cat1", then the user can choose a different separator like "$' to
>>>> write "cat1$ns1:ns2", which can allow us to correctly resolve all the name
>>>> references.
>>>>
>>>> We found this the most flexible, since an organization would typically
>>>> stick to just one or two special characters, and can always find something
>>>> else as a separator. Even in extreme cases, a user can choose a long string
>>>> like "__SEPARATOR__" as the separator.
>>>>
>>>> -Jack
>>>>
>>>> [1] https://docs.aws.amazon.com/glue/latest/webapi/API_GlueTable.html
>>>> [2]
>>>> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore
>>>> [3]
>>>> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/branch-3.4.0/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/util/AWSGlueConfig.java#L26
>>>>
>>>>
>>>> On Tue, Aug 6, 2024 at 1:32 PM Ryan Blue <b...@databricks.com.invalid>
>>>> wrote:
>>>>
>>>>>
>>>>>    1. Change %1F to %2E:
>>>>>
>>>>> As I noted in my earlier email, a catalog may choose to use . as a
>>>>> one-way conversion so that it doesn’t matter that you can’t split the
>>>>> namespace. This does work, but with slightly different behavior.
>>>>>
>>>>> The original decision on this issue was that the behavior should be a
>>>>> catalog choice, which is why we went with what we thought was a safe
>>>>> delimiter. For this discussion, the model I originally proposed could be
>>>>> supported by setting the delimiter to ..
>>>>>
>>>>> New option returned from the config-endpoint telling the which
>>>>> namespace-element separator to use
>>>>>
>>>>> As Eduard noted, old clients affected by this problem are already
>>>>> broken so it is okay if the fix is only for newer clients. If a client
>>>>> isn’t broken, then the service still needs to support 0x1F for
>>>>> compatibility. Given that 0x1F is not a common character in names, I
>>>>> think that should be safe.
>>>>>
>>>>> I do not see any other option than changing the REST spec and define
>>>>> an escaping mechanism, which requires new endpoints.
>>>>>
>>>>> I disagree here. I think making the delimiter configurable is a good
>>>>> solution that doesn’t require new endpoints. I definitely prefer this to a
>>>>> more complicated scheme.
>>>>>
>>>>> Ryan
>>>>>
>>>>> On Tue, Aug 6, 2024 at 2:06 AM Robert Stupp <sn...@snazy.de> wrote:
>>>>>
>>>>>> I'd like to summarize the proposals that came up:
>>>>>>
>>>>>> 1. Change `%1F` to `%2E`:
>>>>>>
>>>>>> Already existing namespaces that have the dot-character (`%2E` ==
>>>>>> `.`) become inaccessible. A namespace ["my", "elem.foo"] is then encoded 
>>>>>> as
>>>>>> `my.elem.foo` and decoded as ["my", "elem", "foo"], which is incorrect.
>>>>>>
>>>>>> 2. New option returned from the config-endpoint telling the which
>>>>>> namespace-element separator to use:
>>>>>>
>>>>>> (Old) clients won't respect this option - and in turn, the service
>>>>>> has to expect both `%1F` and the "proposed" separator character. If an 
>>>>>> old
>>>>>> client sends a namespace element that contains the separator proposed by
>>>>>> the service, the namespace representation on the service side is not the
>>>>>> same as the one on the client side. Ex: A new service advertises `%2E" 
>>>>>> via
>>>>>> the config endpoint - the old client doesn't know about it and encodes
>>>>>> ["my", "elem", "foo"] as `my%1Felem%1Ffoo`, which the service either
>>>>>> rejects with a HTTP/4xx or interprets as ["my%1Felem%1Ffoo"] - both are
>>>>>> incorrect.
>>>>>>
>>>>>> 3. Clients send a new query parameter `?delim=.`
>>>>>>
>>>>>> (Old) services that don't know about this new query parameter will
>>>>>> interpret the namespace differently, the namespace representation on the
>>>>>> service side is not the same as the one on the client side. A client
>>>>>> encodes a namespace ["my", "elem", "foo"] as `my.elem.foo` and adds the
>>>>>> `?delim=.` query param. Old services interpret this as ["my.elem.foo"],
>>>>>> which is incorrect.
>>>>>>
>>>>>>
>>>>>> In any case, services _have to_ support `%1F` or compatibility w/
>>>>>> older clients will be broken.
>>>>>>
>>>>>> I do not see any other option than changing the REST spec and define
>>>>>> an escaping mechanism, which requires new endpoints.
>>>>>>
>>>>>> All options proposed so far are potentially breaking REST spec
>>>>>> changes.
>>>>>>
>>>>>>
>>>>>> On 05.08.24 19:06, Daniel Weeks wrote:
>>>>>>
>>>>>> I would agree with adding either a server side (config override) or
>>>>>> client side control (query param with `?delim=.`) as it will be
>>>>>> compatible with the current v1 endpoint.
>>>>>>
>>>>>> In the future we could introduce a v2 endpoint(s), but I would want
>>>>>> to wait for OpenAPI 4 because they address this by allowing multi-segment
>>>>>> pathing via URI templates in RFC 6570
>>>>>> <https://datatracker.ietf.org/doc/html/rfc6570>, which is the
>>>>>> original way we wanted to represent namespaces, but it wasn't supported
>>>>>> (e.g. .../{+namespaces}/tables/{table}).  I doubt it's really worth
>>>>>> the effort though, so I feel like a configurable delimiter makes the most
>>>>>> sense.
>>>>>>
>>>>>> -Dan
>>>>>>
>>>>>> --
>>>>>> Robert Stupp
>>>>>> @snazy
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>> Databricks
>>>>>
>>>>

Reply via email to