Sorry a bit late to this thread. I would personally prefer the client side separator solution (query param with `?delim=.`) a bit more than the server side (config override), just given the experience of handling similar situations for Glue data catalog which allows any name for database (namespace) and table, except for white space characters [1].
In the Glue-HiveMetaStore connector [2], there is a feature to use a single string to reference a namespace in a non-default catalog, e.g. "cat1:ns1" can mean catalog cat1 namespace ns1. This is basically a 2-level namespace. After a similar discussion thread and exploring what users are actually using, we realized that no separator could work for everyone, so we introduced a config value [3] that users can set at client side when using the connector. If there is a namespace with name "ns1:ns2" in a catalog "cat1", then the user can choose a different separator like "$' to write "cat1$ns1:ns2", which can allow us to correctly resolve all the name references. We found this the most flexible, since an organization would typically stick to just one or two special characters, and can always find something else as a separator. Even in extreme cases, a user can choose a long string like "__SEPARATOR__" as the separator. -Jack [1] https://docs.aws.amazon.com/glue/latest/webapi/API_GlueTable.html [2] https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore [3] https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/branch-3.4.0/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/util/AWSGlueConfig.java#L26 On Tue, Aug 6, 2024 at 1:32 PM Ryan Blue <b...@databricks.com.invalid> wrote: > > 1. Change %1F to %2E: > > As I noted in my earlier email, a catalog may choose to use . as a > one-way conversion so that it doesn’t matter that you can’t split the > namespace. This does work, but with slightly different behavior. > > The original decision on this issue was that the behavior should be a > catalog choice, which is why we went with what we thought was a safe > delimiter. For this discussion, the model I originally proposed could be > supported by setting the delimiter to .. > > New option returned from the config-endpoint telling the which > namespace-element separator to use > > As Eduard noted, old clients affected by this problem are already broken > so it is okay if the fix is only for newer clients. If a client isn’t > broken, then the service still needs to support 0x1F for compatibility. > Given that 0x1F is not a common character in names, I think that should > be safe. > > I do not see any other option than changing the REST spec and define an > escaping mechanism, which requires new endpoints. > > I disagree here. I think making the delimiter configurable is a good > solution that doesn’t require new endpoints. I definitely prefer this to a > more complicated scheme. > > Ryan > > On Tue, Aug 6, 2024 at 2:06 AM Robert Stupp <sn...@snazy.de> wrote: > >> I'd like to summarize the proposals that came up: >> >> 1. Change `%1F` to `%2E`: >> >> Already existing namespaces that have the dot-character (`%2E` == `.`) >> become inaccessible. A namespace ["my", "elem.foo"] is then encoded as >> `my.elem.foo` and decoded as ["my", "elem", "foo"], which is incorrect. >> >> 2. New option returned from the config-endpoint telling the which >> namespace-element separator to use: >> >> (Old) clients won't respect this option - and in turn, the service has to >> expect both `%1F` and the "proposed" separator character. If an old client >> sends a namespace element that contains the separator proposed by the >> service, the namespace representation on the service side is not the same >> as the one on the client side. Ex: A new service advertises `%2E" via the >> config endpoint - the old client doesn't know about it and encodes ["my", >> "elem", "foo"] as `my%1Felem%1Ffoo`, which the service either rejects with >> a HTTP/4xx or interprets as ["my%1Felem%1Ffoo"] - both are incorrect. >> >> 3. Clients send a new query parameter `?delim=.` >> >> (Old) services that don't know about this new query parameter will >> interpret the namespace differently, the namespace representation on the >> service side is not the same as the one on the client side. A client >> encodes a namespace ["my", "elem", "foo"] as `my.elem.foo` and adds the >> `?delim=.` query param. Old services interpret this as ["my.elem.foo"], >> which is incorrect. >> >> >> In any case, services _have to_ support `%1F` or compatibility w/ older >> clients will be broken. >> >> I do not see any other option than changing the REST spec and define an >> escaping mechanism, which requires new endpoints. >> >> All options proposed so far are potentially breaking REST spec changes. >> >> >> On 05.08.24 19:06, Daniel Weeks wrote: >> >> I would agree with adding either a server side (config override) or >> client side control (query param with `?delim=.`) as it will be >> compatible with the current v1 endpoint. >> >> In the future we could introduce a v2 endpoint(s), but I would want to >> wait for OpenAPI 4 because they address this by allowing multi-segment >> pathing via URI templates in RFC 6570 >> <https://datatracker.ietf.org/doc/html/rfc6570>, which is the original >> way we wanted to represent namespaces, but it wasn't supported (e.g. >> .../{+namespaces}/tables/{table}). I doubt it's really worth the effort >> though, so I feel like a configurable delimiter makes the most sense. >> >> -Dan >> >> -- >> Robert Stupp >> @snazy >> >> > > -- > Ryan Blue > Databricks >