Hi all,

On the topic of allowed characters, we've settled in the (recent) past on
alphanumeric characters only for simple names. Reasoning was that this is
most portable.

Remembering that, allowing the use of hyphens (or underscores) will not
unduly complicate things. I would, however, draw a line at characters that
are normally not found in words, like !@#€$%^&*(){}[],<>/?|"';:...

Specifically, I think we should only add to the spec to allow words of
alphanumeric characters, optionally in snake or kebab case. In a regular
expression, this would mean: [a-zA-Z][0-9a-zA-Z_-]*[0-9a-zA-Z]?


Kind regards,
Oscar


On Mon, 13 Nov 2023 at 21:06, Martin Grigorov <mgrigo...@apache.org> wrote:

> Hi Jon,
>
> Thank you for this email!
>
> On Mon, Nov 13, 2023 at 7:56 PM Jonathan Slusher <jonslus...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I opened an issue in the AVRO project in Jira <
> > https://issues.apache.org/jira/browse/AVRO-3900> and I’ve been asked to
> > submit a topic for discussion to this email group.
> >
> > See this issue in the rust schema_registry_converter repo <
> > https://github.com/gklijs/schema_registry_converter/issues/100> for
> > details specific to the crate in rust that we’re having trouble with:
> >
> > A couple of things to point out here:
> >
> > 1. I understand that at this time the Avro spec does not allow hyphens in
> > its namespaces, but somehow our registry is allowing them to be created
> > from our Debezium connectors. We have been using the confluent_python <
> > https://github.com/confluentinc/confluent-kafka-python> module and since
> > version 1.9.2, its deserializer seems to handle these hyphens without
> > error. We also have several JDBC sink connectors with consumer groups
> that
> > are able to use these topics.
> >
> > 2. We recently attempted to implement a consumer written in rust and the
> > crate <https://github.com/gklijs/schema_registry_converter> above, which
> > is used for deserialization, throws an exception when attempting to
> connect
> > to these topics.
> >
> > ```
> > thread 'main' panicked at /app/src/utils/kafka.rs:67:35:
> > Error decoding value: Error: Supplied raw value
> >
> "{\"type\":\"record\",\"name\":\"Envelope\",\"namespace\":\"debezium.abc-123-efg-20231005.table.u_table_dbz\",\"fields\":[{\"name\":\"before\",\"type\":[\"null\",{\"type\":\"record\",\"name\":\"Value\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"uid\",\"type\":{\"type\":\"long\",\"connect.default\":0},\"default\":0},{\"name\":\"release_id\",\"type\":{\"type\":\"long\",\"connect.default\":0},\"default\":0},{\"name\":\"notes\",\"type\":\"string\"},{\"name\":\"notes_public\",\"type\":{\"type\":\"string\",\"connect.version\":1,\"connect.parameters\":{\"allowed\":\"Y,N\"},\"connect.default\":\"N\",\"
> > connect.name
> >
> \":\"io.debezium.data.Enum\"},\"default\":\"N\"},{\"name\":\"added_ts\",\"type\":{\"type\":\"long\",\"connect.version\":1,\"
> > connect.name\":\"io.debezium.time.Timestamp\"}}],\"connect.name
> >
> \":\"debezium.abc-123-efg-20231005.table.u_table_dbz.Value\"}],\"default\":null},{\"name\":\"after\",\"type\":[\"null\",\"Value\"],\"default\":null},{\"name\":\"source\",\"type\":{\"type\":\"record\",\"name\":\"Source\",\"namespace\":\"io.debezium.connector.mysql\",\"fields\":[{\"name\":\"version\",\"type\":\"string\"},{\"name\":\"connector\",\"type\":\"string\"},{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"ts_ms\",\"type\":\"long\"},{\"name\":\"snapshot\",\"type\":[{\"type\":\"string\",\"connect.version\":1,\"connect.parameters\":{\"allowed\":\"true,last,false,incremental\"},\"connect.default\":\"false\",\"
> > connect.name
> >
> \":\"io.debezium.data.Enum\"},\"null\"],\"default\":\"false\"},{\"name\":\"db\",\"type\":\"string\"},{\"name\":\"sequence\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"table\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"server_id\",\"type\":\"long\"},{\"name\":\"gtid\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"file\",\"type\":\"string\"},{\"name\":\"pos\",\"type\":\"long\"},{\"name\":\"row\",\"type\":\"int\"},{\"name\":\"thread\",\"type\":[\"null\",\"long\"],\"default\":null},{\"name\":\"query\",\"type\":[\"null\",\"string\"],\"default\":null}],\"
> > connect.name
> >
> \":\"io.debezium.connector.mysql.Source\"}},{\"name\":\"op\",\"type\":\"string\"},{\"name\":\"ts_ms\",\"type\":[\"null\",\"long\"],\"default\":null},{\"name\":\"transaction\",\"type\":[\"null\",{\"type\":\"record\",\"name\":\"block\",\"namespace\":\"event\",\"fields\":[{\"name\":\"id\",\"type\":\"string\"},{\"name\":\"total_order\",\"type\":\"long\"},{\"name\":\"data_collection_order\",\"type\":\"long\"}],\"connect.version\":1,\"
> > connect.name
> > \":\"event.block\"}],\"default\":null}],\"connect.version\":1,\"
> > connect.name
> \":\"debezium.abc-123-efg-20231005.table.u_table_dbz.Envelope\"}"
> > cant be turned into a Schema, was cause by Invalid namespace
> > debezium.abc-123-efg-20231005.table.u_table_dbz. It must match the regex
> > '^([A-Za-z_][A-Za-z0-9_]*(\.[A-Za-z_][A-Za-z0-9_]*)*)?$', it's retriable:
> > false, it's cached: false
> > ```
> >
> > Ideally, the Avro spec would just accept hyphens since they’re a pretty
> > common character and unavoidable in certain circumstances. If this is
> > easier said than done, I think at the least any library used for Avro
> > deserialization should account for them, including the rust library. If
> > this works in Java and Python, shouldn’t it also work in rust?
> >
>
> This is exactly the reason I asked you to raise this question here in the
> mailing list!
> I also agree that if most/all of the SDKs will allow hyphens in the
> name[space] then it is a better idea to add it to the list of allowed
> characters in the specification instead of adding logic to disable the
> validation.
>
> @Avro devs: What is your opinion ?
>
> Martin
>
>
> >
> > Here’s a generic example of a schema created by a Debezium connector:
> >
> > ```
> > {
> >   "type": "record",
> >   "name": "Envelope",
> >   "namespace": "abc-123-efg-20231005.table.u_table_dbz",
> >   "fields": [
> >     {
> >       "name": "before",
> >       "type": [
> >         "null",
> >         {
> > ...
> > ```
> >
> > Please let me know if you need any more details, and thank you!
> >
> > Jon Slusher
> >
> >
> >
> >
>


-- 

✉️ Oscar Westra van Holthe - Kind <os...@westravanholthe.nl>

Reply via email to