Re: [DISCUSSIONS] Should we use AUTO_PRODUCE schema?

丛搏 Tue, 13 Dec 2022 23:09:14 -0800

Yunze Xu <y...@streamnative.io.invalid> 于2022年12月14日周三 12:40写道：
>
> > the user only creates one producer to send all Kafka topic data, if
> using Pulsar schema, the user needs to create all schema producers in
> a map
>
> It doesn't make sense to me. If the source topic has messages of
> multiple schemas, why did you try to sink them into the same topic
> with a schema? The key point of AUTO_PRODUCE schema is to download the
> schema to validate the source messages. But if the schema of the topic
> evolved, the left messages from the source topic could not be sent to
> the topic.
>
Let me give you an example, AvroSchema will have multi-version,
the version(0) :
Student {
String name;
}
the version(1) :
Student {
String name;
int age;
}
how do you can create two Student.class in one java process? and use
the same namespace?
It's not only the schema type changes it also will have multi-version schema.
In this case, how do you create two producers with version(0) and version(1)?


> The most confusing part is that AUTO_PRODUCE schema will perform
> message format validation before send. It's transparent to users and
> intuitive. IMO, it's better to call validate explicitly like
>
> ```java
> producer.newMessage().value(bytes).validate().sendAsync();
> ```
>
> There are two benefits:
> 1. It's clear that the message validation happens before sending.
> 2. If users don't want to validate before sending, they can choose to
> send the bytes directly and validate the message during consumption.
It only uses `schema.validate()` is enough, data validation does not
belong to the pulsar message, and we can add a usage description in
the schema doc.
>
> The performance problem of the AUTO_PRODUCE schema is that the
> validation happens twice and it cannot be controlled.

Our data verification is the behavior of the client, not the behavior
of the broker. Therefore, we cannot effectively verify that bytes are
generated by a specific schema. I think this is something that users
should consider rather than something that pulsar should guarantee,
because you can't control the data sent by users is generated by this
schema only for client verification. so, we don't need to verifi twice

Re: [DISCUSSIONS] Should we use AUTO_PRODUCE schema?

Reply via email to