Re: [DISCUSSIONS] Should we use AUTO_PRODUCE schema?

丛搏 Tue, 13 Dec 2022 20:01:08 -0800

Hi, Yunze:

Yunze Xu <y...@streamnative.io.invalid> 于2022年12月14日周三 02:26写道：


> First, how do you guarantee the schema can be used to encode the raw
> bytes whose format is unknown?
I think this is what the user needs to ensure that the user knows all
the schema from the Kafka topic and the date(bytes[]) that the user
can send with a pulsar schema
>
> Second, messages that cannot be encoded by the schema can only be
> discarded, i.e. message lost.
If the encoding fails, it proves that the user does not know how to
convert Kafka date's schema to pulsar schema, which is the user's own
problem.
>
> Third, schema in Pulsar is convenient because it can support sending
> any object of type `T` and the Pulsar client is responsible to
> serialize `T` to the bytes. However, when using AUTO_PRODUCE schema,
> the producer still sends raw bytes.
the user only creates one producer to send all Kafka topic data, if
using Pulsar schema, the user needs to create all schema producers in
a map, and get the schema producer to send a message.


In my understanding, AUTO_PRODUCE mainly reduces the number of
producers created by the client, which will bring convenience to users
in migrating data. Instead of dealing with unknown schema data. If you
want to use it correctly, you must know the schema of all data, which
can be converted into a pulsar schema. Otherwise, it would be best if
you handled it yourself using the bytes schema.

Thanks,
Bo

Re: [DISCUSSIONS] Should we use AUTO_PRODUCE schema?

Reply via email to