I have sent the second part of the patch about the Kafka Source
https://github.com/apache/pulsar/pull/10002

With this second patch we are able to support non String keys and also
we are applying Schema information to the Pulsar topic.

When the key deserializer is StringDeserializer we use the decoded key
as Pulsar key.
When the key is not StringDeserializer then we use the Pulsar KeyValue
data type, with SEPARATED key encoding .

This way on the topic we have a Schema for the key and a Schema for the value.
The key is encoded into the Pulsar key (SEPARATED) and so it is used
for routing and for compaction.

Please take a look if you are interested,
there is still much work to be done, but the results are promising

Enrico

Il giorno lun 1 mar 2021 alle ore 22:32 Enrico Olivelli
<eolive...@gmail.com> ha scritto:
>
>
>
> Il Lun 1 Mar 2021, 22:20 Sijie Guo <guosi...@gmail.com> ha scritto:
>>
>> Enrico - I have just reviewed the PR. I don't think you addressed your
>> comments. I still have the concern how this PR is implemented. I'd prefer
>> to keep the Kafka deserializer as simple as possible. We should keep the
>> schema cache and the logic to fetch confluent schema in the source
>> connector.
>
>
> Okay. I will update the patch accordingly.
> Thanks
>
> Enrico
>
>>
>> - Sijie
>>
>> On Mon, Mar 1, 2021 at 1:04 PM Sijie Guo <guosi...@gmail.com> wrote:
>>
>> > Apologized for the delay! Reviewing it now.
>> >
>> > - Sijie
>> >
>> > On Sun, Feb 28, 2021 at 11:29 PM Enrico Olivelli <eolive...@gmail.com>
>> > wrote:
>> >
>> >> Hello,
>> >> Please bear with me, I really want this work to go forward  :-)
>> >>
>> >> @Sijie, I know that you are super busy, so I would like to not put
>> >> pressure on you, and I thank you very much for your useful comments on
>> >> the PR.
>> >>
>> >> Our Pulsar community is big and it is still growing
>> >> IMHO it would be a very good thing that others in the community take a
>> >> look as well.
>> >>
>> >> The first patch is not so big work and it is hard to review.
>> >> As a general approach I prefer to send little patches, this way it is
>> >> easy to understand what's going on.
>> >>
>> >> Code is not written in the stone, and we can always make improvements.
>> >> My plan is to continue working on the Kafka connector and send more
>> >> patches until I have covered all of the use cases of my interest
>> >> (basically around enterprise features, like Schema, Multi topic...)
>> >>
>> >> I would like to work directly here within the project by sending pull
>> >> requests to the ASF repo and I am not willing to not create my own
>> >> Kafka Connector fork.
>> >> I believe this is the best approach for the community,
>> >> but I need some support from the group.
>> >>
>> >> Best regards
>> >> Enrico
>> >>
>> >>
>> >>
>> >> Il giorno gio 25 feb 2021 alle ore 08:56 Sijie Guo
>> >> <guosi...@gmail.com> ha scritto:
>> >> >
>> >> > Apologized for the delay! Will review it again today or tomorrow.
>> >> >
>> >> > - Sijie
>> >> >
>> >> > On Wed, Feb 24, 2021 at 3:49 AM Enrico Olivelli <eolive...@gmail.com>
>> >> wrote:
>> >> >
>> >> > > Hello community,
>> >> > > It looks like only Sijie started to review this work.
>> >> > > https://github.com/apache/pulsar/pull/9448
>> >> > >
>> >> > > I wonder if others that are interested in Kafka compatibility may
>> >> have
>> >> > > time to check it out
>> >> > >
>> >> > > As said, this is only the first part of a series of implementations
>> >> we want
>> >> > > to do about this Connector
>> >> > >
>> >> > > Enrico
>> >> > >
>> >> > > Il giorno mar 16 feb 2021 alle ore 05:31 Sijie Guo <
>> >> guosi...@gmail.com> ha
>> >> > > scritto:
>> >> > >
>> >> > > > Thanks, I will review the PR.
>> >> > > >
>> >> > > > - Sijie
>> >> > > >
>> >> > > > On Mon, Feb 15, 2021 at 2:47 AM Enrico Olivelli <
>> >> eolive...@gmail.com>
>> >> > > > wrote:
>> >> > > >
>> >> > > > > Sijie,
>> >> > > > >
>> >> > > > > I managed to implement Avro support In KafkaBytesSource following
>> >> your
>> >> > > > > suggestions. Thanks.
>> >> > > > >
>> >> > > > > I would like to commit this initial patch and then add support
>> >> for all
>> >> > > of
>> >> > > > > the primitive Schemas as you did in (1) and for JSON.
>> >> > > > > If you prefer I can continue to enhance this patch.
>> >> > > > >
>> >> > > > > Enrico
>> >> > > > >
>> >> > > > > (1)
>> >> > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
>> >> > > > >
>> >> > > > > Il giorno lun 15 feb 2021 alle ore 06:01 Sijie Guo <
>> >> guosi...@gmail.com
>> >> > > >
>> >> > > > ha
>> >> > > > > scritto:
>> >> > > > >
>> >> > > > > > Hi Enrico,
>> >> > > > > >
>> >> > > > > > Thank you for working on this!
>> >> > > > > >
>> >> > > > > > But as I mentioned in the pull request, we should avoid using a
>> >> > > > > > one-connector-per-schema model. That model probably works with
>> >> other
>> >> > > > > > connectors that have a very limited number of schemas. If you
>> >> are
>> >> > > going
>> >> > > > > to
>> >> > > > > > implement a schema-aware Kafka connector, that model is
>> >> impossible to
>> >> > > > > > maintain, because it will introduce N * N connectors where N is
>> >> the
>> >> > > > > number
>> >> > > > > > of supported schemas.
>> >> > > > > >
>> >> > > > > > We should maintain one "bytes" connector and transfer the Kafka
>> >> > > schema
>> >> > > > to
>> >> > > > > > the Pulsar schema. I have written an enhanced Kafka connector
>> >> > > > > > <https://github.com/streamnative/pulsar-io-kafka> two years
>> >> ago.
>> >> > > > > >
>> >> > > > > > You just need to maintain one connector:
>> >> > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L94
>> >> > > > > > Then convert Kafka SerDe to Pulsar schema:
>> >> > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
>> >> > > > > >
>> >> > > > > > I am happy to submit a PR to merge those changes back.
>> >> > > > > >
>> >> > > > > > - Sijie
>> >> > > > > >
>> >> > > > > > On Thu, Feb 11, 2021 at 11:48 PM Enrico Olivelli <
>> >> > > eolive...@gmail.com>
>> >> > > > > > wrote:
>> >> > > > > >
>> >> > > > > > > Hello everyone,
>> >> > > > > > > here in our Pulsar repository we have a simple Kafka
>> >> Connector for
>> >> > > > > Pulsar
>> >> > > > > > > IO composed by a Sink and a Source.
>> >> > > > > > > https://github.com/apache/pulsar/tree/master/pulsar-io/kafka
>> >> > > > > > >
>> >> > > > > > > I have started to work on a set of enhancements to this
>> >> connector
>> >> > > in
>> >> > > > > > order
>> >> > > > > > > to make it more powerful and to better fit the needs of
>> >> enterprise
>> >> > > > > users.
>> >> > > > > > >
>> >> > > > > > > The first patch I have submitted is about supporting Avro
>> >> encoded
>> >> > > > > > messages
>> >> > > > > > > + Confluent Schema Registry in the KafkaSource
>> >> > > > > > > https://github.com/apache/pulsar/pull/9448
>> >> > > > > > >
>> >> > > > > > > The patch is only the first one of a bigger work that we have
>> >> to do
>> >> > > > in
>> >> > > > > > > order to have a fully usable Connector for non-trivial use
>> >> cases.
>> >> > > > > > >
>> >> > > > > > > I will be happy to follow up with other patches and
>> >> especially to
>> >> > > > draw
>> >> > > > > a
>> >> > > > > > > little roadmap about the features that we want to implement
>> >> and
>> >> > > > provide
>> >> > > > > > to
>> >> > > > > > > the community.
>> >> > > > > > >
>> >> > > > > > > Please take a look to the patch and share your thoughts
>> >> > > > > > >
>> >> > > > > > > Regards
>> >> > > > > > > Enrico
>> >> > > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >>
>> >

Reply via email to