Re: Kafka Connect sink

Jean-Baptiste Onofré Mon, 02 Oct 2023 23:03:42 -0700

>From my standpoint, Kafka Connect is interesting to also address
processing logic without Spark or Flink runtime. Definitely
interesting to have Kafka integration/processing (even for me Kafka
and Kafka Connect are two different things ;)).


For pure data ingestion part, I think it would make sense to have a
"ingestion layer" in Iceberg where we can have pluggable IO and where
we can both implement our own IO (specifically for Iceberg as Apache
Beam IOs for instance) and where we can leverage existing integration
framework (like Apache Camel).
Why not have JMS/ActiveMQ integration in Iceberg via an IO, or Pulsar
integration ? I think having such layer would be very interesting for
the community and we can have more users (it's what happened at Apache
Beam, the first IOs were only Google "centric" (bigtable, bigquery,
gfs, ...), we added new IOs (JMS, Kafka, JDBC, ...) and we saw a great
benefit for adoption :)).
DISCLAIMER: I've implemented IOs in Beam and components in Camel ;)

I will do some investigation about that. I will draft a proposal.

Regards
JB

On Tue, Oct 3, 2023 at 7:23 AM Ajantha Bhat <[email protected]> wrote:
>
> Hi Bryan,
>
> I am very happy to see this contribution.
> I have recently tested this project with Nessie catalog and very much liked 
> it.
>
> However, I still don't know the benefits of using kafka-connect instead of 
> directly consuming
> from the kafka like Delta-lake's implementation.
> https://github.com/delta-io/kafka-delta-ingest/blob/main/doc/DESIGN.md
>
> I am not an expert in this ingestion domain and recently got started.
> I hope someone will chime in and we will have detailed analysis over the 
> design.
>
> Looking forward to this feature.
>
> Thanks,
> Ajantha
>
> On Tue, Oct 3, 2023 at 12:18 AM Jean-Baptiste Onofré <[email protected]> 
> wrote:
>>
>> Hi Bryan
>>
>> That’s a great news ! Thanks a lot for the proposal.
>>
>> I will take a look on the PR and existing connector.
>> I’m sure the Iceberg community will be very happy to see this and we will 
>> able to add new features and improvements thanks to the community feedback.
>> I would be more than happy to help for donation (I know that the connector 
>> is already under Apache license but we have to double check the ICLA for the 
>> initial contributors etc , just to be sure we are good there).
>>
>> Thanks again !
>>
>> Let’s see what the others are thinking.
>>
>> Regards
>> JB
>>
>> Le lun. 2 oct. 2023 à 19:39, Bryan Keller <[email protected]> a écrit :
>>>
>>> Hi all,
>>>
>>> We at Tabular would like to contribute our Kafka Connect Iceberg sink to 
>>> the Iceberg project. It would be great to give Iceberg users another option 
>>> for landing data from Kafka into Iceberg tables that is supported by the 
>>> Iceberg community. Kafka Connect is a part of systems from AWS, Confluent, 
>>> Redpanda, and so on, so it can make landing data from Kafka into Iceberg 
>>> much easier for those without a Flink or Spark infrastructure.
>>>
>>> There are a few Iceberg sink implementations out there for Kafka Connect, 
>>> but we feel this one covers most of the features users have requested, such 
>>> as exactly-once processing, schema evolution, and multi-table fanout. And 
>>> having the sink backed by the Iceberg community will help it to evolve and 
>>> improve over time.
>>>
>>> If this sounds like something everyone would like to see added to Iceberg, 
>>> I've opened a PR that includes some initial pieces of the sink. The thought 
>>> was to break up the submission into parts so each could be reviewed more 
>>> easily. Some design docs and notes can be found in the original repo here: 
>>> https://github.com/tabular-io/iceberg-kafka-connect/tree/main/docs
>>>
>>> We'd like to get feedback if others approve of moving forward with this or 
>>> not.
>>>
>>> Thanks,
>>> Bryan
>>>

Re: Kafka Connect sink

Reply via email to