Re: Ingestion Layer?

Jean-Baptiste Onofré Thu, 09 Nov 2023 23:21:54 -0800

Hi Austin

I agree. The idea is to have a ingestion layer with a kind of mix of Apache
Camel (for EIPs), Apache Beam like IOs, etc.


I started a PoC powered by Apache Karaf Minho like runtime.

I should be able to share a first more concrete proposal by next week
(sorry still no electricity at home after Ciaran storm so I don’t moving
forward as fast as I would like).

Regards
JB

Le jeu. 9 nov. 2023 à 18:53, Austin Bennett <aus...@apache.org> a écrit :

> Just a little comment that I imagine there would be massive value from an
> ingestion layer.
>
> Making it easier to add more integrations will be a great benefit for the
> ecosystem, adoption.
>
>
> Concretely, FWIW, I'm evaluating Iceberg [ and alternatives ] for an
> enterprise adoption, and existing integrations [ both for reads from and
> ingesting into iceberg ] and ease-of-contributing lacking integrations are
> TOP of mind.
>
>
>
> On Mon, Oct 2, 2023 at 11:03 PM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
>> From my standpoint, Kafka Connect is interesting to also address
>> processing logic without Spark or Flink runtime. Definitely
>> interesting to have Kafka integration/processing (even for me Kafka
>> and Kafka Connect are two different things ;)).
>>
>> For pure data ingestion part, I think it would make sense to have a
>> "ingestion layer" in Iceberg where we can have pluggable IO and where
>> we can both implement our own IO (specifically for Iceberg as Apache
>> Beam IOs for instance) and where we can leverage existing integration
>> framework (like Apache Camel).
>> Why not have JMS/ActiveMQ integration in Iceberg via an IO, or Pulsar
>> integration ? I think having such layer would be very interesting for
>> the community and we can have more users (it's what happened at Apache
>> Beam, the first IOs were only Google "centric" (bigtable, bigquery,
>> gfs, ...), we added new IOs (JMS, Kafka, JDBC, ...) and we saw a great
>> benefit for adoption :)).
>> DISCLAIMER: I've implemented IOs in Beam and components in Camel ;)
>>
>> I will do some investigation about that. I will draft a proposal.
>>
>> Regards
>> JB
>>
>> On Tue, Oct 3, 2023 at 7:23 AM Ajantha Bhat <ajanthab...@gmail.com>
>> wrote:
>> >
>> > Hi Bryan,
>> >
>> > I am very happy to see this contribution.
>> > I have recently tested this project with Nessie catalog and very much
>> liked it.
>> >
>> > However, I still don't know the benefits of using kafka-connect instead
>> of directly consuming
>> > from the kafka like Delta-lake's implementation.
>> > https://github.com/delta-io/kafka-delta-ingest/blob/main/doc/DESIGN.md
>> >
>> > I am not an expert in this ingestion domain and recently got started.
>> > I hope someone will chime in and we will have detailed analysis over
>> the design.
>> >
>> > Looking forward to this feature.
>> >
>> > Thanks,
>> > Ajantha
>> >
>> > On Tue, Oct 3, 2023 at 12:18 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>> >>
>> >> Hi Bryan
>> >>
>> >> That’s a great news ! Thanks a lot for the proposal.
>> >>
>> >> I will take a look on the PR and existing connector.
>> >> I’m sure the Iceberg community will be very happy to see this and we
>> will able to add new features and improvements thanks to the community
>> feedback.
>> >> I would be more than happy to help for donation (I know that the
>> connector is already under Apache license but we have to double check the
>> ICLA for the initial contributors etc , just to be sure we are good there).
>> >>
>> >> Thanks again !
>> >>
>> >> Let’s see what the others are thinking.
>> >>
>> >> Regards
>> >> JB
>> >>
>> >> Le lun. 2 oct. 2023 à 19:39, Bryan Keller <brya...@gmail.com> a écrit
>> :
>> >>>
>> >>> Hi all,
>> >>>
>> >>> We at Tabular would like to contribute our Kafka Connect Iceberg sink
>> to the Iceberg project. It would be great to give Iceberg users another
>> option for landing data from Kafka into Iceberg tables that is supported by
>> the Iceberg community. Kafka Connect is a part of systems from AWS,
>> Confluent, Redpanda, and so on, so it can make landing data from Kafka into
>> Iceberg much easier for those without a Flink or Spark infrastructure.
>> >>>
>> >>> There are a few Iceberg sink implementations out there for Kafka
>> Connect, but we feel this one covers most of the features users have
>> requested, such as exactly-once processing, schema evolution, and
>> multi-table fanout. And having the sink backed by the Iceberg community
>> will help it to evolve and improve over time.
>> >>>
>> >>> If this sounds like something everyone would like to see added to
>> Iceberg, I've opened a PR that includes some initial pieces of the sink.
>> The thought was to break up the submission into parts so each could be
>> reviewed more easily. Some design docs and notes can be found in the
>> original repo here:
>> https://github.com/tabular-io/iceberg-kafka-connect/tree/main/docs
>> >>>
>> >>> We'd like to get feedback if others approve of moving forward with
>> this or not.
>> >>>
>> >>> Thanks,
>> >>> Bryan
>> >>>
>>
>

Re: Ingestion Layer?

Reply via email to