Hi Austin I agree. The idea is to have a ingestion layer with a kind of mix of Apache Camel (for EIPs), Apache Beam like IOs, etc.
I started a PoC powered by Apache Karaf Minho like runtime. I should be able to share a first more concrete proposal by next week (sorry still no electricity at home after Ciaran storm so I don’t moving forward as fast as I would like). Regards JB Le jeu. 9 nov. 2023 à 18:53, Austin Bennett <aus...@apache.org> a écrit : > Just a little comment that I imagine there would be massive value from an > ingestion layer. > > Making it easier to add more integrations will be a great benefit for the > ecosystem, adoption. > > > Concretely, FWIW, I'm evaluating Iceberg [ and alternatives ] for an > enterprise adoption, and existing integrations [ both for reads from and > ingesting into iceberg ] and ease-of-contributing lacking integrations are > TOP of mind. > > > > On Mon, Oct 2, 2023 at 11:03 PM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> From my standpoint, Kafka Connect is interesting to also address >> processing logic without Spark or Flink runtime. Definitely >> interesting to have Kafka integration/processing (even for me Kafka >> and Kafka Connect are two different things ;)). >> >> For pure data ingestion part, I think it would make sense to have a >> "ingestion layer" in Iceberg where we can have pluggable IO and where >> we can both implement our own IO (specifically for Iceberg as Apache >> Beam IOs for instance) and where we can leverage existing integration >> framework (like Apache Camel). >> Why not have JMS/ActiveMQ integration in Iceberg via an IO, or Pulsar >> integration ? I think having such layer would be very interesting for >> the community and we can have more users (it's what happened at Apache >> Beam, the first IOs were only Google "centric" (bigtable, bigquery, >> gfs, ...), we added new IOs (JMS, Kafka, JDBC, ...) and we saw a great >> benefit for adoption :)). >> DISCLAIMER: I've implemented IOs in Beam and components in Camel ;) >> >> I will do some investigation about that. I will draft a proposal. >> >> Regards >> JB >> >> On Tue, Oct 3, 2023 at 7:23 AM Ajantha Bhat <ajanthab...@gmail.com> >> wrote: >> > >> > Hi Bryan, >> > >> > I am very happy to see this contribution. >> > I have recently tested this project with Nessie catalog and very much >> liked it. >> > >> > However, I still don't know the benefits of using kafka-connect instead >> of directly consuming >> > from the kafka like Delta-lake's implementation. >> > https://github.com/delta-io/kafka-delta-ingest/blob/main/doc/DESIGN.md >> > >> > I am not an expert in this ingestion domain and recently got started. >> > I hope someone will chime in and we will have detailed analysis over >> the design. >> > >> > Looking forward to this feature. >> > >> > Thanks, >> > Ajantha >> > >> > On Tue, Oct 3, 2023 at 12:18 AM Jean-Baptiste Onofré <j...@nanthrax.net> >> wrote: >> >> >> >> Hi Bryan >> >> >> >> That’s a great news ! Thanks a lot for the proposal. >> >> >> >> I will take a look on the PR and existing connector. >> >> I’m sure the Iceberg community will be very happy to see this and we >> will able to add new features and improvements thanks to the community >> feedback. >> >> I would be more than happy to help for donation (I know that the >> connector is already under Apache license but we have to double check the >> ICLA for the initial contributors etc , just to be sure we are good there). >> >> >> >> Thanks again ! >> >> >> >> Let’s see what the others are thinking. >> >> >> >> Regards >> >> JB >> >> >> >> Le lun. 2 oct. 2023 à 19:39, Bryan Keller <brya...@gmail.com> a écrit >> : >> >>> >> >>> Hi all, >> >>> >> >>> We at Tabular would like to contribute our Kafka Connect Iceberg sink >> to the Iceberg project. It would be great to give Iceberg users another >> option for landing data from Kafka into Iceberg tables that is supported by >> the Iceberg community. Kafka Connect is a part of systems from AWS, >> Confluent, Redpanda, and so on, so it can make landing data from Kafka into >> Iceberg much easier for those without a Flink or Spark infrastructure. >> >>> >> >>> There are a few Iceberg sink implementations out there for Kafka >> Connect, but we feel this one covers most of the features users have >> requested, such as exactly-once processing, schema evolution, and >> multi-table fanout. And having the sink backed by the Iceberg community >> will help it to evolve and improve over time. >> >>> >> >>> If this sounds like something everyone would like to see added to >> Iceberg, I've opened a PR that includes some initial pieces of the sink. >> The thought was to break up the submission into parts so each could be >> reviewed more easily. Some design docs and notes can be found in the >> original repo here: >> https://github.com/tabular-io/iceberg-kafka-connect/tree/main/docs >> >>> >> >>> We'd like to get feedback if others approve of moving forward with >> this or not. >> >>> >> >>> Thanks, >> >>> Bryan >> >>> >> >