Hi Ryan Yes, I agree: I started on Iceberg repo to facilitate the discussion and inform the Iceberg community. As said in my previous email, I will move the camel-iceberg component directly in Camel.
"Classic" use cases for Apache Camel is IoT, system integration and event streaming. For instance, we can imagine to have IoT devices sending data via AMQP/MQTT protocols (on a broker like ActiveMQ) and a Camel route consumes these messages on the fly and insert into Iceberg tables directly. It would open new use cases for Iceberg like geoloc analysis, maintenance prediction, etc. Regards JB On Tue, May 21, 2024 at 7:16 PM Ryan Blue <b...@tabular.io> wrote: > > This is an interesting idea. What is the use case and where should this live? > I'm unfamiliar with Camel and I'm not sure what the normal thing is. At least > in the Iceberg community, we generally avoid adding connectors unless there > is a clear use case and demand for them. We don't want to add code that needs > to be maintained but isn't used. > > On Tue, May 21, 2024 at 10:15 AM Yufei Gu <flyrain...@gmail.com> wrote: >> >> Hi JB, >> >> Thanks for sharing. Got a few questions: >> >> Does Apache Camel rely on other engines, e.g., Spark or Flink for any >> processing, or is it fully self-contained? >> What are the potential challenges or limitations you foresee? For example, >> does it generate too many commits and/or small files considering its use >> cases(IoT, Event streaming)? Can Camel cache ingestion data, and write it to >> the Iceberg table as a batch? >> How do you recommend handling schema evolution in Iceberg tables when >> integrating with Camel routes? >> >> Yufei >> >> >> On Tue, May 21, 2024 at 6:06 AM Jean-Baptiste Onofré <j...@nanthrax.net> >> wrote: >>> >>> Hi folks, >>> >>> I'm working on a Iceberg component for Apache Camel: >>> https://github.com/jbonofre/iceberg/tree/CAMEL/camel/camel-iceberg/src/main >>> >>> Apache Camel is an integration framework, supporting a lot of >>> components and EIPs (Enterprise Integration Patterns, like Content >>> Based Router, Splitter, Aggregator, Content Enricher, ...). >>> Camel is very popular in a lot of use cases, like IoT, system >>> integration, event streamings, ... >>> >>> This component provides a Camel component with: >>> - a Camel consumer endpoint (from) to read data from Iceberg >>> tables/views (scan) and create a Camel exchange >>> - a Camel producer endpoint (to) to write data (from Camel exchange) >>> to Iceberg tables/views >>> >>> For instance, you can write a Camel route like this (using the >>> spring/blueprint DSL for instance): >>> >>> <from uri="jms:queue:foo"/> >>> <process ref="#convertToIcebergRecords"/> <!-- optional depending on >>> the exchange message body --> >>> <to uri="iceberg:my_table?catalog=#ref"/> >>> >>> This route is event driven, consuming messages from the foo JMS queue >>> (from Apache ActiveMQ for instance), and writing a message body to >>> my_table iceberg table (it's possible to use a router or multicast >>> EIPs to send the exchange to different tables). >>> NB: for the from (consumer endpoint), you can use any Camel component >>> (https://camel.apache.org/components/4.4.x/). >>> >>> You can also consume (scan) data from an Iceberg table, and send the >>> generated Exchange to any endpoint/route: >>> >>> <from uri="iceberg:my_table?catalog=#ref"/> >>> <process ref="#convertFromIcebergRecords"/> <!-- optional depending on >>> the next steps in the route --> >>> <wireTap uri="direct:tap"/> >>> <to uri="mongodb:myDB?database=mydb&collection=foo&operation=insert"/> >>> >>> This route generates exchanges from my_table Iceberg table, uses the >>> wiretap EIP and stores the data into a mongoDB database/collection. >>> >>> If I started the component in the Iceberg repo, I think it would make >>> more sense to have it at camel (as Apache Beam contains the Iceberg >>> IO). >>> Thoughts ? >>> >>> Comments are welcome ! >>> >>> NB: on a related topic, I created >>> https://github.com/apache/iceberg/pull/10365 >>> >>> Regards >>> JB > > > > -- > Ryan Blue > Tabular