This is an interesting idea. What is the use case and where should this live? I'm unfamiliar with Camel and I'm not sure what the normal thing is. At least in the Iceberg community, we generally avoid adding connectors unless there is a clear use case and demand for them. We don't want to add code that needs to be maintained but isn't used.
On Tue, May 21, 2024 at 10:15 AM Yufei Gu <flyrain...@gmail.com> wrote: > Hi JB, > > Thanks for sharing. Got a few questions: > > 1. Does Apache Camel rely on other engines, e.g., Spark or Flink for > any processing, or is it fully self-contained? > 2. What are the potential challenges or limitations you foresee? For > example, does it generate too many commits and/or small files > considering its use cases(IoT, Event streaming)? Can Camel cache ingestion > data, and write it to the Iceberg table as a batch? > 3. How do you recommend handling schema evolution in Iceberg tables > when integrating with Camel routes? > > Yufei > > > On Tue, May 21, 2024 at 6:06 AM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> Hi folks, >> >> I'm working on a Iceberg component for Apache Camel: >> >> https://github.com/jbonofre/iceberg/tree/CAMEL/camel/camel-iceberg/src/main >> >> Apache Camel is an integration framework, supporting a lot of >> components and EIPs (Enterprise Integration Patterns, like Content >> Based Router, Splitter, Aggregator, Content Enricher, ...). >> Camel is very popular in a lot of use cases, like IoT, system >> integration, event streamings, ... >> >> This component provides a Camel component with: >> - a Camel consumer endpoint (from) to read data from Iceberg >> tables/views (scan) and create a Camel exchange >> - a Camel producer endpoint (to) to write data (from Camel exchange) >> to Iceberg tables/views >> >> For instance, you can write a Camel route like this (using the >> spring/blueprint DSL for instance): >> >> <from uri="jms:queue:foo"/> >> <process ref="#convertToIcebergRecords"/> <!-- optional depending on >> the exchange message body --> >> <to uri="iceberg:my_table?catalog=#ref"/> >> >> This route is event driven, consuming messages from the foo JMS queue >> (from Apache ActiveMQ for instance), and writing a message body to >> my_table iceberg table (it's possible to use a router or multicast >> EIPs to send the exchange to different tables). >> NB: for the from (consumer endpoint), you can use any Camel component >> (https://camel.apache.org/components/4.4.x/). >> >> You can also consume (scan) data from an Iceberg table, and send the >> generated Exchange to any endpoint/route: >> >> <from uri="iceberg:my_table?catalog=#ref"/> >> <process ref="#convertFromIcebergRecords"/> <!-- optional depending on >> the next steps in the route --> >> <wireTap uri="direct:tap"/> >> <to uri="mongodb:myDB?database=mydb&collection=foo&operation=insert"/> >> >> This route generates exchanges from my_table Iceberg table, uses the >> wiretap EIP and stores the data into a mongoDB database/collection. >> >> If I started the component in the Iceberg repo, I think it would make >> more sense to have it at camel (as Apache Beam contains the Iceberg >> IO). >> Thoughts ? >> >> Comments are welcome ! >> >> NB: on a related topic, I created >> https://github.com/apache/iceberg/pull/10365 >> >> Regards >> JB >> > -- Ryan Blue Tabular