Hi Yufei 1. No Camel has its own routing engine, fully self-contained, message oriented 2. The potential limitations is the size of the Exchange (depending of the route, the Exchange can be offload to a store). And yes, potentially, depending of the route, the number of commits can be important (especially with messaging/IoT). However, it's possible to have aggregator to group several Exchanges and do a single commit by batch. 3. The schema evolution is something that I will add in the component (comparing the schema in the Exchange and the table schema).
Regards JB On Tue, May 21, 2024 at 7:14 PM Yufei Gu <flyrain...@gmail.com> wrote: > > Hi JB, > > Thanks for sharing. Got a few questions: > > Does Apache Camel rely on other engines, e.g., Spark or Flink for any > processing, or is it fully self-contained? > What are the potential challenges or limitations you foresee? For example, > does it generate too many commits and/or small files considering its use > cases(IoT, Event streaming)? Can Camel cache ingestion data, and write it to > the Iceberg table as a batch? > How do you recommend handling schema evolution in Iceberg tables when > integrating with Camel routes? > > Yufei > > > On Tue, May 21, 2024 at 6:06 AM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: >> >> Hi folks, >> >> I'm working on a Iceberg component for Apache Camel: >> https://github.com/jbonofre/iceberg/tree/CAMEL/camel/camel-iceberg/src/main >> >> Apache Camel is an integration framework, supporting a lot of >> components and EIPs (Enterprise Integration Patterns, like Content >> Based Router, Splitter, Aggregator, Content Enricher, ...). >> Camel is very popular in a lot of use cases, like IoT, system >> integration, event streamings, ... >> >> This component provides a Camel component with: >> - a Camel consumer endpoint (from) to read data from Iceberg >> tables/views (scan) and create a Camel exchange >> - a Camel producer endpoint (to) to write data (from Camel exchange) >> to Iceberg tables/views >> >> For instance, you can write a Camel route like this (using the >> spring/blueprint DSL for instance): >> >> <from uri="jms:queue:foo"/> >> <process ref="#convertToIcebergRecords"/> <!-- optional depending on >> the exchange message body --> >> <to uri="iceberg:my_table?catalog=#ref"/> >> >> This route is event driven, consuming messages from the foo JMS queue >> (from Apache ActiveMQ for instance), and writing a message body to >> my_table iceberg table (it's possible to use a router or multicast >> EIPs to send the exchange to different tables). >> NB: for the from (consumer endpoint), you can use any Camel component >> (https://camel.apache.org/components/4.4.x/). >> >> You can also consume (scan) data from an Iceberg table, and send the >> generated Exchange to any endpoint/route: >> >> <from uri="iceberg:my_table?catalog=#ref"/> >> <process ref="#convertFromIcebergRecords"/> <!-- optional depending on >> the next steps in the route --> >> <wireTap uri="direct:tap"/> >> <to uri="mongodb:myDB?database=mydb&collection=foo&operation=insert"/> >> >> This route generates exchanges from my_table Iceberg table, uses the >> wiretap EIP and stores the data into a mongoDB database/collection. >> >> If I started the component in the Iceberg repo, I think it would make >> more sense to have it at camel (as Apache Beam contains the Iceberg >> IO). >> Thoughts ? >> >> Comments are welcome ! >> >> NB: on a related topic, I created >> https://github.com/apache/iceberg/pull/10365 >> >> Regards >> JB