Is Camel can be used as an alternate to Flink?
On Tue, May 21, 2024 at 10:17 AM Ryan Blue <b...@tabular.io> wrote: > This is an interesting idea. What is the use case and where should this > live? I'm unfamiliar with Camel and I'm not sure what the normal thing is. > At least in the Iceberg community, we generally avoid adding connectors > unless there is a clear use case and demand for them. We don't want to add > code that needs to be maintained but isn't used. > > On Tue, May 21, 2024 at 10:15 AM Yufei Gu <flyrain...@gmail.com> wrote: > >> Hi JB, >> >> Thanks for sharing. Got a few questions: >> >> 1. Does Apache Camel rely on other engines, e.g., Spark or Flink for >> any processing, or is it fully self-contained? >> 2. What are the potential challenges or limitations you foresee? For >> example, does it generate too many commits and/or small files >> considering its use cases(IoT, Event streaming)? Can Camel cache ingestion >> data, and write it to the Iceberg table as a batch? >> 3. How do you recommend handling schema evolution in Iceberg tables >> when integrating with Camel routes? >> >> Yufei >> >> >> On Tue, May 21, 2024 at 6:06 AM Jean-Baptiste Onofré <j...@nanthrax.net> >> wrote: >> >>> Hi folks, >>> >>> I'm working on a Iceberg component for Apache Camel: >>> >>> https://github.com/jbonofre/iceberg/tree/CAMEL/camel/camel-iceberg/src/main >>> >>> Apache Camel is an integration framework, supporting a lot of >>> components and EIPs (Enterprise Integration Patterns, like Content >>> Based Router, Splitter, Aggregator, Content Enricher, ...). >>> Camel is very popular in a lot of use cases, like IoT, system >>> integration, event streamings, ... >>> >>> This component provides a Camel component with: >>> - a Camel consumer endpoint (from) to read data from Iceberg >>> tables/views (scan) and create a Camel exchange >>> - a Camel producer endpoint (to) to write data (from Camel exchange) >>> to Iceberg tables/views >>> >>> For instance, you can write a Camel route like this (using the >>> spring/blueprint DSL for instance): >>> >>> <from uri="jms:queue:foo"/> >>> <process ref="#convertToIcebergRecords"/> <!-- optional depending on >>> the exchange message body --> >>> <to uri="iceberg:my_table?catalog=#ref"/> >>> >>> This route is event driven, consuming messages from the foo JMS queue >>> (from Apache ActiveMQ for instance), and writing a message body to >>> my_table iceberg table (it's possible to use a router or multicast >>> EIPs to send the exchange to different tables). >>> NB: for the from (consumer endpoint), you can use any Camel component >>> (https://camel.apache.org/components/4.4.x/). >>> >>> You can also consume (scan) data from an Iceberg table, and send the >>> generated Exchange to any endpoint/route: >>> >>> <from uri="iceberg:my_table?catalog=#ref"/> >>> <process ref="#convertFromIcebergRecords"/> <!-- optional depending on >>> the next steps in the route --> >>> <wireTap uri="direct:tap"/> >>> <to uri="mongodb:myDB?database=mydb&collection=foo&operation=insert"/> >>> >>> This route generates exchanges from my_table Iceberg table, uses the >>> wiretap EIP and stores the data into a mongoDB database/collection. >>> >>> If I started the component in the Iceberg repo, I think it would make >>> more sense to have it at camel (as Apache Beam contains the Iceberg >>> IO). >>> Thoughts ? >>> >>> Comments are welcome ! >>> >>> NB: on a related topic, I created >>> https://github.com/apache/iceberg/pull/10365 >>> >>> Regards >>> JB >>> >> > > -- > Ryan Blue > Tabular >