Re: [DISCUSS] camel-iceberg component

Jean-Baptiste Onofré Wed, 22 May 2024 00:54:15 -0700

Hi Yufei

1. No Camel has its own routing engine, fully self-contained, message oriented
2. The potential limitations is the size of the Exchange (depending of
the route, the Exchange can be offload to a store). And yes,
potentially, depending of the route, the number of commits can be
important (especially with messaging/IoT). However, it's possible to
have aggregator to group several Exchanges and do a single commit by
batch.
3. The schema evolution is something that I will add in the component
(comparing the schema in the Exchange and the table schema).


Regards
JB

On Tue, May 21, 2024 at 7:14 PM Yufei Gu <flyrain...@gmail.com> wrote:
>
> Hi JB,
>
> Thanks for sharing. Got a few questions:
>
> Does Apache Camel rely on other engines, e.g., Spark or Flink for any 
> processing, or is it fully self-contained?
> What are the potential challenges or limitations you foresee? For example, 
> does it generate too many commits and/or small files considering its use 
> cases(IoT, Event streaming)? Can Camel cache ingestion data, and write it to 
> the Iceberg table as a batch?
> How do you recommend handling schema evolution in Iceberg tables when 
> integrating with Camel routes?
>
> Yufei
>
>
> On Tue, May 21, 2024 at 6:06 AM Jean-Baptiste Onofré <j...@nanthrax.net> 
> wrote:
>>
>> Hi folks,
>>
>> I'm working on a Iceberg component for Apache Camel:
>> https://github.com/jbonofre/iceberg/tree/CAMEL/camel/camel-iceberg/src/main
>>
>> Apache Camel is an integration framework, supporting a lot of
>> components and EIPs (Enterprise Integration Patterns, like Content
>> Based Router, Splitter, Aggregator, Content Enricher, ...).
>> Camel is very popular in a lot of use cases, like IoT, system
>> integration, event streamings, ...
>>
>> This component provides a Camel component with:
>> - a Camel consumer endpoint (from) to read data from Iceberg
>> tables/views (scan) and create a Camel exchange
>> - a Camel producer endpoint (to) to write data (from Camel exchange)
>> to Iceberg tables/views
>>
>> For instance, you can write a Camel route like this (using the
>> spring/blueprint DSL for instance):
>>
>> <from uri="jms:queue:foo"/>
>> <process ref="#convertToIcebergRecords"/> <!-- optional depending on
>> the exchange message body -->
>> <to uri="iceberg:my_table?catalog=#ref"/>
>>
>> This route is event driven, consuming messages from the foo JMS queue
>> (from Apache ActiveMQ for instance), and writing a message body to
>> my_table iceberg table (it's possible to use a router or multicast
>> EIPs to send the exchange to different tables).
>> NB: for the from (consumer endpoint), you can use any Camel component
>> (https://camel.apache.org/components/4.4.x/).
>>
>> You can also consume (scan) data from an Iceberg table, and send the
>> generated Exchange to any endpoint/route:
>>
>> <from uri="iceberg:my_table?catalog=#ref"/>
>> <process ref="#convertFromIcebergRecords"/> <!-- optional depending on
>> the next steps in the route -->
>> <wireTap uri="direct:tap"/>
>> <to uri="mongodb:myDB?database=mydb&collection=foo&operation=insert"/>
>>
>> This route generates exchanges from my_table Iceberg table, uses the
>> wiretap EIP and stores the data into a mongoDB database/collection.
>>
>> If I started the component in the Iceberg repo, I think it would make
>> more sense to have it at camel (as Apache Beam contains the Iceberg
>> IO).
>> Thoughts ?
>>
>> Comments are welcome !
>>
>> NB: on a related topic, I created 
>> https://github.com/apache/iceberg/pull/10365
>>
>> Regards
>> JB

Re: [DISCUSS] camel-iceberg component

Reply via email to