Re: [DISCUSS] camel-iceberg component

Steven Wu Wed, 22 May 2024 14:54:54 -0700

seems reasonable to keep camel-iceberg inside camel project, which already
has many integration components. +1 for that.


On Wed, May 22, 2024 at 8:58 AM Ajantha Bhat <ajanthab...@gmail.com> wrote:

> +1,
>
> It is always good to have new ways to ingest data as an Iceberg table.
>
> - Ajantha
>
> On Wed, May 22, 2024 at 7:32 PM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
>> Hi Omar,
>>
>> That's the plan (see the last section in my previous email). Just
>> wanted to bring some attention in the Iceberg community :)
>>
>> Regards
>> JB
>>
>> On Wed, May 22, 2024 at 10:01 AM Omar Al-Safi <o...@oalsafi.com> wrote:
>> >
>> > IMO the Camel iceberg component should live in the camel repo. it can
>> be part of the camel components registry in camel
>> >
>> > On Wed, May 22, 2024 at 9:58 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>> >>
>> >> Hi Manish
>> >>
>> >> No, Camel is not an alternative to Spark or Flink: Camel is not a
>> >> query engine. It's more a "complement" to Kafka Connect.
>> >>
>> >> Regards
>> >> JB
>> >>
>> >> On Wed, May 22, 2024 at 7:09 AM Manish Malhotra
>> >> <manish.malhotra.w...@gmail.com> wrote:
>> >> >
>> >> > Is Camel can be used as an alternate to Flink?
>> >> >
>> >> >
>> >> > On Tue, May 21, 2024 at 10:17 AM Ryan Blue <b...@tabular.io> wrote:
>> >> >>
>> >> >> This is an interesting idea. What is the use case and where should
>> this live? I'm unfamiliar with Camel and I'm not sure what the normal thing
>> is. At least in the Iceberg community, we generally avoid adding connectors
>> unless there is a clear use case and demand for them. We don't want to add
>> code that needs to be maintained but isn't used.
>> >> >>
>> >> >> On Tue, May 21, 2024 at 10:15 AM Yufei Gu <flyrain...@gmail.com>
>> wrote:
>> >> >>>
>> >> >>> Hi JB,
>> >> >>>
>> >> >>> Thanks for sharing. Got a few questions:
>> >> >>>
>> >> >>> Does Apache Camel rely on other engines, e.g., Spark or Flink for
>> any processing, or is it fully self-contained?
>> >> >>> What are the potential challenges or limitations you foresee? For
>> example, does it generate too many commits and/or small files considering
>> its use cases(IoT, Event streaming)? Can Camel cache ingestion data, and
>> write it to the Iceberg table as a batch?
>> >> >>> How do you recommend handling schema evolution in Iceberg tables
>> when integrating with Camel routes?
>> >> >>>
>> >> >>> Yufei
>> >> >>>
>> >> >>>
>> >> >>> On Tue, May 21, 2024 at 6:06 AM Jean-Baptiste Onofré <
>> j...@nanthrax.net> wrote:
>> >> >>>>
>> >> >>>> Hi folks,
>> >> >>>>
>> >> >>>> I'm working on a Iceberg component for Apache Camel:
>> >> >>>>
>> https://github.com/jbonofre/iceberg/tree/CAMEL/camel/camel-iceberg/src/main
>> >> >>>>
>> >> >>>> Apache Camel is an integration framework, supporting a lot of
>> >> >>>> components and EIPs (Enterprise Integration Patterns, like Content
>> >> >>>> Based Router, Splitter, Aggregator, Content Enricher, ...).
>> >> >>>> Camel is very popular in a lot of use cases, like IoT, system
>> >> >>>> integration, event streamings, ...
>> >> >>>>
>> >> >>>> This component provides a Camel component with:
>> >> >>>> - a Camel consumer endpoint (from) to read data from Iceberg
>> >> >>>> tables/views (scan) and create a Camel exchange
>> >> >>>> - a Camel producer endpoint (to) to write data (from Camel
>> exchange)
>> >> >>>> to Iceberg tables/views
>> >> >>>>
>> >> >>>> For instance, you can write a Camel route like this (using the
>> >> >>>> spring/blueprint DSL for instance):
>> >> >>>>
>> >> >>>> <from uri="jms:queue:foo"/>
>> >> >>>> <process ref="#convertToIcebergRecords"/> <!-- optional depending
>> on
>> >> >>>> the exchange message body -->
>> >> >>>> <to uri="iceberg:my_table?catalog=#ref"/>
>> >> >>>>
>> >> >>>> This route is event driven, consuming messages from the foo JMS
>> queue
>> >> >>>> (from Apache ActiveMQ for instance), and writing a message body to
>> >> >>>> my_table iceberg table (it's possible to use a router or multicast
>> >> >>>> EIPs to send the exchange to different tables).
>> >> >>>> NB: for the from (consumer endpoint), you can use any Camel
>> component
>> >> >>>> (https://camel.apache.org/components/4.4.x/).
>> >> >>>>
>> >> >>>> You can also consume (scan) data from an Iceberg table, and send
>> the
>> >> >>>> generated Exchange to any endpoint/route:
>> >> >>>>
>> >> >>>> <from uri="iceberg:my_table?catalog=#ref"/>
>> >> >>>> <process ref="#convertFromIcebergRecords"/> <!-- optional
>> depending on
>> >> >>>> the next steps in the route -->
>> >> >>>> <wireTap uri="direct:tap"/>
>> >> >>>> <to
>> uri="mongodb:myDB?database=mydb&collection=foo&operation=insert"/>
>> >> >>>>
>> >> >>>> This route generates exchanges from my_table Iceberg table, uses
>> the
>> >> >>>> wiretap EIP and stores the data into a mongoDB
>> database/collection.
>> >> >>>>
>> >> >>>> If I started the component in the Iceberg repo, I think it would
>> make
>> >> >>>> more sense to have it at camel (as Apache Beam contains the
>> Iceberg
>> >> >>>> IO).
>> >> >>>> Thoughts ?
>> >> >>>>
>> >> >>>> Comments are welcome !
>> >> >>>>
>> >> >>>> NB: on a related topic, I created
>> https://github.com/apache/iceberg/pull/10365
>> >> >>>>
>> >> >>>> Regards
>> >> >>>> JB
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Ryan Blue
>> >> >> Tabular
>>
>

Re: [DISCUSS] camel-iceberg component

Reply via email to