Re: Kafka Streams LEFT JOIN multiple tables

Slathia p Wed, 24 Jan 2024 23:57:15 -0800

Hi Team,


Greetings,

 

Apologies for the delay in reply as I was down with flu.

 

We actually reached out to you for IT/ SAP/ Oracle/ Infor / Microsoft “VOTEC IT 
SERVICE PARTNERSHIP”  “IT SERVICE OUTSOURCING” “ “PARTNER SERVICE 
SUBCONTRACTING”

 

We have very attractive newly introduce reasonably price PARTNER IT SERVICE ODC 
SUBCONTRACTING MODEL in USA, Philippines, India and Singapore etc with White 
Label Model.

 

Our LOW COST IT SERVICE ODC MODEL eliminate the cost of expensive employee 
payroll, Help partner to get profit more than 50% on each project.. ..We really 
mean it.

 

We are already working with platinum partner like NTT DATA, NEC Singapore, 
Deloitte, Hitachi consulting. ACCENTURE, Abeam Singapore etc.

 

Are u keen to understand VOTEC IT SERVICE MODEL PARTNERSHIP offerings?

 

Let us know your availability this week OR Next week?? We can arrange 
discussion with Partner Manager.
> On 01/24/2024 11:50 PM +08 Dharin Shah <dharinsha...@gmail.com> wrote:
> 
>  
> Hi Karsten,
> 
> Before delving deeper into Kafka Streams, it's worth considering if direct
> aggregation in the database might be a more straightforward solution,
> unless there's a compelling reason to avoid it. Aggregating data at the
> database level often leads to more efficient and maintainable systems. This
> approach could involve creating intermediate tables for aggregation,
> simplifying data management and reducing the complexity of downstream
> processing.
> 
> If the data model allows, consider pre-aggregating or restructuring data
> during writes. This step can significantly streamline the data before it's
> ingested into Kafka, which can then be used primarily for CDC and
> notifications.
> 
> However, if Kafka Streams aligns better with your infrastructure and
> requirements, a few more details would be helpful for a tailored
> suggestion. For instance, are you using RocksDB for state stores, and how
> are these states managed - are they backed up on local disks or in Kafka
> itself?
> 
> In Kafka Streams, consider converting KTables to KStreams for
> communications or labels to minimize state management needs. Keep in mind,
> though, that this approach might not reflect customer updates immediately.
> 
> Also, exploring the Processor API could offer more control over data flow
> and state management. It's a bit more complex but can be very effective, as
> discussed in this
> <https://medium.com/@dharinshah1234/the-hidden-complexity-of-kafka-streams-dsl-solving-scaling-issues-8620fd344a6d>
> article.
> 
> Thanks,
> Dharin
> 
> On Wed, Jan 24, 2024 at 12:55 PM Karsten Stöckmann <
> karsten.stoeckm...@gmail.com> wrote:
> >
> > Hi,
> >
> > we're currently in the process of evaluating Debezium and Kafka as a CDC
> > system for our Postgres database in order to build an indexing solution
> > (i.e. Solr or OpenSearch).
> >
> > Debezium captures changes per table and propagates them into dedicated
> > Kafka topics each. The ingested tables originally feature multiple
> > relationships of all kinds (1:1, 1:n, n:m) - aggregated data from those
> > tables should eventually reflect in composite documents.
> > To illustrate this, assume the following heavily simplified table layout:
> >
> >    - customer(id, name, firstname),
> >    - communication(id, customer_id, value),
> >    - label(id, customer_id, value),
> >    - (maybe more dependent tables related to customers 1:n or 1:1 or even
> >    n:m - not considered here for brevity).
> >
> > All tables are streamed into Kafka topics as pointed out above. Now in
> > order to aggregate an output customer with all associated communication
> and
> > label values (i.e. aggregated as lists), what would be the most elegant
> > solution leveraging Kafka Streams? Note that customers do not
> necessisarily
> > have any communication or label at all, thus non-key joins are out of the
> > game as far as I understand.
> >
> > Our initial (naive) solution was to re-key the dependent KTables and then
> > left joining, but that involves a lot of steps and intermediate State
> > Stores, especially when considering the stated example is heavily
> > simplified:
> >
> > KTable<Long, Customer> customers = streamBuilder.table(customerTopic,
> > Consumed.with(<Serdes>));
> > KTable<Long, Comm> comm = streamBuilder.table(commTopic,
> > Consumed.with(<Serdes>));
> > KTable<Long, Label> labels = streamBuilder.table(labelTopic,
> > Consumed.with(<Serdes>));
> > // Re-Keying
> > KTable<Long, GroupedComm> groupedComm =
> > communications.groupBy(...).aggregate(...);
> > KTable<Long, GroupedLabel> groupedLabels =
> > labels.groupBy(...).aggregate(...);
> > // Join
> > KTable<Long, AggregateCustomer> aggregated = customers
> > .leftJoin(groupedComm, ...)
> > .leftJoin(groupedLabel, ...)
> > .groupBy(...)
> > .aggregate(...);
> >
> > Are there more efficient / less naive / more elegant / simpler solutions
> to
> > this? Co-grouping input streams did not yield expected results...
> >
> > Best wishes,
> >
> > Karsten

Re: Kafka Streams LEFT JOIN multiple tables

Reply via email to