Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-12-02 Thread Matthias J. Sax
tien Viale < sebastien.vi...@michelin.com> a écrit : Hi, 106 : Thanks for the clarification. Actually, this is not what I expected, but I better understand the performance issues regarding the state store iteration. If this is how it should be designed, it is fine for me as long

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-11-19 Thread Ayoub Omari
we > > > >>>>>>>> could enforce a repartition if one choses to use a > de-duplication > > > >>>> id > > > >>>>>>> other > > > >>>>>>>> than the key. > > > >>>>>>>> > > > &g

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-11-06 Thread Lucas Brutschy
In this case, there is some internal complexity of this processor: > > >>>>>>> I) We should repartition twice, before and after > > deduplication. > > >>>> The > > >>>>>>> second > > >>>>>>> map triggers repart

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-11-05 Thread Ayoub Omari
n > >>>>>>> > >>>>>>> 1) `deduplicateByKey()` *// No repartitioning* > >>>>>>> 2) `deduplicateByKey((k, v) -> v.id)` *or > >>>> *`deduplicateByKeyAndId((k, > >>>>>> v) > >>>>>&

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-10-03 Thread Matthias J. Sax
e good. Best, Ayoub Le jeu. 13 juin 2024 à 09:03, Sebastien Viale < sebastien.vi...@michelin.com> a écrit : Hi, 106 : Thanks for the clarification. Actually, this is not what I expected, but I better understand the performance issues regarding the state store iteration. If this is

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-08-30 Thread Matthias J. Sax
he clarification. Actually, this is not what I expected, but I better understand the performance issues regarding the state store iteration. If this is how it should be designed, it is fine for me as long as it is clear that the repartition must be done before the deduplication. Sébastien _

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-08-30 Thread Ayoub Omari
t allows to efficiently purge expired records besides the > > keyValue > > > > > store > > > > > > makes sense. I've been looking into the code, and I think a > similar > > > > idea > > > > > > was implemented for other pro

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-08-27 Thread Bill Bejeck
ng code here. > > > > > KIP updated ! > > > > > > > > > > > > > > > 104. > > > > > Updated the KIP to consider records' offsets. > > > > > > > > > > > > > > > 105 > > > > > &g

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-08-26 Thread Ayoub Omari
gt; > simplest to not really have a window lookup, but just a plain > > > key-lookup > > > > > and drop if the key exists in the store? > > > > > > > > KIP updated, we will be `.get()`ing from a keyValueStore instead of > > > &g

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-08-22 Thread Bill Bejeck
> > > > resurrect it (thus, building a workaround to change semantica is > > > > possible for users if we default to keep records, but not the other > way > > > > around). > > > > > > Makes total sense ! I updated the KIP to forward late recor

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-07-13 Thread Ayoub Omari
cating by > > partition. If there is a better name to have this information in the name > > of the api itself it would be good. > > > > > > Best, > > Ayoub > > > > > > Le jeu. 13 juin 2024 à 09:03, Sebastien Viale < > sebastien.vi...@michelin.c

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-07-10 Thread Bill Bejeck
> > better understand the performance issues regarding the state store > > iteration. > > If this is how it should be designed, it is fine for me as long as it is > > clear that the repartition must be done before the deduplication. > > Sébastien > > > > _

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-06-25 Thread Ayoub Omari
> >> > >> > >> -Matthias > >> > >> On 6/11/24 2:30 AM, Sebastien Viale wrote: > >>> Hi, > >>> > >>> I am really interested in this KIP. > >>> > >>> 106: > >>> I hope I am not talking nonsense,

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-06-13 Thread Sebastien Viale
deduplication. Sébastien De : Matthias J. Sax Envoyé : jeudi 13 juin 2024 02:51 À : dev@kafka.apache.org Objet : [EXT] Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams Warning External sender Do not click on any links or open any attachments unless

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-06-12 Thread Matthias J. Sax
: mardi 11 juin 2024 01:54 À : dev@kafka.apache.org Objet : [EXT] Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams Warning External sender Do not click on any links or open any attachments unless you trust the sender and know the content is safe. Thanks for the update Ayoub.

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-06-11 Thread Ayoub Omari
> > > thanks > > > > Sébastien > > ____ > > De : Matthias J. Sax > > Envoyé : mardi 11 juin 2024 01:54 > > À : dev@kafka.apache.org > > Objet : [EXT] Re: [DISCUSS] KIP-655: Add deduplication processor in > kafka-s

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-06-11 Thread Matthias J. Sax
: mardi 11 juin 2024 01:54 À : dev@kafka.apache.org Objet : [EXT] Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams Warning External sender Do not click on any links or open any attachments unless you trust the sender and know the content is safe. Thanks for the update Ayoub.

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-06-11 Thread Sebastien Viale
2024 01:54 À : dev@kafka.apache.org Objet : [EXT] Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams Warning External sender Do not click on any links or open any attachments unless you trust the sender and know the content is safe. Thanks for the update Ayoub. 101: you say

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-06-10 Thread Matthias J. Sax
Thanks for the update Ayoub. 101: you say: But I am not sure if we don't want to have them for this processor ? What is your reasoning to move off the established pattern? Would be good to understand, why `Deduplicated` class needs a different "structure" compared to existing classes.

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-06-06 Thread Ayoub Omari
Hi Matthias, Thank you for your review ! 100. I agree. I changed the name of the parameter to "idSelector". Because this id may be computed, It is better to call it "id" rather than field or attribute. 101. The reason I added the methods `keySerde()` and `valueSerde()` was to have the same capab

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-06-04 Thread Matthias J. Sax
Ayoub, thanks for resurrecting this KIP. I think a built-in de-duplication operator will be very useful. Couple of questions: 100: `deduplicationKeySelector` Is this the best name? It might indicate that we select a "key" what is an overloaded term... Maybe we could use `Field` or `Id` o

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-05-29 Thread Ayoub Omari
Hi everyone, I've just made a (small) change to this KIP about an implementation detail. Please let me know your thoughts. Thank you, Ayoub Le lun. 20 mai 2024 à 21:13, Ayoub a écrit : > Hello, > > Following a discussion on community slack channel, I would like to revive > the discussion on th

[DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-05-20 Thread Ayoub
Hello, Following a discussion on community slack channel, I would like to revive the discussion on the KIP-655, which is about adding a deduplication processor in kafka-streams. https://cwiki.apache.org/confluence/display/KAFKA/KIP-655%3A+Windowed+Distinct+Operation+for+Kafka+Streams+API Even th