tien Viale <
sebastien.vi...@michelin.com>
a écrit :
Hi,
106 :
Thanks for the clarification. Actually, this is not what I
expected,
but I
better understand the performance issues regarding the state
store
iteration.
If this is how it should be designed, it is fine for me as
long
we
> > > >>>>>>>> could enforce a repartition if one choses to use a
> de-duplication
> > > >>>> id
> > > >>>>>>> other
> > > >>>>>>>> than the key.
> > > >>>>>>>>
> > > &g
In this case, there is some internal complexity of this processor:
> > >>>>>>> I) We should repartition twice, before and after
> > deduplication.
> > >>>> The
> > >>>>>>> second
> > >>>>>>> map triggers repart
n
> >>>>>>>
> >>>>>>> 1) `deduplicateByKey()` *// No repartitioning*
> >>>>>>> 2) `deduplicateByKey((k, v) -> v.id)` *or
> >>>> *`deduplicateByKeyAndId((k,
> >>>>>> v)
> >>>>>&
e good.
Best,
Ayoub
Le jeu. 13 juin 2024 à 09:03, Sebastien Viale <
sebastien.vi...@michelin.com>
a écrit :
Hi,
106 :
Thanks for the clarification. Actually, this is not what I
expected,
but I
better understand the performance issues regarding the state
store
iteration.
If this is
he clarification. Actually, this is not what I
expected,
but I
better understand the performance issues regarding the state
store
iteration.
If this is how it should be designed, it is fine for me as long
as
it
is
clear that the repartition must be done before the
deduplication.
Sébastien
_
t allows to efficiently purge expired records besides the
> > keyValue
> > > > > store
> > > > > > makes sense. I've been looking into the code, and I think a
> similar
> > > > idea
> > > > > > was implemented for other pro
ng code here.
> > > > > KIP updated !
> > > > >
> > > > >
> > > > > 104.
> > > > > Updated the KIP to consider records' offsets.
> > > > >
> > > > >
> > > > > 105
> > > > > &g
gt; > simplest to not really have a window lookup, but just a plain
> > > key-lookup
> > > > > and drop if the key exists in the store?
> > > >
> > > > KIP updated, we will be `.get()`ing from a keyValueStore instead of
> > > &g
> > > > resurrect it (thus, building a workaround to change semantica is
> > > > possible for users if we default to keep records, but not the other
> way
> > > > around).
> > >
> > > Makes total sense ! I updated the KIP to forward late recor
cating by
> > partition. If there is a better name to have this information in the name
> > of the api itself it would be good.
> >
> >
> > Best,
> > Ayoub
> >
> >
> > Le jeu. 13 juin 2024 à 09:03, Sebastien Viale <
> sebastien.vi...@michelin.c
> > better understand the performance issues regarding the state store
> > iteration.
> > If this is how it should be designed, it is fine for me as long as it is
> > clear that the repartition must be done before the deduplication.
> > Sébastien
> >
> > _
> >>
> >>
> >> -Matthias
> >>
> >> On 6/11/24 2:30 AM, Sebastien Viale wrote:
> >>> Hi,
> >>>
> >>> I am really interested in this KIP.
> >>>
> >>> 106:
> >>> I hope I am not talking nonsense,
deduplication.
Sébastien
De : Matthias J. Sax
Envoyé : jeudi 13 juin 2024 02:51
À : dev@kafka.apache.org
Objet : [EXT] Re: [DISCUSS] KIP-655: Add deduplication processor in
kafka-streams
Warning External sender Do not click on any links or open any attachments
unless
: mardi 11 juin 2024 01:54
À : dev@kafka.apache.org
Objet : [EXT] Re: [DISCUSS] KIP-655: Add deduplication processor in
kafka-streams
Warning External sender Do not click on any links or open any
attachments unless you trust the sender and know the content is safe.
Thanks for the update Ayoub.
>
> > thanks
> >
> > Sébastien
> > ____
> > De : Matthias J. Sax
> > Envoyé : mardi 11 juin 2024 01:54
> > À : dev@kafka.apache.org
> > Objet : [EXT] Re: [DISCUSS] KIP-655: Add deduplication processor in
> kafka-s
: mardi 11 juin 2024 01:54
À : dev@kafka.apache.org
Objet : [EXT] Re: [DISCUSS] KIP-655: Add deduplication processor in
kafka-streams
Warning External sender Do not click on any links or open any attachments
unless you trust the sender and know the content is safe.
Thanks for the update Ayoub.
2024 01:54
À : dev@kafka.apache.org
Objet : [EXT] Re: [DISCUSS] KIP-655: Add deduplication processor in
kafka-streams
Warning External sender Do not click on any links or open any attachments
unless you trust the sender and know the content is safe.
Thanks for the update Ayoub.
101: you say
Thanks for the update Ayoub.
101: you say:
But I am not sure if we don't want to have them for this processor ?
What is your reasoning to move off the established pattern? Would be
good to understand, why `Deduplicated` class needs a different
"structure" compared to existing classes.
Hi Matthias,
Thank you for your review !
100.
I agree. I changed the name of the parameter to "idSelector".
Because this id may be computed, It is better to call it "id" rather than
field or attribute.
101.
The reason I added the methods `keySerde()` and `valueSerde()` was to
have the same capab
Ayoub,
thanks for resurrecting this KIP. I think a built-in de-duplication
operator will be very useful.
Couple of questions:
100: `deduplicationKeySelector`
Is this the best name? It might indicate that we select a "key" what is
an overloaded term... Maybe we could use `Field` or `Id` o
Hi everyone,
I've just made a (small) change to this KIP about an implementation detail.
Please let me know your thoughts.
Thank you,
Ayoub
Le lun. 20 mai 2024 à 21:13, Ayoub a écrit :
> Hello,
>
> Following a discussion on community slack channel, I would like to revive
> the discussion on th
Hello,
Following a discussion on community slack channel, I would like to revive
the discussion on the KIP-655, which is about adding a deduplication
processor in kafka-streams.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-655%3A+Windowed+Distinct+Operation+for+Kafka+Streams+API
Even th
23 matches
Mail list logo