Thanks a lot Alex and Matthias,
>From Alex answer, I understand that the record is written to the compacted
topic
as part of the transaction right ?


On Tue, May 19, 2020 at 8:32 PM Matthias J. Sax <mj...@apache.org> wrote:

> What Alex says is correct.
>
> The changelog topic is only written into during processing -- in fact,
> you could consider this write a "side effect" of doing `store.put()`.
>
> The changelog topic is only read when recovering from an error and the
> store needs to be rebuilt from it.
>
>
> -Matthias
>
> On 5/19/20 9:30 AM, Alex Craig wrote:
> > Hi Raffaele, hopefully others more knowledgeable will correct me if I'm
> > wrong, but I don't believe anything gets read from the changelog topic.
> > (other than at startup if the state-store needs to be restored)  So in
> > your Sub-topology-1,
> > the only topic being consumed from is the repartition topic.  After it
> hits
> > the aggregation (state store), the record is emitted to the to the next
> > step which converts it back to a KStream object, and then finally it's
> > sinked to your output topic. If a failure occurred somewhere in that
> > sub-topology, then the offset for the repartition topic would not have
> been
> > committed, which means it would get processed again on application
> > startup.  Hope that helps,
> >
> > Alex Craig
> >
> > On Tue, May 19, 2020 at 10:21 AM Raffaele Esposito <
> rafaelral...@gmail.com>
> > wrote:
> >
> >> This is the topology of a simple word count:
> >>
> >> Topologies:
> >>    Sub-topology: 0
> >>     Source: KSTREAM-SOURCE-0000000000 (topics: [word_count_input])
> >>       --> KSTREAM-FLATMAPVALUES-0000000001
> >>     Processor: KSTREAM-FLATMAPVALUES-0000000001 (stores: [])
> >>       --> KSTREAM-KEY-SELECT-0000000002
> >>       <-- KSTREAM-SOURCE-0000000000
> >>     Processor: KSTREAM-KEY-SELECT-0000000002 (stores: [])
> >>       --> KSTREAM-FILTER-0000000005
> >>       <-- KSTREAM-FLATMAPVALUES-0000000001
> >>     Processor: KSTREAM-FILTER-0000000005 (stores: [])
> >>       --> KSTREAM-SINK-0000000004
> >>       <-- KSTREAM-KEY-SELECT-0000000002
> >>     Sink: KSTREAM-SINK-0000000004 (topic: Counts-repartition)
> >>       <-- KSTREAM-FILTER-0000000005
> >>
> >>   Sub-topology: 1
> >>     Source: KSTREAM-SOURCE-0000000006 (topics: [Counts-repartition])
> >>       --> KSTREAM-AGGREGATE-0000000003
> >>     Processor: KSTREAM-AGGREGATE-0000000003 (stores: [Counts])
> >>       --> KTABLE-TOSTREAM-0000000007
> >>       <-- KSTREAM-SOURCE-0000000006
> >>     Processor: KTABLE-TOSTREAM-0000000007 (stores: [])
> >>       --> KSTREAM-SINK-0000000008
> >>       <-- KSTREAM-AGGREGATE-0000000003
> >>     Sink: KSTREAM-SINK-0000000008 (topic: word_count_output)
> >>       <-- KTABLE-TOSTREAM-0000000007
> >>
> >> I would like to understand better what (stores: [Counts]) means.
> >>
> >> The documentation says:
> >>
> >> For each state store, Kafka maintains a replicated changelog Kafka
> topic in
> >> which it tracks any state updates.
> >>
> >> As I understand a KTable in Kafka is implemented with an in memory table
> >> (RockDB but in theory could also be an HashMap) backed up by a compacted
> >> log, which is the one of the internal topic created by my sample
> >> application.
> >>
> >> wordcount-application-Counts-changelog
> >>
> >> If the above assumption is correct it means that in that step Kafka
> Streams
> >> is writing back to kafka and start reading again from Kafka.
> >>
> >> So basically the log compacted topic is the sink for one
> read-process-write
> >> cycle and the source for the next.
> >>
> >>    - read from repartition topic -> write on KTable compacted topic
> >>    - read from KTable compacted topic -> process -> write to output
> topic
> >>
> >> Is this correct ?
> >> As per fault tolerance and exaclty-once semantic I would also expect
> Kafka
> >> to use apply transactions. Once again, are my assumptions correct ?
> Where I
> >> could learn more about these concepts ?
> >>
> >> Any help is welcome !
> >>
> >
>
>

Reply via email to