KTable as a compacted topic, implications

Raffaele Esposito Tue, 19 May 2020 08:22:20 -0700

This is the topology of a simple word count:

Topologies:
   Sub-topology: 0
    Source: KSTREAM-SOURCE-0000000000 (topics: [word_count_input])
      --> KSTREAM-FLATMAPVALUES-0000000001
    Processor: KSTREAM-FLATMAPVALUES-0000000001 (stores: [])
      --> KSTREAM-KEY-SELECT-0000000002
      <-- KSTREAM-SOURCE-0000000000
    Processor: KSTREAM-KEY-SELECT-0000000002 (stores: [])
      --> KSTREAM-FILTER-0000000005
      <-- KSTREAM-FLATMAPVALUES-0000000001
    Processor: KSTREAM-FILTER-0000000005 (stores: [])
      --> KSTREAM-SINK-0000000004
      <-- KSTREAM-KEY-SELECT-0000000002
    Sink: KSTREAM-SINK-0000000004 (topic: Counts-repartition)
      <-- KSTREAM-FILTER-0000000005


  Sub-topology: 1
    Source: KSTREAM-SOURCE-0000000006 (topics: [Counts-repartition])
      --> KSTREAM-AGGREGATE-0000000003
    Processor: KSTREAM-AGGREGATE-0000000003 (stores: [Counts])
      --> KTABLE-TOSTREAM-0000000007
      <-- KSTREAM-SOURCE-0000000006
    Processor: KTABLE-TOSTREAM-0000000007 (stores: [])
      --> KSTREAM-SINK-0000000008
      <-- KSTREAM-AGGREGATE-0000000003
    Sink: KSTREAM-SINK-0000000008 (topic: word_count_output)
      <-- KTABLE-TOSTREAM-0000000007

I would like to understand better what (stores: [Counts]) means.

The documentation says:

For each state store, Kafka maintains a replicated changelog Kafka topic in
which it tracks any state updates.

As I understand a KTable in Kafka is implemented with an in memory table
(RockDB but in theory could also be an HashMap) backed up by a compacted
log, which is the one of the internal topic created by my sample
application.

wordcount-application-Counts-changelog

If the above assumption is correct it means that in that step Kafka Streams
is writing back to kafka and start reading again from Kafka.

So basically the log compacted topic is the sink for one read-process-write
cycle and the source for the next.

   - read from repartition topic -> write on KTable compacted topic
   - read from KTable compacted topic -> process -> write to output topic

Is this correct ?
As per fault tolerance and exaclty-once semantic I would also expect Kafka
to use apply transactions. Once again, are my assumptions correct ? Where I
could learn more about these concepts ?

Any help is welcome !

KTable as a compacted topic, implications

Reply via email to