This seems to be related to internal KTable caches. You can disable them
by setting cache size to zero.

http://docs.confluent.io/current/streams/developer-guide.html#memory-management

-Matthias



On 6/14/17 4:08 PM, Caleb Welton wrote:
> Update, if I set `StreamsConfig.NUM_STREAM_THREADS_CONFIG=1` then the
> problem does not manifest, at `StreamsConfig.NUM_STREAM_THREADS_CONFIG=2`
> or higher the problem shows up.
> 
> When the number of threads is 1 the speed of data through the first part of
> the topology (before the ktable) slows down considerably, but it seems to
> slow down to the speed of the output which may be the key.
> 
> That said... Changing the number of stream threads should not impact data
> correctness.  Seems like a bug someplace in kafka.
> 
> 
> 
> On Wed, Jun 14, 2017 at 2:53 PM, Caleb Welton <ca...@autonomic.ai> wrote:
> 
>> I have a topology of
>>     KStream -> KTable -> KStream
>>
>> ```
>>
>> final KStreamBuilder builder = new KStreamBuilder();
>> final KStream<String, Metric> metricStream = builder.stream(ingestTopic);
>> final KTable<Windowed<String>, MyThing> myTable = metricStream
>>         .groupByKey(stringSerde, mySerde)
>>         .reduce(MyThing::merge,
>>                 
>> TimeWindows.of(10000).advanceBy(10000).until(Duration.ofDays(retentionDays).toMillis()),
>>                 tableTopic);
>>
>> myTable.toStream()
>>         .map((key, value) -> { return (KeyValue.pair(key.key(), 
>> value.finalize(key.window()))); })
>>         .to(stringSerde, mySerde, sinkTopic);
>>
>> ```
>>
>>
>> Normally went sent data at 10x a second I expect ~1 output metric for
>> every 100 metrics it receives, based on the 10 second window width.
>>
>> When fed data real time at that rate it seems to do just that.
>>
>> However when I either reprocess on an input topic with a large amount of
>> data or feed data in significantly faster I see a very different behavior.
>>
>> Over the course of 20 seconds I can see 1,440,000 messages being ingested
>> into the ktable, but only 633 emitted from it (expected 14400).
>>
>> Over the next minute the records output creeps to 1796, but then holds
>> steady and does not keep going up to the expected total of 14400.
>>
>> A consumer reading from the sinkTopic ends up finding about 1264, which is
>> lower than the 1796 records I would have anticipated from the number of
>> calls into the final kstream map function.
>>
>> Precise number of emitted records will vary from one run to the next.
>>
>> Where are the extra metrics going?  Is there some commit issue that is
>> causing dropped messages if the ktable producer isn't able to keep up?
>>
>> Any recommendations on where to focus the investigation of the issue?
>>
>> Running Kafka 0.10.2.1.
>>
>> Thanks,
>>   Caleb
>>
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to