Re: Kafka Streams Internal Topic Retention not applied

2018-04-06 Thread Matthias J. Sax
Björn, broker configs are default config but can be overwritten when a topic is created. And this happens when Kafka Streams creates internal topics. Thus, you need to change the setting Kafka Streams applies when creating topics. Also note: if cleanup.policy = compact, the setting of `log.retent

Re: Kafka Streams Internal Topic Retention not applied

2018-04-06 Thread Björn Häuser
Hello Guozhang thanks. So after reading much more docs I still do not have the complete picture. These are our relevant settings from kafka broker configuration: log.cleanup.policy=delete # set log.retention.bytes to 15 gb log.retention.bytes=16106127360 # set log.retention.hours to 30 days log

Re: Kafka Mirrormaker issue

2018-04-06 Thread Jeff Field
I'm hitting the same problem, even with the new consumer, on MirrorMaker 0.9 reading from a 0.9 Kafka cluster and producing to a 0.11 Kafka cluster. On 3/30/18, 3:56 PM, "Andrew Otto" wrote: I’m currently stuck on MirrorMaker version 0.9, and I’m not sure when the new consumer client b

Re: What causes partitions to be revoked?

2018-04-06 Thread Scott Thibault
I've resolved this issue. In the end, it was caused by heartbeat expiration. My polling is correct, however, it seems to be suffering from thread starvation. I'm not sure what the fix is in general for that. Fortunately, increasing session.timeout.ms was enough for my situation. Thanks --Scott

Re: Kafka-streams: mix Processor API with windowed grouping

2018-04-06 Thread Dmitriy Vsekhvalnov
Thanks guys, ok, question then - is it possible to use state store with .aggregate()? Here are some details on counting, we basically looking for TopN + Remaining calculation. Example: - incoming data: api url -> hit count - we want output: Top 20 urls per each domain per hour + remaining coun

Re: Kafka-streams: mix Processor API with windowed grouping

2018-04-06 Thread Guozhang Wang
Hello Dmitriy, You can "simulate" an lower-level processor API by 1) adding the stores you need via the builder#addStore(); 2) do a manual "through" call after "selectKey" (the selected key will be the same as your original groupBy call), and then from the repartitioned stream add the `transform()

Re: Kafka-streams: mix Processor API with windowed grouping

2018-04-06 Thread Matthias J. Sax
KGroupedStream and TimeWindowedKStream are only logical representations at DSL level. They don't really "do" anything. Thus, you can mimic them as follows: builder.addStore(...) in.selectKey().through(...).transform(..., "storeName"). selectKey() set's the new key for the grouping and the throug

Re: Kafka Consumer rebalancing frequently

2018-04-06 Thread Reitzig, Jochen, DSE Extern
Hi, Great explanation. However, I keep experiencing situations in which the "rebalancing state" just does not disappear. To my understanding, latest by the time of the session timer having expired across all consumers, the rebalancing state should go away. The only way, I manage to get rid of

Kafka-streams: mix Processor API with windowed grouping

2018-04-06 Thread Dmitriy Vsekhvalnov
Hey, good day everyone, another kafka-streams friday question. We hit the wall with DSL implementation and would like to try low-level Processor API. What we looking for is to: - repartition incoming source stream via grouping records by some fields + windowed (hourly, daily, e.t.c). - and

Batch.max.rows question

2018-04-06 Thread Mike MBS
I have a table of about 100 million rows, lets call it tbl_large. I am exposing this data as a kafka stream using a confluent connector to poll the table every second or so. For (no) good reason, i need to do a clean load of this data each weekend in our lower environment (dev and qa) from data i

Re: What causes partitions to be revoked?

2018-04-06 Thread Scott Thibault
I'm following the approach of the AdvancedConsumer example, which is to pause the partition and continue to invoke poll (several times a second) while the actual processing is done on another thread. --Scott On Fri, Apr 6, 2018 at 8:53 AM, Gabriel Giussi wrote: > Ok, the other thing that come

Re: What causes partitions to be revoked?

2018-04-06 Thread Gabriel Giussi
Ok, the other thing that come to my mind is to check the max.poll.interval.ms configuration. You said that "poll is invoked regularly" but this isn't very specific. The default value for max.poll.interval.ms is 5 minutes (30 millis) so if you are executing a poll regularly each 6 minutes, you w