kafka-streams / window expiration because of shuffling

2020-11-20 Thread Mathieu D
Hello there, We're processing IOT device data, each device sending several metrics. When we upgrade our streams app, we set a brand new 'application.id' to reprocess a bit of past data, to warm up the state stores and aggregations to make sure all outputs will be valid. Downstream is designed for

Reading all messages from a Kafka topic for a state

2020-11-20 Thread Tomer Cohen
Hello everyone I have the following use case: I have two Kafka topics, one is meant to be used as a stream of incoming messages to be processed, the other is meant as a store of records that is meant to be used as a bootstrap to the initial state of the application. Is there a way to do the foll

Re: Kafka Streams: SessionWindowedSerde Vs TimeWindowedSerde. Ambiguous implicit values

2020-11-20 Thread Daniel Hinojosa
By the way. It was even cleaner than that. I decided on a hunch this morning that there has to be something simpler and cleaner Checked the documentation and there it was. You can just import these two statements and just do the stream without implicit bindings: import org.apache.kafka.streams.s

Re: Kafka Streams: SessionWindowedSerde Vs TimeWindowedSerde. Ambiguous implicit values

2020-11-20 Thread Eric Beabes
Wow. This is amazing Daniel. THANKS A LOT! On Fri, Nov 20, 2020 at 9:44 AM Daniel Hinojosa wrote: > By the way. It was even cleaner than that. I decided on a hunch this > morning that there has to be something simpler and cleaner Checked the > documentation and there it was. > > You can just im

Re: Kafka Streams: SessionWindowedSerde Vs TimeWindowedSerde. Ambiguous implicit values

2020-11-20 Thread Eric Beabes
Daniel, Thanks again. Accidentally sent the previous message before finishing typing the whole thing. Anyway, I've so much to learn about these 'implicits' but I generally get the idea. It's compiling fine now. It's sending out "blank" messages but that's the next challenge. Thanks again. On

Kafka Streams: Unexpected data loss

2020-11-20 Thread Jeffrey Goderie
Hi all, We recently started using Kafka Streams and we encountered an unexpected issue with our Streams application. Using the following topology we ran into data loss: Topologies: Sub-topology: 0 Source: meteringpoints-source (topics: [serving.meteringpoints.mad-meteringpoint]) -->

Re: kafka-streams / window expiration because of shuffling

2020-11-20 Thread John Roesler
Hi Mathieu, Ah, that is unfortunate. I believe your analysis is correct. In general, we have no good solution to the problem of upstream tasks moving ahead of each other and causing disorder in the repartition topics. Guozhang has done a substantial amount of thinking on this subject, though, a

Kafka Streams: Unexpected data loss

2020-11-20 Thread Jeffrey Goderie
Hi all, We recently started using Kafka Streams and we encountered an unexpected issue with our Streams application. Using the following topology we ran into data loss: Topologies: Sub-topology: 0 Source: meteringpoints-source (topics: [serving.meteringpoints.mad-meteringpoint]) -->

Re: Reading all messages from a Kafka topic for a state

2020-11-20 Thread Matthias J. Sax
Messages are processed in timestamp order across topics. Thus, if your messages in the `KTable` topic have smaller timestamps than the records in the other topics, they will be processed first and thus effectively bootstrap your KTable state store. You many want to increase `max.taskl.idle.ms` par

Re: KIP to Gracefully handle timeout exception on kafka streams

2020-11-20 Thread Matthias J. Sax
Yes, if brokers are upgraded via rolling bounce, and the embedded clients are configured with large enough timeouts and retries, they should just fail over to running brokers if a single broker is bounced. If you get a timeout exception, than KafkaStreams dies atm -- we have KIP-572 in-flight that

Re: Kafka Streams: Unexpected data loss

2020-11-20 Thread John Roesler
Hello Jeffrey, I’m sorry for the trouble. I appreciate your diligence in tracking this down. In reading your description, nothing jumps out to me as problematic. I’m a bit at a loss as to what may have been the problem. >- Is there a realistic scenario (e.g. crash, rebalance) which you ca

Re: KIP to Gracefully handle timeout exception on kafka streams

2020-11-20 Thread Pushkar Deole
Thanks Matthias... We are already on kafka 2.5.0, and https://issues.apache.org/jira/browse/KAFKA-8803 mentions that these type of issues are fixed in 2.5.0 Is KIP-572 planned for 2.7.0 ? Also, for timeout and retries, can you provide which parameters should we configure to higher values for now