Re: KStream Close Processor

2016-04-11 Thread Guozhang Wang
Yeah we can definitely do better in documentation. While regarding the API changes I would prefer to hold and think through if such use cases are common in pattern, and that if we can even re-order the closing process to get around the issue I mentioned above if it is required. Guozhang On Mon, A

Re: KStream Close Processor

2016-04-11 Thread Matthias J. Sax
What about extending the API with a method beforeClose() that enables the user to flush buffered data? Maybe we can also rename close() to afterClose(), to make the difference clear. At least, we should document when close() is called -- from a user point of view, I would expect that close() allow

Re: KStream Close Processor

2016-04-10 Thread Guozhang Wang
Re 1), Kafka Streams intentionally close all underlying clients before closing processors since some of closing the processors require shutting down its processor state managers, for example we need to make sure producer's message sends // have all been acked before the state manager records // cha

Re: KStream Close Processor

2016-04-10 Thread Jay Kreps
Also, I wonder if this issue is related: https://issues.apache.org/jira/browse/KAFKA-3135 -Jay On Sun, Apr 10, 2016 at 8:58 AM, Jay Kreps wrote: > Two things: > 1. Caching data in the processor is a bit dangerous since it will be lost > on failure. Nonetheless, I think you have a point that we

Re: KStream Close Processor

2016-04-10 Thread Jay Kreps
Two things: 1. Caching data in the processor is a bit dangerous since it will be lost on failure. Nonetheless, I think you have a point that we should ideally close the processors first, then commit in case they send any messages on close. 2. The issue you describe shouldn't happen for the reason y

Re: KStream Close Processor

2016-04-10 Thread Michael D. Coon
Guozhang,    Yes, I'm merging message contents into larger messages before sending to the producer. We have demonstrated that many tiny messages of < 1K causes tremendous slow down on the down stream consumers. Not because of memory contention but because of the broker filling up the max fetch r

Re: KStream Close Processor

2016-04-09 Thread Guozhang Wang
Mike, Not clear what do you mean by "buffering up the contents". Producer itself already did some buffering and batching when sending to Kafka. Did you actually "merge" multiple small messages into one large message before giving it to the producer in the app code? In either case, I am not sure ho

Re: KStream Close Processor

2016-04-09 Thread Michael D. Coon
Guozhang,    In my processor, I'm buffering up contents of the final messages in order to make them larger. This is to optimize throughput and avoid tiny messages from being injected downstream. So nothing is being pushed to the producer until my configured thresholds are met in the buffering me

Re: KStream Close Processor

2016-04-08 Thread Guozhang Wang
Hi Michael, When you call KafkaStreams.close(), it will first trigger a commitAll() function, which will 1) flush local state store if necessary; 2) flush messages buffered in producer; 3) commit offsets on consumer. Then it will close the producer / consumer clients and shutdown the tasks. So whe

KStream Close Processor

2016-04-08 Thread Michael D. Coon
All,    I'm seeing my processor's "close" method being called AFTER my downstream producer has been closed. I had assumed that on close I would be able to flush whatever I had been buffering up to send to kafka topic. In other words, we've seen significant performance differences in building flo