Re: Streams reprocessing whole topic when deployed but not locally

2019-07-10 Thread Alessandro Tagliapietra
Thanks Bruno and Patrik, my fault was that since on confluent cloud auto creation is set to false, I've manually created the topics, without taking care of changing the cleanup policy. Doing so the store changelogs kept the whole changes, in fact after enabling the compacting policy the storage us

Re: Streams reprocessing whole topic when deployed but not locally

2019-07-10 Thread Patrik Kleindl
Hi Regarding the I/O, RocksDB has something called write amplification which writes the data to multiple levels internally to enable better optimization at the cost of storage and I/O. This is also the reason the stores can get larger than the topics themselves. This can be modified by RocksDB se

Re: Streams reprocessing whole topic when deployed but not locally

2019-07-10 Thread Bruno Cadonna
Hi Alessandro, > - how do I specify the retention period of the data? Just by setting the > max retention time for the changelog topic? For window and session stores, you can set retention time on a local state store by using Materialized.withRetention(...). Consult the javadocs for details. If

Re: Streams reprocessing whole topic when deployed but not locally

2019-07-09 Thread Alessandro Tagliapietra
I've just noticed that the store topic created automatically by our streams app have different cleanup.policy. I think that's the main reason I'm seeing that big read/write IO, having a compact policy instead of delete would make the topic much smaller. I'll try that to also see the impact on our s

Re: Streams reprocessing whole topic when deployed but not locally

2019-07-09 Thread Alessandro Tagliapietra
Hi Bruno, Oh I see, I'll try to add a persistent disk where the local stores are. I've other questions then: - why is it also writing that much? - how do I specify the retention period of the data? Just by setting the max retention time for the changelog topic? - wouldn't be possible, for examp

Re: Streams reprocessing whole topic when deployed but not locally

2019-07-09 Thread Bruno Cadonna
Hi Alessandro, I am not sure I understand your issue completely. If you start your streams app in a new container without any existing local state, then it is expected that the changelog topics are read from the beginning to restore the local state stores. Am I misunderstanding you? Best, Bruno

Re: Streams reprocessing whole topic when deployed but not locally

2019-07-08 Thread Alessandro Tagliapietra
I think I'm having again this issue, this time though it only happens on some state stores. Here you can find the code and the logs https://gist.github.com/alex88/1d60f63f0eee9f1568d89d5e1900fffc We first seen that our confluent cloud bill went up 10x, then seen that our streams processor was rest

Re: Streams reprocessing whole topic when deployed but not locally

2019-06-06 Thread Alessandro Tagliapietra
I'm not sure, one thing I know for sure is that on the cloud control panel, in the consumer lag page, the offset didn't reset on the input topic, so it was probably something after that. Anyway, thanks a lot for helping, if we experience that again I'll try to add more verbose logging to better un

Re: Streams reprocessing whole topic when deployed but not locally

2019-06-06 Thread Guozhang Wang
Honestly I cannot think of an issue that fixed in 2.2.1 but not in 2.2.0 which could be correlated to your observations: https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20KAFKA%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%202.2.1%20AND%20component%20%3D%20streams%20

Re: Streams reprocessing whole topic when deployed but not locally

2019-06-06 Thread Alessandro Tagliapietra
Yes that's right, could that be the problem? Anyway, so far after upgrading to 2.2.1 from 2.2.0 we didn't experience that problem anymore. Regards -- Alessandro Tagliapietra On Thu, Jun 6, 2019 at 10:50 AM Guozhang Wang wrote: > That's right, but local state is used as a "materialized view"

Re: Streams reprocessing whole topic when deployed but not locally

2019-06-06 Thread Parthasarathy, Mohan
that if you can live with the extra delay at the beginning of the app restart, stateful set is not required From: Guozhang Wang Sent: Thursday, June 6, 2019 10:13 AM To: users@kafka.apache.org Subject: Re: Streams reprocessing whole topic when deployed but not

Re: Streams reprocessing whole topic when deployed but not locally

2019-06-06 Thread Guozhang Wang
That's right, but local state is used as a "materialized view" of your changelog topics: if you have nothing locally, then it has to bootstrap from the beginning of your changelog topic. But I think your question was about the source "sensors-input" topic, not the changelog topic. I looked at the

Re: Streams reprocessing whole topic when deployed but not locally

2019-06-06 Thread Alessandro Tagliapietra
Isn't the windowing state stored in the additional state store topics that I had to additionally create? Like these I have here: sensors-pipeline-KSTREAM-AGGREGATE-STATE-STORE-01-changelog sensors-pipeline-KTABLE-SUPPRESS-STATE-STORE-04-changelog Thank you -- Alessandro Tagliapi

Re: Streams reprocessing whole topic when deployed but not locally

2019-06-06 Thread Guozhang Wang
If you deploy your streams app into a docker container, you'd need to make sure local state directories are preserved, since otherwise whenever you restart all the state would be lost and Streams then has to bootstrap from scratch. E.g. if you are using K8s for cluster management, you'd better use

Re: Streams reprocessing whole topic when deployed but not locally

2019-06-05 Thread Alessandro Tagliapietra
Hi Guozhang, sorry, by "app" i mean the stream processor app, the one shown in pipeline.kt. The app reads a topic of data sent by a sensor each second and generates a 20 second window output to another topic. My "problem" is that when running locally with my local kafka setup, let's say I stop it

Re: Streams reprocessing whole topic when deployed but not locally

2019-06-05 Thread Guozhang Wang
Hello Alessandro, What did you do for `restarting the app online`? I'm not sure I follow the difference between "restart the streams app" and "restart the app online" from your description. Guozhang On Wed, Jun 5, 2019 at 10:42 AM Alessandro Tagliapietra < tagliapietra.alessan...@gmail.com> wr

Streams reprocessing whole topic when deployed but not locally

2019-06-05 Thread Alessandro Tagliapietra
Hello everyone, I've a small streams app, the configuration and part of the code I'm using can be found here https://gist.github.com/alex88/6b7b31c2b008817a24f63246557099bc There's also the log when the app is started locally and when the app is started on our servers connecting to the confluent c