Any thoughts on this? On Fri, Apr 21, 2023 at 4:10 PM Sumanta Majumdar <feel.suma...@gmail.com> wrote:
> Hi, > > Currently we have a streaming use case where we have a flink application > which runs on a session cluster which is responsible for reading data from > Kafka source which is basically table transaction events getting ingested > into the upstream kafka topic which is converted to a row and then > deduplicated to extract distinct rows and persist via a sink function to an > external warehouse such as vertica or snowflake schema. > > Now initially what we have observed by using TumblingWindow windows > assigner implementation is that the state sizes are growing unconditionally > even when we have tuned rocksdb options and provided a good chunk of > managed memory. > > We are able to read more than 150000 records within a period of 4 mins > which is our time window set based on our requirements. > > Now one optimization which I see is suggested through the flink docs in > order to reduce the state size is to use incremental aggregation using > reduce or aggregate functions available. > > Now I did use aggregate function along with window in order implement the > same but now I am observing that the consumption rate has reduced > drastically post this implementation which has increased the overall > throughput of the pipeline. > > Any thoughts as to why this can happen? > > -- > Thanks and Regards, > Sumanta Majumdar > -- Thanks and Regards, Sumanta Majumdar