Hi, Currently we have a streaming use case where we have a flink application which runs on a session cluster which is responsible for reading data from Kafka source which is basically table transaction events getting ingested into the upstream kafka topic which is converted to a row and then deduplicated to extract distinct rows and persist via a sink function to an external warehouse such as vertica or snowflake schema.
Now initially what we have observed by using TumblingWindow windows assigner implementation is that the state sizes are growing unconditionally even when we have tuned rocksdb options and provided a good chunk of managed memory. We are able to read more than 150000 records within a period of 4 mins which is our time window set based on our requirements. Now one optimization which I see is suggested through the flink docs in order to reduce the state size is to use incremental aggregation using reduce or aggregate functions available. Now I did use aggregate function along with window in order implement the same but now I am observing that the consumption rate has reduced drastically post this implementation which has increased the overall throughput of the pipeline. Any thoughts as to why this can happen? -- Thanks and Regards, Sumanta Majumdar