Hi,

Currently we have a streaming use case where we have a flink application
which runs on a session cluster which is responsible for reading data from
Kafka source which is basically table transaction events getting ingested
into the upstream kafka topic which is converted to a row and then
deduplicated to extract distinct rows and persist via a sink function to an
external warehouse such as vertica or snowflake schema.

Now initially what we have observed by using TumblingWindow windows
assigner implementation is that the state sizes are growing unconditionally
even when we have tuned rocksdb options and provided a good chunk of
managed memory.

We are able to read more than 150000 records within a period of 4 mins
which is our time window set based on our requirements.

Now one optimization which I see is suggested through the flink docs in
order to reduce the state size is to use incremental aggregation using
reduce or aggregate functions available.

Now I did use aggregate function along with window in order implement the
same but now I am observing that the consumption rate has reduced
drastically post this implementation which has increased the overall
throughput of the pipeline.

Any thoughts as to why this can happen?

-- 
Thanks and Regards,
Sumanta Majumdar

Reply via email to