I'm running my structured streaming jobs in EMR. We were thinking a worst
case scenario recovery situation would be to spin up another cluster and
set startingOffsets to earliest (our Kafka cluster has a retention policy
of 7 days).

My observation is that the job never catches up to latest. This is not
acceptable. I've set the number of partitions for the topic to 6. I've
tried using a cluster of 4 in EMR.

The producer rate for this topic is 4 events/second. Does anyone have any
suggestions on what I can do to have my consumer catch up to latest faster?

Reply via email to