I'm running my structured streaming jobs in EMR. We were thinking a worst case scenario recovery situation would be to spin up another cluster and set startingOffsets to earliest (our Kafka cluster has a retention policy of 7 days).
My observation is that the job never catches up to latest. This is not acceptable. I've set the number of partitions for the topic to 6. I've tried using a cluster of 4 in EMR. The producer rate for this topic is 4 events/second. Does anyone have any suggestions on what I can do to have my consumer catch up to latest faster?