While running my Spark (Stateful) Structured Streaming job I am setting 'maxOffsetsPerTrigger' value to 10 Million. I've noticed that messages are processed faster if I use a large value for this property.
What I am also noticing is that until the batch is completely processed, no messages are getting written to the output Kafka topic. The 'State timeout' is set to 10 minutes so I am expecting to see at least some of the messages after 10 minutes or so BUT messages are not getting written until processing of the next batch is started. Is there any property I can use to kinda 'flush' the messages that are ready to be written? Please let me know. Thanks.