As Spark uses micro-batch for streaming, it's unavoidable to adjust the
batch size properly to achieve your expectation of throughput vs latency.
Especially, Spark uses global watermark which doesn't propagate (change)
during micro-batch, you'd want to make the batch relatively small to make
watermark move forward faster.

On Wed, Jul 1, 2020 at 2:54 AM Eric Beabes <[email protected]> wrote:

> While running my Spark (Stateful) Structured Streaming job I am setting
> 'maxOffsetsPerTrigger' value to 10 Million. I've noticed that messages are
> processed faster if I use a large value for this property.
>
> What I am also noticing is that until the batch is completely processed,
> no messages are getting written to the output Kafka topic. The 'State
> timeout' is set to 10 minutes so I am expecting to see at least some of the
> messages after 10 minutes or so BUT messages are not getting written until
> processing of the next batch is started.
>
> Is there any property I can use to kinda 'flush' the messages that are
> ready to be written? Please let me know. Thanks.
>
>

Reply via email to