Your observation is correct. The Processor#process() and punctuation callback are executed on a single thread. It's by design to avoid the issue of concurrency (writing thread safe code is hard and we want to avoid putting this burden onto the user). There is currently no plans to make process() and punctuation concurrent, and it would require a larger change inside the runtime code.

Spanning your own thread and calling context.forward() is _not_ safe, and there is currently no way for you to make is safe. The runtime code makes certain assumptions about being single threaded which would break if you call context.forward() from a different thread. (The runtime _always_ assume that context.forward() is called inside process() or inside the punctuation callback.)

The only way forward I can see, would be trying to make the punctuation call shorter, eg, not scanning the full store but only a small part of it, such that the thread can go back to execute process() quicker (it's of course an additional challenge to keep track where you stopped the scan and to resume it...), and to make the punctuation interval shorter.

Hope this helps.

-Matthias

On 11/3/22 11:30 AM, Joshua Suskalo wrote:
I have data that I am storing in a state store which I would like to
periodically run some code over, and the way I have decided to do this is
via a punctuator from inside a Transformer, which gets an iterator over the
state store, performs actions, and forwards events on.

So far, everything works fine, but I do have one issue: messages coming
into the transformer are intended to act as updates to individual values
stored in the state store, and should be incorporated as immediately as
possible, but whenever the punctuator is running the stream thread is tied
up with traversing the iterator and cannot process new messages.

I have written the code in such a way that the transformation function for
these entities is thread safe with respect to the code in the punctuator,
such that the punctuator could fire from one thread while another thread
processes new events without issue, however this was done under a mistaken
understanding of how stream threads are allocated, as I had believed that
punctuators were fired from a different thread as compared to the transform
method.

Inside the punctuator I use the ProcessorContext to forward additional
messages on, as well as in the transform method, and I would like to know
about the thread safety of having those two things happening concurrently
from separate threads, as might occur if I were to have the punctuator
start a thread to perform the iteration, rather than doing the iteration
itself. Is this a safe thing to do, or is this going to be prone to bugs,
even assuming that I have written code to ensure that no individual key in
the state store will be manipulated concurrently from both threads at once?

I've looked carefully through the documentation and looked for others doing
similar things online and have come up empty handed, and while I am happy
to look through the code of kafka streams to find out for myself, that will
take some time as I'm mostly unfamiliar with the codebase, and I was hoping
that I might be able to get an answer more quickly here.

Joshua

Reply via email to