Hey Navinder,

I think what's happening is a little different. Let's see if my
worldview also explains your experiences.

There is no such thing as "mark for deletion". When a thread loses a
task, it simply releases its lock on the directory. If no one else on
the instance claims that lock within `state.cleanup.delay.ms` amount
of milliseconds, then the state cleaner will itself grab the lock and
delete the directory. On the other hand, if another thread (or the
same thread) gets the task back and claims the lock before the
cleaner, it will be able to re-open the store and use it.

The default for `state.cleanup.delay.ms` is 10 minutes, which is
actually short enough that it could pass during a single rebalance (if
Streams starts recovering a lot of state). I recommend you increase
`state.cleanup.delay.ms` by a lot, like maybe set it to one hour.

One thing I'm curious about... You didn't mention if Thread-2
eventually is able to re-create the state directory (after the cleaner
is done) and transition to RUNNING. This should be the case. If not, I
would consider it a bug.

Thanks,
-John

On Thu, Nov 14, 2019 at 3:02 PM Navinder Brar
<navinder_b...@yahoo.com.invalid> wrote:
>
> Hi,
> We are facing a peculiar situation in the 2.3 version of Kafka Streams. First 
> of all, I want to clarify if it is possible that a Stream Thread (say Stream 
> Thread-1) which had got an assignment for a standby task (say 0_0) can change 
> to Stream Thread-2 on the same host post rebalancing. The issue we are facing 
> is this is happening for us and post rebalancing since the Stream Thread-1 
> had 0_0 and is not assigned back to it, it closes that task and marks it for 
> deletion(after cleanup delay time), and meanwhile, the task gets assigned to 
> Stream Thread-2. When the Stream Thread-2 tries to transition this task to 
> Running, it gets a LockException which is caught in 
> AssignedTasks#initializeNewTasks(). This makes 0_0 stay in Created state on 
> Stream Thread-2 and after the cleanup delay is over the task directories for 
> 0_0 get deleted.
> Can someone please comment on this behavior.
> Thanks,Navinder

Reply via email to