Hi John,

Can you please point us to code where Thread-2 will be able to recreate the 
state directory once cleaner is done ?

Also, we see that in https://issues.apache.org/jira/browse/KAFKA-6122, retries 
around locks is removed. Please let us know why retry mechanism is removed?

Also can you please explain below comment in 
AssignedTasks.java#initializeNewTasks function
catch (final LockException e) {
                    // made this trace as it will spam the logs in the poll 
loop.
                    log.trace("Could not create {} {} due to {}; will retry", 
taskTypeName, entry.getKey(), e.getMessage());
                }
Thanks,
Giridhar.

On 2019/11/14 21:25:28, John Roesler <j...@confluent.io> wrote: 
> Hey Navinder,
> 
> I think what's happening is a little different. Let's see if my
> worldview also explains your experiences.
> 
> There is no such thing as "mark for deletion". When a thread loses a
> task, it simply releases its lock on the directory. If no one else on
> the instance claims that lock within `state.cleanup.delay.ms` amount
> of milliseconds, then the state cleaner will itself grab the lock and
> delete the directory. On the other hand, if another thread (or the
> same thread) gets the task back and claims the lock before the
> cleaner, it will be able to re-open the store and use it.
> 
> The default for `state.cleanup.delay.ms` is 10 minutes, which is
> actually short enough that it could pass during a single rebalance (if
> Streams starts recovering a lot of state). I recommend you increase
> `state.cleanup.delay.ms` by a lot, like maybe set it to one hour.
> 
> One thing I'm curious about... You didn't mention if Thread-2
> eventually is able to re-create the state directory (after the cleaner
> is done) and transition to RUNNING. This should be the case. If not, I
> would consider it a bug.
> 
> Thanks,
> -John
> 
> On Thu, Nov 14, 2019 at 3:02 PM Navinder Brar
> <navinder_b...@yahoo.com.invalid> wrote:
> >
> > Hi,
> > We are facing a peculiar situation in the 2.3 version of Kafka Streams. 
> > First of all, I want to clarify if it is possible that a Stream Thread (say 
> > Stream Thread-1) which had got an assignment for a standby task (say 0_0) 
> > can change to Stream Thread-2 on the same host post rebalancing. The issue 
> > we are facing is this is happening for us and post rebalancing since the 
> > Stream Thread-1 had 0_0 and is not assigned back to it, it closes that task 
> > and marks it for deletion(after cleanup delay time), and meanwhile, the 
> > task gets assigned to Stream Thread-2. When the Stream Thread-2 tries to 
> > transition this task to Running, it gets a LockException which is caught in 
> > AssignedTasks#initializeNewTasks(). This makes 0_0 stay in Created state on 
> > Stream Thread-2 and after the cleanup delay is over the task directories 
> > for 0_0 get deleted.
> > Can someone please comment on this behavior.
> > Thanks,Navinder
> 

Reply via email to