Hi John, Can you please point us to code where Thread-2 will be able to recreate the state directory once cleaner is done ?
Also, we see that in https://issues.apache.org/jira/browse/KAFKA-6122, retries around locks is removed. Please let us know why retry mechanism is removed? Also can you please explain below comment in AssignedTasks.java#initializeNewTasks function catch (final LockException e) { // made this trace as it will spam the logs in the poll loop. log.trace("Could not create {} {} due to {}; will retry", taskTypeName, entry.getKey(), e.getMessage()); } Thanks, Giridhar. On 2019/11/14 21:25:28, John Roesler <j...@confluent.io> wrote: > Hey Navinder, > > I think what's happening is a little different. Let's see if my > worldview also explains your experiences. > > There is no such thing as "mark for deletion". When a thread loses a > task, it simply releases its lock on the directory. If no one else on > the instance claims that lock within `state.cleanup.delay.ms` amount > of milliseconds, then the state cleaner will itself grab the lock and > delete the directory. On the other hand, if another thread (or the > same thread) gets the task back and claims the lock before the > cleaner, it will be able to re-open the store and use it. > > The default for `state.cleanup.delay.ms` is 10 minutes, which is > actually short enough that it could pass during a single rebalance (if > Streams starts recovering a lot of state). I recommend you increase > `state.cleanup.delay.ms` by a lot, like maybe set it to one hour. > > One thing I'm curious about... You didn't mention if Thread-2 > eventually is able to re-create the state directory (after the cleaner > is done) and transition to RUNNING. This should be the case. If not, I > would consider it a bug. > > Thanks, > -John > > On Thu, Nov 14, 2019 at 3:02 PM Navinder Brar > <navinder_b...@yahoo.com.invalid> wrote: > > > > Hi, > > We are facing a peculiar situation in the 2.3 version of Kafka Streams. > > First of all, I want to clarify if it is possible that a Stream Thread (say > > Stream Thread-1) which had got an assignment for a standby task (say 0_0) > > can change to Stream Thread-2 on the same host post rebalancing. The issue > > we are facing is this is happening for us and post rebalancing since the > > Stream Thread-1 had 0_0 and is not assigned back to it, it closes that task > > and marks it for deletion(after cleanup delay time), and meanwhile, the > > task gets assigned to Stream Thread-2. When the Stream Thread-2 tries to > > transition this task to Running, it gets a LockException which is caught in > > AssignedTasks#initializeNewTasks(). This makes 0_0 stay in Created state on > > Stream Thread-2 and after the cleanup delay is over the task directories > > for 0_0 get deleted. > > Can someone please comment on this behavior. > > Thanks,Navinder >