[ https://issues.apache.org/jira/browse/KAFKA-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035452#comment-16035452 ]
Kevin Chen commented on KAFKA-5070: ----------------------------------- we saw the similar exceptions in our stream applications too, we are using 10.2 client/server. We have 6 threads per instance. we have 12 partitions on the incoming topic. so total 12 tasks, we are running total 2 instances on 2 nodes(one instance each). We got the following exceptions in different situations ! org.apache.kafka.streams.errors.LockException: task [0_1] Failed to lock the state directory: at one time, the root cause is rocskdb.flush() exception(we are not using custom implementation), restart fix the problem. only saw that once. But most time it happens when to re-balance, like we need bring down a node. In this case, no apparent root cause, here is all I got from the stack trace WARN [2017-06-02 21:23:47,591] org.apache.kafka.streams.processor.internals.StreamThread: Could not create task 0_6. Will retry. ! org.apache.kafka.streams.errors.LockException: task [0_6] Failed to lock the state directory: /tmp/aggregator-one/kafka-streams-state/aggregator-one-1/0_6 ! at org.apache.kafka.streams.processor.internals.ProcessorStateManager.<init>(ProcessorStateManager.java:102) ! at org.apache.kafka.streams.processor.internals.AbstractTask.<init>(AbstractTask.java:73) ! at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:108) ! at org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:834) ! at org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:1207) ! at org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1180) ! at org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:937) ! at org.apache.kafka.streams.processor.internals.StreamThread.access$500(StreamThread.java:69) ! at org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:236) ! at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:255) ! at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:339) ! at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303) ! at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:286) ! at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030) ! at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) ! at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:582) ! at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368) > org.apache.kafka.streams.errors.LockException: task [0_18] Failed to lock the > state directory: /opt/rocksdb/pulse10/0_18 > ------------------------------------------------------------------------------------------------------------------------ > > Key: KAFKA-5070 > URL: https://issues.apache.org/jira/browse/KAFKA-5070 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 0.10.2.0 > Environment: Linux Version > Reporter: Dhana > Assignee: Matthias J. Sax > Attachments: RocksDB_LockStateDirec.7z > > > Notes: we run two instance of consumer in two difference machines/nodes. > we have 400 partitions. 200 stream threads/consumer, with 2 consumer. > We perform HA test(on rebalance - shutdown of one of the consumer/broker), we > see this happening > Error: > 2017-04-05 11:36:09.352 WARN StreamThread:1184 StreamThread-66 - Could not > create task 0_115. Will retry. > org.apache.kafka.streams.errors.LockException: task [0_115] Failed to lock > the state directory: /opt/rocksdb/pulse10/0_115 > at > org.apache.kafka.streams.processor.internals.ProcessorStateManager.<init>(ProcessorStateManager.java:102) > at > org.apache.kafka.streams.processor.internals.AbstractTask.<init>(AbstractTask.java:73) > at > org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:108) > at > org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:834) > at > org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:1207) > at > org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1180) > at > org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:937) > at > org.apache.kafka.streams.processor.internals.StreamThread.access$500(StreamThread.java:69) > at > org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:236) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:255) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:339) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:286) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:582) > at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368) -- This message was sent by Atlassian JIRA (v6.3.15#6346)