[ https://issues.apache.org/jira/browse/KAFKA-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798610#comment-15798610 ]
Manuel Romero commented on KAFKA-4582: -------------------------------------- Hi again. The error continues happening using the release 0.10.1.1. The problem appears after a rebalance triggered by the session timeout: Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records. check_box After that, the tasks start to commit, and the subsequent error occurs again: Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: org.apache.kafka.streams.errors.StreamsException: stream-thread [StreamThread-4] Failed to rebalance Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:410) Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:242) Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: Caused by: org.apache.kafka.streams.errors.ProcessorStateException: task [0_5] Error while creating the state manager Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011at org.apache.kafka.streams.processor.internals.AbstractTask.<init>(AbstractTask.java:72) Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:89) Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011at org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:633) Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011at org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:660) Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011at org.apache.kafka.streams.processor.internals.StreamThread.access$100(StreamThread.java:69) Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011at org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:124) Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:228) Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:313) Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:277) Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:259) Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1013) Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:979) Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:407) Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011... 1 more Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: Caused by: java.io.IOException: task [0_5] Failed to lock the state directory: /tmp/kafka-streams/dougal_text_pipeline_id/0_5 Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011at org.apache.kafka.streams.processor.internals.ProcessorStateManager.<init>(ProcessorStateManager.java:101) Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011at org.apache.kafka.streams.processor.internals.AbstractTask.<init>(AbstractTask.java:69) Jan 4 15:35:42 ip-172-31-133-151 docker/e59184ec5215[799]: #011... 13 more This error is dumped into the log 15 times, having 6 threads. > KStream job fails in multi-thread mode > --------------------------------------- > > Key: KAFKA-4582 > URL: https://issues.apache.org/jira/browse/KAFKA-4582 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 0.10.0.1, 0.10.1.1 > Environment: t2.large EC2 AWS instance > Reporter: Manuel Romero > > We are experimenting an issue with KStream when using more than one thread. > Currently, we are using 2 threads in a t2.large EC2 AWS instance. According > to the searches we made, this error should be solved in the version we are > currently using. It seems the job fails to rebalance due to an error on the > lock on the state directory. This is the full log. > 16/12/19 16:36:36 INFO internals.AbstractCoordinator: (Re-)joining group > dougal_text_pipeline_id > 16/12/19 16:36:36 INFO assignment.TaskAssignor: stream-thread > [StreamThread-6] Assigning tasks to clients: > {a2a7324e-43da-4b17-878b-53f62a0fd870=[activeTasks: ([]) assignedTasks: ([]) > prevActiveTasks: ([0_32, 0_2, 0_68, 0_38, 0_8, 0_74, 0_44, 0_14, 0_80, 0_50, > 0_20, 0_86, 0_56, 0_26, 0_92, 0_62]) prevAssignedTasks: ([0_32, 0_2, 0_68, > 0_38, 0_8, 0_74, 0_44, 0_14, 0_80, 0_50, 0_20, 0_86, 0_56, 0_26, 0_92, 0_62]) > capacity: 1.0 cost: 0.0]}, prevAssignmentBalanced: false, > prevClientsUnchanged: false, tasks: [0_0, 0_1, 0_2, 0_3, 0_4, 0_5, 0_6, 0_7, > 0_8, 0_9, 0_10, 0_11, 0_12, 0_13, 0_14, 0_15, 0_16, 0_17, 0_18, 0_19, 0_20, > 0_21, 0_22, 0_23, 0_24, 0_25, 0_26, 0_27, 0_28, 0_29, 0_30, 0_31, 0_32, 0_33, > 0_34, 0_35, 0_36, 0_37, 0_38, 0_39, 0_40, 0_41, 0_42, 0_43, 0_44, 0_45, 0_46, > 0_47, 0_48, 0_49, 0_50, 0_51, 0_52, 0_53, 0_54, 0_55, 0_56, 0_57, 0_58, 0_59, > 0_60, 0_61, 0_62, 0_63, 0_64, 0_65, 0_66, 0_67, 0_68, 0_69, 0_70, 0_71, 0_72, > 0_73, 0_74, 0_75, 0_76, 0_77, 0_78, 0_79, 0_80, 0_81, 0_82, 0_83, 0_84, 0_85, > 0_86, 0_87, 0_88, 0_89, 0_90, 0_91, 0_92, 0_93, 0_94, 0_95], replicas: 0 > 16/12/19 16:36:36 INFO assignment.TaskAssignor: stream-thread > [StreamThread-6] Assigned with: > {a2a7324e-43da-4b17-878b-53f62a0fd870=[activeTasks: ([0_0, 0_1, 0_2, 0_3, > 0_4, 0_5, 0_6, 0_7, 0_8, 0_9, 0_10, 0_11, 0_12, 0_13, 0_14, 0_15, 0_16, 0_17, > 0_18, 0_19, 0_20, 0_21, 0_22, 0_23, 0_24, 0_25, 0_26, 0_27, 0_28, 0_29, 0_30, > 0_31, 0_32, 0_33, 0_34, 0_35, 0_36, 0_37, 0_38, 0_39, 0_40, 0_41, 0_42, 0_43, > 0_44, 0_45, 0_46, 0_47, 0_48, 0_49, 0_50, 0_51, 0_52, 0_53, 0_54, 0_55, 0_56, > 0_57, 0_58, 0_59, 0_60, 0_61, 0_62, 0_63, 0_64, 0_65, 0_66, 0_67, 0_68, 0_69, > 0_70, 0_71, 0_72, 0_73, 0_74, 0_75, 0_76, 0_77, 0_78, 0_79, 0_80, 0_81, 0_82, > 0_83, 0_84, 0_85, 0_86, 0_87, 0_88, 0_89, 0_90, 0_91, 0_92, 0_93, 0_94, > 0_95]) assignedTasks: ([0_0, 0_1, 0_2, 0_3, 0_4, 0_5, 0_6, 0_7, 0_8, 0_9, > 0_10, 0_11, 0_12, 0_13, 0_14, 0_15, 0_16, 0_17, 0_18, 0_19, 0_20, 0_21, 0_22, > 0_23, 0_24, 0_25, 0_26, 0_27, 0_28, 0_29, 0_30, 0_31, 0_32, 0_33, 0_34, 0_35, > 0_36, 0_37, 0_38, 0_39, 0_40, 0_41, 0_42, 0_43, 0_44, 0_45, 0_46, 0_47, 0_48, > 0_49, 0_50, 0_51, 0_52, 0_53, 0_54, 0_55, 0_56, 0_57, 0_58, 0_59, 0_60, 0_61, > 0_62, 0_63, 0_64, 0_65, 0_66, 0_67, 0_68, 0_69, 0_70, 0_71, 0_72, 0_73, 0_74, > 0_75, 0_76, 0_77, 0_78, 0_79, 0_80, 0_81, 0_82, 0_83, 0_84, 0_85, 0_86, 0_87, > 0_88, 0_89, 0_90, 0_91, 0_92, 0_93, 0_94, 0_95]) prevActiveTasks: ([]) > prevAssignedTasks: ([]) capacity: 1.0 cost: 41.600000000000016]} > 16/12/19 16:36:36 INFO internals.AbstractCoordinator: Successfully joined > group dougal_text_pipeline_id with generation 21 > 16/12/19 16:36:36 INFO internals.ConsumerCoordinator: Setting newly assigned > partitions [crawler-fashion-content-83, crawler-fashion-content-17, > crawler-fashion-content-50, crawler-fashion-content-91, > crawler-fashion-content-25, crawler-fashion-content-58, > crawler-fashion-content-33, crawler-fashion-content-66, > crawler-fashion-content-0, crawler-fashion-content-41, > crawler-fashion-content-74, crawler-fashion-content-8, > crawler-fashion-content-49, crawler-fashion-content-82, > crawler-fashion-content-16, crawler-fashion-content-57, > crawler-fashion-content-90, crawler-fashion-content-24, > crawler-fashion-content-65, crawler-fashion-content-32, > crawler-fashion-content-73, crawler-fashion-content-7, > crawler-fashion-content-40, crawler-fashion-content-81, > crawler-fashion-content-15, crawler-fashion-content-48, > crawler-fashion-content-89, crawler-fashion-content-23, > crawler-fashion-content-56, crawler-fashion-content-31, > crawler-fashion-content-64, crawler-fashion-content-39, > crawler-fashion-content-72, crawler-fashion-content-6, > crawler-fashion-content-47, crawler-fashion-content-80, > crawler-fashion-content-14, crawler-fashion-content-55, > crawler-fashion-content-88, crawler-fashion-content-22, > crawler-fashion-content-63, crawler-fashion-content-30, > crawler-fashion-content-71, crawler-fashion-content-5, > crawler-fashion-content-38, crawler-fashion-content-79, > crawler-fashion-content-13, crawler-fashion-content-46, > crawler-fashion-content-87, crawler-fashion-content-21, > crawler-fashion-content-54, crawler-fashion-content-4, > crawler-fashion-content-95, crawler-fashion-content-29, > crawler-fashion-content-62, crawler-fashion-content-12, > crawler-fashion-content-37, crawler-fashion-content-70, > crawler-fashion-content-20, crawler-fashion-content-45, > crawler-fashion-content-78, crawler-fashion-content-28, > crawler-fashion-content-53, crawler-fashion-content-86, > crawler-fashion-content-3, crawler-fashion-content-36, > crawler-fashion-content-61, crawler-fashion-content-94, > crawler-fashion-content-11, crawler-fashion-content-44, > crawler-fashion-content-69, crawler-fashion-content-19, > crawler-fashion-content-52, crawler-fashion-content-77, > crawler-fashion-content-27, crawler-fashion-content-60, > crawler-fashion-content-85, crawler-fashion-content-35, > crawler-fashion-content-68, crawler-fashion-content-2, > crawler-fashion-content-93, crawler-fashion-content-43, > crawler-fashion-content-76, crawler-fashion-content-10, > crawler-fashion-content-51, crawler-fashion-content-84, > crawler-fashion-content-18, crawler-fashion-content-59, > crawler-fashion-content-92, crawler-fashion-content-26, > crawler-fashion-content-67, crawler-fashion-content-1, > crawler-fashion-content-34, crawler-fashion-content-75, > crawler-fashion-content-9, crawler-fashion-content-42] for group > dougal_text_pipeline_id > 16/12/19 16:36:36 INFO internals.StreamThread: stream-thread [StreamThread-6] > New partitions [[crawler-fashion-content-83, crawler-fashion-content-17, > crawler-fashion-content-50, crawler-fashion-content-91, > crawler-fashion-content-25, crawler-fashion-content-58, > crawler-fashion-content-33, crawler-fashion-content-66, > crawler-fashion-content-0, crawler-fashion-content-41, > crawler-fashion-content-74, crawler-fashion-content-8, > crawler-fashion-content-49, crawler-fashion-content-82, > crawler-fashion-content-16, crawler-fashion-content-57, > crawler-fashion-content-90, crawler-fashion-content-24, > crawler-fashion-content-65, crawler-fashion-content-32, > crawler-fashion-content-73, crawler-fashion-content-7, > crawler-fashion-content-40, crawler-fashion-content-81, > crawler-fashion-content-15, crawler-fashion-content-48, > crawler-fashion-content-89, crawler-fashion-content-23, > crawler-fashion-content-56, crawler-fashion-content-31, > crawler-fashion-content-64, crawler-fashion-content-39, > crawler-fashion-content-72, crawler-fashion-content-6, > crawler-fashion-content-47, crawler-fashion-content-80, > crawler-fashion-content-14, crawler-fashion-content-55, > crawler-fashion-content-88, crawler-fashion-content-22, > crawler-fashion-content-63, crawler-fashion-content-30, > crawler-fashion-content-71, crawler-fashion-content-5, > crawler-fashion-content-38, crawler-fashion-content-79, > crawler-fashion-content-13, crawler-fashion-content-46, > crawler-fashion-content-87, crawler-fashion-content-21, > crawler-fashion-content-54, crawler-fashion-content-4, > crawler-fashion-content-95, crawler-fashion-content-29, > crawler-fashion-content-62, crawler-fashion-content-12, > crawler-fashion-content-37, crawler-fashion-content-70, > crawler-fashion-content-20, crawler-fashion-content-45, > crawler-fashion-content-78, crawler-fashion-content-28, > crawler-fashion-content-53, crawler-fashion-content-86, > crawler-fashion-content-3, crawler-fashion-content-36, > crawler-fashion-content-61, crawler-fashion-content-94, > crawler-fashion-content-11, crawler-fashion-content-44, > crawler-fashion-content-69, crawler-fashion-content-19, > crawler-fashion-content-52, crawler-fashion-content-77, > crawler-fashion-content-27, crawler-fashion-content-60, > crawler-fashion-content-85, crawler-fashion-content-35, > crawler-fashion-content-68, crawler-fashion-content-2, > crawler-fashion-content-93, crawler-fashion-content-43, > crawler-fashion-content-76, crawler-fashion-content-10, > crawler-fashion-content-51, crawler-fashion-content-84, > crawler-fashion-content-18, crawler-fashion-content-59, > crawler-fashion-content-92, crawler-fashion-content-26, > crawler-fashion-content-67, crawler-fashion-content-1, > crawler-fashion-content-34, crawler-fashion-content-75, > crawler-fashion-content-9, crawler-fashion-content-42]] assigned at the end > of consumer rebalance. > 16/12/19 16:36:36 INFO internals.StreamThread: stream-thread [StreamThread-6] > Creating active task 0_0 with assigned partitions > [[crawler-fashion-content-0]] > 16/12/19 16:36:36 INFO kafka.KafkaStreamsBackboneCoordinator: get() called > 16/12/19 16:36:36 INFO internals.StreamTask: task [0_0] Initializing state > stores > 16/12/19 16:36:36 INFO internals.StreamTask: task [0_0] Initializing > processor nodes of the topology > 16/12/19 16:36:36 INFO kafka.KafkaStreamsBackboneValueTransformer: Init > called with > org.apache.kafka.streams.processor.internals.ProcessorContextImpl@43350b2e > 0_0 > 16/12/19 16:36:36 INFO internals.StreamThread: stream-thread [StreamThread-6] > Creating active task 0_1 with assigned partitions > [[crawler-fashion-content-1]] > 16/12/19 16:36:36 INFO kafka.KafkaStreamsBackboneCoordinator: get() called > 16/12/19 16:36:36 INFO internals.StreamTask: task [0_1] Initializing state > stores > 16/12/19 16:36:36 INFO internals.StreamTask: task [0_1] Initializing > processor nodes of the topology > 16/12/19 16:36:36 INFO kafka.KafkaStreamsBackboneValueTransformer: Init > called with > org.apache.kafka.streams.processor.internals.ProcessorContextImpl@5d045471 > 0_1 > 16/12/19 16:36:36 INFO internals.StreamThread: stream-thread [StreamThread-6] > Creating active task 0_2 with assigned partitions > [[crawler-fashion-content-2]] > 16/12/19 16:36:36 INFO kafka.KafkaStreamsBackboneCoordinator: get() called > 16/12/19 16:36:36 INFO internals.StreamTask: task [0_2] Initializing state > stores > 16/12/19 16:36:36 INFO internals.StreamTask: task [0_2] Initializing > processor nodes of the topology > 16/12/19 16:36:36 INFO kafka.KafkaStreamsBackboneValueTransformer: Init > called with > org.apache.kafka.streams.processor.internals.ProcessorContextImpl@215e155a > 0_2 > 16/12/19 16:36:36 INFO internals.StreamThread: stream-thread [StreamThread-6] > Creating active task 0_3 with assigned partitions > [[crawler-fashion-content-3]] > 16/12/19 16:36:36 INFO kafka.KafkaStreamsBackboneCoordinator: get() called > 16/12/19 16:36:36 INFO internals.StreamTask: task [0_3] Initializing state > stores > 16/12/19 16:36:36 INFO internals.StreamTask: task [0_3] Initializing > processor nodes of the topology > #011at > org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:90) > > #011at > org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:633) > > #011at > org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:660) > > #011at > org.apache.kafka.streams.processor.internals.StreamThread.access$100(StreamThread.java:69) > > #011at > org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:124) > > #011at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:228) > > #011at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:313) > > #011at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:277) > > #011at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:259) > > #011at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1013) > > #011at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:979) > #011at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:407) > > #011at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:242) > > Caused by: java.io.IOException: task [0_6] Failed to lock the state > directory: /tmp/kafka-streams/dougal_text_pipeline_id/0_6 > #011at > org.apache.kafka.streams.processor.internals.ProcessorStateManager.<init>(ProcessorStateManager.java:101) > > #011at > org.apache.kafka.streams.processor.internals.AbstractTask.<init>(AbstractTask.java:69) > > #011... 13 more > 16/12/19 16:36:37 ERROR internals.ConsumerCoordinator: User provided listener > org.apache.kafka.streams.processor.internals.StreamThread$1 for group > dougal_text_pipeline_id failed on partition assignment > org.apache.kafka.streams.errors.ProcessorStateException: task [0_6] Error > while creating the state manager > #011at > org.apache.kafka.streams.processor.internals.AbstractTask.<init>(AbstractTask.java:72) > #011at > org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:90) > > #011at > org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:633) > > #011at > org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:660) > > #011at > org.apache.kafka.streams.processor.internals.StreamThread.access$100(StreamThread.java:69) > > #011at > org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:124) > > #011at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:228) > > #011at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:313) > > #011at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:277) > > #011at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:259) > > #011at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1013) > > #011at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:979) > #011at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:407) > > #011at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:242) > > Caused by: java.io.IOException: task [0_6] Failed to lock the state > directory: /tmp/kafka-streams/dougal_text_pipeline_id/0_6 > #011at > org.apache.kafka.streams.processor.internals.ProcessorStateManager.<init>(ProcessorStateManager.java:101) > > #011at > org.apache.kafka.streams.processor.internals.AbstractTask.<init>(AbstractTask.java:69) > > #011... 13 more > 16/12/19 16:36:37 INFO internals.StreamThread: stream-thread [StreamThread-6] > Shutting down -- This message was sent by Atlassian JIRA (v6.3.4#6332)