Live-lock between consumer thread and heartbeat thread

je.ik Thu, 02 Feb 2017 02:46:28 -0800

Hi all,

I have a question about a very suspicious behavior I see duringconsuming messages using manual synchronous commit with Kafka 0.10.1.0.The code looks something like this:


try (KafkaConsumer<...> consumer = ...) {

Map<TopicPartition, OffsetAndMetadata> commitMap =Collections.synchronizedMap(...);

  while (!Thread.currentThread().isInterrupted()) {
    ConsumerRecords records = consumer.poll(..);
    for (...) {
      // queue records for asynchronous processing in different thread.
      // when the asynchronous processing finishes, it updates the
      // `commitMap', so it has to be synchronized somehow
    }
    synchronized (commitMap) {
      // commit if we have anything to commit
      if (!commitMap.isEmpty()) {
        consumer.commitSync(commitMap);
        commitMap.clear();
      }
    }
  }
}

Now, what time to time happens in my case is that the consumer thread isstuck in the call to `commitSync`. By straing the PID I found out thatit periodically epolls on an *empty* list of file descriptors. Byfurther investigation I found out, that response to the `commitSync` isbeing handled by the kafka-coordinator-heartbeat-thread, which duringhandling of the response needs to access the `commitMap`, and thereforeblocks, because the lock is being held by the application main thread.Therefore, the whole consumption stops and ends in live-lock. Thesolution in my case was to clone the map and unsynchronize the call to`commitSync` like this:


  final Map<TopicPartition, OffsetAndMetadata> clone;
  synchronized (commitMap) {
    if (!commitMap.isEmpty()) {
      clone = new HashMap<>(commitMap);
      commitMap.clear();
    } else {
      clone = null;
    }
  }
  if (clone != null) {
    consumer.commitSync(clone);
  }

which seems to work fine. My question is whether my interpretation ofthe problem is correct and if so, should be anything done to avoid this?I see two possibilities - either the call to `commitSync` should clonethe map itself, or there should be somehow guaranteed that the samethread that issues synchronous requests receives the response. Am I right?


Thanks for comments,
 best,
  Jan

Live-lock between consumer thread and heartbeat thread

Reply via email to