[ https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729835#comment-13729835 ]
Jay Kreps commented on KAFKA-615: --------------------------------- Okay it looks like the fetch requests are only stopped in Partition.makeFollower, so though my reasoning may be right the assumption is wrong. Here is my proposal: 1. I change this patch to checkpoint the recovery point in Partition.makeFollower. This will be inefficient since we will do this checkpoint once per partition that we become a slave for. 2. I open a second JIRA to optimize the checkpoint logic. Here is the proposal: 1. We add ReplicaManager.addFetchers and .removeFetchers which adds or removes a bunch of fetchers all at once 2. We add LogManager.truncateTo(m: Map[TopicAndPartition, Long). This method will first checkpoint the recovery point, then do the truncates. This is better because it gets the recovery point stuff out of ReplicaManager. 3. We change ReplicaManager.becomeLeaderOrFollower to stop fetchers for all slaves undergoing the change all at once, do the truncate, then start all those fetchers. This may be faster than what we currently have due to no longer having to fight for the fetcher lock many times while IO and network is happening. > Avoid fsync on log segment roll > ------------------------------- > > Key: KAFKA-615 > URL: https://issues.apache.org/jira/browse/KAFKA-615 > Project: Kafka > Issue Type: Bug > Reporter: Jay Kreps > Assignee: Neha Narkhede > Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch, > KAFKA-615-v3.patch, KAFKA-615-v4.patch, KAFKA-615-v5.patch, KAFKA-615-v6.patch > > > It still isn't feasible to run without an application level fsync policy. > This is a problem as fsync locks the file and tuning such a policy so that > the flushes aren't so frequent that seeks reduce throughput, yet not so > infrequent that the fsync is writing so much data that there is a noticable > jump in latency is very challenging. > The remaining problem is the way that log recovery works. Our current policy > is that if a clean shutdown occurs we do no recovery. If an unclean shutdown > occurs we recovery the last segment of all logs. To make this correct we need > to ensure that each segment is fsync'd before we create a new segment. Hence > the fsync during roll. > Obviously if the fsync during roll is the only time fsync occurs then it will > potentially write out the entire segment which for a 1GB segment at 50mb/sec > might take many seconds. The goal of this JIRA is to eliminate this and make > it possible to run with no application-level fsyncs at all, depending > entirely on replication and background writeback for durability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira