[ 
https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-615:
----------------------------

    Attachment: KAFKA-615-v1.patch

Attached a draft patch for a first version of this for early feedback. A few 
details remain to work out.

This patch removes the per-data-directory .kafka_cleanshutdown file as well as 
the concept of a "clean shutdown". The concept of clean shutdown is replaced 
with the concept of "recovery point". The recovery point is the offset from 
which the log must be recovered. Recovery points are checkpointed in a 
per-data-directory file called recovery-point-offset-checkpoint. This uses 
normal offset checkpoint file format.

Previously we always recovered the last log segment unless a clean shutdown was 
recorded. Now we recover from the recovery point--which may mean recovering 
many segments. We do not, however, recover partial segments: if the recovery 
point falls in the middle of a segment we recover that segment from the 
beginning.

On shutdown we force a flush and checkpoint which has the same effect as the 
cleanshutdown file did before.

Deleting the recovery-point-offset-checkpoint file will cause running full 
recovery on your log on restart which is kind of a nice feature if you suspect 
any kind of corruption in the log.

Log.flush now takes an offset argument and flushes from the recovery point up 
to the given offset. This allows more granular control to avoid syncing (and 
hence locking) the active segment.

Log.roll() now uses the scheduler to make its flush asynchronous. This flush 
now only covers up to the segment that is just completed, not the newly created 
segment, so there should be no locking of the active segment any more.

The per-topic flush policy based on # messages and time still remains but now 
it defaults to off so we rely only on 

I did some preliminary performance testing and we can indeed run with no 
application-level flush policy with reasonable latency which is both convenient 
(no tuning to do) and yields much better throughput. I will do more testing and 
report results.
                
> Avoid fsync on log segment roll
> -------------------------------
>
>                 Key: KAFKA-615
>                 URL: https://issues.apache.org/jira/browse/KAFKA-615
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jay Kreps
>            Assignee: Neha Narkhede
>         Attachments: KAFKA-615-v1.patch
>
>
> It still isn't feasible to run without an application level fsync policy. 
> This is a problem as fsync locks the file and tuning such a policy so that 
> the flushes aren't so frequent that seeks reduce throughput, yet not so 
> infrequent that the fsync is writing so much data that there is a noticable 
> jump in latency is very challenging.
> The remaining problem is the way that log recovery works. Our current policy 
> is that if a clean shutdown occurs we do no recovery. If an unclean shutdown 
> occurs we recovery the last segment of all logs. To make this correct we need 
> to ensure that each segment is fsync'd before we create a new segment. Hence 
> the fsync during roll.
> Obviously if the fsync during roll is the only time fsync occurs then it will 
> potentially write out the entire segment which for a 1GB segment at 50mb/sec 
> might take many seconds. The goal of this JIRA is to eliminate this and make 
> it possible to run with no application-level fsyncs at all, depending 
> entirely on replication and background writeback for durability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to