+1. This is an important performance fix for Windows-based clusters. -Jakob
On 22 April 2015 at 03:25, Honghai Chen <honghai.c...@microsoft.com> wrote: > Fix the issue Sriram mentioned. Code review and jira/KIP updated. > > Below are detail description for the scenarios: > 1.If do clear shutdown, the last log file will be truncated to its real size > since the close() function of FileMessageSet will call trim(), > 2.If crash, then when restart, will go through the process of recover() and > the last log file will be truncate to its real size, (and the position will > be moved to end of the file) > 3.When service start and open existing file > a.Will run the LogSegment constructor which has NO parameter "preallocate", > b.Then in FileMessageSet, the "end" in FileMessageSet will be Int.MaxValue, > and then "channel.position(math.min(channel.size().toInt, end))" will make > the position be end of the file, > c.If recover needed, the recover function will truncate file to end of valid > data, and also move the position to it, > > 4.When service running and need create new log segment and new FileMessageSet > > a.If preallocate = truei.the "end" in FileMessageSet will be 0, the file > size will be "initFileSize", and then > "channel.position(math.min(channel.size().toInt, end))" will make the > position be 0, > > b.Else if preallocate = falsei.backward compatible, the "end" in > FileMessageSet will be Int.MaxValue, the file size will be "0", and then > "channel.position(math.min(channel.size().toInt, end))" will make the > position be 0, > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-20+-+Enable+log+preallocate+to+improve+consume+performance+under+windows+and+some+old+Linux+file+system > https://issues.apache.org/jira/browse/KAFKA-1646 > https://reviews.apache.org/r/33204/diff/2/ > > Thanks, Honghai Chen > http://aka.ms/kafka > http://aka.ms/manifold > > -----Original Message----- > From: Honghai Chen > Sent: Wednesday, April 22, 2015 11:12 AM > To: dev@kafka.apache.org > Subject: RE: [DISCUSS] KIP 20 Enable log preallocate to improve consume > performance under windows and some old Linux file system > > Hi Sriram, > One sentence of code missed, will update code review board and KIP > soon. > For LogSegment and FileMessageSet, must use different constructor > function for existing file and new file, then the code " > channel.position(math.min(channel.size().toInt, end)) " will make sure the > position at end of existing file. > > Thanks, Honghai Chen > > -----Original Message----- > From: Jay Kreps [mailto:jay.kr...@gmail.com] > Sent: Wednesday, April 22, 2015 5:22 AM > To: dev@kafka.apache.org > Subject: Re: [DISCUSS] KIP 20 Enable log preallocate to improve consume > performance under windows and some old Linux file system > > My understanding of the patch is that clean shutdown truncates the file back > to it's true size (and reallocates it on startup). Hard crash is handled by > the normal recovery which should truncate off the empty portion of the file. > > On Tue, Apr 21, 2015 at 10:52 AM, Sriram Subramanian < > srsubraman...@linkedin.com.invalid> wrote: > >> Could you describe how recovery works in this mode? Say, we had a 250 >> MB preallocated segment and we wrote till 50MB and crashed. Till what >> point do we recover? Also, on startup, how is the append end pointer >> set even on a clean shutdown? How does the FileChannel end position >> get set to 50 MB instead of 250 MB? The existing code might just work >> for it but explaining that would be useful. >> >> On 4/21/15 9:40 AM, "Neha Narkhede" <n...@confluent.io> wrote: >> >> >+1. I've tried this on Linux and it helps reduce the spikes in append >> >+(and >> >hence producer) latency for high throughput writes. I am not entirely >> >sure why but my suspicion is that in the absence of preallocation, >> >you see spikes writes need to happen faster than the time it takes >> >Linux to allocate the next block to the file. >> > >> >It will be great to see some performance test results too. >> > >> >On Tue, Apr 21, 2015 at 9:23 AM, Jay Kreps <jay.kr...@gmail.com> wrote: >> > >> >> I'm also +1 on this. The change is quite small and may actually >> >>help perf on Linux as well (we've never tried this). >> >> >> >> I have a lot of concerns on testing the various failure conditions >> >> but I think since it will be off by default the risk is not too high. >> >> >> >> -Jay >> >> >> >> On Mon, Apr 20, 2015 at 6:58 PM, Honghai Chen >> >><honghai.c...@microsoft.com> >> >> wrote: >> >> >> >> > I wrote a KIP for this after some discussion on KAFKA-1646. >> >> > https://issues.apache.org/jira/browse/KAFKA-1646 >> >> > >> >> > >> >> >> >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-20+-+Enable+log+ >> pre >> >>allocate+to+improve+consume+performance+under+windows+and+some+old+Linux+ >> >>file+system >> >> > The RB is here: https://reviews.apache.org/r/33204/diff/ >> >> > >> >> > Thanks, Honghai >> >> > >> >> > >> >> >> > >> > >> > >> >-- >> >Thanks, >> >Neha >> >>