I'm getting ready to try out this configuration (use multiple disks, no RAID, per broker). One concern is the procedure for recovering if there is a disk failure.
If a disk fails, will the broker go offline, or will it continue serving partitions on its remaining good disks? And if so, is there a procedure for moving the partitions that were on the failed disk, but not necessarily all the others on that broker? Jason On Thu, Jun 20, 2013 at 3:15 PM, Jason Rosenberg <j...@squareup.com> wrote: > yeah, that would work! > > > On Thu, Jun 20, 2013 at 1:20 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > >> Yeah we didn't go as far as adding weighting or anything like that--I >> think we'd be open to a patch that did that as long as it was >> optional. In the short term you can obviously add multiple directories >> on the same disk to increase its share. >> >> -Jay >> >> On Thu, Jun 20, 2013 at 12:59 PM, Jason Rosenberg <j...@squareup.com> >> wrote: >> > This sounds like a great idea, to just disks as "just a bunch of disks" >> or >> > JBOD.....hdfs works well this way. >> > >> > Do all the disks need to be the same size, to use them evenly? Since it >> > will allocate partitions randomly? >> > >> > It would be nice if you had 2 disks, with one twice as large as the >> other, >> > if the larger would be twice as likely to receive partitions as the >> smaller >> > one, etc. >> > >> > I suppose this goes into my earlier question to the list, vis-a-vis >> > heterogeneous brokers (e.g. utilize brokers with different sized >> storage, >> > using some sort of weighting scheme, etc.). >> > >> > Jason >> > >> > >> > On Thu, Jun 20, 2013 at 11:07 AM, Jay Kreps <jay.kr...@gmail.com> >> wrote: >> > >> >> The intention is to allow the use of multiple disks without RAID or >> >> logical volume management. We have found that there are a lot of >> >> downsides to RAID--in particular a huge throughput hit. Since we >> >> already have a parallelism model due to partitioning and a fault >> >> tolerance model with replication RAID doesn't actually buy much. With >> >> this feature you can directly mount multiple disks as their own >> >> directory and the server will randomly assign partitions to them. >> >> >> >> Obviously this will only work well if there are enough high-throughput >> >> partitions to make load balance evenly (e.g. if you have only one big >> >> partition per server then this isn't going to work). >> >> >> >> -Jay >> >> >> >> On Wed, Jun 19, 2013 at 11:01 PM, Jason Rosenberg <j...@squareup.com> >> >> wrote: >> >> > is it possible for a partition to have multiple replicas on different >> >> > directories on the same broker? (hopefully no!) >> >> > >> >> > >> >> > On Wed, Jun 19, 2013 at 10:47 PM, Jun Rao <jun...@gmail.com> wrote: >> >> > >> >> >> It takes a comma separated list and partition replicas are randomly >> >> >> distributed to the list. >> >> >> >> >> >> Thanks, >> >> >> >> >> >> Jun >> >> >> >> >> >> >> >> >> On Wed, Jun 19, 2013 at 10:25 PM, Jason Rosenberg <j...@squareup.com >> > >> >> >> wrote: >> >> >> >> >> >> > In the 0.8 config, log.dir is now log.dirs. It looks like the >> >> singular >> >> >> > log.dir is still supported, but under the covers the property is >> >> >> log.dirs. >> >> >> > >> >> >> > I'm curious, does this take a comma separated list of directories? >> >> The >> >> >> new >> >> >> > config page just says: >> >> >> > "The directories in which the log data is kept" >> >> >> > >> >> >> > Also, how does kafka handle multiple directories? Does it treat >> each >> >> >> > directory as a separate replica partition, or what? >> >> >> > >> >> >> > Jason >> >> >> > >> >> >> >> >> >> > >