Todd,

Did you load-test using SSDs?
Got numbers to share?

On Fri, Oct 24, 2014 at 10:40 AM, Todd Palino <tpal...@gmail.com> wrote:
> Hmm, I haven't read the design doc lately, but I'm surprised that there's
> even a discussion of sequential disk access. I suppose for small subsets of
> the writes you can write larger blocks of sequential data, but that's about
> the extent of it. Maybe one of the developers can speak more to that aspect.
>
> As far as the number of files goes, it really doesn't matter that much
> whether you have a few or a lot. Once you have more than one, the disk
> access is random, so the performance is more like a cliff than a gentle
> slope. As I said, we've found issues once we go above 4000 partitions, and
> that's probably a combination of what the software can handle and the
> number of open files.
>
> -Todd
>
>
> On Thu, Oct 23, 2014 at 11:19 PM, Xiaobin She <xiaobin...@gmail.com> wrote:
>
>> Todd,
>>
>> Thank you very much for your reply. My understanding of RAID 10 is wrong.
>>
>> I understand that one can not get absolute sequential disk access even on
>> one single disk, the reason I'm interested with this question is that the
>> design document of Kafka emphasize that Kafka make advantage of the
>> sequential disk acceess to improve the disk performance, and I can' t
>> understand how to achive this with thounds of open files.
>>
>> I thought that compare to one or fewer files, thounds of open files will
>> make the disk access much more random, and make the disk performance much
>> more weak.
>>
>> You mentioned that to increase overall IO cpapcity, one will have to use
>> multiple spindles with sufficiently fast disk speed, but will it be more
>> effective for the disk with fewer files? Or does the num of files is not an
>> important factor for the entire performance of Kafka?
>>
>> Thanks again.
>>
>> xiaobinshe
>>
>>
>>
>> 2014-10-23 22:01 GMT+08:00 Todd Palino <tpal...@gmail.com>:
>>
>> > Your understanding of RAID 10 is slightly off. Because it is a
>> combination
>> > of striping and mirroring, trying to say that there are 4000 open files
>> per
>> > pair of disks is not accurate. The disk, as far as the system is
>> concerned,
>> > is the entire RAID. Files are striped across all mirrors, so any open
>> file
>> > will cross all 7 mirror sets.
>> >
>> > Even if you were to operate on a single disk, you're never going to be
>> able
>> > to ensure sequential disk access with Kafka. Even if you have a single
>> > partition on a disk, there will be multiple log files for that partition
>> > and you will have to seek to read older data. What you have to do is use
>> > multiple spindles, with sufficiently fast disk speeds, to increase your
>> > overall IO capacity. You can also tune to get a little more. For example,
>> > we use a 120 second commit on that mount point to reduce the frequency of
>> > flushing to disk.
>> >
>> > -Todd
>> >
>> >
>> > On Wed, Oct 22, 2014 at 10:09 PM, Xiaobin She <xiaobin...@gmail.com>
>> > wrote:
>> >
>> > > Todd,
>> > >
>> > > Thank you for the information.
>> > >
>> > > With 28,000+ files and 14 disks, that makes there are averagely about
>> > 4000
>> > > open files on two disk ( which is treated as one single disk) , am I
>> > right?
>> > >
>> > > How do you manage to make the all the write operation to thest 4000
>> open
>> > > files be sequential to the disk?
>> > >
>> > > As far as I know, write operation to different files on the same disk
>> > will
>> > > cause random write, which is not good for performance.
>> > >
>> > > xiaobinshe
>> > >
>> > >
>> > >
>> > >
>> > > 2014-10-23 1:00 GMT+08:00 Todd Palino <tpal...@gmail.com>:
>> > >
>> > > > In fact there are many more than 4000 open files. Many of our brokers
>> > run
>> > > > with 28,000+ open files (regular file handles, not network
>> > connections).
>> > > In
>> > > > our case, we're beefing up the disk performance as much as we can by
>> > > > running in a RAID-10 configuration with 14 disks.
>> > > >
>> > > > -Todd
>> > > >
>> > > > On Tue, Oct 21, 2014 at 7:58 PM, Xiaobin She <xiaobin...@gmail.com>
>> > > wrote:
>> > > >
>> > > > > Todd,
>> > > > >
>> > > > > Actually I'm wondering how kafka handle so much partition, with one
>> > > > > partition there is at least one file on disk, and with 4000
>> > partition,
>> > > > > there will be at least 4000 files.
>> > > > >
>> > > > > When all these partitions have write request, how did Kafka make
>> the
>> > > > write
>> > > > > operation on the disk to be sequential (which is emphasized in the
>> > > design
>> > > > > document of Kafka) and make sure the disk access is effective?
>> > > > >
>> > > > > Thank you for your reply.
>> > > > >
>> > > > > xiaobinshe
>> > > > >
>> > > > >
>> > > > >
>> > > > > 2014-10-22 5:10 GMT+08:00 Todd Palino <tpal...@gmail.com>:
>> > > > >
>> > > > > > As far as the number of partitions a single broker can handle,
>> > we've
>> > > > set
>> > > > > > our cap at 4000 partitions (including replicas). Above that we've
>> > > seen
>> > > > > some
>> > > > > > performance and stability issues.
>> > > > > >
>> > > > > > -Todd
>> > > > > >
>> > > > > > On Tue, Oct 21, 2014 at 12:15 AM, Xiaobin She <
>> > xiaobin...@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > hello, everyone
>> > > > > > >
>> > > > > > > I'm new to kafka, I'm wondering what's the max num of partition
>> > can
>> > > > one
>> > > > > > > siggle machine handle in Kafka?
>> > > > > > >
>> > > > > > > Is there an sugeest num?
>> > > > > > >
>> > > > > > > Thanks.
>> > > > > > >
>> > > > > > > xiaobinshe
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>

Reply via email to