We haven't done any testing of Kafka on SSDs, mostly because our storage
density needs are too high. Since our IO load has been fine on the current
model, we haven't pushed in that direction yet. Additionally, I haven't
done any real load testing since I got here, which is part of why we're
going to reevaluate our storage soon.

That said, we are using SSDs for the transaction log volume on our
Zookeeper nodes, with great success. We detailed some of that in the
presentation that Jonathan linked (no latency or outstanding requests). It
helps that we use very high quality SSD drives.

-Todd


On Fri, Oct 24, 2014 at 10:44 AM, Gwen Shapira <gshap...@cloudera.com>
wrote:

> Todd,
>
> Did you load-test using SSDs?
> Got numbers to share?
>
> On Fri, Oct 24, 2014 at 10:40 AM, Todd Palino <tpal...@gmail.com> wrote:
> > Hmm, I haven't read the design doc lately, but I'm surprised that there's
> > even a discussion of sequential disk access. I suppose for small subsets
> of
> > the writes you can write larger blocks of sequential data, but that's
> about
> > the extent of it. Maybe one of the developers can speak more to that
> aspect.
> >
> > As far as the number of files goes, it really doesn't matter that much
> > whether you have a few or a lot. Once you have more than one, the disk
> > access is random, so the performance is more like a cliff than a gentle
> > slope. As I said, we've found issues once we go above 4000 partitions,
> and
> > that's probably a combination of what the software can handle and the
> > number of open files.
> >
> > -Todd
> >
> >
> > On Thu, Oct 23, 2014 at 11:19 PM, Xiaobin She <xiaobin...@gmail.com>
> wrote:
> >
> >> Todd,
> >>
> >> Thank you very much for your reply. My understanding of RAID 10 is
> wrong.
> >>
> >> I understand that one can not get absolute sequential disk access even
> on
> >> one single disk, the reason I'm interested with this question is that
> the
> >> design document of Kafka emphasize that Kafka make advantage of the
> >> sequential disk acceess to improve the disk performance, and I can' t
> >> understand how to achive this with thounds of open files.
> >>
> >> I thought that compare to one or fewer files, thounds of open files will
> >> make the disk access much more random, and make the disk performance
> much
> >> more weak.
> >>
> >> You mentioned that to increase overall IO cpapcity, one will have to use
> >> multiple spindles with sufficiently fast disk speed, but will it be more
> >> effective for the disk with fewer files? Or does the num of files is
> not an
> >> important factor for the entire performance of Kafka?
> >>
> >> Thanks again.
> >>
> >> xiaobinshe
> >>
> >>
> >>
> >> 2014-10-23 22:01 GMT+08:00 Todd Palino <tpal...@gmail.com>:
> >>
> >> > Your understanding of RAID 10 is slightly off. Because it is a
> >> combination
> >> > of striping and mirroring, trying to say that there are 4000 open
> files
> >> per
> >> > pair of disks is not accurate. The disk, as far as the system is
> >> concerned,
> >> > is the entire RAID. Files are striped across all mirrors, so any open
> >> file
> >> > will cross all 7 mirror sets.
> >> >
> >> > Even if you were to operate on a single disk, you're never going to be
> >> able
> >> > to ensure sequential disk access with Kafka. Even if you have a single
> >> > partition on a disk, there will be multiple log files for that
> partition
> >> > and you will have to seek to read older data. What you have to do is
> use
> >> > multiple spindles, with sufficiently fast disk speeds, to increase
> your
> >> > overall IO capacity. You can also tune to get a little more. For
> example,
> >> > we use a 120 second commit on that mount point to reduce the
> frequency of
> >> > flushing to disk.
> >> >
> >> > -Todd
> >> >
> >> >
> >> > On Wed, Oct 22, 2014 at 10:09 PM, Xiaobin She <xiaobin...@gmail.com>
> >> > wrote:
> >> >
> >> > > Todd,
> >> > >
> >> > > Thank you for the information.
> >> > >
> >> > > With 28,000+ files and 14 disks, that makes there are averagely
> about
> >> > 4000
> >> > > open files on two disk ( which is treated as one single disk) , am I
> >> > right?
> >> > >
> >> > > How do you manage to make the all the write operation to thest 4000
> >> open
> >> > > files be sequential to the disk?
> >> > >
> >> > > As far as I know, write operation to different files on the same
> disk
> >> > will
> >> > > cause random write, which is not good for performance.
> >> > >
> >> > > xiaobinshe
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > 2014-10-23 1:00 GMT+08:00 Todd Palino <tpal...@gmail.com>:
> >> > >
> >> > > > In fact there are many more than 4000 open files. Many of our
> brokers
> >> > run
> >> > > > with 28,000+ open files (regular file handles, not network
> >> > connections).
> >> > > In
> >> > > > our case, we're beefing up the disk performance as much as we can
> by
> >> > > > running in a RAID-10 configuration with 14 disks.
> >> > > >
> >> > > > -Todd
> >> > > >
> >> > > > On Tue, Oct 21, 2014 at 7:58 PM, Xiaobin She <
> xiaobin...@gmail.com>
> >> > > wrote:
> >> > > >
> >> > > > > Todd,
> >> > > > >
> >> > > > > Actually I'm wondering how kafka handle so much partition, with
> one
> >> > > > > partition there is at least one file on disk, and with 4000
> >> > partition,
> >> > > > > there will be at least 4000 files.
> >> > > > >
> >> > > > > When all these partitions have write request, how did Kafka make
> >> the
> >> > > > write
> >> > > > > operation on the disk to be sequential (which is emphasized in
> the
> >> > > design
> >> > > > > document of Kafka) and make sure the disk access is effective?
> >> > > > >
> >> > > > > Thank you for your reply.
> >> > > > >
> >> > > > > xiaobinshe
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > 2014-10-22 5:10 GMT+08:00 Todd Palino <tpal...@gmail.com>:
> >> > > > >
> >> > > > > > As far as the number of partitions a single broker can handle,
> >> > we've
> >> > > > set
> >> > > > > > our cap at 4000 partitions (including replicas). Above that
> we've
> >> > > seen
> >> > > > > some
> >> > > > > > performance and stability issues.
> >> > > > > >
> >> > > > > > -Todd
> >> > > > > >
> >> > > > > > On Tue, Oct 21, 2014 at 12:15 AM, Xiaobin She <
> >> > xiaobin...@gmail.com>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > hello, everyone
> >> > > > > > >
> >> > > > > > > I'm new to kafka, I'm wondering what's the max num of
> partition
> >> > can
> >> > > > one
> >> > > > > > > siggle machine handle in Kafka?
> >> > > > > > >
> >> > > > > > > Is there an sugeest num?
> >> > > > > > >
> >> > > > > > > Thanks.
> >> > > > > > >
> >> > > > > > > xiaobinshe
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
>

Reply via email to