Re: How does number of partitions affect sequential disk IO

2014-06-24 Thread Daniel Compton
Thanks Jay, that's exactly what I was looking for. On 25 June 2014 04:18, Jay Kreps wrote: > The primary relevant factor here is the fsync interval. Kafka's replication > guarantees do not require fsyncing every message, so the reason for doing > so is to handle correlated power loss (a pretty

Re: How does number of partitions affect sequential disk IO

2014-06-24 Thread Jay Kreps
The primary relevant factor here is the fsync interval. Kafka's replication guarantees do not require fsyncing every message, so the reason for doing so is to handle correlated power loss (a pretty uncommon failure in a real data center). Replication will handle most other failure modes with much m

Re: How does number of partitions affect sequential disk IO

2014-06-24 Thread Paul Mackles
Its probably best to run some tests that simulate your usage patterns. I think a lot of it will be determined by how effectively you are able to utilize the OS file cache in which case you could have many more partitions. Its a delicate balance but you definitely want to err on the side of having m

Re: How does number of partitions affect sequential disk IO

2014-06-24 Thread Daniel Compton
Good point. We've only got two disks per node and two topics so I was planning to have one disk/partition. Our workload is very write heavy so I'm mostly concerned about write throughput. Will we get write speed improvements by sticking to 1 partition/disk or will the difference between 1 and

Re: How does number of partitions affect sequential disk IO

2014-06-24 Thread Paul Mackles
You'll want to account for the number of disks per node. Normally, partitions are spread across multiple disks. Even more important, the OS file cache reduces the amount of seeking provided that you are reading mostly sequentially and your consumers are keeping up. On 6/24/14 3:58 AM, "Daniel Comp

How does number of partitions affect sequential disk IO

2014-06-24 Thread Daniel Compton
I’ve been reading the Kafka docs and one thing that I’m having trouble understanding is how partitions affect sequential disk IO. One of the reasons Kafka is so fast is that you can do lots of sequential IO with read-ahead cache and all of that goodness. However, if your broker is responsible fo