Thanks Jay, that's exactly what I was looking for.
On 25 June 2014 04:18, Jay Kreps wrote:
> The primary relevant factor here is the fsync interval. Kafka's replication
> guarantees do not require fsyncing every message, so the reason for doing
> so is to handle correlated power loss (a pretty
The primary relevant factor here is the fsync interval. Kafka's replication
guarantees do not require fsyncing every message, so the reason for doing
so is to handle correlated power loss (a pretty uncommon failure in a real
data center). Replication will handle most other failure modes with much
m
Its probably best to run some tests that simulate your usage patterns. I
think a lot of it will be determined by how effectively you are able to
utilize the OS file cache in which case you could have many more
partitions. Its a delicate balance but you definitely want to err on the
side of having m
Good point. We've only got two disks per node and two topics so I was planning
to have one disk/partition.
Our workload is very write heavy so I'm mostly concerned about write
throughput. Will we get write speed improvements by sticking to 1
partition/disk or will the difference between 1 and
You'll want to account for the number of disks per node. Normally,
partitions are spread across multiple disks. Even more important, the OS
file cache reduces the amount of seeking provided that you are reading
mostly sequentially and your consumers are keeping up.
On 6/24/14 3:58 AM, "Daniel Comp
I’ve been reading the Kafka docs and one thing that I’m having trouble
understanding is how partitions affect sequential disk IO. One of the reasons
Kafka is so fast is that you can do lots of sequential IO with read-ahead cache
and all of that goodness. However, if your broker is responsible fo