Re: Sequential writes make Kafka fast, or so they say

2020-03-13 Thread Eugen Dueck
ved that in a second. There are other emerging messaging systems that for this reason writes to one single file... like BookKeeper Peter On Thu, 12 Mar 2020 at 03:30, Eugen Dueck wrote: > A question about something that was always in the back of my mind. > > According to Jay Kreps >

Sequential writes make Kafka fast, or so they say

2020-03-11 Thread Eugen Dueck
A question about something that was always in the back of my mind. According to Jay Kreps > The first [reason that Kafka is so fast despite writing to disk] is that > Kafka does only sequential file I/O. I wonder how true this statement is, because Kafka uses 3 segments per partition. so even

Re: log.dirs and SSDs

2020-03-11 Thread Eugen Dueck
interested to see your results. > On Mar 11, 2020, at 5:45 PM, Eugen Dueck wrote: > > I'm asking the questions here! 🙂 > So is that the way to tune the broker if it does not achieve disk throughput? > > > 差出人: Peter Bukowinski > 送信日時:

Re: log.dirs and SSDs

2020-03-11 Thread Eugen Dueck
questions here! 🙂 So is that the way to tune the broker if it does not achieve disk throughput? 差出人: Peter Bukowinski 送信日時: 2020年3月12日 9:38 Couldn’t the same be accomplished by increasing the num.io.threads broker setting? > On Mar 11, 2020, at 5:15 PM, Eugen

Re: log.dirs and SSDs

2020-03-11 Thread Eugen Dueck
2020, at 5:15 PM, Eugen Dueck wrote: > > So there is not e.g. a single thread responsible per directory in log.dirs > that could become a bottleneck relative to SSD throughput of GB/s? > > This is in fact the case for Apache Pulsar, and the openmessaging benchmark > uses 4 dire

Re: log.dirs and SSDs

2020-03-11 Thread Eugen Dueck
. 差出人: Peter Bukowinski 送信日時: 2020年3月12日 8:51 宛先: users@kafka.apache.org 件名: Re: log.dirs and SSDs > On Mar 11, 2020, at 4:28 PM, Eugen Dueck wrote: > > So log.dirs should contain only one entry per HDD disk, to avoid random seeks. > What about SSDs? Can throughput be

log.dirs and SSDs

2020-03-11 Thread Eugen Dueck
So log.dirs should contain only one entry per HDD disk, to avoid random seeks. What about SSDs? Can throughput be increased by specifying multiple directories on the same SSD?

Re: synchronously flushing messages to disk

2020-03-07 Thread Eugen Dueck
ather than physical disk writes. Also, note that even an fsync(2) does not provide you the guarantee data has been physically written, since data also has to go through other layers such as disk caches. Le sam. 7 mars 2020 à 10:56, Eugen Dueck a écrit : > > I was under the impression that the

Re: synchronously flushing messages to disk

2020-03-07 Thread Eugen Dueck
I have heard other opinions on that, I'd like to know if someone in the Kafka community can clarify. Best regards Eugen 差出人: Eugen Dueck 送信日時: 2020年2月26日 13:28 宛先: users@kafka.apache.org 件名: synchronously flushing messages to disk Hi I want to benchmark

synchronously flushing messages to disk

2020-02-25 Thread Eugen Dueck
Hi I want to benchmark Kafka, configured such that a message that has been acked by the broker to the producer is guaranteed to have been persisted to disk. I changed the broker settings: log.flush.interval.messages=1 log.flush.interval.ms=0 (Is this the proper way to do it?) The impact is ve