Hi, yes, if you write one partition only, it will be sequential. But that's unlikely, so in practice, it won't be sequential overall. I used AWS EC2 instances with st1 EBS disks, that is the old HDD type rotational disk. It struggled to give any kind of performance to support our 6000+ partitions. Switching to gp2 SSD solved that in a second.
There are other emerging messaging systems that for this reason writes to one single file... like BookKeeper Peter On Thu, 12 Mar 2020 at 03:30, Eugen Dueck <eu...@tworks.co.jp> wrote: > A question about something that was always in the back of my mind. > > According to Jay Kreps > > > The first [reason that Kafka is so fast despite writing to disk] is that > Kafka does only sequential file I/O. > > I wonder how true this statement is, because Kafka uses 3 segments per > partition. so even with a single topic and partition per broker and disk, > it would not be sequential. Now say we have 1000 partitions per > broker/disk, i.e. 3000 files. How can concurrent/interleaved writes to > thousands of files on a single disk be considered sequential file I/O? > > Isn't the reason Kafka is so fast despite writing to disk the fact that it > does not fsync to disk, leaving that to the OS? The OS would, I assume, be > smart enough to order the writes when it flushes its caches to disk in a > way that minimizes random seeks. But then, wouldn't the manner in which > Kafka writes to files be more or less irrelevant? Or put differently: If > Kafka was synchronously flushing to disk, wouldn't it have to limit itself > to writing all partitions for a broker/disk to a single file, if it wanted > to do sequential file I/O? > > For reading (historical, non-realtime) data that is not in the OS cache, > keeping it in append-only files, the statement makes of course sense. >