Re: Could RAM possibly be just 3-4 times faster than bare hdd writes and reads? or, is the Linux kernel doing its 'magic' in the bg? or, ...

Michael Stone Thu, 18 Jun 2020 05:59:06 -0700

On Thu, Jun 18, 2020 at 08:50:49AM +0300, Reco wrote:

Hi.


On Wed, Jun 17, 2020 at 05:54:51PM -0400, Michael Stone wrote:

On Wed, Jun 17, 2020 at 11:45:53PM +0300, Reco wrote:
> Long story short, if you need a primitive I/O benchmark, you're better
> with both dsync and nocache.

Not unless that's your actual workload, IMO. Almost nothing does sync i/o;


Almost everything does (see my previous e-mails). No everything does it
with O_DSYNC, that's true.

You're not using the words like most people use them, which doescertainly confuse the conversation. At a certain point it doesn't matterwhether your interpretation is right or wrong--if you're arguing againstthe common usage that people will find the in man page and programarguments you're just making it harder to communicate. It doesn't helpthat you've latched on to one particular API, without clearlyconsidering/communicating the entire stack and how different pieces ofthe stack may have overlapping terminology depending on theirperspective.

Although if it uses sqlite - chances are it
does it with O_DSYNC.

That may be true in some modes but not in my quick testing and wouldn'tbe what I'd expect. As open(2) says:

      O_DSYNC
             Write operations on the file will complete accord-
             ing to the requirements of synchronized  I/O  data
             integrity completion.

             By  the  time  write(2)  (and similar) return, the
             output data has been transferred to the underlying
             hardware,  along with any file metadata that would
             be required to retrieve that data (i.e., as though
             each  write(2)  was  followed  by a call to fdata-
             sync(2)).  See NOTES below.

writing one block at a time is *really* *really* bad for performance.Most applications for which I/O performance is important allow writes tobuffer, then flush the buffers as needed for data integrity.Also note the subtlety of "synchronized" I/O vs "synchronous" I/O whichis another thing that's really important in some contexts, but will justmake what should be a simple answer more confusing if you follow therabbit hole.

simply using conv=fdatasync to make sure that the cache is flushed before 
exiting
is going to be more representative.


If you're answering the question "how fast is my programs are going to
write there" - sure. If you're answering the question "how fast my
drive(s) actually is(are)" - nope, you need O_DSYNC.

While OF COURSE the question people want answered is "how fast is myprograms are going to write there" rather than some other number like"how fast is some mode of writing that I won't use", using dd's syncflag isn't even a better answer to the question "how fast my drive(s)actually is(are)" because it's probably going to be far slower than itneeds to be unless you're using an unrealistically large block size.With the fdatasync option you'll find out how much time it takes thedata to be committed to the drive, which certainly isn't *faster* thanthat the drive "actually" is...so you'll need to be a lot more clear onwhat's wrong with that number.

On old spinning disks this all didn't matter as much as the slow driveperformance dominated the equation and a smaller (but still overlylarge) block size like 1M could paper over things. But on faster mediathe differences are more obvious:

dd if=/dev/zero of=testfil bs=1M oflag=dsync,nocache count=1000

1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 4.71833 s, 222 MB/s

dd if=/dev/zero of=testfil bs=128M oflag=dsync,nocache count=8

8+0 records in
8+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.45332 s, 739 MB/s

dd if=/dev/zero of=testfil bs=1M conv=fdatasync count=1000

1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.13529 s, 924 MB/s

dd if=/dev/zero of=testfil bs=64k conv=fdatasync count=16000

16000+0 records in
16000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 0.914076 s, 1.1 GB/s

See how the dsync,nocache version doesn't perform reasonably wellwithout the ridiculously oversize 128M blocks, and even then is stillrelatively slow. Now see how the fdatasync version actually comes closeto the correct speed for this SSD. Also very important to note that theblock size doesn't really matter, and this 64k run is actually faster.Mostly they're statistically the same, with multiple runs averagingabout the same and would get closer with a longer test--the key here isthat the user doesn't have to worry about block sizes. (Also I alignedthem in an orderly way for discussion, they don't actually just getfaster over time.) Applications trying to squeeze every last bit ofperformance out of their storage need to very carefully tune theiraccess patterns for the characteristics of a particular device (and howthey're using that device). I've done that, it isn't fun, and for mostapplications it's a complete waste of time. If you're in that categoryyou probably know it or have been told to do it (and just saying bs=1Mbecause it makes the numbers easy would *not* be the right answer). Foreveryone else a simple dd with conv=fdatasync and a realistic block sizewill give them a decent idea of how fast they can copy stuff on a diskand other things they actually care about.

time sh -c 'cp testfil testfil1 ; sync'

0.000u 0.477s 0:01.05 44.7%     0+0k 0+2048000io 0pf+0w

Note that the time to copy the file (from memory; cached) and sync thefilesystem is much closer to the fdatasync dd even with the extraoverhead than it is to the dsync,nocache version...

Also worth noting that stuff like filesystem type & options willdramatically affect all of the differences above.

Re: Could RAM possibly be just 3-4 times faster than bare hdd writes and reads? or, is the Linux kernel doing its 'magic' in the bg? or, ...

Reply via email to