On Thu, Jun 18, 2020 at 08:50:49AM +0300, Reco wrote:
        Hi.

On Wed, Jun 17, 2020 at 05:54:51PM -0400, Michael Stone wrote:
On Wed, Jun 17, 2020 at 11:45:53PM +0300, Reco wrote:
> Long story short, if you need a primitive I/O benchmark, you're better
> with both dsync and nocache.

Not unless that's your actual workload, IMO. Almost nothing does sync i/o;

Almost everything does (see my previous e-mails). No everything does it
with O_DSYNC, that's true.

You're not using the words like most people use them, which does certainly confuse the conversation. At a certain point it doesn't matter whether your interpretation is right or wrong--if you're arguing against the common usage that people will find the in man page and program arguments you're just making it harder to communicate. It doesn't help that you've latched on to one particular API, without clearly considering/communicating the entire stack and how different pieces of the stack may have overlapping terminology depending on their perspective.
Although if it uses sqlite - chances are it
does it with O_DSYNC.

That may be true in some modes but not in my quick testing and wouldn't be what I'd expect. As open(2) says:
      O_DSYNC
             Write operations on the file will complete accord-
             ing to the requirements of synchronized  I/O  data
             integrity completion.

             By  the  time  write(2)  (and similar) return, the
             output data has been transferred to the underlying
             hardware,  along with any file metadata that would
             be required to retrieve that data (i.e., as though
             each  write(2)  was  followed  by a call to fdata-
             sync(2)).  See NOTES below.

writing one block at a time is *really* *really* bad for performance. Most applications for which I/O performance is important allow writes to buffer, then flush the buffers as needed for data integrity. Also note the subtlety of "synchronized" I/O vs "synchronous" I/O which is another thing that's really important in some contexts, but will just make what should be a simple answer more confusing if you follow the rabbit hole.

simply using conv=fdatasync to make sure that the cache is flushed before 
exiting
is going to be more representative.

If you're answering the question "how fast is my programs are going to
write there" - sure. If you're answering the question "how fast my
drive(s) actually is(are)" - nope, you need O_DSYNC.

While OF COURSE the question people want answered is "how fast is my programs are going to write there" rather than some other number like "how fast is some mode of writing that I won't use", using dd's sync flag isn't even a better answer to the question "how fast my drive(s) actually is(are)" because it's probably going to be far slower than it needs to be unless you're using an unrealistically large block size. With the fdatasync option you'll find out how much time it takes the data to be committed to the drive, which certainly isn't *faster* than that the drive "actually" is...so you'll need to be a lot more clear on what's wrong with that number.

On old spinning disks this all didn't matter as much as the slow drive performance dominated the equation and a smaller (but still overly large) block size like 1M could paper over things. But on faster media the differences are more obvious:

dd if=/dev/zero of=testfil bs=1M oflag=dsync,nocache count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 4.71833 s, 222 MB/s
dd if=/dev/zero of=testfil bs=128M oflag=dsync,nocache count=8
8+0 records in
8+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.45332 s, 739 MB/s
dd if=/dev/zero of=testfil bs=1M conv=fdatasync count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.13529 s, 924 MB/s
dd if=/dev/zero of=testfil bs=64k conv=fdatasync count=16000
16000+0 records in
16000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 0.914076 s, 1.1 GB/s

See how the dsync,nocache version doesn't perform reasonably well without the ridiculously oversize 128M blocks, and even then is still relatively slow. Now see how the fdatasync version actually comes close to the correct speed for this SSD. Also very important to note that the block size doesn't really matter, and this 64k run is actually faster. Mostly they're statistically the same, with multiple runs averaging about the same and would get closer with a longer test--the key here is that the user doesn't have to worry about block sizes. (Also I aligned them in an orderly way for discussion, they don't actually just get faster over time.) Applications trying to squeeze every last bit of performance out of their storage need to very carefully tune their access patterns for the characteristics of a particular device (and how they're using that device). I've done that, it isn't fun, and for most applications it's a complete waste of time. If you're in that category you probably know it or have been told to do it (and just saying bs=1M because it makes the numbers easy would *not* be the right answer). For everyone else a simple dd with conv=fdatasync and a realistic block size will give them a decent idea of how fast they can copy stuff on a disk and other things they actually care about.

time sh -c 'cp testfil testfil1 ; sync'
0.000u 0.477s 0:01.05 44.7%     0+0k 0+2048000io 0pf+0w

Note that the time to copy the file (from memory; cached) and sync the filesystem is much closer to the fdatasync dd even with the extra overhead than it is to the dsync,nocache version...

Also worth noting that stuff like filesystem type & options will dramatically affect all of the differences above.

Reply via email to