On Thu, Jun 18, 2020 at 08:50:49AM +0300, Reco wrote:
Hi.
On Wed, Jun 17, 2020 at 05:54:51PM -0400, Michael Stone wrote:
On Wed, Jun 17, 2020 at 11:45:53PM +0300, Reco wrote:
> Long story short, if you need a primitive I/O benchmark, you're better
> with both dsync and nocache.
Not unless that's your actual workload, IMO. Almost nothing does sync i/o;
Almost everything does (see my previous e-mails). No everything does it
with O_DSYNC, that's true.
You're not using the words like most people use them, which does
certainly confuse the conversation. At a certain point it doesn't matter
whether your interpretation is right or wrong--if you're arguing against
the common usage that people will find the in man page and program
arguments you're just making it harder to communicate. It doesn't help
that you've latched on to one particular API, without clearly
considering/communicating the entire stack and how different pieces of
the stack may have overlapping terminology depending on their
perspective.
Although if it uses sqlite - chances are it
does it with O_DSYNC.
That may be true in some modes but not in my quick testing and wouldn't
be what I'd expect. As open(2) says:
O_DSYNC
Write operations on the file will complete accord-
ing to the requirements of synchronized I/O data
integrity completion.
By the time write(2) (and similar) return, the
output data has been transferred to the underlying
hardware, along with any file metadata that would
be required to retrieve that data (i.e., as though
each write(2) was followed by a call to fdata-
sync(2)). See NOTES below.
writing one block at a time is *really* *really* bad for performance.
Most applications for which I/O performance is important allow writes to
buffer, then flush the buffers as needed for data integrity.
Also note the subtlety of "synchronized" I/O vs "synchronous" I/O which
is another thing that's really important in some contexts, but will just
make what should be a simple answer more confusing if you follow the
rabbit hole.
simply using conv=fdatasync to make sure that the cache is flushed before
exiting
is going to be more representative.
If you're answering the question "how fast is my programs are going to
write there" - sure. If you're answering the question "how fast my
drive(s) actually is(are)" - nope, you need O_DSYNC.
While OF COURSE the question people want answered is "how fast is my
programs are going to write there" rather than some other number like
"how fast is some mode of writing that I won't use", using dd's sync
flag isn't even a better answer to the question "how fast my drive(s)
actually is(are)" because it's probably going to be far slower than it
needs to be unless you're using an unrealistically large block size.
With the fdatasync option you'll find out how much time it takes the
data to be committed to the drive, which certainly isn't *faster* than
that the drive "actually" is...so you'll need to be a lot more clear on
what's wrong with that number.
On old spinning disks this all didn't matter as much as the slow drive
performance dominated the equation and a smaller (but still overly
large) block size like 1M could paper over things. But on faster media
the differences are more obvious:
dd if=/dev/zero of=testfil bs=1M oflag=dsync,nocache count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 4.71833 s, 222 MB/s
dd if=/dev/zero of=testfil bs=128M oflag=dsync,nocache count=8
8+0 records in
8+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.45332 s, 739 MB/s
dd if=/dev/zero of=testfil bs=1M conv=fdatasync count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.13529 s, 924 MB/s
dd if=/dev/zero of=testfil bs=64k conv=fdatasync count=16000
16000+0 records in
16000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 0.914076 s, 1.1 GB/s
See how the dsync,nocache version doesn't perform reasonably well
without the ridiculously oversize 128M blocks, and even then is still
relatively slow. Now see how the fdatasync version actually comes close
to the correct speed for this SSD. Also very important to note that the
block size doesn't really matter, and this 64k run is actually faster.
Mostly they're statistically the same, with multiple runs averaging
about the same and would get closer with a longer test--the key here is
that the user doesn't have to worry about block sizes. (Also I aligned
them in an orderly way for discussion, they don't actually just get
faster over time.) Applications trying to squeeze every last bit of
performance out of their storage need to very carefully tune their
access patterns for the characteristics of a particular device (and how
they're using that device). I've done that, it isn't fun, and for most
applications it's a complete waste of time. If you're in that category
you probably know it or have been told to do it (and just saying bs=1M
because it makes the numbers easy would *not* be the right answer). For
everyone else a simple dd with conv=fdatasync and a realistic block size
will give them a decent idea of how fast they can copy stuff on a disk
and other things they actually care about.
time sh -c 'cp testfil testfil1 ; sync'
0.000u 0.477s 0:01.05 44.7% 0+0k 0+2048000io 0pf+0w
Note that the time to copy the file (from memory; cached) and sync the
filesystem is much closer to the fdatasync dd even with the extra
overhead than it is to the dsync,nocache version...
Also worth noting that stuff like filesystem type & options will
dramatically affect all of the differences above.