> 5) DMA straight from user buffer to disk avoiding a copy.

This is what the "direct" in "direct i/o" has historically meant.  :-)

> line has been that 5) won't help latency much and
> latency is here I think the game is currently played. Now the
> disconnect might be because people might feel that the game
> is not latency but CPU efficiency : "how many CPU cycles do I
> burn to do get data from disk to user buffer".

Actually, it's less CPU cycles in many cases than memory cycles.

For many databases, most of the I/O is writes (reads wind up
cached in memory).  What's the cost of a write?

With direct I/O: CPU writes to memory (spread out over many
transactions), disk DMAs from memory.  We write LPS (log page size)
bytes of data from CPU to memory, we read LPS bytes from memory.
On processors without a cache line zero, we probably read the LPS
data from memory as part of the write.  Total cost = W:LPS, R:2*LPS.

Without direct I/O: The cost of getting the data into the user buffer
remains the same (W:LPS, R:LPS).  We copy the data from user buffer
to system buffer (W:LPS, R:LPS).  Then we push it out to disk.  Total
cost = W:2*LPS, R:3*LPS.  We've nearly doubled the cost, not including
any TLB effects.

On a memory-bandwidth-starved system (which should be nearly all
modern designs, especially with multi-threaded chips like Niagara),
replacing buffered I/O with direct I/O should give you nearly a 2x
improvement in log write bandwidth.  That's without considering
cache effects (which shouldn't be too significant, really, since LPS
should be << the size of L2).

How significant is this?  We'd have to measure; and it will likely
vary quite a lot depending on which database is used for testing.

But note that, for ZFS, the win with direct I/O will be somewhat
less.  That's because you still need to read the page to compute
its checksum.  So for direct I/O with ZFS (with checksums enabled),
the cost is W:LPS, R:2*LPS.  Is saving one page of writes enough to
make a difference?  Possibly not.

Anton
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to