On Jun 20, 2012, at 6:19 PM, Shaun Thomas wrote:

> I ran some tests on our (crappy) RAID0, comprised of two 300GB SAS drives. 
> Here's what I got for varying block sizes:
> 
> DRBD connected:
> 8K - 26MB/s
> [...]
> DRBD disconnected:
> 8K - 57MB/s
> [...]
> Those disconnected numbers look not too far off from raw disk performance in 
> a simple 2-disk RAID0.
> [...]
> And we could do all that because we have capacitor-backed RAID controllers.
> 
> We're seeing pretty much *exactly* what you should expect. This is with the 
> 3.2.0 kernel as well. To make sure this wasn't fake, I monitored iostat and 
> watched Dirty and Writeback from /proc/sys/meminfo. These are legit numbers 
> obtained directly from dd and oflag=sync for all tests. 
> 
> So, I'm not sure about your own setup, but I can confirm that DRBD does honor 
> sync in our case.

I think this demonstrates that O_SYNC causes writes to happen immediately 
rather than accumulating in the pagecache (I assume you observed Dirty stayed 
very low), but I don't think it demonstrates anything about DRBD issuing a sync 
to the underlying device when it receives a sync itself. We already know O_SYNC 
or fsync will flush the pagecache; this is a fuction of the Linux VFS and is 
not visible from DRBD's perspective. What's important is that when DRBD 
receives a sync operation, it passes it through to the lower layer, but as long 
as the BBU remains enabled, your RAID controller will treat all sync operations 
as no-ops because there's no volatile cache on the RAID device to be synced, so 
there's no change in behavior that we could observe.

To test my hypothesis, you'd have to disable the write-back cache. What you 
should see is a drop in performance of one or two orders of magnitude, going 
from your measured 7296 IOPS (57MB/s*1024/8k), an impossibility for any 
spinning media on this planet, to something limited by the rotation speed. If 
you had, for example, a 10000 RPM drive, anything faster than 10000/60 or 166 
IOPS or 6 ms per IO means the IO syscalls must not be blocking until the data 
has reached nonvolatile storage (as requested by O_SYNC). You might also as 
much as double the IOPS with RAID-0 if you are performing small sequential 
writes rather than re-writing the same block between each sync, but even so 
this is an order of magnitude slower than what you just measured.

If you don't see a huge drop in performance after disabling the battery backed 
writeback cache, then we can conclude that DRBD or something else is eating the 
sync operations between userland (fsync(), open(O_SYNC), etc) and the 
underlying device. Not a problem if you have money to spend on battery-backed 
cache, and can tolerate the added risk of power loss when the battery has 
failed, or is reconditioning, or is being replaced, or power loss longer than 
battery hold time, but for everyone else, it's a big problem.
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to