On Mon, Feb 7, 2011 at 12:25 AM, Richard Elling
<richard.ell...@gmail.com> wrote:
> On Feb 5, 2011, at 8:10 AM, Yi Zhang wrote:
>
>> Hi all,
>>
>> I'm trying to achieve the same effect of UFS directio on ZFS and here
>> is what I did:
>
> Solaris UFS directio has three functions:
>        1. improved async code path
>        2. multiple concurrent writers
>        3. no buffering
>
Thanks for the comments, Richard. All I wanted is to achieve 3 on ZFS.
But as I said, apprently 2.a) below didn't give me that. Do you have
any suggestion?

> Of the three, #1 and #2 were designed into ZFS from day 1, so there is nothing
> to set or change to take advantage of the feature.
>
>>
>> 1. Set the primarycache of zfs to metadata and secondarycache to none,
>> recordsize to 8K (to match the unit size of writes)
>> 2. Run my test program (code below) with different options and measure
>> the running time.
>> a) open the file without O_DSYNC flag: 0.11s.
>> This doesn't seem like directio is in effect, because I tried on UFS
>> and time was 2s. So I went on with more experiments with the O_DSYNC
>> flag set. I know that directio and O_DSYNC are two different things,
>> but I thought the flag would force synchronous writes and achieve what
>> directio does (and more).
>
> Directio and O_DSYNC are two different features.
>
>> b) open the file with O_DSYNC flag: 147.26s
>
> ouch
>
>> c) same as b) but also enabled zfs_nocacheflush: 5.87s
>
> Is your pool created from a single HDD?
Yes, it is. Do you have an explanation for the b) case? I also tried
O_DSYNC AND directio on UFS, the time is on the same order as directio
but no O_DSYNC on UFS (see below). This dramatic difference between
UFS and ZFS is puzzling me...
UFS:  directio=on,no O_DSYNC -> 2s          directio=on,O_DSYNC -> 5s
ZFS:  no caching, no O_DSYNC -> 0.11s     no caching, O_DSYNC -> 147s

>
>> My questions are:
>> 1. With my primarycache and secondarycache settings, the FS shouldn't
>> buffer reads and writes anymore. Wouldn't that be equivalent to
>> O_DSYNC? Why a) and b) are so different?
>
> No. O_DSYNC deals with when the I/O is committed to media.
>
>> 2. My understanding is that zfs_nocacheflush essentially removes the
>> sync command sent to the device, which cancels the O_DSYNC flag. Why
>> b) and c) are so different?
>
> No. Disabling the cache flush means that the volatile write buffer in the
> disk is not flushed. In other words, disabling the cache flush is in direct
> conflict with the semantics of O_DSYNC.
>
>> 3. Does ZIL have anything to do with these results?
>
> Yes. The ZIL is used for meeting the O_DSYNC requirements.  This has
> nothing to do with buffering. More details are on the ZFS Best Practices 
> Guide.
>  -- richard
>
>>
>> Thanks in advance for any suggestion/insight!
>> Yi
>>
>>
>> #include <fcntl.h>
>> #include <sys/time.h>
>>
>> int main(int argc, char **argv)
>> {
>>   struct timeval tim;
>>   gettimeofday(&tim, NULL);
>>   double t1 = tim.tv_sec + tim.tv_usec/1000000.0;
>>   char a[8192];
>>   int fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC, 0660);
>>   //int fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC|O_DSYNC, 0660);
>>   if (argv[2][0] == '1')
>>       directio(fd, DIRECTIO_ON);
>>   int i;
>>   for (i=0; i<10000; ++i)
>>       pwrite(fd, a, sizeof(a), i*8192);
>>   close(fd);
>>   gettimeofday(&tim, NULL);
>>   double t2 = tim.tv_sec + tim.tv_usec/1000000.0;
>>   printf("%f\n", t2-t1);
>> }
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to