On Mon, Feb 7, 2011 at 10:26 AM, Roch <roch.bourbonn...@oracle.com> wrote:
>
> Le 7 févr. 2011 à 06:25, Richard Elling a écrit :
>
>> On Feb 5, 2011, at 8:10 AM, Yi Zhang wrote:
>>
>>> Hi all,
>>>
>>> I'm trying to achieve the same effect of UFS directio on ZFS and here
>>> is what I did:
>>
>> Solaris UFS directio has three functions:
>>       1. improved async code path
>>       2. multiple concurrent writers
>>       3. no buffering
>>
>> Of the three, #1 and #2 were designed into ZFS from day 1, so there is 
>> nothing
>> to set or change to take advantage of the feature.
>>
>>>
>>> 1. Set the primarycache of zfs to metadata and secondarycache to none,
>>> recordsize to 8K (to match the unit size of writes)
>>> 2. Run my test program (code below) with different options and measure
>>> the running time.
>>> a) open the file without O_DSYNC flag: 0.11s.
>>> This doesn't seem like directio is in effect, because I tried on UFS
>>> and time was 2s. So I went on with more experiments with the O_DSYNC
>>> flag set. I know that directio and O_DSYNC are two different things,
>>> but I thought the flag would force synchronous writes and achieve what
>>> directio does (and more).
>>
>> Directio and O_DSYNC are two different features.
>>
>>> b) open the file with O_DSYNC flag: 147.26s
>>
>> ouch
>
> how big a file ?
> Does the resuld holds if you don't truncate ?
>
> -r
>
The file is 8K*10000 about 80M. I removed the O_TRUNC flag and the
results stayed the same...

>>
>>> c) same as b) but also enabled zfs_nocacheflush: 5.87s
>>
>> Is your pool created from a single HDD?
>>
>>> My questions are:
>>> 1. With my primarycache and secondarycache settings, the FS shouldn't
>>> buffer reads and writes anymore. Wouldn't that be equivalent to
>>> O_DSYNC? Why a) and b) are so different?
>>
>> No. O_DSYNC deals with when the I/O is committed to media.
>>
>>> 2. My understanding is that zfs_nocacheflush essentially removes the
>>> sync command sent to the device, which cancels the O_DSYNC flag. Why
>>> b) and c) are so different?
>>
>> No. Disabling the cache flush means that the volatile write buffer in the
>> disk is not flushed. In other words, disabling the cache flush is in direct
>> conflict with the semantics of O_DSYNC.
>>
>>> 3. Does ZIL have anything to do with these results?
>>
>> Yes. The ZIL is used for meeting the O_DSYNC requirements.  This has
>> nothing to do with buffering. More details are on the ZFS Best Practices 
>> Guide.
>> -- richard
>>
>>>
>>> Thanks in advance for any suggestion/insight!
>>> Yi
>>>
>>>
>>> #include <fcntl.h>
>>> #include <sys/time.h>
>>>
>>> int main(int argc, char **argv)
>>> {
>>>  struct timeval tim;
>>>  gettimeofday(&tim, NULL);
>>>  double t1 = tim.tv_sec + tim.tv_usec/1000000.0;
>>>  char a[8192];
>>>  int fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC, 0660);
>>>  //int fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC|O_DSYNC, 0660);
>>>  if (argv[2][0] == '1')
>>>      directio(fd, DIRECTIO_ON);
>>>  int i;
>>>  for (i=0; i<10000; ++i)
>>>      pwrite(fd, a, sizeof(a), i*8192);
>>>  close(fd);
>>>  gettimeofday(&tim, NULL);
>>>  double t2 = tim.tv_sec + tim.tv_usec/1000000.0;
>>>  printf("%f\n", t2-t1);
>>> }
>>> _______________________________________________
>>> zfs-discuss mailing list
>>> zfs-discuss@opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to