On Mon, Feb 7, 2011 at 12:25 AM, Richard Elling <richard.ell...@gmail.com> wrote: > On Feb 5, 2011, at 8:10 AM, Yi Zhang wrote: > >> Hi all, >> >> I'm trying to achieve the same effect of UFS directio on ZFS and here >> is what I did: > > Solaris UFS directio has three functions: > 1. improved async code path > 2. multiple concurrent writers > 3. no buffering > Thanks for the comments, Richard. All I wanted is to achieve 3 on ZFS. But as I said, apprently 2.a) below didn't give me that. Do you have any suggestion?
> Of the three, #1 and #2 were designed into ZFS from day 1, so there is nothing > to set or change to take advantage of the feature. > >> >> 1. Set the primarycache of zfs to metadata and secondarycache to none, >> recordsize to 8K (to match the unit size of writes) >> 2. Run my test program (code below) with different options and measure >> the running time. >> a) open the file without O_DSYNC flag: 0.11s. >> This doesn't seem like directio is in effect, because I tried on UFS >> and time was 2s. So I went on with more experiments with the O_DSYNC >> flag set. I know that directio and O_DSYNC are two different things, >> but I thought the flag would force synchronous writes and achieve what >> directio does (and more). > > Directio and O_DSYNC are two different features. > >> b) open the file with O_DSYNC flag: 147.26s > > ouch > >> c) same as b) but also enabled zfs_nocacheflush: 5.87s > > Is your pool created from a single HDD? Yes, it is. Do you have an explanation for the b) case? I also tried O_DSYNC AND directio on UFS, the time is on the same order as directio but no O_DSYNC on UFS (see below). This dramatic difference between UFS and ZFS is puzzling me... UFS: directio=on,no O_DSYNC -> 2s directio=on,O_DSYNC -> 5s ZFS: no caching, no O_DSYNC -> 0.11s no caching, O_DSYNC -> 147s > >> My questions are: >> 1. With my primarycache and secondarycache settings, the FS shouldn't >> buffer reads and writes anymore. Wouldn't that be equivalent to >> O_DSYNC? Why a) and b) are so different? > > No. O_DSYNC deals with when the I/O is committed to media. > >> 2. My understanding is that zfs_nocacheflush essentially removes the >> sync command sent to the device, which cancels the O_DSYNC flag. Why >> b) and c) are so different? > > No. Disabling the cache flush means that the volatile write buffer in the > disk is not flushed. In other words, disabling the cache flush is in direct > conflict with the semantics of O_DSYNC. > >> 3. Does ZIL have anything to do with these results? > > Yes. The ZIL is used for meeting the O_DSYNC requirements. This has > nothing to do with buffering. More details are on the ZFS Best Practices > Guide. > -- richard > >> >> Thanks in advance for any suggestion/insight! >> Yi >> >> >> #include <fcntl.h> >> #include <sys/time.h> >> >> int main(int argc, char **argv) >> { >> struct timeval tim; >> gettimeofday(&tim, NULL); >> double t1 = tim.tv_sec + tim.tv_usec/1000000.0; >> char a[8192]; >> int fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC, 0660); >> //int fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC|O_DSYNC, 0660); >> if (argv[2][0] == '1') >> directio(fd, DIRECTIO_ON); >> int i; >> for (i=0; i<10000; ++i) >> pwrite(fd, a, sizeof(a), i*8192); >> close(fd); >> gettimeofday(&tim, NULL); >> double t2 = tim.tv_sec + tim.tv_usec/1000000.0; >> printf("%f\n", t2-t1); >> } >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss