[perf-discuss] Webrev posted for Filebench fixes / enhancements

Andrew Wilson Fri, 16 Nov 2007 09:36:28 -0800

FileBench affectionados,

I have just uploaded a webrev of a set of bug fixes and other 
modifications to FileBench to the OpenSolaris.org webrev site. This set 
of changes started out as modification to essentially merge "files" into 
"filesets", but ended up addressing a number of related issues. Here are 
the details:


This started out as just a fix for bug:

6601818 Turn FileBench "files" into filesets with 1 entry.

However, to do this properly, found it helped to fix / implement five 
other outstanding bug / feature requests:

6601341 flowop_endop() needs to have the actual number of bytes of I/O 
done passed to it

6581691 pre-allocation is molasses

6568378 Flowop reads and writes should be consistent about memory buffer 
usage

6595374 "tf_memsize smaller than IO size" error when reading/writing 
large file

6564960 filebench should handle larger iosize's or gracefully error out 
with a nicer message

To make reviewing easier, here are some comments about the individual 
bugs that were fixed and the files that were changed as part of each 
bug's specific fix.

6601818 Turn FileBench "files" into filesets with 1 entry:

Basically, this involves changes to parser_gram.y so that the "define 
files" command creates a fileset with entry size of 1. It is implemented 
by creating a new subroutine "parser_fileset_define_common" which 
allocates a fileset and fills in all attributes that are common between 
files and filesets. The old parser_file_define() calls that, and sets 
default values to fileset things that don't apply when only a single 
entry (i.e. "files") is involved. Similarly, parser_fileset_define() 
first uses the common routine then sets attributes specific to filesets. 
Also, fileset.c and fileset.h were modified to support raw devices as a 
special case for filesets, (replacing the raw device mode for files, 
which, incidentally, was not enabled). Also, during testing I discovered 
that a subtle difference between files and filesets for "non allocated" 
files. Files actually create a file of size 0 at the specified path in 
that case, while filesets don't create a file at all. After consultation 
with other Sun engineers, I decided to leave the "file" behavior as "no 
file", and modify the two workloads that depend on having a file of size 
0 to specifically request the "alloc" of  0 length file.  Thus for both 
filesets (as was the case before) and the "new" files, leaving off the 
"alloc" attribute will result in no initial file creation on the disk.

As part of this fix, I needed to preserve the parallel allocation 
feature of files, so filesets can now be allocated in parallel, with up 
to 32 allocation threads running across all filesets and their 
constituent files, addressing:

6581691 pre-allocation is molasses

The changes to turn files into filesets also involved removing the 
"files" code path from the seven flowops that do I/O, and replacing it 
with some special code to handle raw devices. Also a few changes to the 
create / open / close / delete flowops. There was a lot of (almost) 
duplicate code here which I was planning to eliminate as part of:

6568378 Flowop reads and writes should be consistent about memory buffer 
usage

so I decided to tackle that one too. I created a common routine to 
select filesetentries, determine memory buffer pointers and file 
offsets. As mentioned, the "files" portion disappeared as part of that. 
Also, the selection of memory locations to read or write to was unified. 
If thread memory has been specified (tf_memsize > 0), then a random 
offset into tf_mem is calculated and passed back to the calling routine. 
Otherwise, a private fo_buf is allocated or reused, with its location 
passed back. The private buffer is created to be large enough to hold 
fo_iosize worth of bytes, which may be much larger than the old method 
of only allocating 1 MB. Thus, we keep track of the size of the buffer, 
and free(), malloc() a new one if the existing one is too small. Thus, 
this change also addresses:

6564960 filebench should handle larger iosize's or gracefully error out 
with a nicer message

If tf_mem is in use, it already will provide an error message if iosize 
is larger than tf_mem size, and that now applies to all seven I/O 
flowops (read, write, aiowrite, readwholefile, writewholefile, 
appendfile, appendfilerand). if tf_mem is not specified or set to 0 
size, then private buffers (fo_buf) will be allocated of iosize.

The old readwholefile and writewholefile ignored any supplied iosize 
(actually set it to the file size AFTER the first execution of the 
flowop), and arbitrarily broke the request into 1 megabyte chunks. This 
size gives full performance with many current disk drives and file 
systems, but not all, and can be too small for full performance with 
some RAIDed systems. So, for backwards compatibility, if iosize is 0 
(what the legacy workloads use for those two flowops), they will read or 
write the entire file in one I/O. Note that the thread memory, if it 
exists, must be at least as large as the largest size the file(s) can 
be. If iosize is set, the whole file will be read or written in "iosize" 
increments (or whatever is left of the file on the last I/O). So, if you 
have thread memory of 10 MB, you can set iosize to 10 MB (or less) to 
prevent "tf_mem too small" errors.

Changing readwholefile and writewholefile to do multiple iosize 
transfers, instead of multiple 1 MB transfers) until the whole file was 
read or written necessitated fixing:

6601341 flowop_endop() needs to have the actual number of bytes of I/O 
done passed to it

So that correct accounting of transfered bytes occurs. This also makes 
the accounting more accurate for sequential reads which can do less than 
iosize reads when they hit the end of the file. Finally, it fixes:

6595374 "tf_memsize smaller than IO size" error when reading/writing 
large file

which was caused by the way readwholefile and writewholefile manipulated 
iosize so the old flowop_endop() would (almost) do correct bytes 
transfered accounting.

The webrev for all of this has been posted to OpenSolaris.org at:
http://cr.opensolaris.org/~dreww/filebench_files2filesets 
<http://cr.opensolaris.org/%7Edreww/filebench_files2filesets>

Looking forward to your comments.

Drew Wilson

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

[perf-discuss] Webrev posted for Filebench fixes / enhancements

Reply via email to