> On Sat, 7 Nov 2009, Dennis Clarke wrote:
>>
>> Now the first test I did was to write 26^2 files [a-z][a-z].dat in 26^2
>> directories named [a-z][a-z] where each file is 64K of random
>> non-compressible data and then some english text.
>
> What method did you use to produce this "random" data?

I'm using the tt800 method from Makoto Matsumoto described here :

see http://random.mat.sbg.ac.at/generators/

and then here :

    /*
     * Generate the random text before we need it and also
     * outside of the area that measures the IO time.
     * We could have just read bytes from /dev/urandom but
     * you would be *amazed* how slow that is.
     */
    random_buffer_start_hrt = gethrtime();
    if ( random_buffer_start_hrt == -1 ) {
        perror("Could not get random_buffer high res start time");
        exit(EXIT_FAILURE);
        }
    for ( char_count = 0; char_count < 65535; ++char_count ) {
        k_index = (int) ( genrand() * (double) 62 );
        buffer_64k_rand_text[char_count]=alph[k_index];
        }
    /* would be nice to break this into 0x40h char lines */
    for ( p = 0x03fu; p < 65535; p = p + 0x040u )
        buffer_64k_rand_text[p]='\n';
    buffer_64k_rand_text[65535]='\n';
    buffer_64k_rand_text[65536]='\0';
   random_buffer_end_hrt = gethrtime();

That works well.

You know what ... I'm a schmuck.  I didn't grab a time based seed first.
All those files with random text .. have identical twins on the filesystem
somewhere. :-P  damn

I'll go fix that.

>> The dedupe ratio has climbed to 1.95x with all those unique files that
>> are less than %recordsize% bytes.
>
> Perhaps there are other types of blocks besides user data blocks (e.g.
> metadata blocks) which become subject to deduplication?  Presumably
> 'dedupratio' is based on a count of blocks rather than percentage of
> total data.

I have no idea .. yet.  I figure I'll try a few more experiments to see
what it does and maybe, dare I say it, look at the source :-)

-- 
Dennis Clarke
dcla...@opensolaris.ca  <- Email related to the open source Solaris
dcla...@blastwave.org   <- Email related to open source for Solaris


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to