On Tue, 9 May 2006, Matthew Ahrens wrote:

> On Sun, May 07, 2006 at 11:38:52PM -0700, Darren Dunham wrote:
> > I was doing some tests with creating and removing subdirectories and
> > watching the time that takes.  The directory retains the size and
> > performance issues after the files are removed.
> >
> > /rootz/test> ls -la .
> > total 42372
> > drwxr-xr-x   2 add      root           2 May  7 23:20 .
> > drwxr-xr-x   3 root     sys            3 May  7 00:34 ..
> > /rootz/test> time du -ak .
> > 21184   .
>
> ZFS doesn't recover *quite* all the space that large directories use,
> even after deleting all the entries.  From my own tests, I thought we
> would be recovering about 5x more space than we are.  We rely on
> compression to help recover the space, but it appears that the "empty"
> directory blocks are not as compressable as I thought they would be.
> Here's my experience:
>
>       # ls -ls
>       154075 drwxr-xr-x   2 root     root        1.0M May  9 11:14 dir
>
> So I have a directory with 1 million entries (all links to the same file),
> which takes up about 76MB (~75 bytes per entry)
>
>       # ptime du -h dir
>         77M   dir
>
> Yep, 77MB.
>
>       real       41.780
>       user        1.971
>       sys        39.793
>
> Took about 0.04 milliseconds per directory entry to readdir() and stat() it.
>
> After removing all the dirents:
>
>       #ptime du -h dir
>         41M   dir
>
> Now we're down to about 40 bytes per (now deleted) entry.  If the
> compression was working optimally, this would be about 8 bytes per
> (deleted) entry.
>
> I've filed the following bug to track this issue:
>   6423695 empty ZAP leaf blocks are not compressed down to minimum size
>
>       real        0.383
>       user        0.000
>       sys         0.379
>
> And the 'du' was quick.
>
> > real    0m28.497s
> > user    0m0.002s
> > sys     0m1.174s
>
> Probably your directory is not cached, so it takes some time to read in
> all the (now empty) blocks of the directory.
>
> In the future, we may improve this by "joining" less-than-full ZAP
> blocks, or by simply special-casing "empty again" or "small again"
> directories and essentially automatically re-writing them in the most
> compact form.  So far we haven't found this to be a pressing performance
> issue, so it isn't high on our priority list.

Does this mean, that if I have a zfs filesystem that is
creating/writing/reading/deleting millions of short-lived files in a day,
that the directory area would keep growing?  Or am I missing something?

Surely the ideal case, when a file is deleted, is that the corresponding
directory space/time would be zero.  In the case of busy production
systems, it would be great to have some sort of zfs directory 'purge'
function, that, *if necessary*, could be run when the system is known to
be idle (in the wee hours of the A.M.)?

PS: I'm also thinking of a zfs filesystem layered on top of a RAM disk.
Whether that RAM disk is a chunk of the server memory (ramdiskadm) or
something similar to a Gigabyte i-ram (aka Gigabyte GC-RAMDISK) - which
looks like an SATA drive to the OS.

PPS: I see a trend towards lower-cost RAM disks that will improve rapidly,
in terms of capacity and cost/performance over the next few years.

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to