Re: [zfs-discuss] ZSF Solaris

Al Hopper Tue, 30 Sep 2008 19:44:39 -0700

On Tue, Sep 30, 2008 at 6:30 PM, Nathan Kroenert
<[EMAIL PROTECTED]> wrote:
> Actually, the one that'll hurt most is ironically the most closely
> related to bad database schema design... With a zillion files in the one
> directory, if someone does an 'ls' in that directory, it'll not only
> take ages, but steal a whole heap of memory and compute power...
>
> Provided the only things that'll be doing *anything* in that directory
> are using indexed methods, there is no real problem from a ZFS
> perspective, but if something decides to list (or worse, list and sort)
> that directory, it won't be that pleasant.
>
> Oh - That's of course assuming you have sufficient memory in the system
> to cache all that metadata somewhere... If you don't then that's another
> zillion I/O's you need to deal with each time you list the entire directory.
>
> an ls -1rt on a directory with about 1.2 million files with names like
> afile1202899 takes minutes to complete on my box, and we see 'ls' get to
  ^^^^^^^^^^^


Here's your problem!

> in excess of 700MB rss... (and that's not including the memory zfs is
> using to cache whatever it can.)
>
> My box has the ARC limited to about 1GB, so it's obviously undersized
> for such a workload, but still gives you an indication...
>
> I generally look to keep directories to a size that allows the utilities
> that work on and in it to perform at a reasonable rate... which for the
> most part is around the 100K files or less...
>
> Perhaps you are using larger hardware than I am for some of this stuff? :)
>

I've seen this problem where *Solaris has issues with many files
created with this type of file naming pattern.  For example, the file
naming pattern produced by tmpfile(3C).  I saw it originally on a
tmpfs and it can be easily reproduced by:

[note: I'm writing this from memory - so don't beat me up over specific details]

1) pick a number for the number of files you want to test with (try
different numbers - start with 1,500 and then increase it).  Call this
test#
2) cd /tmp
3)  IMPORTANT:  Make a test directory for this experiment - let's call it temp
4) cd /tmp/temp  (your playground)
5) using your favorite language generate your test# of files using a
pattern similar to the one above by calling (ultimate) tmpfile()
6) ptime ls -al;      -  it will be quick the first time
7) ptime rm  * ;   - it will be quick the first time
8) repeat steps 5, 6 and 7.  Your ptimes will be a little slower
9) repeat steps 5, 6 and 7.  Your ptimes will be much slower
10) repeat steps 5, 6 and 7.  Your ptimes will be *really* slow.  Now
you'll understand that you have a problem.
11) repeat 5, 6 and 7 a couple more times.  Notice how bad your ptimes are now!
12) look at the size of /tmp/temp using ls -ald /tmp/temp  and you'll
notice that it has grown substancially.  The larger this directory
grows, the slower the filesystem operations will get.

This behavior is common to tmpfs, UFS and I tested it on early ZFS
releases.  I have no idea why - I have not made the time to figure it
out.  What I have observed is that all operations on your (victim)
test directory will max out (100% utilization) one CPU or one CPU core
- and all directory operations become single-threaded and limited by
the performance of one CPU (or core).

Now for the weird part: the *only* way to return everything to normal
performance levels (that I've found) is to rmdir the (victim)
directory.  This is why I recommend you perform this experiment in a
subdirectory.  If you do it in /tmp - you'll have to reboot the box to
get reasonably performance back - and you don't want to do it in your
home directory either!!

I'll try to set aside some time tomorrow to re-run this experiment.
But I'm nearly sure this is why your directory related file ops are so
slow and *dramatically* slower than they should be.   This problem/bug
is insideous - because using tmpfile() in /tmp is a very common
practice and the application(s) using /tmp will slow down dramatically
while maxing out (100% utilization) one CPU (or core).  And if your
system only has a single CPU...   :(

Let me know what you find out.  I know that the file name pattern is
what causes this bug to bite bigtime - and not so much the number of
files you use to test it.

I *suspect* that there might be something like a hash table that is
degenerating into a singly linked list as the root cause of this
issue.  But this is only my WAG.

Regards,

-- 
Al Hopper  Logical Approach Inc,Plano,TX [EMAIL PROTECTED]
                   Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZSF Solaris

Reply via email to