On 08/09/2012 10:01 AM, Matt Brookings wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 8/9/2012 9:50 AM, John Simpson wrote:
On 2012-08-08, at 2132, Eric Shubert wrote:
#define MAX_USERS_PER_LEVEL 100
...
In an ext3 environment, it could be set (by the admin) to 30000 (ext3 supports
32000 subdirectories), and with ext4 it could be set to 60000 (ext4 supports
64000). These settings would for the most part disable hashed directories,
while still allowing hashes should the filesystem limits be approached. Of
course, a default value in dir_control could still be 100, which would maintain
former behavior. If this were done, the --disable-users-big-dir option should
probably be changed to --allow-single-digit-users as well. ;)
Please let me know what the prospects of such changes are. If it doesn't look
like anything that might ever happen in this area, I just may patch the vauth.h
file to be 30000 and call it done.
The filesystem's limit on how many entries can exist in a directory is not the
only issue... the other issue is performance.
On most filesystems (including ext2/3/4), in order to find a particular file
within a directory, the kernel has to do a linear search on the contents. It
can take longer to do a linear search across 30K items than it does to search
through 100 entries, open a new directory, and do a second search through 100
entries. This isn't an issue for filesystems which implement directories as
binary trees instead of linear lists.
Recent versions of ext/2/3/4 have an option to created a hashed index
for directories (tune2fs -O dir_index), as I imagine you are aware. I
sincerely doubt that having vpopmail hashing provides any significant
benefit beyond that.
From an architectural point of view, I also expect you'd agree that
directory hashing belongs better at the filesystem level than in the
application code.
Perhaps it's desirable to provide hashing for some filesystems (any of
which I'm unaware) that do not provide directory hashing on their own.
That's fine. I'm not suggesting that the capability be removed, only
that it be able to be managed more effectively.
There is presently an option to turn this feature off at the user level,
which is great (imo). I suppose that the likelihood of having hashed
users is greater than that of having hashed domains in most situations,
but if the option is appropriate for users, why would it not also be
appropriate for domains? I initially wondered if the
--disable-users-big-dir option turned off hashing at both levels, which
seemed reasonable to me. I verified by examining the code though that
this is not the case. So the --disable-users-big-dir option seems
reasonable at least.
The scripts that I write which access the mailboxes all use "vdominfo" or
"vuserinfo" (or the qmail virtualdomains and users/assign files, and the domain's
vpasswd.cdb file) to locate the directories, rather than making assumptions about where a
particular domain or mailbox might be on the disk. This way I'm using the same exact method that
qmail uses to deliver mail, so I know I'm ending up in the right place.
Thanks for this tip. This is the proper way to access this data. I'll
look for any qmailtoaster-plus scripts that should be changed and fix them.
If I'm not mistaken, the limitation on single-character mailbox names has something to do with how
the hashing is implemented. The hash directories all have single-digit or single-letter names, and
if a mailbox exists with the same name, it causes problems (or at least confusion.) Personally, I
always thought they should have given the hash directories names which aren't used in SMTP address,
like ",0" or ",a", but that's not how it was originally written.
I agree.
John has basically said everything I was going to :) The only thing I would
mention is that the 5.4.32 and 5.4.33 both include changes that re-populate old
hash directories that have been made lighter by user deletion. It's the
"backfill" feature.
I think you both missed a significant part of my post. Let me make my
question as direct as I can. Why in the world is
#define MAX_USERS_PER_LEVEL 100
hard coded in the source (something nearly always best avoided), while
it's brethren data:
level_cur 0
level_max 3
level_start0 0
level_start1 0
level_start2 0
level_end0 61
level_end1 61
level_end2 61
level_mod0 0
level_mod1 2
level_mod2 4
level_index0 0
level_index1 0
level_index2 0
all live in the dom_89 record of the dir_control table?
I think it's great, however dangerous, that the directory hashing has
parameters which are so easily changed. I'm simply wondering, what would
be the problem (if any) with making max_users_per_level a field in the
same record, instead of it being hard coded in a header file?
The benefit of doing so is quite significant (more so than application
hashing on top of a filesystem hashing certainly). Most significantly,
it allows the administrator to tune the point at which application
hashing kicks in. Being able to tune things in this way is precisely
what I'm looking to be able to do. To be honest, I'm aiming at turning
vpopmail's directory hashing off, but I think that less drastic measures
would have value as well (IOW, letting it kick in at a higher level).
Another benefit is that, in theory, the limit of the number of domains
and users could be increased astronomically, perhaps to the actual
limits of any filesystem. In other words, instead of
100 + (62 * 100) + (62 * 62 * 100) + (62 * 62 * 62 * 100) = over 24
million domains (or users per domain), there could be over
30000 + (62 * 30000) + (62 * 62 * 30000) + (62 * 62 * 62 * 30000) of
either/both on ext3. Ext4 does over 60000 where I've used 30000 for
ext3. I'll let someone else do the math (scientific notation would be
handy in this case).
Just one more question. If there's no problem or objection with doing
this, which version of vpopmail would you suggest I use to write a patch
to accomplish this? 5.4.33 (I would prefer) or a 5.5 version?
Thanks for your understanding and attention.
--
-Eric 'shubes'
!DSPAM:5024a1a034211797715083!