[vchkpw] Re: Hashed domain directories - options

Eric Shubert Thu, 09 Aug 2012 22:53:33 -0700

On 08/09/2012 10:01 AM, Matt Brookings wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On 8/9/2012 9:50 AM, John Simpson wrote:

On 2012-08-08, at 2132, Eric Shubert wrote:


#define MAX_USERS_PER_LEVEL 100
...

In an ext3 environment, it could be set (by the admin) to 30000 (ext3 supports 
32000 subdirectories), and with ext4 it could be set to 60000 (ext4 supports 
64000). These settings would for the most part disable hashed directories, 
while still allowing hashes should the filesystem limits be approached. Of 
course, a default value in dir_control could still be 100, which would maintain 
former behavior. If this were done, the --disable-users-big-dir option should 
probably be changed to --allow-single-digit-users as well. ;)

Please let me know what the prospects of such changes are. If it doesn't look 
like anything that might ever happen in this area, I just may patch the vauth.h 
file to be 30000 and call it done.


The filesystem's limit on how many entries can exist in a directory is not the 
only issue... the other issue is performance.

On most filesystems (including ext2/3/4), in order to find a particular file 
within a directory, the kernel has to do a linear search on the contents. It 
can take longer to do a linear search across 30K items than it does to search 
through 100 entries, open a new directory, and do a second search through 100 
entries. This isn't an issue for filesystems which implement directories as 
binary trees instead of linear lists.

Recent versions of ext/2/3/4 have an option to created a hashed indexfor directories (tune2fs -O dir_index), as I imagine you are aware. Isincerely doubt that having vpopmail hashing provides any significantbenefit beyond that.

From an architectural point of view, I also expect you'd agree thatdirectory hashing belongs better at the filesystem level than in theapplication code.

Perhaps it's desirable to provide hashing for some filesystems (any ofwhich I'm unaware) that do not provide directory hashing on their own.That's fine. I'm not suggesting that the capability be removed, onlythat it be able to be managed more effectively.

There is presently an option to turn this feature off at the user level,which is great (imo). I suppose that the likelihood of having hashedusers is greater than that of having hashed domains in most situations,but if the option is appropriate for users, why would it not also beappropriate for domains? I initially wondered if the--disable-users-big-dir option turned off hashing at both levels, whichseemed reasonable to me. I verified by examining the code though thatthis is not the case. So the --disable-users-big-dir option seemsreasonable at least.


The scripts that I write which access the mailboxes all use "vdominfo" or 
"vuserinfo" (or the qmail virtualdomains and users/assign files, and the domain's 
vpasswd.cdb file) to locate the directories, rather than making assumptions about where a 
particular domain or mailbox might be on the disk. This way I'm using the same exact method that 
qmail uses to deliver mail, so I know I'm ending up in the right place.

Thanks for this tip. This is the proper way to access this data. I'lllook for any qmailtoaster-plus scripts that should be changed and fix them.

If I'm not mistaken, the limitation on single-character mailbox names has something to do with how 
the hashing is implemented. The hash directories all have single-digit or single-letter names, and 
if a mailbox exists with the same name, it causes problems (or at least confusion.) Personally, I 
always thought they should have given the hash directories names which aren't used in SMTP address, 
like ",0" or ",a", but that's not how it was originally written.


I agree.


John has basically said everything I was going to :)  The only thing I would
mention is that the 5.4.32 and 5.4.33 both include changes that re-populate old
hash directories that have been made lighter by user deletion.  It's the
"backfill" feature.

I think you both missed a significant part of my post. Let me make myquestion as direct as I can. Why in the world is

#define MAX_USERS_PER_LEVEL 100

hard coded in the source (something nearly always best avoided), whileit's brethren data:

level_cur 0
level_max 3
level_start0 0
level_start1 0
level_start2 0
level_end0 61
level_end1 61
level_end2 61
level_mod0 0
level_mod1 2
level_mod2 4
level_index0 0
level_index1 0
level_index2 0
all live in the dom_89 record of the dir_control table?

I think it's great, however dangerous, that the directory hashing hasparameters which are so easily changed. I'm simply wondering, what wouldbe the problem (if any) with making max_users_per_level a field in thesame record, instead of it being hard coded in a header file?

The benefit of doing so is quite significant (more so than applicationhashing on top of a filesystem hashing certainly). Most significantly,it allows the administrator to tune the point at which applicationhashing kicks in. Being able to tune things in this way is preciselywhat I'm looking to be able to do. To be honest, I'm aiming at turningvpopmail's directory hashing off, but I think that less drastic measureswould have value as well (IOW, letting it kick in at a higher level).

Another benefit is that, in theory, the limit of the number of domainsand users could be increased astronomically, perhaps to the actuallimits of any filesystem. In other words, instead of100 + (62 * 100) + (62 * 62 * 100) + (62 * 62 * 62 * 100) = over 24million domains (or users per domain), there could be over30000 + (62 * 30000) + (62 * 62 * 30000) + (62 * 62 * 62 * 30000) ofeither/both on ext3. Ext4 does over 60000 where I've used 30000 forext3. I'll let someone else do the math (scientific notation would behandy in this case).

Just one more question. If there's no problem or objection with doingthis, which version of vpopmail would you suggest I use to write a patchto accomplish this? 5.4.33 (I would prefer) or a 5.5 version?


Thanks for your understanding and attention.

--
-Eric 'shubes'




!DSPAM:5024a1a034211797715083!

[vchkpw] Re: Hashed domain directories - options

Reply via email to