On Tuesday 06 May 2008 09:48:30 Roman Divacky wrote: > On Tue, May 06, 2008 at 02:25:56AM -0400, David Schultz wrote: > > On Mon, May 05, 2008, Roman Divacky wrote: > > > hi > > > > > > when we want to use a hash table in kernel we call "hashinit" which > > > initializes a hash table with power-of-2 size. There's also > > > "phashinit" that creates hash table of size that is a prime number. > > > This was added in 1995 by davidg@ but it is not used anywhere in > > > the kernel. > > > > > > phk@ commited rev. 1.30 of vfs_cache.c replacing phashinit with > > > hashinit stating that it's better because it replaces a division > > > with logical and. is this reason still valid today? (the commit was > > > done almost 11 years ago) > > > > > > is there still any reason why not use the phashinit instead of > > > hashinit? I believe using prime-sized hash table might have > > > positive performance impact... > > > > There's a tradeoff. > > > > The argument for using powers of 2 is that division takes many > > times longer than a logical AND. > > > > The argument for using primes is that if your hash function isn't > > as good as you thought it was, or if the data has some regularity > > you weren't expecting, you can get screwed a lot more easily with > > a power of 2 hash table. With a prime-sized hash table, you only > > get screwed if lots of your data is congruent modulo the prime, > > which is very rare. > > > > Most general-purpose hash implementations I've used (e.g., GNU > > libstdc++, Sun JDK, Microsoft .NET) use prime table sizes, > > probably to make it less likely that programmers will shoot > > themselves in the foot with pathological data or bad hash functions. > > yes... a division takes roughly 40 cycles on modern i386 hw, while > and is just 1, but does it matter compared to the access times of > memory these days? > > the ratio between cpu-speed/mem-speed has changed a lot. I am not > arguing if it's still true that avoiding the division helps the > performance these days...
requests/s * div_overhead - [avoided_]collisions/s * collision_overhead ~= -([avoided_]collisions/requests * collision_overhead) assuming the collision_overhead (requiring memory operations) greatly dominates the div_overhead. So if there is a high collision rate and you can reasonably assert that you will lower that significantly by using a prime sized hash table, the div_overhead doesn't matter. At least that's what I've come up with off the top of my head now. -- /"\ Best regards, | [EMAIL PROTECTED] \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | [EMAIL PROTECTED] / \ ASCII Ribbon Campaign | Against HTML Mail and News _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"