Andries,
> > int hash_fn (char * p)
> > {
> > int hash = 0;
> > while (*p) {
> > hash = hash + *p;
> > // Rotate a 31 bit field 7 bits:
> > hash = ((hash << 7) | (hash >> 24)) & 0x7fff;
> > }
> > return hash;
> > }
>
> Hmm. This one goes in the "catastrophic" catego
Andries,
> is very little interaction to start with. And the final AND that
truncates
> to the final size of the hash chain kills the high order bits.
> No good.
I didn't realise you were bit-masking down to the required size.
Yes, it would be pretty useless in that case.
Ralph.
>
> Andrie
On Sat, Feb 24, 2001 at 10:43:16AM +1300, Ralph Loader wrote:
> A while ago I did some experimentation with simple bit-op based string
> hash functions. I.e., no multiplications / divides in the hash loop.
>
> The best I found was:
>
> int hash_fn (char * p)
> {
> int hash = 0;
> while (*p
Hi,
I while ago I did some experimentation with simple bit-op based string
hash
functions. I.e., no multiplications / divides in the hash loop.
The best I found was:
int hash_fn (char * p)
{
int hash = 0;
while (*p) {
hash = hash + *p;
// Rotate a 31 bit field 7 bits:
hash =
Jonathan Morton writes:
> Meanwhile, let's go back to Linus' comment on compatibility and so on. He
> has a *very* good point, which I'll expand on slightly here:
>
> Suppose some stone-age Linux user, running 2.0.35 or something equally old
> (which runs ext2), decides to finally bite the bulle
From [EMAIL PROTECTED] Fri Feb 23 04:43:23 2001
> Now that you provide source for r5 and dx_hack_hash, let me feed my
> collections to them.
> r5: catastrophic
> dx_hack_hash: not bad, but the linear hash is better.
I never expected dx_hack_hash to be particularly good at
>Now that you provide source for r5 and dx_hack_hash, let me feed my
>collections to them.
>r5: catastrophic
>dx_hack_hash: not bad, but the linear hash is better.
So, not only does the linear hash normally provide a shorter worst-case
chain, its' results are actually more consistent than the o
[EMAIL PROTECTED] wrote:
>
> Just looking at R5 I knew it wasn't going to do well in this application
> because it's similar to a number of hash functions I tried with the same
> idea in mind: to place similar names together in the same leaf block.
> That turned out to be not very
Just looking at R5 I knew it wasn't going to do well in this application
because it's similar to a number of hash functions I tried with the same
idea in mind: to place similar names together in the same leaf block.
That turned out to be not very important compared to achieving a
>> idea of using crc32 as the default hash function
> I once liked those things, too - but I've learned better since.
> A universal class of hashing functions is a class with the property that
> given any input, the average performance of all the functions is good.
> For example, h(k) = (a * k +
Thus spake Alan Cox ([EMAIL PROTECTED]):
> > > There will be a lot fewer metadata index
> > > blocks in your directory file, for one thing.
> > Oh yes, another thing: a B-tree directory structure does not need
> > metadata index blocks.
> Before people get excited about complex tree directory inde
Linus Torvalds wrote:
>
> On Thu, 22 Feb 2001, Daniel Phillips wrote:
> >
> > In the first heat of hash races - creating 20,000 files in one directory
> > - dentry::hash lost out to my original hack::dx_hash, causing a high
> > percentage of leaf blocks to remain exactly half full and slowing dow
[EMAIL PROTECTED] (Martin Mares) wrote on 22.02.01 in
<[EMAIL PROTECTED]>:
> One could avoid this, but it would mean designing the whole filesystem in a
> completely different way -- merge all directories to a single gigantic
> hash table and use (directory ID,file name) as a key, but we were o
[EMAIL PROTECTED] (Daniel Phillips) wrote on 20.02.01 in
<01022020011905.18944@gimli>:
> But the current hash function is just a place holder, waiting for
> an better version based on some solid theory. I currently favor the
> idea of using crc32 as the default hash function, but I welcome
> s
On Thu, 22 Feb 2001, Ingo Oeser wrote:
>
> On Wed, Feb 21, 2001 at 09:19:45PM -0800, Linus Torvalds wrote:
> > In article <97230a$16k$[EMAIL PROTECTED]>,
> > Linus Torvalds <[EMAIL PROTECTED]> wrote:
> > >allocate blocks one at a time. Make the blocksize something nice and
> > >big, not just 4k
On Wednesday, February 21, 2001 07:30:47 PM -0800 Linus Torvalds
<[EMAIL PROTECTED]> wrote:
> On Thu, 22 Feb 2001, Daniel Phillips wrote:
>>
>
> I'd love to hear the results from R5, as that seems to be the reiserfs
> favourite, and I'm trying it out in 2.4.2 because it was so easy to plug
> i
Hi Linus,
Hi LKML people,
On Wed, Feb 21, 2001 at 09:19:45PM -0800, Linus Torvalds wrote:
> In article <97230a$16k$[EMAIL PROTECTED]>,
> Linus Torvalds <[EMAIL PROTECTED]> wrote:
> >allocate blocks one at a time. Make the blocksize something nice and
> >big, not just 4kB or 8kB or something.
>
>
> Daniel Phillips wrote:
> >
> > There will be a lot fewer metadata index
> > blocks in your directory file, for one thing.
> >
>
> Oh yes, another thing: a B-tree directory structure does not need
> metadata index blocks.
Before people get excited about complex tree directory indexes, remembe
H. Peter Anvin wrote:
> Martin Mares wrote:
> >
> > Hello!
> >
> > > True. Note too, though, that on a filesystem (which we are, after all,
> > > talking about), if you assume a large linear space you have to create a
> > > file, which means you need to multiply the cost of all random-access
>
Also sprach H. Peter Anvin:
} Martin Mares wrote:
} >
} > Hello!
} >
} > > True. Note too, though, that on a filesystem (which we are, after all,
} > > talking about), if you assume a large linear space you have to create a
} > > file, which means you need to multiply the cost of all random-acc
HPA writes:
> Daniel Phillips wrote:
> > I mentioned this earlier but it's worth repeating: the desire to use a
> > small block size is purely an artifact of the fact that ext2 has no
> > handling for tail block fragmentation. That's a temporary situation -
> > once we've dealt with it your 2,000
"H. Peter Anvin" wrote:
>
> Daniel Phillips wrote:
> >
> > Have you looked at the structure and algorithms I'm using? I would not
> > call this a hash table, nor is it a btree. It's a 'hash-keyed
> > uniform-depth tree'. It never needs to be rehashed (though it might be
> > worthwhile compacti
Linus Torvalds <[EMAIL PROTECTED]> wrote:
>
> Ed Tomlinson <[EMAIL PROTECTED]> wrote:
> >The default in reiserfs is now the R5 hash, but you are right that lots of
> > efforts went into finding this hash. This includes testing various
> > hashes on real directory structures to see which one work
In article <[EMAIL PROTECTED]>,
Daniel Phillips <[EMAIL PROTECTED]> wrote:
>
>I mentioned this earlier but it's worth repeating: the desire to use a
>small block size is purely an artifact of the fact that ext2 has no
>handling for tail block fragmentation. That's a temporary situation -
>once w
In article <[EMAIL PROTECTED]>,
Davide Libenzi <[EMAIL PROTECTED]> wrote:
>
>Yep, 4 is not good as a shifting factor. Prime number are the better choice for
>this stuff.
Oh, absolutely.
It looks like the hash function was done rather early on in the dcache
lifetime (one of the first things), ba
Andreas Dilger wrote:
>
> Daniel Phillips writes:
> > Easy, with average dirent reclen of 16 bytes each directory leaf block
> > can holds up to 256 entries. Each index block indexes 512 directory
> > blocks and the root indexes 511 index blocks. Assuming the leaves are
> > on average 75% full
Daniel Phillips writes:
> Easy, with average dirent reclen of 16 bytes each directory leaf block
> can holds up to 256 entries. Each index block indexes 512 directory
> blocks and the root indexes 511 index blocks. Assuming the leaves are
> on average 75% full this gives:
>
> (4096 / 16)
Daniel Phillips wrote:
>
> "H. Peter Anvin" wrote:
> >
> > Andreas Dilger wrote:
> > >
> > > Basically (IMHO) we will not really get any noticable benefit with 1 level
> > > index blocks for a 1k filesystem - my estimates at least are that the break
> > > even point is about 5k files. We _should
Daniel Phillips wrote:
>
> There will be a lot fewer metadata index
> blocks in your directory file, for one thing.
>
Oh yes, another thing: a B-tree directory structure does not need
metadata index blocks.
-hpa
--
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gi
Linus Torvalds wrote:
>
> On Tue, 20 Feb 2001, Daniel Phillips wrote:
> >
> > You mean full_name_hash? I will un-static it and try it. I should have
> > some statistics tomorrow. I have a couple of simple metrics for
> > measuring the effectiveness of the hash function: the uniformity of
> > t
In article <97230a$16k$[EMAIL PROTECTED]>,
Linus Torvalds <[EMAIL PROTECTED]> wrote:
>
>Another way of saying this: if you go to the complexity of no longer
>being a purely block-based filesystem, please go the whole way. Make the
>thing be extent-based, and get away from the notion that you have
"H. Peter Anvin" wrote:
>
> Andreas Dilger wrote:
> >
> > Basically (IMHO) we will not really get any noticable benefit with 1 level
> > index blocks for a 1k filesystem - my estimates at least are that the break
> > even point is about 5k files. We _should_ be OK with 780k files in a single
> >
Andreas Dilger wrote:
>
> Basically (IMHO) we will not really get any noticable benefit with 1 level
> index blocks for a 1k filesystem - my estimates at least are that the break
> even point is about 5k files. We _should_ be OK with 780k files in a single
> directory for a while.
>
I've had a
Daniel Phillips wrote:
>
> "H. Peter Anvin" wrote:
> >
> > Daniel Phillips wrote:
> > >
> > > Have you looked at the structure and algorithms I'm using? I would not
> > > call this a hash table, nor is it a btree. It's a 'hash-keyed
> > > uniform-depth tree'. It never needs to be rehashed (tho
Followup to: <971i36$180$[EMAIL PROTECTED]>
By author:[EMAIL PROTECTED] (Linus Torvalds)
In newsgroup: linux.dev.kernel
>
> (The current VFS name hash is probably _really_ stupid - I think it's
> still my original one, and nobody probably ever even tried to run it
> through any testing. For
On Thu, 22 Feb 2001, Daniel Phillips wrote:
>
> In the first heat of hash races - creating 20,000 files in one directory
> - dentry::hash lost out to my original hack::dx_hash, causing a high
> percentage of leaf blocks to remain exactly half full and slowing down
> the whole thing by about 5%.
"H. Peter Anvin" wrote:
>
> Martin Mares wrote:
> >
> > > True. Note too, though, that on a filesystem (which we are, after all,
> > > talking about), if you assume a large linear space you have to create a
> > > file, which means you need to multiply the cost of all random-access
> > > operatio
Daniel Phillips wrote:
>
> Have you looked at the structure and algorithms I'm using? I would not
> call this a hash table, nor is it a btree. It's a 'hash-keyed
> uniform-depth tree'. It never needs to be rehashed (though it might be
> worthwhile compacting it at some point). It also never n
On 21-Feb-2001 Daniel Phillips wrote:
> "H. Peter Anvin" wrote:
>>
>> Martin Mares wrote:
>> >
>> > > True. Note too, though, that on a filesystem (which we are, after all,
>> > > talking about), if you assume a large linear space you have to create a
>> > > file, which means you need to multip
On 21-Feb-2001 Linus Torvalds wrote:
> In article <[EMAIL PROTECTED]>,
> Ed Tomlinson <[EMAIL PROTECTED]> wrote:
>>
>>The default in reiserfs is now the R5 hash, but you are right that lots of
>>efforts went
>>into finding this hash. This includes testing various hashes on real
>>directory
>>
Martin Mares wrote:
> Hello!
>
> > True. Note too, though, that on a filesystem (which we are, after all,
> > talking about), if you assume a large linear space you have to create a
> > file, which means you need to multiply the cost of all random-access
> > operations with O(log n).
>
> One co
Martin Mares wrote:
>
> Hello!
>
> > True. Note too, though, that on a filesystem (which we are, after all,
> > talking about), if you assume a large linear space you have to create a
> > file, which means you need to multiply the cost of all random-access
> > operations with O(log n).
>
> One
In article <[EMAIL PROTECTED]>,
Ed Tomlinson <[EMAIL PROTECTED]> wrote:
>
>The default in reiserfs is now the R5 hash, but you are right that lots of efforts
>went
>into finding this hash. This includes testing various hashes on real directory
>structures to see which one worked best. R5 won
Martin Mares wrote:
>
> Hello!
>
> > Not true. The rehashing is O(n) and it has to be performed O(log n)
> > times during insertion. Therefore, insertion is O(log n).
>
> Rehashing is O(n), but the "n" is the _current_ number of items, not the
> maximum one after all the insertions.
>
> Let'
Hello!
> Not true. The rehashing is O(n) and it has to be performed O(log n)
> times during insertion. Therefore, insertion is O(log n).
Rehashing is O(n), but the "n" is the _current_ number of items, not the
maximum one after all the insertions.
Let's assume you start with a single-entry ha
Followup to: <[EMAIL PROTECTED]>
By author:Martin Mares <[EMAIL PROTECTED]>
In newsgroup: linux.dev.kernel
>
> Hello!
>
> > To have O(1) you've to have the number of hash entries > number of files and a
> > really good hasing function.
>
> No, if you enlarge the hash table twice (and re-has
Hello!
> True. Note too, though, that on a filesystem (which we are, after all,
> talking about), if you assume a large linear space you have to create a
> file, which means you need to multiply the cost of all random-access
> operations with O(log n).
One could avoid this, but it would mean de
Hello!
> You're right. However, for each hash table operation to be O(1) the size
> of the hash table must be >> n.
If we are talking about average case complexity (which is the only possibility
with fixed hash function and arbitrary input keys), it suffices to have
hash table size >= c*n for s
Mark Hahn wrote:
>
> > You're right. However, for each hash table operation to be O(1) the size
> > of the hash table must be >> n.
>
> there's at least one kind of HT where the table starts small
> and gets bigger, but at trivial cost (memcpy). while those
> memcpy's are O(n) each time, it's
Hello!
> My personal preference goes to skiplist coz it doesn't have fixed ( or growing
> ) tables to handle. You've simply a stub of data togheter with FS data in each
> direntry.
Another problem with skip lists is that they require variable sized nodes,
so you either need to keep free chunk li
On 21-Feb-2001 Martin Mares wrote:
> Hello!
>
>> My personal preference goes to skiplist coz it doesn't have fixed ( or
>> growing
>> ) tables to handle. You've simply a stub of data togheter with FS data in
>> each
>> direntry.
>
> Another problem with skip lists is that they require variable
Martin Mares wrote:
>
> Hello!
>
> > You're right. However, for each hash table operation to be O(1) the size
> > of the hash table must be >> n.
>
> If we are talking about average case complexity (which is the only possibility
> with fixed hash function and arbitrary input keys), it suffices
On 21-Feb-2001 Martin Mares wrote:
> Hello!
>
>> To have O(1) you've to have the number of hash entries > number of files and
>> a
>> really good hasing function.
>
> No, if you enlarge the hash table twice (and re-hash everything) every time
> the
> table fills up, the load factor of the table
Hello!
> To have O(1) you've to have the number of hash entries > number of files and a
> really good hasing function.
No, if you enlarge the hash table twice (and re-hash everything) every time the
table fills up, the load factor of the table keeps small and everything is O(1)
amortized, of cou
Hello!
> Have You tried to use skiplists ?
> In 93 I've coded a skiplist based directory access for Minix and it gave very
> interesting performances.
> Skiplists have a link-list like performance when linear scanned, and overall
> good performance in insertion/seek/delete.
Skip list search/inse
On 21-Feb-2001 Martin Mares wrote:
> Hello!
>
>> Have You tried to use skiplists ?
>> In 93 I've coded a skiplist based directory access for Minix and it gave
>> very
>> interesting performances.
>> Skiplists have a link-list like performance when linear scanned, and overall
>> good performance
On Wed, 21 Feb 2001, Bernd Eckenfels wrote:
> In article <01022100361408.18944@gimli> you wrote:
> > But actually, rm is not problem, it's open and create. To do a
> > create you have to make sure the file doesn't already exist, and
> > without an index you have to scan on average half the direct
On 20-Feb-2001 Daniel Phillips wrote:
> Earlier this month a runaway installation script decided to mail all its
> problems to root. After a couple of hours the script aborted, having
> created 65535 entries in Postfix's maildrop directory. Removing those
> files took an awfully long time. The
Alan Cox wrote:
>> probably a bad idea to use it, because in theory at least the VFS layer
>> might decide to switch the hash function around. I'm more interested in
>> hearing whether it's a good hash, and maybe we could improve the VFS hash
>> enough that there's no reason to use anything else.
In article <01022100361408.18944@gimli> you wrote:
> But actually, rm is not problem, it's open and create. To do a
> create you have to make sure the file doesn't already exist, and
> without an index you have to scan on average half the directory file.
Unless you use a File System which is be
Linus writes:
> On Tue, 20 Feb 2001, Daniel Phillips wrote:
> > You mean full_name_hash? I will un-static it and try it. I should have
> > some statistics tomorrow.
>
> I was more thinking about just using "dentry->d_name->hash" directly, and
> not worrying about how that hash was computed. Yes
> probably a bad idea to use it, because in theory at least the VFS layer
> might decide to switch the hash function around. I'm more interested in
> hearing whether it's a good hash, and maybe we could improve the VFS hash
> enough that there's no reason to use anything else..
Reiserfs seems to
On Tue, 20 Feb 2001, Daniel Phillips wrote:
>
> You mean full_name_hash? I will un-static it and try it. I should have
> some statistics tomorrow. I have a couple of simple metrics for
> measuring the effectiveness of the hash function: the uniformity of
> the hash space splitting (which in
On Tue, 20 Feb 2001, Jeremy Jackson wrote:
> Mike Dresser wrote:
>
> > the way i'm reading this, the problem is there's 65535 files in the directory
> > /where/postfix/lives. rm * or what have you, is going to take forever and
> > ever, and bog the machine down while its doing it. My understand
>Perhaps rm -rf . would be faster? Let rm do glob expansion,
>without the sort. Care to recreate those 65535 files and try it?
Perhaps, but I think that form is still fairly slow. It takes an
"uncomfortable" amount of time to remove a complex directory structure
using, eg. "rm -rf /usr/src/lin
Mike Dresser wrote:
> the way i'm reading this, the problem is there's 65535 files in the directory
> /where/postfix/lives. rm * or what have you, is going to take forever and
> ever, and bog the machine down while its doing it. My understanding is you
> could do the rm *, and instead of it rea
On Tue, 20 Feb 2001, Linus Torvalds wrote:
> In article <01022020011905.18944@gimli>,
> Daniel Phillips <[EMAIL PROTECTED]> wrote:
> >Earlier this month a runaway installation script decided to mail all its
> >problems to root. After a couple of hours the script aborted, having
> >created 65535
the way i'm reading this, the problem is there's 65535 files in the directory
/where/postfix/lives. rm * or what have you, is going to take forever and
ever, and bog the machine down while its doing it. My understanding is you
could do the rm *, and instead of it reading the tree over and over f
> In article <01022020011905.18944@gimli>,
> Daniel Phillips <[EMAIL PROTECTED]> wrote:
> >Earlier this month a runaway installation script decided to mail all its
> >problems to root. After a couple of hours the script aborted, having
> >created 65535 entries in Postfix's maildrop directory. R
In article <01022020011905.18944@gimli>,
Daniel Phillips <[EMAIL PROTECTED]> wrote:
>Earlier this month a runaway installation script decided to mail all its
>problems to root. After a couple of hours the script aborted, having
>created 65535 entries in Postfix's maildrop directory. Removing th
70 matches
Mail list logo