Re: [Bacula-users] [Bacula-devel] Project for strong incremental backup assurance

Martin Simmons Wed, 13 Jun 2007 12:23:16 -0700

>>>>> On Mon, 11 Jun 2007 14:45:38 +0200, Kern Sibbald said:
> 
> Hello,
> 
> Thanks for your comments.  Please see my notes below ...
> 
> On Monday 11 June 2007 10:24, Andre Noll wrote:
> > On 15:35, Andre Noll wrote:
> > 
> > > > - Reviewing my hash table code (particularly the hash function)  
> > > > src/lib/htable.h src/lib/htable.c
> > > 
> > > Will do.
> > 
> > First some general remarks:
> > 
> >     - check the return value of malloc()
> 
> Not necessary, malloc() is #defined to be bmalloc(), which aborts if it is 
> out 
> of memory.
> 
> >     - keys are restricted to strings. it would be easy to extend this
> >       to arbitrary data by adding a "size" field.
> 
> Yes, good idea, though I don't immediately see a need to handle anything 
> other 
> than strings within Bacula.
> 
> >     - IMHO "%p" is prefered for printing pointer variables.
> 
> Yes, but it was not portable.  We now have our own Bacula printf code which 
> is 
> portable and supports %p, so I've been converting the old stuff when I am in 
> the file.  Clearly this is a good file to attack.
> 
> > 
> > --------------------------------------- htable.h
> > 
> > 
> > > struct hlink {
> > >    void *next;                        /* next hash item */
> > 
> > why not struct hlink *next?
> 
> Good question -- I think you are right.  I'll try it and see if anything 
> complains ...
> 
> > 
> > >    char *key;                         /* key this item */
> > >    uint32_t hash;                     /* hash for this key */
> > > };
> > > 
> > > class htable : public SMARTALLOC {
> > >    hlink **table;                     /* hash table */
> > >    int loffset;                       /* link offset in item */
> > 
> > see below.
> > 
> > >    uint32_t num_items;                /* current number of items */
> > >    uint32_t max_items;                /* maximum items before growing */
> > 
> > see below.
> > 
> > >    uint32_t buckets;                  /* size of hash table */
> > >    uint32_t hash;                     /* temp storage */
> > >    uint32_t index;                    /* temp storage */
> > >    uint32_t mask;                     /* "remainder" mask */
> > >    uint32_t rshift;                   /* amount to shift down */
> > >    hlink *walkptr;                    /* table walk pointer */
> > 
> > This is only used in first() and next(). IMHO the code would be
> > more readable if this were local to these two methods and
> > mext() would take a pointer to the predecessor.
> 
> I think what you are suggesting is possible, and I will take a look at it. 
> However, it leads to several problems:
> 1. walkptr and walk_index are then no longer treated the same, and IMO that 
> will make it harder to read the code.  I.e. there are 2 pieces of information 
> that would have to be returned and then passed into next().
> 2. There will be additional code necessary to adjust the pointer on entry, 
> which I currently avoid by keeping walkptr in the class.
> 3. It complicates the "user" code in that it must pass the old pointer to the 
> next() routine and if it is symmetric also walk_index.
> 
> > 
> > >    uint32_t walk_index;               /* table walk index */
> > 
> > same here.
> 
> As with the above, that will complicate the problem by returning it to the 
> higher level program.  The "user" program really shouldn't need to know 
> anything about walkptr or walk_index as they are simply the means to traverse 
> a hash list in linear order without using recursion (I have never seen such 
> code anywhere), so IMO, returning them to the higher level routine breaks the 
> encapsulation of the code.
> 
> > 
> > >    void hash_index(char *key);        /* produce hash key,index */
> > >    void grow_table();                 /* grow the table */
> > > public:
> > >    htable(void *item, void *link, int tsize = 31);
> > >    ~htable() { destroy(); }
> > >    void init(void *item, void *link, int tsize = 31);
> > >    bool  insert(char *key, void *item);
> > >    void *lookup(char *key);
> > >    void *first();                     /* get first item in table */
> > >    void *next();                      /* get next item in table */
> > >    void destroy();
> > >    void stats();                      /* print stats about the table */
> > >    uint32_t size();                   /* return size of table */
> > > };
> > 
> > ----------------------------------------htable.c
> > 
> > 
> > > /*
> > >  * Create hash of key, stored in hash then
> > >  *  create and return the pseudo random bucket index
> > >  */
> > > void htable::hash_index(char *key)
> > > {
> > >    hash = 0;
> > >    for (char *p=key; *p; p++) {
> > >       hash += (hash << 3) + (uint32_t)*p;
> > >    }
> > 
> > This ist just shifting and multiplication with the bytes in key. I
> > would expect this hash to have quite some unneccessary collisions
> > when used on strings, though I'm by no means an expert in this area.
> 
> If I am not mistaken, this is a *very* common technique.  The most common 
> algorithm shifts by 5 rather than 3, but if I remember right my tests showed 
> that 3 was better -- at least with the filenames on my system.


I think I know why 3 is better -- it manages to encode the final 32/3 chars
instead of the final 32/5 :-)  Shifting is not sufficient -- you need
rotation, otherwise the hash value will only depend on the tail of the string.


> 
> > 
> > >    /* Multiply by large prime number, take top bits, mask for remainder */
> > >    index = ((hash * 1103515249) >> rshift) & mask;
> > >    Dmsg2(100, "Leave hash_index hash=0x%x index=%d\n", hash, index);
> > >    return;
> > 
> > The return statement is unneccesary.
> 
> Yes, I have removed it as it could be confusing.

OTOH, it is good documentation to help the reader understand why the hash
function doesn't return anything :-)


> 
> 
> 
> > 
> > > void htable::init(void *item, void *link, int tsize)
> > > {
> > >    int pwr;
> > >    tsize >>= 2;
> > >    for (pwr=0; tsize; pwr++) {
> > >       tsize >>= 1;
> > >    }
> > >    loffset = (char *)link - (char *)item;
> > 
> > This is a constant, known at compile time, so make it a preprocessor
> > define using offsetof.
> 
> Uh, what is "This is a constant"?   I don't see any constant. loffset depends 
> on how one calls the routines.  Rather than using a C++ template, which 
> duplicates the code for each different structure that contains a hash link, 
> Bacula computes the offset of the link in the structure, and thus with the 
> same source code can handle any number of different user structures that have 
> hash links.  I have done the same for alist (allocated list) and dlist 
> (doubly linked lists).  The downside of this technique is that the higher 
> level code must cast because it passes void pointers to the structures.
> 
> It is a bit of a tricky point, so I'm not sure the above is clear.
> 
> > 
> > >    mask = ~((~0)<<pwr);               /* 3 bits => table size = 8 */
> > >    rshift = 30 - pwr;                 /* start using bits 28, 29, 30 */
> > >    num_items = 0;                     /* number of entries in table */
> > >    buckets = 1<<pwr;                  /* hash table size -- power of two 
> */
> > >    max_items = buckets * 4;           /* allow average 4 entries per 
> > > chain 
> */
> > 
> > As max_items is always four times the number of buckets, we could
> > get rid of max_items. 
> 
> Yes, good. I agree.
> 
> > More importantly, I think it is not optimal 
> > to store that many keys in the table because that means we'll have
> > in average four hits per bucket. I'd rather choose max_items _smaller_
> > than the number of buckets. For instance the git code which uses hash
> > tables extensively, grows its table whenever it is filled 66%.
> 
> 
> I'd like to see some tests on this.  I'm not convinced that the % full is a 
> good criterion because you could then have one bucket that has 100 items in 
> it which could be catastrophic.

If you have 100 items in a bucket then the hash function is very broken!
Also, I suspect that the acceptable % filled for chained tables can generally
be larger than for open tables because the clashes only affect one bucket.
Moreover, the current implementation stores the hash value (not the bucket
index) in each hlink in the chain, so the scan should quickly filter out
non-matching elements without calling strcmp.

__Martin



> 
> Once I implement a bloom filter, only 15% (with the parameters that I have 
> chosen) of the collisions will require a search of the buckets.
> 
> > 
> > >    table = (hlink **)malloc(buckets * sizeof(hlink *));
> > 
> > unneccessary cast.
> 
> In C++ code as opposed to C the cast is *required* or you get a compiler 
> warning (or perhaps an error -- I forget).
> 
> > 
> > >    memset(table, 0, buckets * sizeof(hlink *));
> > 
> > how about calloc() instead of malloc + memset?
> 
> I'm not sure that smartall (our underlying malloc routine) implements 
> calloc().  I'll look at it.
> 
> > 
> > > /*
> > >  * Take each hash link and walk down the chain of items
> > >  *  that hash there counting them (i.e. the hits), 
> > >  *  then report that number.
> > 
> > Open addressing might be better than chaining to deal with
> > hash collisions. In particular if items are never (or rarely)
> > removed from the table.
> 
> Open addressing has the big advantage that it keeps entries together which 
> improves CPU speed since everything is in the cache, but unless I am missing 
> something, open addressing is extremely expensive in terms of memory 
> requirements since the table has to always be the maximum size.  This could 
> be reasonable if we know how many items are going to be added, but if we do 
> not, I'm not sure.
> 
> I would like to see some real performance studies before changing.
> 
> > 
> > >  *  Obiously, the more hits in a chain, the more time
> > >  *  it takes to reference them. Empty chains are not so
> > >  *  hot either -- as it means unused or wasted space.
> > >  */
> > > #define MAX_COUNT 20
> > 
> > Unless I'm mistaken, the code will fail badly if there are more than
> > MAX_COUNT keys that hash to the same value.
> 
> Quite possibly, but I think there are some mitigating factors.  1. If I 
> remember right the max should be 4.  2. this code is really used for debug. 
> 3. if it fails the stats will be broken, but there will be no serious 
> consequences.  For just obtaining stats, I didn't want to spend too much time 
> worrying about covering all possibilities.
> 
> > 
> > > void htable::stats()
> > > {
> > >    int hits[MAX_COUNT];
> > >    int max = 0;
> > >    int i, j;
> > >    hlink *p;
> > >    printf("\n\nNumItems=%d\nTotal buckets=%d\n", num_items, buckets);
> > >    printf("Hits/bucket: buckets\n");
> > 
> > Dmesg?
> 
> Yes, that would clearly be better.
> 
> > 
> > >    for (i=0; i < MAX_COUNT; i++) {
> > >       hits[i] = 0;
> > >    }
> > 
> > memset(hits, 0 MAXCOUNT * sizeof(int));
> > 
> > >    for (i=0; i<(int)buckets; i++) {
> > >       p = table[i];
> > >       j = 0;
> > >       while (p) {
> > >          p = (hlink *)(p->next);
> > 
> > cast would become unneccessary if next were struct hlink*.
> 
> Yup :-)
> 
> > 
> > > void htable::grow_table()
> > > {
> > >    Dmsg1(100, "Grow called old size = %d\n", buckets);
> > >    /* Setup a bigger table */
> > >    htable *big = (htable *)malloc(sizeof(htable));
> > 
> > unnecessary cast
> 
> See above.  Actually, if I were doing this right, I would probably use one of 
> those new fangled C++ casts, but I find them so ugly and the syntax so hard 
> to remember that I avoid them.
> 
> > 
> > >    big->loffset = loffset;
> > >    big->mask = mask<<1 | 1;
> > >    big->rshift = rshift - 1;
> > >    big->num_items = 0;
> > >    big->buckets = buckets * 2;
> > >    big->max_items = big->buckets * 4;
> > 
> > DRY ;)
> > 
> > >    big->table = (hlink **)malloc(big->buckets * sizeof(hlink *));
> > >    memset(big->table, 0, big->buckets * sizeof(hlink *));
> > 
> > unnecessary cast, calloc().
> > 
> > >    /*
> > >     * We walk through the old smaller tree getting items,
> > >     * but since we are overwriting the colision links, we must
> > >     * explicitly save the item->next pointer and walk each
> > >     * colision chain ourselves.  We do use next() for getting
> > >     * to the next bucket.
> > >     */
> > >    for (void *item=first(); item; ) {
> > >       void *ni = ((hlink *)((char *)item+loffset))->next;  /* save link 
> overwritten by insert */
> > 
> > It might be worth to improve readability by defining
> > 
> > #define next_item(item) ((hlink *)((char *)((item)+(loffset))
> 
> Yes, good idea. I was planning to do that as I did in dlist.c.
> 
> > 
> > as this calculation is needed several times.
> > 
> > >       Dmsg1(100, "Grow insert: %s\n", ((hlink *)((char 
> *)item+loffset))->key);
> > >       big->insert(((hlink *)((char *)item+loffset))->key, item);
> > >       if (ni) {
> > >          item = (void *)((char *)ni-loffset);
> > >       } else {
> > >          walkptr = NULL;
> > >          item = next();
> > >       }
> > >    }
> > 
> > This rather ugly code would become unneccessary when using open
> > addressing instead of chaining.
> 
> Quite possibly, but see my remarks about open addressing above.
> 
> > 
> > > void *htable::next()
> > > {
> > >    Dmsg1(100, "Enter next: walkptr=0x%x\n", (unsigned)walkptr);
> > 
> > 
> > >    if (walkptr) {
> > >       walkptr = (hlink *)(walkptr->next);
> > 
> > unnecessary cast
> 
> C++  :-)
> 
> Thanks for taking the time to look at this -- you are the first person who 
> has 
> been interested in this kind of code. I'll go over each of your points in 
> detail -- especially since I will be going through this code in the near 
> future to bring it up to date (#defines to simplify the pointer offsets), ...
> 
> Don't hesitate to respond if some of my remarks above are incorrect.
> 
> What always amuses me when I look back at code like this is that I wrote it 4 
> years ago to be able to implement the base file project (de-duplication) and 
> the project to track files added/removed from the system, but I've been so 
> busy with other priority projects that I have not had time to get back and 
> actually use these routines.
> 
> The same happened for my red/black tree (src/lib/rblist.c/h), except in that 
> case, I have now actually put the code to use in the restore code giving a 
> 513 times speed up in version 2.1.x compared to 2.0.x for 800,000 files.
> 
> Best regards,
> 
> Kern
> 
> > 
> > 
> > Regards
> > Andre
> > -- 
> > The only person who always got his work done by Friday was Robinson Crusoe
> > 
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
> 

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] [Bacula-devel] Project for strong incremental backup assurance

Reply via email to