Kern Sibbald wrote (2007/01/30):
> These are interesting results, but I'm slightly confused.  It seems that you 
> have either misinterpreted or changed the original subject (no problem), 
> which was referring to a large FileSet exclusion list for backup.

I'm sorry, I just replied to the fact, that there is just double linked
list (dlist) implemented in Bacula. I met this problem, when I tried to
restore a backup with over 3000000 milions files on a server with some
special purposes, where there are several directories with over
100000 files. I found that dlist::binary_insert() is very optimistic
in its name, so I started to dig into an implementation with more stable
time results, however I'm not sure, if it is not better to simply get
sorted results from database, which should help in my case too. However
I started to implement a balanced tree (unbalanced would be inadequate
in Bacula's case), because I believed, that there have to be some other
places, where a tree instead of a dlist would be much better - and it
seems that exclusion list would be the case.

> If I understand correctly, you have implemented an AA (never heard of it) 
> balanced tree for building the restore in memory tree.  Is that correct?

Yes. Fortunately, there is very good interface border between data type
usage, and its implementation, so it was very easy to switch from dlist
to my tree implementation (however still unfinished yet), thanks ;o)

> On Tuesday 30 January 2007 08:57, Rudolf Cejka wrote:
> Yes, it is probably also kernel caching on the Bacula side as well.  I always 
> run the test 10 times sequentially, throw out the first two results and then 
> discard any subsequent test more than 10% from the average, then finally make 
> an average.  That usually gives reasonably consistent results.

However in the real life, the first values are very important too, because
it is often the real case - I do not have to repeat tree restoration more
than once ;o)

> Yes, if you are talking about the restore memory tree, the big problem is 
> always directories containing large numbers (more than 10000) files.

Since now, it almost "was" ;o)

> Have you looked at using the Bacula red-black tree code?  According to the 
> research that I did red-black trees are the fastest of all tree routines. 

Yes, AA tree is RB tree with one additional condition, so implementation
is easier and more transparent. When aa_insert() and aa_remove() are done,
it is easy to add {rb,avl}_{insert,remove} too. AA tree tends to perform
more rotations, but its resulting tree happens to be more balanced, so it
is believed, that AA and RB performances are balanced. AVL trees are even
more balanced, but not so popular, however this is the only balanced tree,
which I implemented in the past.

> The second question is have you examined the amount of memory required in the 
> two cases to hold the whole tree (current dlink code and balanced tree code)?

There are additional 8 bytes per file, parent and level items. Parent
can be omited, but then next() has to be recursive, or has to have some
internal stack (probably next direction in my tests). Level can be smaller
in its size, but I do not want to strike alignment problems. I saw memory
usage around 10 % bigger, 280 MB instead of 250 MB.

> The structures needed to setup and maintain a tree (balanced or not) are 
> rather more expensive in terms of bytes than a linked list.  It might be 
> interesting to be able to disable any code that balances the tree.

Balanced parts are distinguishable, so I hope, that it would be possible.

> A balanced tree is primarily optimized for searches rather than inserts,
> and the big time consumer for Bacula is the inserts.

Yes, but every insert means tree search first, so inserts are
significantly impacted too. In case of insertion of a sorted list
of items even very significantly.

-- 
Rudolf Cejka <cejkar at fit.vutbr.cz> http://www.fit.vutbr.cz/~cejkar
Brno University of Technology, Faculty of Information Technology
Bozetechova 2, 612 66  Brno, Czech Republic

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to