Von: Mattias Gaertner <nc-gaert...@netcologne.de>
Gesendet: Fr 03.04.2009 18:07
An: fpc-pascal@lists.freepascal.org; 
Betreff: Re: [fpc-pascal] "Generics" Red Black Tree for fpc

> On Fri, 3 Apr 2009 17:16:50 +0200
> Helmut Hartl <helmut.ha...@firmos.at> wrote:
> 
> > Von: Mattias Gaertner <nc-gaert...@netcologne.de>
> > Gesendet: Fr 03.04.2009 16:51
> > An: fpc-pascal@lists.freepascal.org; 
> > Betreff: Re: [fpc-pascal] "Generics" Red Black Tree for fpc
> > 
> > > How much work do you think is it to extend it to accept duplicate
> > > keys? Mattias
> > 
> > How probable are duplicate keys in your usecase? / what is the use
> > case ? It would be easy to add them as list per key node natively,
> > but that would be possible with the current version too if you use a
> > listtype as the storage type ...
> 
> Of course duplicates are very unlikely, but chance is not 0. Approx 80%
> of my trees must therefore support duplicates.
> Using lists in the nodes would cost speed and memory (duplicates are
> seldom), so I would prefer a more direct support.
> 
> Mattias

I can think of two possible ways of allowing duplicates:

a) the internal node structure gets another pointer for the next duplicated 
storage value. 
(single linked list). The tree structure does not change. If duplicates are not 
very
probable the linked list is sufficiently small to not give a huge performance 
penalty.
The order of access would then be log(n)+k, where k ist the amount of 
duplicates.
We have one pointer more per node. The time to scan the whole tree linear would 
be n*log(n).
The tree semantics for a random access with duplicates would not be very clean: 
1) first scan for a key
2) check if duplicates exist
3) scan the duplicate list
Implementation would be easy, memory cost just one additional pointer, maybe 
the duplicate count in the node.

b)
We add 2 pointers in double linked list fashion to the tree node structure and 
link each node against its pred and next node. If we insert a duplicate we do 
it only in the list, not in the tree structure.
We could then lineary scan the tree faster in the order of n, by traversing the 
list and rembering the last element - and in log(n) for a random access. 
Semantics for the access would be cleaner.
1) Search the key
2) call next until the key changes


Background Info:
This tree + semantic is designed for heavy multiprocessor/multithreading loads, 
therefor the
interface must be kept as simple as possible. If you have multiple threads 
reading and writing concurrently on the tree, one thread(A) can search the tree 
by only rembering the active key, 
while two other thread (C,D) may delete or insert nodes. If thread A gets 
interupted in the scan then he can continoue with his last key value and finds 
the next node, making progress.
If thread A would remember a pointer to the node, he could search the tree in 
order n (not the described n*log(n) but has to deal with deleted nodes, 
ABA-Problems and much headache - or the tree (as directory) must be locked as a 
whole while processing a linear scan. (which is practically two inefficient and 
slow).

Which means as conclusion that the speed benefit of solution b) will vanish.
A solution to the whole cause could be a STM (software transactional memory) 
which i currently 
work on - but thats far away from production quality ...

So i would propose solution A) with the following semantics for a random acess 
- if the
tree would be intended for multithreaded usage - for single threaded usage i 
would propose solution B).

A)
1) Search the key with a function:
function Find(const key:_TKey;out node:_TStore;const 
DuplicateIterator:TIteratorFunc):Boolean;
where you pass an iteratorfunction which gets called for every extra duplicate.

The multithreaded access could then be:
1) Lock the directory (r/b tree) 
2) Find the key and fetch the objects and duplicates, lock them
3) unlock the tree
4) process the objects/data

i hope i was somewhat clear :-) 

If it is of value i would implement a solution.

 helmut
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Reply via email to