----- On Mar 27, 2018, at 5:55 AM, Denis Yevgenyevich 
<denis.yevgenyev...@gmail.com> wrote: 

>> The "accounting" option is meant to be enabled, otherwise you're using the 
>> hash
> > table only with
> > the bucket-length resize threshold, which has the imperfections you noticed
> Well, given that my software has to maintain precise counts of items inserted
> into some of hashtables,
> I'd like to be able to avoid any performance overhead due to accounting.

>> only for small tables. Beyond a certain size, the numbers of elements in the
> > hash table are used
> > as a criterion for resize.
>> Why is it unexpected that the table stops growing beyond 4096 after 900
> > insertions ?
> Ah, thanks. Turns out that the table stopped growing because of the check
> if (count >= (1UL << (COUNT_COMMIT_ORDER + split_count_order)))

If both (CDS_LFHT_ACCOUNTING | CDS_LFHT_AUTO_RESIZE) flags are set, the table 
will 
still grow after it has gone beyond this condition. 

When both flags are set, check_resize() is only for the bucket-local chain 
length algorithm that deals 
with very small tables. Then, when the table size goes above that threshold, 
split-counters are used to 
adjust the required table size. We don't use the split-counters approximation 
of the number of elements 
in the table for small tables because they are not accurate enough for small 
tables. It's a tradeoff 
between accuracy and accounting overhead here. 

Notice that ht_count_add() and ht_count_del() both invoke 
cds_lfht_resize_lazy_count(). It's that 
function which is responsible for the split-counter based resize algorithm. 

> I am a bit disappointed about that. Why can't we have tables that resize
> 1. automatically
> 2. from 10s to 100,000s of buckets
> 3. without becoming too large too soon?

The only thing here is that the bucket-length algorithm used for small tables 
can grow the 
table more quickly than you would expect if you have hash collisions. But it 
should not 
matter that much because it only applies to small tables. Do you experience 
"too aggressive" 
hash table growth even beyond the threshold selecting the split-counters resize 
algorithm ? 

That threshold is calculated as: 

(1UL << (COUNT_COMMIT_ORDER + split_count_order) 

Where COUNT_COMMIT_ORDER = 10 
and split_count_order = log2(nr_cpus ceiled to next power of 2) 

Assuming a 64-core machine: 

log2(64) = 6 
2 ^ (10 + 6) = 65536 

Considering that you created your hash table with an initial size of 16 
buckets, specified a minimum 
number of buckets of 16, and a maximum number of buckets of 1 << 16 (65536), 
you'll always be 
using the "hash-chain length" algorithm for resize on a 64-core machine. The 
split-counter algorithm 
would never kick in. 

You should therefore try tweaking the @max_nr_buckets parameter to a larger 
value when 
calling cds_lfht_new(). 

How many cores do you have ? 

Let me know how is goes, 

Thanks, 

Mathieu 

> On 26 March 2018 at 18:32, Mathieu Desnoyers < [
> mailto:mathieu.desnoy...@efficios.com | mathieu.desnoy...@efficios.com ] >
> wrote:

>> ----- On Mar 26, 2018, at 11:01 AM, Denis Yevgenyevich < [
>> mailto:denis.yevgenyev...@gmail.com | denis.yevgenyev...@gmail.com ] > wrote:

>>> Hello,

>>> I think rculfhash autoresizing (as enabled by CDS_LFHT_AUTO_RESIZE) does not
>>> behave well. Each time _cds_lfht_add finds a hash bucket with more than 2
>>> items, it basically tries to quadruple the hash table size. But even with a
>>> perfect hash function hash collisions are not that rare, given a large 
>>> enough
>>> table. That is, every now and then the algorithm will come across a 3-long
>>> chain and will grow the table unnecessarily.

>>> I created a table with cds_lfht_new(16, 16, 1<<16, CDS_LFHT_AUTO_RESIZE, 
>>> NULL)
>>> and began adding an item each 10ms. Each item had a unique key and the hash
>>> function was very good (full 64-bit values of Siphash). Judging by messages 
>>> I
>>> added to rculfhash.c, this was happening:
>>> after about 100 insertions, the table was resized to 512
>>> after about 500 insertions, the table was resized to 4096
>>> after about 1300 insertions, the table was resized to 16384 and immediately 
>>> to
>>> 32768

>>> With CDS_LFHT_AUTO_RESIZE|CDS_LFHT_ACCOUNTING the behaviour was different 
>>> but
>>> still unacceptable: the table got resized to 4096 after 900 insertions and 
>>> then
>>> stopped growing!

>> The "accounting" option is meant to be enabled, otherwise you're using the 
>> hash
>> table only with
>> the bucket-length resize threshold, which has the imperfections you noticed.

>> With accounting enabled, the bucket length is used as an approximation for 
>> when
>> resize is needed
>> only for small tables. Beyond a certain size, the numbers of elements in the
>> hash table are used
>> as a criterion for resize.

>> Why is it unexpected that the table stops growing beyond 4096 after 900
>> insertions ? How many
>> more insertions did you do beyond 900 ?

>> Thanks,

>> Mathieu

>>> Could someone please implement saner resize conditions, or better yet, add a
>>> customisable hook for decisions as to when and how much should a table grow?

>>> Cheers, Denis.

>>> _______________________________________________
>>> lttng-dev mailing list
>>> [ mailto:lttng-dev@lists.lttng.org | lttng-dev@lists.lttng.org ]
>>> [ https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev |
>>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ]

>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> [ http://www.efficios.com/ | http://www.efficios.com ]

-- 
Mathieu Desnoyers 
EfficiOS Inc. 
http://www.efficios.com 
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Reply via email to