> Actually, this is not the average chain length: the resize happens when the first chain going > beyond the threshold is encountered. I would expect the average chain length to be somewhere > below that threshold. It would be interesting if you could provide insight into the distribution of > your hash table bucket chain length by adding instrumentation code. Please have a look at the attached files. I printed out histograms <chain length: frequency> right before and after each _do_cds_lfht_resize(). The average chain length computed as items / buckets exceeded 8. Each millisecond I was adding a unique item in the table. The test was run twice with different hash functions, siphash (64-bit) and lookup3 (32-bit).
Chains that long are ok, I think, until tables get huge and most node accesses during a lookup become cache misses. > > Perhaps you could make CHAIN_LEN_RESIZE_THRESHOLD a tunable? > That would be a possibility, yes. Would you like to propose a patch ? Perhaps later, too busy. On 27 March 2018 at 19:17, Mathieu Desnoyers <mathieu.desnoy...@efficios.com > wrote: > (Please keep the lttng-dev mailing list in CC.) > > ----- On Mar 27, 2018, at 12:07 PM, Denis Yevgenyevich > denis.yevgenyev...@gmail.com wrote: > > >> When both flags are set, check_resize() is only for the bucket-local > chain > > > length algorithm that deals > >> with very small tables. Then, when the table size goes above that > threshold, > > > split-counters are used to > >> adjust the required table size. We don't use the split-counters > approximation of > > > the number of elements > >> in the table for small tables because they are not accurate enough for > small > > > tables. It's a tradeoff > > > between accuracy and accounting overhead here. > > Thanks for the explanation. > > >> Notice that ht_count_add() and ht_count_del() both invoke > > > cds_lfht_resize_lazy_count(). It's that > > > function which is responsible for the split-counter based resize > algorithm. > >> If both (CDS_LFHT_ACCOUNTING | CDS_LFHT_AUTO_RESIZE) flags are set, the > table > > > will > > > still grow after it has gone beyond this condition. > > That's what I overlooked. I've just repeated my test, letting it run for > a > > longer time. > > After 4096 items (4-core cpu) check_resize() stopped growing the table. > > 4096 buckets is indeed the threshold for a 4-core cpu system. > > > After 32768 items ht_cound_add() finally grew the table to 32768. That > is when > > the check > > if ((count >> CHAIN_LEN_RESIZE_THRESHOLD) < size) > > went false. The average chain length might have been 8 by that time. > Isn't it > > too much? > > Actually, this is not the average chain length: the resize happens when > the first chain going > beyond the threshold is encountered. I would expect the average chain > length to be somewhere > below that threshold. It would be interesting if you could provide insight > into the distribution of > your hash table bucket chain length by adding instrumentation code. > > > Perhaps you could make CHAIN_LEN_RESIZE_THRESHOLD a tunable? > > That would be a possibility, yes. Would you like to propose a patch ? > > > Anyway, I am glad now I understand how this stuff works. Thank you again! > > You're welcome! > > Thanks, > > Mathieu > > > On 27 March 2018 at 16:02, Mathieu Desnoyers < [ > > mailto:mathieu.desnoy...@efficios.com | mathieu.desnoy...@efficios.com > ] > > > wrote: > > >> ----- On Mar 27, 2018, at 5:55 AM, Denis Yevgenyevich < [ > >> mailto:denis.yevgenyev...@gmail.com | denis.yevgenyev...@gmail.com ] > > wrote: > > >>>> The "accounting" option is meant to be enabled, otherwise you're > using the hash > >>> > table only with > >>> > the bucket-length resize threshold, which has the imperfections you > noticed > >>> Well, given that my software has to maintain precise counts of items > inserted > >>> into some of hashtables, > >>> I'd like to be able to avoid any performance overhead due to > accounting. > > >>>> only for small tables. Beyond a certain size, the numbers of elements > in the > >>> > hash table are used > >>> > as a criterion for resize. > >>>> Why is it unexpected that the table stops growing beyond 4096 after > 900 > >>> > insertions ? > >>> Ah, thanks. Turns out that the table stopped growing because of the > check > >>> if (count >= (1UL << (COUNT_COMMIT_ORDER + split_count_order))) > > >> If both (CDS_LFHT_ACCOUNTING | CDS_LFHT_AUTO_RESIZE) flags are set, the > table > >> will > >> still grow after it has gone beyond this condition. > > >> When both flags are set, check_resize() is only for the bucket-local > chain > >> length algorithm that deals > >> with very small tables. Then, when the table size goes above that > threshold, > >> split-counters are used to > >> adjust the required table size. We don't use the split-counters > approximation of > >> the number of elements > >> in the table for small tables because they are not accurate enough for > small > >> tables. It's a tradeoff > >> between accuracy and accounting overhead here. > > >> Notice that ht_count_add() and ht_count_del() both invoke > >> cds_lfht_resize_lazy_count(). It's that > >> function which is responsible for the split-counter based resize > algorithm. > > >>> I am a bit disappointed about that. Why can't we have tables that > resize > >>> 1. automatically > >>> 2. from 10s to 100,000s of buckets > >>> 3. without becoming too large too soon? > > >> The only thing here is that the bucket-length algorithm used for small > tables > >> can grow the > >> table more quickly than you would expect if you have hash collisions. > But it > >> should not > >> matter that much because it only applies to small tables. Do you > experience "too > >> aggressive" > >> hash table growth even beyond the threshold selecting the > split-counters resize > >> algorithm ? > > >> That threshold is calculated as: > > >> (1UL << (COUNT_COMMIT_ORDER + split_count_order) > > >> Where COUNT_COMMIT_ORDER = 10 > >> and split_count_order = log2(nr_cpus ceiled to next power of 2) > > >> Assuming a 64-core machine: > > >> log2(64) = 6 > >> 2 ^ (10 + 6) = 65536 > > >> Considering that you created your hash table with an initial size of 16 > buckets, > >> specified a minimum > >> number of buckets of 16, and a maximum number of buckets of 1 << 16 > (65536), > >> you'll always be > >> using the "hash-chain length" algorithm for resize on a 64-core > machine. The > >> split-counter algorithm > >> would never kick in. > > >> You should therefore try tweaking the @max_nr_buckets parameter to a > larger > >> value when > >> calling cds_lfht_new(). > > >> How many cores do you have ? > > >> Let me know how is goes, > > >> Thanks, > > >> Mathieu > > >>> On 26 March 2018 at 18:32, Mathieu Desnoyers < [ > >>> mailto:mathieu.desnoy...@efficios.com | mathieu.desnoy...@efficios.com > ] > > >>> wrote: > > >>>> ----- On Mar 26, 2018, at 11:01 AM, Denis Yevgenyevich < [ > >>>> mailto:denis.yevgenyev...@gmail.com | denis.yevgenyev...@gmail.com ] > > wrote: > > >>>>> Hello, > > >>>>> I think rculfhash autoresizing (as enabled by CDS_LFHT_AUTO_RESIZE) > does not > >>>>> behave well. Each time _cds_lfht_add finds a hash bucket with more > than 2 > >>>>> items, it basically tries to quadruple the hash table size. But even > with a > >>>>> perfect hash function hash collisions are not that rare, given a > large enough > >>>>> table. That is, every now and then the algorithm will come across a > 3-long > >>>>> chain and will grow the table unnecessarily. > > >>>>> I created a table with cds_lfht_new(16, 16, 1<<16, > CDS_LFHT_AUTO_RESIZE, NULL) > >>>>> and began adding an item each 10ms. Each item had a unique key and > the hash > >>>>> function was very good (full 64-bit values of Siphash). Judging by > messages I > >>>>> added to rculfhash.c, this was happening: > >>>>> after about 100 insertions, the table was resized to 512 > >>>>> after about 500 insertions, the table was resized to 4096 > >>>>> after about 1300 insertions, the table was resized to 16384 and > immediately to > >>>>> 32768 > > >>>>> With CDS_LFHT_AUTO_RESIZE|CDS_LFHT_ACCOUNTING the behaviour was > different but > >>>>> still unacceptable: the table got resized to 4096 after 900 > insertions and then > >>>>> stopped growing! > > >>>> The "accounting" option is meant to be enabled, otherwise you're > using the hash > >>>> table only with > >>>> the bucket-length resize threshold, which has the imperfections you > noticed. > > >>>> With accounting enabled, the bucket length is used as an > approximation for when > >>>> resize is needed > >>>> only for small tables. Beyond a certain size, the numbers of elements > in the > >>>> hash table are used > >>>> as a criterion for resize. > > >>>> Why is it unexpected that the table stops growing beyond 4096 after > 900 > >>>> insertions ? How many > >>>> more insertions did you do beyond 900 ? > > >>>> Thanks, > > >>>> Mathieu > > >>>>> Could someone please implement saner resize conditions, or better > yet, add a > >>>>> customisable hook for decisions as to when and how much should a > table grow? > > >>>>> Cheers, Denis. > > >>>>> _______________________________________________ > >>>>> lttng-dev mailing list > >>>>> [ mailto:lttng-dev@lists.lttng.org | lttng-dev@lists.lttng.org ] > >>>>> [ https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev | > >>>>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ] > > >>>> -- > >>>> Mathieu Desnoyers > >>>> EfficiOS Inc. > >>>> [ http://www.efficios.com/ | http://www.efficios.com ] > > >> -- > >> Mathieu Desnoyers > >> EfficiOS Inc. > >> [ http://www.efficios.com/ | http://www.efficios.com ] > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com >
;;; ----------------- 0: 2 1: 3 2: 3 3: 4 4: 2 5: 2 ;;; 39 items, 16 buckets, avg.len 2.44 ;;; RCU GROWING 16 TO 64 0: 34 1: 19 2: 11 ;;; 41 items, 64 buckets, avg.len 0.64 ;;; RCU RESIZED TO 64 ;;; ----------------- 0: 15 1: 16 2: 14 3: 13 4: 4 5: 2 ;;; 109 items, 64 buckets, avg.len 1.70 ;;; RCU GROWING 64 TO 256 0: 415 1: 85 2: 10 3: 2 ;;; 111 items, 512 buckets, avg.len 0.22 ;;; RCU RESIZED TO 512 ;;; ----------------- 0: 295 1: 155 2: 49 3: 10 4: 1 5: 0 6: 2 ;;; 299 items, 512 buckets, avg.len 0.58 ;;; RCU GROWING 512 TO 4096 0: 3807 1: 279 2: 10 ;;; 299 items, 4096 buckets, avg.len 0.07 ;;; RCU RESIZED TO 4096 ;;; ----------------- 0: 4 1: 9 2: 28 3: 85 4: 192 5: 297 6: 415 7: 544 8: 528 9: 596 10: 450 11: 322 12: 270 13: 161 14: 95 15: 49 16: 23 17: 11 18: 10 19: 5 20: 1 21: 0 22: 1 ;;; 34771 items, 4096 buckets, avg.len 8.49 ;;; RCU GROWING 4096 TO 32768 0: 11338 1: 12054 2: 6363 3: 2235 4: 627 5: 125 6: 22 7: 4 ;;; 34778 items, 32768 buckets, avg.len 1.06 ;;; RCU RESIZED TO 32768 ;;; ----------------- 0: 11 1: 81 2: 357 3: 854 4: 1842 5: 2968 6: 3845 7: 4568 8: 4549 9: 4211 10: 3350 11: 2337 12: 1648 13: 973 14: 584 15: 307 16: 154 17: 76 18: 30 19: 16 20: 4 21: 2 22: 0 23: 1 ;;; 264060 items, 32768 buckets, avg.len 8.06 ;;; RCU GROWING 32768 TO 262144 0: 95656 1: 96470 2: 48596 3: 16395 4: 4057 5: 808 6: 143 7: 16 8: 3 ;;; 264109 items, 262144 buckets, avg.len 1.01 ;;; RCU RESIZED TO 262144
;;; ----------------- 0: 2 1: 1 2: 2 3: 9 4: 1 5: 0 6: 1 ;;; 42 items, 16 buckets, avg.len 2.62 ;;; RCU GROWING 16 TO 64 0: 91 1: 32 2: 5 ;;; 42 items, 128 buckets, avg.len 0.33 ;;; RCU RESIZED TO 128 ;;; ----------------- 0: 52 1: 42 2: 26 3: 7 4: 1 ;;; 119 items, 128 buckets, avg.len 0.93 ;;; RCU GROWING 128 TO 512 0: 911 1: 107 2: 6 ;;; 119 items, 1024 buckets, avg.len 0.12 ;;; RCU RESIZED TO 1024 ;;; ----------------- 0: 531 1: 362 2: 103 3: 26 4: 2 ;;; 654 items, 1024 buckets, avg.len 0.64 ;;; RCU GROWING 1024 TO 4096 0: 3492 1: 555 2: 48 3: 1 ;;; 654 items, 4096 buckets, avg.len 0.16 ;;; RCU RESIZED TO 4096 ;;; ----------------- 0: 2 1: 5 2: 35 3: 75 4: 199 5: 319 6: 458 7: 550 8: 567 9: 512 10: 428 11: 384 12: 229 13: 137 14: 89 15: 48 16: 28 17: 16 18: 8 19: 4 20: 2 21: 0 22: 0 23: 1 ;;; 34435 items, 4096 buckets, avg.len 8.41 ;;; RCU GROWING 4096 TO 32768 0: 11457 1: 11984 2: 6410 3: 2219 4: 542 5: 121 6: 30 7: 5 ;;; 34449 items, 32768 buckets, avg.len 1.05 ;;; RCU RESIZED TO 32768 ;;; ----------------- 0: 10 1: 80 2: 336 3: 905 4: 1785 5: 3019 6: 3964 7: 4587 8: 4487 9: 4093 10: 3324 11: 2470 12: 1468 13: 1008 14: 611 15: 309 16: 176 17: 77 18: 35 19: 17 20: 5 21: 1 22: 1 ;;; 263868 items, 32768 buckets, avg.len 8.05 ;;; RCU GROWING 32768 TO 262144 0: 95735 1: 96539 2: 48544 3: 16180 4: 4167 5: 807 6: 148 7: 20 8: 4 ;;; 263930 items, 262144 buckets, avg.len 1.01 ;;; RCU RESIZED TO 262144
_______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev