Hello, l...@gnu.org (Ludovic Courtès) skribis:
> Total time: 66.515425859 seconds (31.859444991 seconds in GC) So, more investigation on the GC side… ‘perf’ shows a profile like this: --8<---------------cut here---------------start------------->8--- 12.33% guile libgc.so.1.0.3 [.] GC_mark_from 10.67% guile libgc.so.1.0.3 [.] GC_move_disappearing_link_inner 8.47% guile libgc.so.1.0.3 [.] GC_header_cache_miss 7.06% guile libgc.so.1.0.3 [.] GC_mark_and_push 5.98% guile libgc.so.1.0.3 [.] GC_finalize 4.08% guile libpthread-2.25.so [.] pthread_mutex_trylock 4.03% guile libgc.so.1.0.3 [.] GC_register_disappearing_link_inner 3.95% guile libpthread-2.25.so [.] __pthread_mutex_unlock_usercnt 3.07% guile libguile-2.2.so.1.2.0 [.] weak_table_put_x 3.05% guile libgc.so.1.0.3 [.] GC_base 2.49% guile libguile-2.2.so.1.2.0 [.] resize_table 2.47% guile libgc.so.1.0.3 [.] GC_is_marked 1.76% guile libguile-2.2.so.1.2.0 [.] rob_from_rich 1.54% guile libgc.so.1.0.3 [.] GC_grow_table 1.22% guile libgc.so.1.0.3 [.] GC_find_header 1.19% guile libgc.so.1.0.3 [.] GC_clear_mark_bit 1.17% guile libguile-2.2.so.1.2.0 [.] mem2uinteger 1.13% guile libgc.so.1.0.3 [.] GC_malloc_kind 0.98% guile libguile-2.2.so.1.2.0 [.] peek_byte_or_eof 0.98% guile libguile-2.2.so.1.2.0 [.] scm_getc 0.71% guile libguile-2.2.so.1.2.0 [.] scm_get_byte_or_eof 0.68% guile libguile-2.2.so.1.2.0 [.] scm_to_uint64 0.68% guile libguile-2.2.so.1.2.0 [.] read_token 0.67% guile libgc.so.1.0.3 [.] GC_is_heap_ptr 0.64% guile libguile-2.2.so.1.2.0 [.] scm_unget_bytes 0.60% guile libguile-2.2.so.1.2.0 [.] read_inner_expression 0.59% guile libguile-2.2.so.1.2.0 [.] scm_flush 0.58% guile libguile-2.2.so.1.2.0 [.] peek_utf8_codepoint 0.55% guile libguile-2.2.so.1.2.0 [.] scm_set_cdr_x 0.53% guile libgc.so.1.0.3 [.] GC_push_marked 0.53% guile libguile-2.2.so.1.2.0 [.] scm_c_weak_set_lookup 0.51% guile libgc.so.1.0.3 [.] GC_call_with_alloc_lock 0.51% guile libguile-2.2.so.1.2.0 [.] scm_read_sexp 0.50% guile libguile-2.2.so.1.2.0 [.] mark_weak_key_table 0.49% guile libguile-2.2.so.1.2.0 [.] move_weak_entry.part.0 0.49% guile libguile-2.2.so.1.2.0 [.] scm_ungetc 0.47% guile libguile-2.2.so.1.2.0 [.] scm_to_int32 0.46% guile libguile-2.2.so.1.2.0 [.] flush_ws 0.45% guile libguile-2.2.so.1.2.0 [.] scm_i_string_ref 0.41% guile libgc.so.1.0.3 [.] GC_reclaim_clear 0.39% guile libgc.so.1.0.3 [.] GC_move_disappearing_link --8<---------------cut here---------------end--------------->8--- As you hinted on #guix a while back Andy, the mark procedures Guile uses break libgc’s expectations. Specifically, ‘mark_weak_key_table’ typically pushes lots of objects on the mark stack (in my example the source properties table has a 3595271 ‘scm_t_weak_entry’ objects, so at every mark phase, we push roughly all of that on the mark stack.) Internally, libgc would never do that: in ‘GC_mark_from’ it has a limited “credit” for marking, and it stops when it has consumed all of it. However, with mark procedures, it cannot do that because the mark procedure obviously keeps running until it’s done, and from libgc’s viewpoint, the mark procedure marks one object (in this case, the big weak entry array.) (Besides, libgc recommends against using mark procedures in the first place, but we cannot use the GC_DS_BITMAP approach here because it only works for objects up to 30 words, not 200+ MiB arrays…) In addition to this, ‘GC_move_disappearing_link’ is expensive, as shown in the profile, yet it’s called quite a lot on table resizing. I’ve come to the conclusion that the 2.2 weak-table implementation strategy cannot work efficiently with libgc. I’m also skeptical (perhaps that’s also because I’m insufficiently informed, tell me!) about the “open-addressed” strategy that is used. To me, it’s necessarily less space-efficient than a regular hash table with chaining since we always have at least 10% more weak entries than the number of actual entries in the table (and in practice it’s usually much more than 10% AFAICS, because of the gap between subsequent sizes.) All in all, given that these issues are very likely causes of the execution time and memory consumption issues that plague the compiler (where we have huge symbol and source property tables), I’m in favor of switching back to the 2.0 implementation of weak hash tables. That can be done in an API-compatible way, I think. WDYT? Better ideas maybe? Thanks, Ludo’.