Andy Wingo <wi...@pobox.com> writes: > On Tue 06 Mar 2012 10:55, Mark H Weaver <m...@netris.org> writes: > >> +static void >> +set_stringbuf_shared (SCM buf) >> +{ >> + /* Don't modify BUF if it's already marked as shared since it >> + might be a read-only, statically allocated stringbuf. */ >> + if (!STRINGBUF_SHARED (buf)) >> + { >> + scm_i_pthread_mutex_lock (&stringbuf_write_mutex); >> + SCM_SET_CELL_WORD_0 (buf, SCM_CELL_WORD_0 (buf) | STRINGBUF_F_SHARED); >> + scm_i_pthread_mutex_unlock (&stringbuf_write_mutex); >> + } >> +} >> + > > Does this work, with C's memory model? It seems that if thread A sets > the shared flag on stringbuf S, a concurrent call to > set_stringbuf_shared(S) from thread B has no guarantee as to what value > to see, as the initial flag check is outside the lock (a synchronization > point).
That's true. However, in that case, the shared flag is already being set by another thread, so it doesn't matter, because the only requirement of this function is to make sure the flag gets set. We could do it unconditionally if not for the existence of read-only statically allocated stringbufs. The flag is _never_ cleared. Also note that 'set_stringbuf_shared' does not return the value of the flag, so nobody else sees this undefined value. In fact, there are only two places where the flag is accessed: 1. %symbol-dump, which only prints it out for debugging purposes. 2. scm_i_string_start_writing, which accesses it while the 'stringbuf_write_mutex' is locked. > Perhaps it doesn't matter. This is the only place that the SHARED flag > is accessed outside of a mutex, yes? The only other place is '%symbol-dump', which merely prints the flag's value for debugging purposes. > Adding Ken for thoughts on threadsafety. If you are convinced it's > right, please add a comment to the source code. Given the above considerations, I'm quite confident that this is safe, and will add a comment. Of course Ken's thoughts are always welcome :) >> + return scm_double_cell (scm_tc7_symbol | SCM_I_F_SYMBOL_LAZY_GENSYM, >> + SCM_UNPACK (prefix_stringbuf), (scm_t_bits) 0, >> + SCM_UNPACK (scm_cons (SCM_BOOL_F, SCM_EOL))); > > Would be nice to avoid the plist cons if possible, but that's another > issue. Yes, that definitely irked me as well. If we could get rid of the "hash value" in the third slot, we could put the function slot there instead and avoid the cons. This 'equal?' hash value (the same as the hash of the name string) is very unfortunate, because it means that a lazy gensym must be forced whenever it's added to an 'equal?' hash table. This shouldn't be necessary. Since 'equal?' compares symbols the same way that 'eq?' does, the 'equal?' hash value should be the same as the 'eq?' hash value. I guess this is part of the hack to implement the weird symbol table keyed on the symbols themselves. But as you say, this is a separate issue that deserves its own thread. >> + /* Attempt to intern the symbol */ >> + handle = scm_hash_fn_create_handle_x (symbols, sym, SCM_UNDEFINED, >> + symbol_lookup_hash_fn, >> + symbol_lookup_assoc_fn, >> + NULL); >> + } while (SCM_UNLIKELY (!scm_is_eq (sym, SCM_CAR (handle)))); > > Note that this is racy: this is a weak key hash table, so it's not safe > to access the car of the handle outside the alloc lock. It's not an issue here, because the only thing I'm doing with the 'car' is checking that it's 'eq?' to the lazy gensym that's being forced. If it _is_ 'eq?' to our gensym, then there's no possibility that it will be nulled out, because we hold a reference to it. If it's _not_ 'eq?' to our gensym, then we don't care whether it's null or not; in either case we have failed to intern this name and we try again with the next counter value. However, note that 'intern_symbol', 'lookup_interned_symbol', and 'lookup_interned_latin1_symbol' all access the 'car' of a handle of the symbol table outside of the alloc lock, and those might indeed be problems. Those issues are not from this patch though. The relevant code was last changed by you in 2011. >> + /* We must not clear the lazy gensym flag until our symbol has >> + been interned. The lock does not save us here, because another >> + thread could retrieve our gensym's name or hash outside of any >> + lock. */ >> + SCM_SET_CELL_WORD_0 (sym, (SCM_CELL_WORD_0 (sym) >> + & ~SCM_I_F_SYMBOL_LAZY_GENSYM)); >> + } >> + scm_i_pthread_mutex_unlock (&symbols_lock); >> +} > > Doing all this work within a mutex is prone to deadlock, if allocation > causes a finalizer to run that forces another lazy symbol. Ugh. Well, we already have a problem then, because 'intern_symbol' also does allocation while holding this lock, via 'symbol_lookup_assoc_fn', which calls 'scm_symbol_to_string', which must allocate the string object (symbols hold only stringbufs). Therefore, with Guile 2.0.5, if anyone calls 'scm_from_*_symbol' within a finalizer, there is already the possibility for deadlock. Have I mentioned lately how much I hate locks? :/ > This is not an issue on `master'. Excellent! > If we can get around this potential problem, then we should indeed apply > this to stable-2.0 as well. The good news is that it should be possible to fix this (pre-existing) class of problems for 'symbols_lock' in stable-2.0 by changing 'symbol_lookup_assoc_fn' to avoid allocation. It should also be possible to avoid allocation within the lock in 'scm_i_force_lazy_gensym'. I'll work on it. >> #define scm_is_symbol(x) (!SCM_IMP (x) \ >> && (SCM_TYP7 (x) == scm_tc7_symbol)) >> -#define scm_i_symbol_hash(x) ((unsigned long) SCM_CELL_WORD_2 (x)) >> #define scm_i_symbol_is_interned(x) \ >> (!(SCM_CELL_WORD_0 (x) & SCM_I_F_SYMBOL_UNINTERNED)) >> +#define scm_i_symbol_is_lazy_gensym(x) \ >> + (SCM_CELL_WORD_0 (x) & SCM_I_F_SYMBOL_LAZY_GENSYM) >> >> #define SCM_I_F_SYMBOL_UNINTERNED 0x100 >> +#define SCM_I_F_SYMBOL_LAZY_GENSYM 0x200 > > Can we make this change in stable-2.0? It is an ABI change of sorts. The only potential problem is if someone is already using 'scm_i_symbol_hash' in external code. They shouldn't be doing that because it has the 'scm_i_' prefix and is undocumented. However, even if they are using it, in the worst case they'll get an incorrect hash value for lazy gensyms. The hash value they will get for a lazy gensym will be '0', unless the gensym is currently being forced in another thread. > If you are convinced that we can, please surround the scm_i_* with > #ifdef BUILDING_LIBGUILE. Okay. > OK, I think that's it. I'm very much looking forward to this going in: > on master, meta/guile examples/web/hello.scm spends 13% of its > instructions in gensym and resulting GC foo. Sounds good. Thanks for the review! :) Mark