Hi Mark! Thanks for the response. I have various minor thoughts here and one serious note on GC.
On Wed 07 Mar 2012 17:43, Mark H Weaver <[email protected]> writes: >>> + if (!STRINGBUF_SHARED (buf)) >>> + { >>> + scm_i_pthread_mutex_lock (&stringbuf_write_mutex); >>> + SCM_SET_CELL_WORD_0 (buf, SCM_CELL_WORD_0 (buf) | >>> STRINGBUF_F_SHARED); >>> + scm_i_pthread_mutex_unlock (&stringbuf_write_mutex); >>> + } >> >> Does this work, with C's memory model? > > That's true. However, in that case, the shared flag is already being > set by another thread, so it doesn't matter, because the only > requirement of this function is to make sure the flag gets set. I think it will be fine. Thanks for walking through it with me. >>> + /* Attempt to intern the symbol */ >>> + handle = scm_hash_fn_create_handle_x (symbols, sym, >>> SCM_UNDEFINED, >>> + symbol_lookup_hash_fn, >>> + symbol_lookup_assoc_fn, >>> + NULL); >>> + } while (SCM_UNLIKELY (!scm_is_eq (sym, SCM_CAR (handle)))); >> >> Note that this is racy: this is a weak key hash table, so it's not safe >> to access the car of the handle outside the alloc lock. > > It's not an issue here, because the only thing I'm doing with the 'car' > is checking that it's 'eq?' to the lazy gensym that's being forced. If > it _is_ 'eq?' to our gensym, then there's no possibility that it will be > nulled out, because we hold a reference to it. If it's _not_ 'eq?' to > our gensym, then we don't care whether it's null or not; in either case > we have failed to intern this name and we try again with the next > counter value. > > However, note that 'intern_symbol', 'lookup_interned_symbol', and > 'lookup_interned_latin1_symbol' all access the 'car' of a handle of the > symbol table outside of the alloc lock, and those might indeed be > problems. Those issues are not from this patch though. The relevant > code was last changed by you in 2011. Yes, it was part of an attempt to correct this situation, and part of a learning process as well. I'm more satisfied with master's correctness in this regard. >>> + /* We must not clear the lazy gensym flag until our symbol has >>> + been interned. The lock does not save us here, because another >>> + thread could retrieve our gensym's name or hash outside of any >>> + lock. */ >>> + SCM_SET_CELL_WORD_0 (sym, (SCM_CELL_WORD_0 (sym) >>> + & ~SCM_I_F_SYMBOL_LAZY_GENSYM)); >>> + } >>> + scm_i_pthread_mutex_unlock (&symbols_lock); >>> +} >> >> Doing all this work within a mutex is prone to deadlock, if allocation >> causes a finalizer to run that forces another lazy symbol. > > Ugh. Well, we already have a problem then, because 'intern_symbol' also > does allocation while holding this lock, via 'symbol_lookup_assoc_fn', > which calls 'scm_symbol_to_string', which must allocate the string > object (symbols hold only stringbufs). Therefore, with Guile 2.0.5, if > anyone calls 'scm_from_*_symbol' within a finalizer, there is already > the possibility for deadlock. Yuck. > Have I mentioned lately how much I hate locks? :/ :) Locks aren't really the problem here though -- it's the finalizer-introduced concurrency, specifically in the case in which your program is in a critical section. If we ran finalizers in a separate thread, we would not have these issues. > The good news is that it should be possible to fix this (pre-existing) > class of problems for 'symbols_lock' in stable-2.0 by changing > 'symbol_lookup_assoc_fn' to avoid allocation. That's not enough. Adding spine segments, ribs, and associating a disappearing link all allocate memory, and those are internal to the hash table implementation. ^ The serious note :) Maybe you'll never hit it. I don't know. It depends very much on your allocation pattern. What's the likelihood that a finalizer accesses a symbol's characters? Who knows. Maybe it's good enough to document this defect in 2.0. "Don't try to string->symbol or symbol->string in a finalizer". Andy -- http://wingolog.org/
