On Aug 28, 2010, at 15:20, Andy Wingo wrote: >> Unfortunately this applies to some internals of the implementation too. >> For example, "set-object-property!" and friends use a hash table and >> assoc lists internally. > > Fixed, thanks. > >> scm_c_issue_deprecation_warning and scm_c_register_extension don't look >> thread-safe to me (in the sense described above). > > Fixed also.
Thanks! Though... looking at deprecation.c, I'd suggest not printing stuff while the lock is held; just keep track of whether it needs to be printed, and do the printing after releasing it. Partly because locks should be held only briefly, and printing can block, but also if the error port can call back into other Scheme code, which may call a deprecated function, then you could get an error trying to acquire the lock (which we won't notice, because we never check), or deadlock. I was starting to look through the port code for this, but got distracted by scm_ptobs, a pointer which appears to be dynamically reallocated as needed when scm_make_port_type is called. Which means every function that reads it should be locking the same mutex used when updating it; in this case, that appears to be the generic "critical section" stuff, and it looks like we're not doing that. So, hmm, where else do we use hash tables internally? foreign.c: pointer_weak_refs is a weakly keyed hash table, which we add things to without locking. gc.c: scm_protects is protected; good. instructions.c: Can scm_lookup_instruction_by_name be called the "first" time from two threads concurrently? It uses a hash table stored in a static variable, and the variable is updated before the hash table is filled in. (Even if you simply assign to the variable afterwards, there's no requirement that the compiler or CPU not reorder those assignments.) But, as an aside, do we need all that in C, or could instruction-list just export a big Scheme list that could be examined (or turned into a hash table) by Scheme versions of the other functions? keywords.c: A critical section protects keyword_obarray; good. modules.c: scm_pre_modules_obarray doesn't look protected. ports.c: scm_i_port_weak_hash is protected in some places, others I haven't investigated further. But in scm_c_port_for_each, it looks to me as though, if ports are being created while the function is setting up, you could end up raising an exception if scm_internal_hash_fold iterates over more ports than were in the hash table when the count was initially taken (and the vector size set). procproc.c: There's a mutex to protect overrides, but it looks like set-procedure-property! doesn't use it correctly; it should look more like set-object-property! does. properties.c: I'm not familiar with this code at all, but I'd guess properties_whash should be protected. Though I wonder if the call-out from primitive-property-ref could invoke other primitive-property functions, causing a deadlock if it isn't done carefully. smob.c: I don't think tramp_weak_map is adequately protected, but I'm not certain. srcprop.c: scm_source_whash isn't protected; it also appears to be exposed by the name "source-whash" to Scheme code, which wouldn't be able to use a mutex defined and used purely in the C code. (What's that, four different kinds of property-list mappings, all implemented separately?) struct.c: Even calling struct-vtable-name can cause a hash table entry to be created, it appears. So that's not thread-safe, never mind the call to actually change the name. symbols.c: I don't think 'symbols' is handled safely. But this code is all starting to run together in my mind. :-) weaks.c: Is only making things for Scheme code, where we're making locking the user's problem. It would be a bit easier if hash tables themselves were thread-safe. A read/write lock might make more sense than a simple mutex though, if reading is more common than writing. In my experience they sometimes perform worse than simple mutexes, so they're a win mainly for concurrent reading. Of course, hash tables aren't the only places where we can have problems, and I'm not even touching on hash tables that might be created by Scheme code (but exposed to users with APIs that don't scream "unsafe hash table used here!"), but I've already spent my afternoon and evening on this. >> That's just from a spot check; I'm not even trying to be thorough. > > I await your more thorough check :) Yeah, in my copious free time... :-( Maybe some more next weekend, if I'm not doing the stuff I was supposed to get done last weekend. >> And don't get me going on memory ordering models.... > > I'm interested! There's a lot to read out there, and I'm no expert, but the main idea is, depending on the specific architecture, memory loads and stores can be almost arbitrarily reordered between CPUs unless you do something to prevent it. So, if you do something like: (preconditions: z is a pair visible to both threads; SCM_CAR(z) is a pair) thread 1: allocate cons cell x SCM_SETCAR(x, v1) SCM_SETCDR(x, v2) SCM_SETCAR(z, x) thread 2: t=SCM_CAR(z) read SCM_CAR(t) Assuming that reads and writes of SCM values are atomic, thread 2 will read t as either the old value, or x. If it's x, though, SCM_CAR(x) might *not* be read back as v1, but whatever garbage was in there when (or before) the cell was allocated. And instead of the CAR slot could also be some bits in a non-immediate value that indicate a subtype, causing the reading thread to call the wrong routines to manipulate it. Basically, any time you make an object handle visible to other threads, you need to make sure the object's contents have been stored first, and the other processor hasn't loaded that location yet -- or, if it might be cached, it knows to flush that entry. (Multiple threads can run on one processor, or multiple cores sharing a cache, in which case it's probably not a problem. A thread can also be moved between cores, but the OS is responsible for keeping things in sync in that case.) Changing an object already visible to other threads probably requires at least as much caution, but given that malloc/free are likely to reuse storage, it's not even reasonable to assume that a location in a freshly allocated cell isn't cached by some other processor from back when it was part of some other object, so, I think actually the two problems are largely equivalent; a visible live object might merely be more likely to be in another cache. According to the Wikipedia article on memory ordering, the Alpha has (some of them are still out there) a really interesting property: reordering dependent loads, i.e., loading data before loading the pointer to the data. It isn't described in detail, but I assume it's related to speculative loading or something. Mutexes encapsulate "something to prevent it" pretty well; hacks like "volatile" don't, though they can get the compiler somewhat out of the way. Of course, adding a mutex to every non-immediate value is far too expensive in memory use, even if the almost complete lack of contention means the locking can probably be done with few context switches into the kernel. Using one global mutex can work, but it basically means only one thread can be examining or manipulating Guile objects at a time. For I/O or DNS lookups, that's probably fine, but we do want to be able to manipulate data structures concurrently. There are lower-level facilities generally available, like memory-barrier instructions and atomic operations (usually compare-and-swap, or load-link/store-conditional), which could probably be used more efficiently; there are even lock-free algorithms for doing some simple data structures safely using these primitives. But the details vary, and doing more synchronization than you're actually required to can hurt performance, so you really need to tune it well. This is the sort of thing that kernel and C-library developers sometimes spend a lot of effort trying to get right, so that the kernel and libc facilities (e.g., mutexes) will be both correct and efficient. To make it worse, the compiler can reorder memory access instructions if it can show there aren't aliasing problems (i.e., that the locations being accessed cannot be the same assuming the source code is following the rules of the language -- and that last clause may let a compiler reorder accesses via "int" and "double" types that happen to map to the same location!), and it can even write (or not write) intermediate values to an output location. So, for example, in: register SCM x = ...; // so it can't be aliased by memory accesses SCM_SETCDR(x, v1); SCM_SETCAR(x, v2); the compiler could change the order of the writes, since it knows the locations are different (because there's a fixed nonzero offset between them), and under the specs of the C language there's no way to stop in between the two statements and examine the values stored, so the two versions are the same under the "as if" rule that says only the overall side effects matter. The original order will often be retained to help with debugging, because no optimization benefit was found in reordering, etc., but you can't make your program's semantics depend on that. Or: register int a, b, c; n = foo() + a*b*c; // store to mem could -- at least theoretically -- be implemented as: n = foo(); tmp = a*b; tmp *= c; n += tmp; though offhand I'm having a hard time thinking of a realistic case in which a compiler would be likely to do that. This is where "volatile" would help a bit. But simply making all non-immediate objects volatile would probably be overkill. Now, how *much* is this a problem for user-mode programs in practice? I don't know. I've read a few papers, but haven't actually worked on code at this level or researched the architecture details of many processors and memory subsystems. (The multiprocessing stuff I've done all used mutexes, if not always very efficiently.) A couple of web pages I've run across have suggested that some published processor errata have pointed out memory ordering problems, so the actual requirements for a given processor family may be stricter than advertised by the architecture manuals. And sometimes updates to the architecture manuals can change these specs. I think we'd need someone with more low-level experience to help answer that. I suspect many programs can run just fine for a long time without race conditions in them actually causing visible problems; then, the next day, maybe they'll crash. I recommend the "Memory ordering" page at wikipedia as a good starting point for reading up on it, and the first couple of references at least. Also Boehm's paper "Threads Cannot be Implemented as a Library" is an interesting read about the compiler interactions, though if we're trying to bypass having to use locks altogether, the problems we face are probably larger than the ones he describes. > I think we agree, but I prefer to paint this in a more optimistic > light -- that things are mostly there, but bugs are possible. Bug fixes > are also possible :) You might guess from the above that I don't think they're mostly there. That bugs are possible, yes, that I'll grant you. ;-) A real review should check: * any non-const static-storage variables defined in the library, and how they're used; * any functions modifying objects, to determine the effect of concurrent calls; * any uses of those functions or objects from our C or Scheme code, to see if we're presenting the correct semantics in the face of concurrent calls; * that we actually specify the semantics, or when results are undefined, clearly; * anything dealing with OS interfaces (or other library APIs) that aren't thread-safe; And if those aren't going to take long enough already: * getting *real* answers to the memory ordering issues; * possibly reviewing every function manipulating memory for ordering issues; * probably more. Then, I might have some confidence that "things are mostly there." I don't think it's going to happen at all for 2.0, and I don't expect to have time myself to do anything more than little bits of work on the first part of the list, if even that much, in the near future. My work today didn't do any of this; I just started with looking over the patches you checked in, and then ran "grep hash_ta *.[ch]" in libguile, to see where else we've got hash tables, since we've already seen that they're a problem in at least one other place. But even with just that, I found quite a few things that I think need to be looked at. Ken