Re: thread safe functions

Ken Raeburn Sat, 28 Aug 2010 18:33:48 -0700

On Aug 28, 2010, at 15:20, Andy Wingo wrote:
>> Unfortunately this applies to some internals of the implementation too.
>> For example, "set-object-property!" and friends use a hash table and
>> assoc lists internally.
> 
> Fixed, thanks.
> 
>> scm_c_issue_deprecation_warning and scm_c_register_extension don't look
>> thread-safe to me (in the sense described above).
> 
> Fixed also.


Thanks!

Though... looking at deprecation.c, I'd suggest not printing stuff while the 
lock is held; just keep track of whether it needs to be printed, and do the 
printing after releasing it.  Partly because locks should be held only briefly, 
and printing can block, but also if the error port can call back into other 
Scheme code, which may call a deprecated function, then you could get an error 
trying to acquire the lock (which we won't notice, because we never check), or 
deadlock.

I was starting to look through the port code for this, but got distracted by 
scm_ptobs, a pointer which appears to be dynamically reallocated as needed when 
scm_make_port_type is called.  Which means every function that reads it should 
be locking the same mutex used when updating it; in this case, that appears to 
be the generic "critical section" stuff, and it looks like we're not doing that.

So, hmm, where else do we use hash tables internally?

foreign.c: pointer_weak_refs is a weakly keyed hash table, which we add things 
to without locking.

gc.c: scm_protects is protected; good.

instructions.c: Can scm_lookup_instruction_by_name be called the "first" time 
from two threads concurrently?  It uses a hash table stored in a static 
variable, and the variable is updated before the hash table is filled in.  
(Even if you simply assign to the variable afterwards, there's no requirement 
that the compiler or CPU not reorder those assignments.)  But, as an aside, do 
we need all that in C, or could instruction-list just export a big Scheme list 
that could be examined (or turned into a hash table) by Scheme versions of the 
other functions?

keywords.c: A critical section protects keyword_obarray; good.

modules.c: scm_pre_modules_obarray doesn't look protected.

ports.c: scm_i_port_weak_hash is protected in some places, others I haven't 
investigated further.  But in scm_c_port_for_each, it looks to me as though, if 
ports are being created while the function is setting up, you could end up 
raising an exception if scm_internal_hash_fold iterates over more ports than 
were in the hash table when the count was initially taken (and the vector size 
set).

procproc.c: There's a mutex to protect overrides, but it looks like 
set-procedure-property! doesn't use it correctly; it should look more like 
set-object-property! does.

properties.c: I'm not familiar with this code at all, but I'd guess 
properties_whash should be protected.  Though I wonder if the call-out from 
primitive-property-ref could invoke other primitive-property functions, causing 
a deadlock if it isn't done carefully.

smob.c: I don't think tramp_weak_map is adequately protected, but I'm not 
certain.

srcprop.c: scm_source_whash isn't protected; it also appears to be exposed by 
the name "source-whash" to Scheme code, which wouldn't be able to use a mutex 
defined and used purely in the C code.

(What's that, four different kinds of property-list mappings, all implemented 
separately?)

struct.c: Even calling struct-vtable-name can cause a hash table entry to be 
created, it appears.  So that's not thread-safe, never mind the call to 
actually change the name.

symbols.c: I don't think 'symbols' is handled safely.  But this code is all 
starting to run together in my mind. :-)

weaks.c: Is only making things for Scheme code, where we're making locking the 
user's problem.

It would be a bit easier if hash tables themselves were thread-safe.  A 
read/write lock might make more sense than a simple mutex though, if reading is 
more common than writing.  In my experience they sometimes perform worse than 
simple mutexes, so they're a win mainly for concurrent reading.

Of course, hash tables aren't the only places where we can have problems, and 
I'm not even touching on hash tables that might be created by Scheme code (but 
exposed to users with APIs that don't scream "unsafe hash table used here!"), 
but I've already spent my afternoon and evening on this.


>> That's just from a spot check; I'm not even trying to be thorough.
> 
> I await your more thorough check :)

Yeah, in my copious free time... :-(  Maybe some more next weekend, if I'm not 
doing the stuff I was supposed to get done last weekend.

>> And don't get me going on memory ordering models....
> 
> I'm interested!

There's a lot to read out there, and I'm no expert, but the main idea is, 
depending on the specific architecture, memory loads and stores can be almost 
arbitrarily reordered between CPUs unless you do something to prevent it.

So, if you do something like:
(preconditions: z is a pair visible to both threads; SCM_CAR(z) is a pair)
thread 1:
  allocate cons cell x
  SCM_SETCAR(x, v1)
  SCM_SETCDR(x, v2)
  SCM_SETCAR(z, x)
thread 2:
  t=SCM_CAR(z)
  read SCM_CAR(t)

Assuming that reads and writes of SCM values are atomic, thread 2 will read t 
as either the old value, or x.  If it's x, though, SCM_CAR(x) might *not* be 
read back as v1, but whatever garbage was in there when (or before) the cell 
was allocated.  And instead of the CAR slot could also be some bits in a 
non-immediate value that indicate a subtype, causing the reading thread to call 
the wrong routines to manipulate it.

Basically, any time you make an object handle visible to other threads, you 
need to make sure the object's contents have been stored first, and the other 
processor hasn't loaded that location yet -- or, if it might be cached, it 
knows to flush that entry.  (Multiple threads can run on one processor, or 
multiple cores sharing a cache, in which case it's probably not a problem.  A 
thread can also be moved between cores, but the OS is responsible for keeping 
things in sync in that case.)  Changing an object already visible to other 
threads probably requires at least as much caution, but given that malloc/free 
are likely to reuse storage, it's not even reasonable to assume that a location 
in a freshly allocated cell isn't cached by some other processor from back when 
it was part of some other object, so, I think actually the two problems are 
largely equivalent; a visible live object might merely be more likely to be in 
another cache.

According to the Wikipedia article on memory ordering, the Alpha has (some of 
them are still out there) a really interesting property: reordering dependent 
loads, i.e., loading data before loading the pointer to the data.  It isn't 
described in detail, but I assume it's related to speculative loading or 
something.

Mutexes encapsulate "something to prevent it" pretty well; hacks like 
"volatile" don't, though they can get the compiler somewhat out of the way.

Of course, adding a mutex to every non-immediate value is far too expensive in 
memory use, even if the almost complete lack of contention means the locking 
can probably be done with few context switches into the kernel.  Using one 
global mutex can work, but it basically means only one thread can be examining 
or manipulating Guile objects at a time.  For I/O or DNS lookups, that's 
probably fine, but we do want to be able to manipulate data structures 
concurrently.

There are lower-level facilities generally available, like memory-barrier 
instructions and atomic operations (usually compare-and-swap, or 
load-link/store-conditional), which could probably be used more efficiently; 
there are even lock-free algorithms for doing some simple data structures 
safely using these primitives.  But the details vary, and doing more 
synchronization than you're actually required to can hurt performance, so you 
really need to tune it well.  This is the sort of thing that kernel and 
C-library developers sometimes spend a lot of effort trying to get right, so 
that the kernel and libc facilities (e.g., mutexes) will be both correct and 
efficient.


To make it worse, the compiler can reorder memory access instructions if it can 
show there aren't aliasing problems (i.e., that the locations being accessed 
cannot be the same assuming the source code is following the rules of the 
language -- and that last clause may let a compiler reorder accesses via "int" 
and "double" types that happen to map to the same location!), and it can even 
write (or not write) intermediate values to an output location.

So, for example, in:

  register SCM x = ...;   // so it can't be aliased by memory accesses
  SCM_SETCDR(x, v1);
  SCM_SETCAR(x, v2);

the compiler could change the order of the writes, since it knows the locations 
are different (because there's a fixed nonzero offset between them), and under 
the specs of the C language there's no way to stop in between the two 
statements and examine the values stored, so the two versions are the same 
under the "as if" rule that says only the overall side effects matter.  The 
original order will often be retained to help with debugging, because no 
optimization benefit was found in reordering, etc., but you can't make your 
program's semantics depend on that.

Or:
  register int a, b, c;
  n = foo() + a*b*c;   // store to mem
could -- at least theoretically -- be implemented as:
  n = foo();
  tmp = a*b;
  tmp *= c;
  n += tmp;
though offhand I'm having a hard time thinking of a realistic case in which a 
compiler would be likely to do that.

This is where "volatile" would help a bit.  But simply making all non-immediate 
objects volatile would probably be overkill.


Now, how *much* is this a problem for user-mode programs in practice?  I don't 
know.  I've read a few papers, but haven't actually worked on code at this 
level or researched the architecture details of many processors and memory 
subsystems.  (The multiprocessing stuff I've done all used mutexes, if not 
always very efficiently.)  A couple of web pages I've run across have suggested 
that some published processor errata have pointed out memory ordering problems, 
so the actual requirements for a given processor family may be stricter than 
advertised by the architecture manuals.  And sometimes updates to the 
architecture manuals can change these specs.  I think we'd need someone with 
more low-level experience to help answer that.  I suspect many programs can run 
just fine for a long time without race conditions in them actually causing 
visible problems; then, the next day, maybe they'll crash.

I recommend the "Memory ordering" page at wikipedia as a good starting point 
for reading up on it, and the first couple of references at least.  Also 
Boehm's paper "Threads Cannot be Implemented as a Library" is an interesting 
read about the compiler interactions, though if we're trying to bypass having 
to use locks altogether, the problems we face are probably larger than the ones 
he describes.

> I think we agree, but I prefer to paint this in a more optimistic
> light -- that things are mostly there, but bugs are possible. Bug fixes
> are also possible :)

You might guess from the above that I don't think they're mostly there.  That 
bugs are possible, yes, that I'll grant you. ;-)

A real review should check:
 * any non-const static-storage variables defined in the library, and how 
they're used;
 * any functions modifying objects, to determine the effect of concurrent calls;
 * any uses of those functions or objects from our C or Scheme code, to see if 
we're presenting the correct semantics in the face of concurrent calls;
 * that we actually specify the semantics, or when results are undefined, 
clearly;
 * anything dealing with OS interfaces (or other library APIs) that aren't 
thread-safe;
And if those aren't going to take long enough already:
 * getting *real* answers to the memory ordering issues;
 * possibly reviewing every function manipulating memory for ordering issues;
 * probably more.

Then, I might have some confidence that "things are mostly there."  I don't 
think it's going to happen at all for 2.0, and I don't expect to have time 
myself to do anything more than little bits of work on the first part of the 
list, if even that much, in the near future.

My work today didn't do any of this; I just started with looking over the 
patches you checked in, and then ran "grep hash_ta *.[ch]" in libguile, to see 
where else we've got hash tables, since we've already seen that they're a 
problem in at least one other place.  But even with just that, I found quite a 
few things that I think need to be looked at.

Ken

Re: thread safe functions

Reply via email to