Ryan Stuart <ryan.stuart...@gmail.com> writes: > My point is malloc, something further up (down?) the stack, is making > modifications to shared state when threads are involved. Modifying > shared state makes it infinitely more difficult to reason about the > correctness of your software.
If you're saying the libc malloc might have bugs that affect multithreaded apps but not single threaded ones, then sure, but the Linux kernel might also have such bugs and it's inherently multithreaded, so there's no escape. Even if your app is threaded you're still susceptible to threading bugs in the kernel. If malloc works properly then it's thread-safe and you can use it without worrying about how your app's state interacts with malloc's internal state. > We clearly got completely different things from the article. My > interpretation was that it was making *the exact opposite* point to > what you stated mainly because non-threading approaches don't share > state. It gave the example of asyncio, which is non-threaded but (according to the article) was susceptible to shared state bugs because you could accidentally insert yield points in critical sections, by doing things like logging. > It states that quite clearly. For example "it is – literally – > exponentially more difficult to reason about a routine that may be > executed from an arbitrary number of threads concurrently". I didn't understand what it was getting at with that n**n claim. Of course arbitrary code (even single threaded) is incalculably difficult to reason about (halting problem, Rice's theorem). But we're talking about code following a particular set of conventions, not arbitrary code. The conventions are supposed to facilitate reasoning and verification. Again there's tons of solid theory in the OS literature about this stuff. > by default Haskell looks to use lightweight threads where only 1 > thread can be executing at a time [1]... That doesn't seem to be > shared state multithreading, which is what the article is referring to. Haskell uses lightweight, shared state threads with synchronization primitives that do the usual things (the API is somewhat different than Posix threads though). You have to use the +RTS command line option to run on multiple cores: I don't know why the default is to stay on a single core. There might be a performance hit if you use the multicore runtime with a single-threaded program, or something like that. There is a book about Haskell concurrency and parallelism that I've been wanting to read (full text online): http://chimera.labs.oreilly.com/books/1230000000929/index.html > 2) it has a weird story about the brass cockroach, that basically > signified that they didn't have a robust enough testing system to > be able to reproduce the bug. > > The point was that it wasn't feasible to have a robust testing suite > because, you guessed it, No really, they observed this bug happening repeatedly under what sounded like fairly light load with real users. So a stress testing framework should have been able to reproduce it. Do you really think it's impossible to debug this kind of problem? OS developers do it all the time. There is no getting around it. > This is probably correct. Is there any STM implementations out that > that don't significantly compromise performance? STM is fast as long as there's not much contention for shared data between threads. In the "account balance" example that should almost always be the case. The slowdown is when multiple threads are fighting over the same data and transactions keep having to be rolled back and restarted. > multiprocessing module looks pretty nice and I should try it > It's 1 real advantage is that it side-steps the GIL. So, if you need > to utilise multiple cores for CPU bound tasks, then it might well be > the only option. It's 1 real advantage compared to what? I thought you were saying it avoids shared data hazards of threads. The 4 alternatives in that article were threads, multiprocessing, old-fashioned async (callback hell), and asyncio (still contorted and relies on Python 3 coroutines). If you eliminate threads because of data sharing and asyncio because you need Python 2 compatibility, you're left with multiprocessing if you want to avoid the control inversion of callback style. It's true though, this started out about the GIL in PyPy (was Laura going to post about that?) so using multicores is indeed maybe relevant. -- https://mail.python.org/mailman/listinfo/python-list