On Oct 24, 3:02 pm, Glenn Linderman <[EMAIL PROTECTED]> wrote: > On approximately 10/23/2008 2:24 PM, came the following characters from the > keyboard of Rhamphoryncus: >> >> On Oct 23, 11:30 am, Glenn Linderman <[EMAIL PROTECTED]> wrote: >> >>> >>> On approximately 10/23/2008 12:24 AM, came the following characters from >>> the keyboard of Christian Heimes >>>> >>>> Andy wrote: >>>> I'm very - not absolute, but very - sure that Guido and the initial >>>> designers of Python would have added the GIL anyway. The GIL makes >>>> Python faster on single core machines and more stable on multi core >>>> machines. > > Actually, the GIL doesn't make Python faster; it is a design decision that > reduces the overhead of lock acquisition, while still allowing use of global > variables. > > Using finer-grained locks has higher run-time cost; eliminating the use of > global variables has a higher programmer-time cost, but would actually run > faster and more concurrently than using a GIL. Especially on a > multi-core/multi-CPU machine.
Those "globals" include classes, modules, and functions. You can't have *any* objects shared. Your interpreters are entirely isolated, much like processes (and we all start wondering why you don't use processes in the first place.) Or use safethread. It imposes safe semantics on shared objects, so you can keep your global classes, modules, and functions. Still need garbage collection though, and on CPython that means refcounting and the GIL. >> Another peeve I have is his characterization of the observer pattern. >> The generalized form of the problem exists in both single-threaded >> sequential programs, in the form of unexpected reentrancy, and message >> passing, with infinite CPU usage or infinite number of pending >> messages. >> > > So how do you get reentrancy is a single-threaded sequential program? I > think only via recursion? Which isn't a serious issue for the observer > pattern. If you add interrupts, then your program is no longer sequential. Sorry, I meant recursion. Why isn't it a serious issue for single-threaded programs? Just the fact that it's much easier to handle when it does happen? >> Try looking at it on another level: when your CPU wants to read from a >> bit of memory controlled by another CPU it sends them a message >> requesting they get it for us. They send back a message containing >> that memory. They also note we have it, in case they want to modify >> it later. We also note where we got it, in case we want to modify it >> (and not wait for them to do modifications for us). >> > > I understand that level... one of my degrees is in EE, and I started college > wanting to design computers (at about the time the first microprocessor chip > came along, and they, of course, have now taken over). But I was side-lined > by the malleability of software, and have mostly practiced software during > my career. > > Anyway, that is the level that Herb Sutter was describing in the Dr Dobbs > articles I mentioned. And the overhead of doing that at the level of a cache > line is high, if there is lots of contention for particular memory locations > between threads running on different cores/CPUs. So to achieve concurrency, > you must not only limit explicit software locks, but must also avoid memory > layouts where data needed by different cores/CPUs are in the same cache > line. I suspect they'll end up redesigning the caching to use a size and alignment of 64 bits (or smaller). Same cache line size, but with masking. You still need to minimize contention of course, but that should at least be more predictable. Having two unrelated mallocs contend could suck. >> Message passing vs shared memory isn't really a yes/no question. It's >> about ratios, usage patterns, and tradeoffs. *All* programs will >> share data, but in what way? If it's just the code itself you can >> move the cache validation into software and simplify the CPU, making >> it faster. If the shared data is a lot more than that, and you use it >> to coordinate accesses, then it'll be faster to have it in hardware. >> > > I agree there are tradeoffs... unfortunately, the hardware architectures > vary, and the languages don't generally understand the hardware. So then it > becomes an OS API, which adds the overhead of an OS API call to the cost of > the synchronization... It could instead be (and in clever applications is) a > non-portable assembly level function that wraps on OS locking or waiting > API. In practice I highly doubt we'll see anything that doesn't extend traditional threading (posix threads, whatever MS has, etc). > Nonetheless, while putting the shared data accesses in hardware might be > more efficient per unit operation, there are still tradeoffs: A software > solution can group multiple accesses under a single lock acquisition; the > hardware probably doesn't have enough smarts to do that. So it may well > require many more hardware unit operations for the same overall concurrently > executed function, and the resulting performance may not be any better. Speculative ll/sc? ;) > Sidestepping the whole issue, by minimizing shared data in the application > design, avoiding not only software lock calls, and hardware cache > contention, is going to provide the best performance... it isn't the things > you do efficiently that make software fast — it is the things you don't do > at all. Minimizing contention, certainly. Minimizing the shared data itself is iffier though. -- http://mail.python.org/mailman/listinfo/python-list