Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> writes: >> A minimal naive implementation indeed doubles the memory requirements, >> but from a Python perspective where every integer takes something like >> 24 bytes already, even that doesn't seem so terrible. > > Doubling 24 is less terrible than doubling 4 or 8?? You’re kidding, right?
No, it would be doubling 4 or 8 bytes. The extra overhead like the reference count would not be there to bloat up the integer like in Python. >> More sophisticated implementations use multiple small heaps or other tricks. > More and more complications to patch up the idea. At some point, you have to > admit there is something fundamentally flawed about the whole concept. Oh sheesh, that's silly. Every area of programming where performance matters is full of optimization tricks. Look at any serious database implementation for example. Or any compiler. Look at Python's implementation of dictionaries. Yeah, the optimizations add complexity to improve performance, sometimes even in heuristic ways that can fail. That doesn't mean the concepts are fundamentally flawed. GC is no different. >> The new heap is filled sequentially so accesses to it will have good >> locality. > what does matter is the frequency distribution of references, Sorry, just I meant during the gc operation itself. The gc's access pattern in the new heap is completely sequential as the gc just copies stuff to it linearly from from the old heap, bumping a pointer upwards. The access pattern when the user application is running is of course not predictable. > Your generational garbage collector completely breaks this assumption, by > regularly forcing an access to every single object in the heap. Cache- > thrashing, anyone? In the minor collections, the whole minor heap fits in cache, so there's no thrashing. The major collections can use a different strategy, or you can simply rely on their relative infrequency. Why do you speculate like this? If you run a GHC program with profiling active, it tells you exactly how much time is spent in minor gc and how much time is in major gc, and it's all generally pretty tolerable unless your program has bugs. (Unfortunately Haskell programs are notoriously susceptable to a certain type of bug that causes them to still give the right answers, but use much more memory than they should. The usual sign of that happening is high gc load). >> It's also the case that programs with very large memory consumption tend >> to use most of the memory for large arrays that don't contain pointers > But your generational garbage collector still has to copy those very large > objects to the new heap, with all the cache-hostile consequences therefrom. Not necessarily, depends on how you write the program and how the gc works. > By the way, isn’t this the opposite of the array-of-pointers example you > were using earlier to try to cast reference-counting in a bad light? I wasn't trying to cast reference counting in a bad light, I was pointing out that reference counting can experience pauses just like traditional gc approaches. Most programs including "soft" real time programs can tolerate an occasional pause. If your program is not one of those, and you need guaranteed upper bounds on pauses so you can't use traditional gc, switching from gc to reference counting won't save you. > It seems to me a reference count would work very well for such a > large, simple object. Mark/sweep would do it too. Some gc's use a hybrid approach, with mark/sweep for older or larger objects. > But not garbage collection. This is because of the asymmetric way in > which hardware has become faster:... the effectiveness of that > caching depends crucially on certain assumptions about the runtime > behaviour of the programs: and garbage collection breaks those > assumptions. ... > You may want to enlighten yourself by meditating on this seeming paradox of > modern computing hardware: memory is cheap, but accessing memory is > expensive. I'm interested in receiving enlightment if you've got some pointers into the research literature that back up your views. Right now it sounds like you're going by some intuitions you have that aren't backed up by evidence. Anyone who has performance-tuned a program knows that intuition can give reasonable starting points for experiments and measurements, but once the results start coming in, a lot of the intuitions end up invalidated. By now, there is enough experimental and theoretical literature about gc that opinions like yours, that don't seem to be grounded in any knowledge of that literature, are not very persuasive no matter how superficially attractive the raw intuition might be. Especially in your case, where you seem to have decided ahead of time what conclusion you want to reach and are looking for ways to justify it. -- http://mail.python.org/mailman/listinfo/python-list