On Thu, 12 Dec 2013 13:40:36 -0500, Terry Reedy wrote: > On 12/12/2013 7:08 AM, Steven D'Aprano wrote: > >> Please don't focus on the algorithm I gave. Focus on the fact that I >> could write it like this: >> >> if some condition to do with the computer's available memory: >> make modifications in place >> else: >> make a copy of the data containing the modifications >> >> if only I knew what that condition was and how to test for it. > > In other words, you want a magic formula that depends on just about > everything: the (static) memory in the machine, the other programs > running, the memory otherwise used by the current program, the number > items, the nature of the items, and possibly the memory fragmentation > state.
Yes. Surely there's a way to check whether allocating a list of N items will succeed without paging, without actually having to allocate a list of N items? Back in the 1980s, when I was a Mac user and occasional programmer, there were memory manager routines which (if I recall correctly) would tell you whether or not an allocation would succeed or not. Macs of that vintage didn't have virtual memory, it is true, but the principle remains sound: "is there a single contiguous block of N bytes available? if so, use a list comprehension, if not, switch to an in-place algorithm". Until Python, I had never used a programming language that didn't give you at least a rough-and-ready way to determine how much memory was available so you could make an informed guess as to whether an expensive operation would succeed before actually attempting it. I will be very disappointed (but hardly surprised) to learn that functionality which was standard in 1984 computers is not possible in 2013. But I don't *specifically* care about measuring memory size, only as a proxy for whether or not I should switch algorithms. That's why this post is called "Optimizing list processing" rather than "Finding out how much memory is available". If there's another way to solve the same problem, I'm happy to hear it. [...] >> I'm not terribly fussed about micro-optimizations here. I'm concerned >> about *macro-optimizing* the case where creating two (or more) lists >> forces the computer to page memory in and out of virtual memory. Saving >> a few milliseconds? I don't care. Saving 5 or 6 seconds? That I care >> about. > > Why would you spend an hour or more of your time to save 5 seconds? For a number of reasons: - Trading off increased memory use for faster code is a good optimization technique so long as it actually does result in faster code. I measured, and discovered that there comes a point where it results in serious performance degradation. Not just a little bit slower by 5% or 10%, but nearly 300% slower. We're told not to optimize until we've measured. Well I've measured. Swapping algorithms to in-place for large enough lists is a BIG time saver. - It's not just the five seconds that I save, but the 30 or 50 seconds of general slowdown of *everything* on the computer as it frantically pages blocks of memory. And not just on *this* computer, if I'm running the code on a server it will effect all the other machines relying on that server. That five seconds of time for a single process may affect all the users of that machine for a minute or two before the system settles down to normal performance again. Paging is *deadly* for performance -- moving memory around is typically O(N**2), and hard drive speeds are typically millions of times slower than RAM speeds. Here is a graphical view of the latency of various media: http://i.imgur.com/X1Hi1.gif [...] > The simplest answer is to always avoid catastrophe by always modifying > the one list. For 'short' lists, the time difference will be relatively > small. It seems perverse to avoid writing idiomatic, and usually faster, Python code just because occasionally it performs badly. What you describe is a last resort. -- Steven -- https://mail.python.org/mailman/listinfo/python-list