Marko Rauhamaa <ma...@pacujo.net> writes: > dieter <die...@handshake.de>: > ... >> I work in the domain of web applications. And I made there a nasty >> experience with garbage collection: occasionally, the web application >> stopped to respond for about a minute. A (quite difficult) analysis >> revealed that some (stupid) component created in some situations (a >> search) hundreds of thousands of temporary objects and thereby >> triggered a complete garbage collection. The garbage collector started >> its mark and sweep phase to detect unreachable objects - traversing a >> graph of millions of objects. >> >> As garbage collection becomes drastically more complex if the object >> graph can change during this phase (and this was Python), a global >> look prevented any other activity -- leading to the observed >> latencies. > > Yes. The occasional global freeze is unavoidable in any > garbage-collected runtime environment regardless of the programming > language. > > However, I challenge the notion that creating hundreds of thousands of > temporary objects is stupid. I suspect that the root cause of the > lengthy pauses is that the program maintains millions of *nongarbage* > objects in RAM (a cache, maybe?).
Definitely. The application concerned was a long running web application; caching was an important feature to speed up its typical use cases. I do not say that creating hundreds of thousands of temporary objects is always stupid. But in this case, those temporary objects were used to wrap early on the document ids found in an index entry just to get a comfortable interface to access the corresponding documents. While the index authors were aware that they treat mass data and therefore stored it in a compact way as C level objects with efficient "C" level implemented filtering operations on it, the search author has neglected this aspect and wrapped all document ids into Python objects. "search" is essentially a filtering operation; typically, you need to access far less documents (at most those in a prefiltered result set) than document ids (the input to the filtering); in this case, it is stupid to create temporary objects for all document ids in order to access much less documents later in a comfortable way. -- https://mail.python.org/mailman/listinfo/python-list