Bugs item #1565525, was opened at 2006-09-26 02:58 Message generated for change (Comment added) made by akuchling You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1565525&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Interpreter Core Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Greg Hazel (ghazel) Assigned to: Nobody/Anonymous (nobody) Summary: gc allowing tracebacks to eat up memory Initial Comment: Attached is a file which demonstrates an oddity about traceback objects and the gc. The observed behaviour is that when the tuple from sys.exc_info() is stored on an object which is inside the local scope, the object, thus exc_info tuple, are not collected even when both leave scope. If you run the test with "s.e = sys.exc_info()" commented out, the observed memory footprint of the process quickly approaches and sits at 5,677,056 bytes. Totally reasonable. If you uncomment that line, the memory footprint climbs to 283,316,224 bytes quite rapidly. That's a two order of magnitude difference! If you uncomment the "gc.collect()" line, the process still hits 148,910,080 bytes. This was observed in production code, where exc_info tuples are saved for re-raising later to get the stack- appending behaviour tracebacks and 'raise' perform. The example includes a large array to simulate application state. I assume this is bad behaviour occurring because the traceback object holds frames, and those frames hold a reference to the local objects, thus the exc_info tuple itself, thus causing a circular reference which involves the entire stack. Either the gc needs to be improved to prevent this from growing so wildly, or the traceback objects need to (optionally?) hold frames which do not have references or have weak references instead. ---------------------------------------------------------------------- >Comment By: A.M. Kuchling (akuchling) Date: 2006-10-26 16:48 Message: Logged In: YES user_id=11375 A quick grep of the stdlib turns up various uses of sys.exc_info that do put it in a local variable., e.g. doctest._exception_traceback, unittest._exc_info_to_string, SimpleXMLRPCServer._marshaled_dispatch. Do these all need to be fixed? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2006-09-27 23:48 Message: Logged In: YES user_id=31435 [Martin] > tim_one: Why do you think your proposed modification of > introducing get_traceback would help? The frame foo still > refers to s (which is an O), and s.e will still refer > to the traceback that includes foo. Sorry about that! It was an illusion, of course. I wanted to suggest a quick fix, and "tested it" too hastily in a program that didn't actually bloat with /or/ without it. For the OP, I had need last year of capturing a traceback and (possibly) displaying it later in ZODB. It never would have occurred to me to try saving away exc_info(), though. Instead I used the `traceback` module to capture the traceback output (a string), which was (possibly) displayed later, with annotations, by a different thread. No cycles, no problems. BTW, I must repeat that there is no simple-minded way to 'repair' this. That isn't based on general principle, but on knowledge of how Python is implemented. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2006-09-27 23:03 Message: Logged In: YES user_id=21627 I disagree that the circular reference is non-obvious. I'm not sure what your application is, but I would expect that callers of sys.exc_info should be fully aware what a traceback is, and how it refers to the current frames. I do agree that it is unavoidable; I fail to see that it is a bug because of that (something unavoidable cannot be a bug). If you are saying that it is unavoidable in your application: I have difficulties believing this. For example, you could do s.e = sys.exc_info()[:2] This would drop the traceback, and thus not create a cyclic reference. Since, in the program you present, the traceback is never used, this looks like a "legal" simplification (of course, you don't use s.e at all in this program, so I can only guess that you don't need the traceback in your real application). As for the time of cleanup not being controllable: you can certainly control frequency of gc with gc.set_threshold; no need to invoke gc explicitly. tim_one: Why do you think your proposed modification of introducing get_traceback would help? The frame foo still refers to s (which is an O), and s.e will still refer to the traceback that includes foo. ---------------------------------------------------------------------- Comment By: Greg Hazel (ghazel) Date: 2006-09-27 17:07 Message: Logged In: YES user_id=731668 The bug is the circular reference which is non-obvious and unavoidable, and cleaned up at some uncontrolable (unless you run a full collection) time in the future. There are many better situations or solutions to this bug, depending on which you think it is. I think those should be investigated. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2006-09-27 03:49 Message: Logged In: YES user_id=21627 I'm still having problems figuring out what the bug is that you are reporting. Ok, in this case, it consumes a lot of memory. Why is that a bug? ---------------------------------------------------------------------- Comment By: Greg Hazel (ghazel) Date: 2006-09-26 23:20 Message: Logged In: YES user_id=731668 I have read the exc_info suggestions before, but they have never made any difference. Neither change you suggest modifies the memory footprint behaviour in any way. Weakrefs might be slow, I offered them as an alternative to just removing the references entirely. I understand this might cause problems with existing code, but the current situation causes a problem which is more difficult to work around. Code that needs locals and globals can explicity store a reference to eat - it is impossible to dig in to the traceback object and remove those references. The use-case of storing the exc_info is fairly simple, for example: Two threads. One queues a task for the other to complete. That task fails an raises an exception. The exc_info is caught, passed back to the first thread, the exc_info is raised from there. The goal is to get the whole execution stack, which it does quite nicely, except that it has this terrible memory side effect. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2006-09-26 06:04 Message: Logged In: YES user_id=31435 Your memory bloat is mostly due to the d = range(100000) line. Python has no problem collecting the cyclic trash, but you're creating 100000 * 100 = 10 million integer objects hanging off trash cycles before invoking gc.collect(), and those integers require at least 10 million * 12 ~= 120MB all by themselves. Worse, memory allocated to "short" integers is both immortal and unbounded: it can be reused for /other/ integer objects, but it never goes away. Note that memory usage in your program remains low and steady if you force gc.collect() after every call to bar(). Then you only create 100K integers, instead of 10M, before the trash gets cleaned up. There is no simple-minded way to "repair" this, BTW. For example, /of course/ a frame has to reference all its locals, and moving to weak references for those instead would be insanely inefficient (among other, and deeper, problems). Note that the library reference manual warns against storing the result of exc_info() in a local variable (which you're /effectively/ doing, since the formal parameter `s` is a local variable within foo()), and suggests other approaches. Sorry, but I really couldn't tell from your description why you want to store this stuff in an instance attribute, so can't guess whether another more-or-less obvious approach would help. For example, no cyclic trash is created if you add this method to your class O: def get_traceback(self): self.e = sys.exc_info() and inside foo() invoke: s.get_traceback() instead of doing: s.e = sys.exc_info() Is that unreasonable? Perhaps simpler is to define a function like: def get_exc_info(): return sys.exc_info() and inside foo() do: s.e = get_exc_info() No cyclic trash gets created that way either. These are the kinds of things the manual has suggested doing for the last 10 years ;-) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1565525&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com