Poking around I discovered somewhere someone saying that Python gc adds a 4-7% speed penalty.
So since I was pretty sure I was not creating reference cycles in nucular I tried running the tests with garbage collection disabled. To my delight I found that index builds run 30-40% faster without gc. This is really nice because testing gc.collect() afterward shows that gc was not actually doing anything. I haven't analyzed memory consumption but I suspect that should be significantly improved also, since the index builds construct some fairly large data structures with lots of references for a garbage collector to keep track of. Somewhere someone should mention the possibility that disabling gc can greatly improve performance with no down side if you don't create reference cycles. I couldn't find anything like this on the Python site or elsewhere. As Paul (I think) said, this should be a FAQ. Further, maybe Python should include some sort of "backoff" heuristic which might go like this: If gc didn't find anything and memory size is stable, wait longer for the next gc cycle. It's silly to have gc kicking in thousands of times in a multi-hour run, finding nothing every time. Just my 2c. -- Aaron Watters nucular full text fielded indexing: http://nucular.sourceforge.net === http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=dingus%20fish -- http://mail.python.org/mailman/listinfo/python-list