I have a large (10gb) data file for which I want to parse each line into an object and then append this object to a list for sorting and further processing. I have noticed however that as the length of the list increases the rate at which objects are added to it decreases dramatically. My first thought was that I was nearing the memory capacity of the machine and the decrease in performance was due to the os swapping things in and out of memory. When I looked at the memory usage this was not the case. My process was the only job running and was consuming 40gb of the the total 130gb and no swapping processes were running. To make sure there was not some problem with the rest of my code, or the servers file system, I ran my program again as it was but without the line that was appending items to the list and it completed without problem indicating that the decrease in performance is the result of some part of the process of appending to the list. Since other people have observed this problem as well (http://tek-tips.com/viewthread.cfm?qid=1096178&page=13, http://stackoverflow.com/questions/2473783/is-there-a-way-to-circumvent-python-list-append-becoming-progressively-slower-i) I did not bother to further analyze or benchmark it. Since the answers in the above forums do not seem very definitive I thought I would inquire here about what the reason for this decrease in performance is, and if there is a way, or another data structure, that would avoid this problem.

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to