Creating Long Lists
I have a large (10gb) data file for which I want to parse each line into an object and then append this object to a list for sorting and further processing. I have noticed however that as the length of the list increases the rate at which objects are added to it decreases dramatically. My first thought was that I was nearing the memory capacity of the machine and the decrease in performance was due to the os swapping things in and out of memory. When I looked at the memory usage this was not the case. My process was the only job running and was consuming 40gb of the the total 130gb and no swapping processes were running. To make sure there was not some problem with the rest of my code, or the servers file system, I ran my program again as it was but without the line that was appending items to the list and it completed without problem indicating that the decrease in performance is the result of some part of the process of appending to the list. Since other people have observed this problem as well (http://tek-tips.com/viewthread.cfm?qid=1096178&page=13, http://stackoverflow.com/questions/2473783/is-there-a-way-to-circumvent-python-list-append-becoming-progressively-slower-i) I did not bother to further analyze or benchmark it. Since the answers in the above forums do not seem very definitive I thought I would inquire here about what the reason for this decrease in performance is, and if there is a way, or another data structure, that would avoid this problem. -- http://mail.python.org/mailman/listinfo/python-list
Re: Creating Long Lists
The answer it turns out is the garbage collector. When I disable the garbage collector before the loop that loads the data into the list and then enable it after the loop the program runs without issue. This raises a question though, can the logic of the garbage collector be changed so that it is not triggered in instances like this were you really do want to put lots and lots of stuff in memory. Turning on and off the garbage collector is not a big deal, but it would obviously be nicer not to have to. -- http://mail.python.org/mailman/listinfo/python-list
Re: Creating Long Lists
I am using python 2.6.2, so it may no longer be a problem. I am open to using another data type, but the way I read the documentation array.array only supports numeric types, not arbitrary objects. I also tried playing around with numpy arrays, albeit for only a short time, and it seems that although they do support arbitrary objects they seem to be geared toward numbers as well and I found it cumbersome to manipulate objects with them. It could be though that if I understood them better they would work fine. Also do numpy arrays support sorting arbitrary objects, I only saw a method that sorts numbers. -- http://mail.python.org/mailman/listinfo/python-list
Has Next in Python Iterators
I have been programing in python for a while now and by in large love it. One thing I don't love though is that as far as I know iterators have no has_next type functionality. As a result if I want to iterate until an element that might or might not be present is found I either wrap the while loop in a try block or break out of a for loop. Since an iterator having an end is not actually an exceptional case and the for construct is really for iterating though the entirety of a list both of these solutions feel like greasy workarounds and thus not very pythonic. Is there something I am missing? Is there a reason python iterators don't have has_next functionality? What is the standard solution to this problem? -- http://mail.python.org/mailman/listinfo/python-list
Re: Has Next in Python Iterators
The example I have in mind is list like [2,2,2,2,2,2,1,3,3,3,3] where you want to loop until you see not a 2 and then you want to loop until you see not a 3. In this situation you cannot use a for loop as follows: foo_list_iter = iter([2,2,2,2,2,2,1,3,3,3,3]) for foo_item in foo_list_iter: if foo_item != 2: break because it will eat the 1 and not allow the second loop to find it. takeWhile and dropWhile have the same problem. It is possible to use a while loop as follows: foo_list_item = foo_list_iter.next() while foo_list_item == 2: foo_list_item = foo_list_iter.next() while foo_list_item == 3: foo_list_item = foo_list_iter.next() but if you can't be sure the list is not empty/all 2s then all 3s you need to surround this code in a try block. Unless there is a good reason for having to do this I think it is undesirable because it means that the second clause of the loop invariant, namely that you are not off the end of the list, is being controlled outside of the loop. As for the feasibly of implementing a has_next function I agree that you cannot write one method that will provide the proper functionality in all cases, and thus that you cannot create a has_next for generators. Iterators however are a different beast, they are returned by the thing they are iterating over and thus any special cases can be covered by writing a specific implementation for the iterable in question. This sort of functionality is possible to implement, because java does it. -- http://mail.python.org/mailman/listinfo/python-list