Creating Long Lists

2011-02-21 Thread Kelson Zawack
I have a large (10gb) data file for which I want to parse each line into 
an object and then append this object to a list for sorting and further 
processing.  I have noticed however that as the length of the list 
increases the rate at which objects are added to it decreases 
dramatically.  My first thought was that  I was nearing the memory 
capacity of the machine and the decrease in performance was due to the 
os swapping things in and out of memory.  When I looked at the memory 
usage this was not the case.  My process was the only job running and 
was consuming 40gb of the the total 130gb and no swapping processes were 
running.  To make sure there was not some problem with the rest of my 
code, or the servers file system, I ran my program again as it was but 
without the line that was appending items to the list and it completed 
without problem indicating that the decrease in performance is the 
result of some part of the process of appending to the list.  Since 
other people have observed this problem as well 
(http://tek-tips.com/viewthread.cfm?qid=1096178&page=13,  
http://stackoverflow.com/questions/2473783/is-there-a-way-to-circumvent-python-list-append-becoming-progressively-slower-i) 
I did not bother to further analyze or benchmark it.  Since the answers 
in the above forums do not seem very definitive  I thought  I would 
inquire here about what the reason for this decrease in performance is, 
and if there is a way, or another data structure, that would avoid this 
problem.


--
http://mail.python.org/mailman/listinfo/python-list


Re: Creating Long Lists

2011-02-22 Thread Kelson Zawack
The answer it turns out is the garbage collector.  When I disable the
garbage collector before the loop that loads the data into the list
and then enable it after the loop the program runs without issue.
This raises a question though, can the logic of the garbage collector
be changed so that it is not triggered in instances like this were you
really do want to put lots and lots of stuff in memory.  Turning on
and off the garbage collector is not a big deal, but it would
obviously be nicer not to have to.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating Long Lists

2011-02-22 Thread Kelson Zawack
I am using python 2.6.2, so it may no longer be a problem.

I am open to using another data type, but the way I read the
documentation array.array only supports numeric types, not arbitrary
objects.  I also tried playing around with numpy arrays, albeit for
only a short time, and it seems that although they do support
arbitrary objects they seem to be geared toward numbers as well and I
found it cumbersome to manipulate objects with them.  It could be
though that if I understood them better they would work fine.  Also do
numpy arrays support sorting arbitrary objects, I only saw a method
that sorts numbers.
-- 
http://mail.python.org/mailman/listinfo/python-list


Has Next in Python Iterators

2010-10-21 Thread Kelson Zawack
I have been programing in python for a while now and by in large love 
it.  One thing I don't love though is that as far as I know iterators 
have no has_next type functionality.  As a result if I want to iterate 
until an element that might or might not be present is found I either 
wrap the while loop in a try block or break out of a for loop.  Since an 
iterator having an end is not actually an exceptional case and the for 
construct is really for iterating though the entirety of a list both of 
these solutions feel like greasy workarounds and thus not very 
pythonic.  Is there something I am missing?  Is there a reason python 
iterators don't have has_next functionality?  What is the standard 
solution to this problem?


--
http://mail.python.org/mailman/listinfo/python-list


Re: Has Next in Python Iterators

2010-10-25 Thread Kelson Zawack
The example I have in mind is list like [2,2,2,2,2,2,1,3,3,3,3] where
you want to loop until you see not a 2 and then you want to loop until
you see not a 3.  In this situation you cannot use a for loop as
follows:

foo_list_iter = iter([2,2,2,2,2,2,1,3,3,3,3])
for foo_item in foo_list_iter:
if foo_item != 2:
break
because it will eat the 1 and not allow the second loop to find it.
takeWhile and dropWhile have the same problem.  It is possible to use
a while loop as follows:

foo_list_item = foo_list_iter.next()
while foo_list_item == 2:
foo_list_item = foo_list_iter.next()
while foo_list_item == 3:
foo_list_item = foo_list_iter.next()

but if you can't be sure the list is not empty/all 2s then all 3s you
need to surround this code in a try block.  Unless there is a good
reason for having to do this I think it is undesirable because it
means that the second clause of the loop invariant, namely that you
are not off the end of the list, is being controlled outside of the
loop.

As for the feasibly of implementing a has_next function I agree that
you cannot write one method that will provide the proper functionality
in all cases, and thus that you cannot create a has_next for
generators.  Iterators however are a different beast, they are
returned by the thing they are iterating over and thus any special
cases can be covered by writing a specific implementation for the
iterable in question.  This sort of functionality is possible to
implement, because java does it.
-- 
http://mail.python.org/mailman/listinfo/python-list