On 23/04/2013 02:47, Dave Angel wrote: > On 04/22/2013 05:32 PM, Blind Anagram wrote: >> On 22/04/2013 22:03, Oscar Benjamin wrote: >>> On 22 April 2013 21:18, Oscar Benjamin <oscar.j.benja...@gmail.com> >>> wrote: >>>> On 22 April 2013 17:38, Blind Anagram <blindanag...@nowhere.org> wrote: >>>>> On 22/04/2013 17:06, Oscar Benjamin wrote: >>>>> >>>>>> I don't know what your application is but I would say that my first >>>>>> port of call here would be to consider a different algorithmic >>>>>> approach. An obvious question would be about the sparsity of this >>>>>> data >>>>>> structure. How frequent are the values that you are trying to count? >>>>>> Would it make more sense to store a list of their indices? >>>>> >>>>> Actually it is no more than a simple prime sieve implemented as a >>>>> Python >>>>> class (and, yes, I realize that there are plenty of these around). >>>> >>>> If I understand correctly, you have a list of roughly a billion >>>> True/False values indicating which integers are prime and which are >>>> not. You would like to discover how many prime numbers there are >>>> between two numbers a and b. You currently do this by counting the >>>> number of True values in your list between the indices a and b. >>>> >>>> If my description is correct then I would definitely consider using a >>>> different algorithmic approach. The density of primes from 1 to 1 >>>> billlion is about 5%. Storing the prime numbers themselves in a sorted >>>> list would save memory and allow a potentially more efficient way of >>>> counting the number of primes within some interval. >>> >>> In fact it is probably quicker if you don't mind using all that memory >>> to just store the cumulative sum of your prime True/False indicator >>> list. This would be the prime counting function pi(n). You can then >>> count the primes between a and b in constant time with pi[b] - pi[a]. >> >> I did wonder whether, after creating the sieve, I should simply go >> through the list and replace the True values with a count. This would >> certainly speed up the prime count function, which is where the issue >> arises. I will try this and see what sort of performance trade-offs >> this involves. >> > > By doing that replacement, you'd increase memory usage manyfold (maybe > 3:1, I don't know). As long as you're only using bools in the list, you > only have the list overhead to consider, because all the objects > involved are already cached (True and False exist only once each). If > you have integers, you'll need a new object for each nonzero count.
Thank you, Dave, you have answered a question that I was going to ask before I even asked it! -- http://mail.python.org/mailman/listinfo/python-list