On Sat, 07 May 2005 02:29:48 GMT, [EMAIL PROTECTED] (Bengt Richter) wrote: >On Sat, 07 May 2005 11:08:31 +1000, Maurice LING <[EMAIL PROTECTED]> wrote: >> >>It doesn't seems to help. I'm thinking that it might be a SOAPpy >>problem. The allocation fails when I grab a list of more than 150k >>elements through SOAP but allocating a 1 million element list is fine in >>python. >> >>Now I have a performance problem... >> >>Say I have 3 lists (20K elements, 1G elements, and 0 elements), call >>them 'a', 'b', and 'c'. I want to filter all that is in 'b' but not in >>'a' into 'c'... >> >> >>> a = range(1, 100000, 5) >> >>> b = range(0, 1000000) >> >>> c = [] >> >>> for i in b: >>... if i not in a: c.append(i) >>... >> >>This takes forever to complete. Is there anyway to optimize this? >> >Checking whether something is in a list may average checking equality with >each element in half the list. Checking for membership in a set should >be much faster for any significant size set/list. I.e., just changing to > > a = set(range(1, 100000, 5)) > >should help. I assume those aren't examples of your real data ;-) >You must have a lot of memory if you are keeping 1G elements there and >copying a significant portion of them. Do you need to do this file-to-file, >keeping a in memory? Perhaps page-file thrashing is part of the time problem?
Since when was 1000000 == 1G?? Maurice, is this mucking about with 1M or 1G lists in the same exercise as the "vm_malloc fails when allocating a 20K-element list" problem? Again, it might be a good idea if you gave us a little bit more detail. You haven't even posted the actual *PYTHON* error message and stack trace that you got from the original problem. In fact, there's a possible interpretation that the (system?) malloc merely prints the vm_malloc message and staggers on somehow ... Regards, John -- http://mail.python.org/mailman/listinfo/python-list