Re: Performance of list.index - how to speed up a silly algorithm?

Laszlo Nagy Sat, 01 May 2010 04:42:38 -0700

Maybe this should be implemented in C. But I believe that thealgorithm itself must be wrong (regardless of the language). I reallythink that I'm doing something wrong. Looks like my algorithm'sprocessing time is not linear to the number of rows. Not evenlog(n)*n. There should be a more effective way to do this. But how?
I had the idea of sorting the rows by a given dimension. Then itwould be obvious to speed up the indexing part - for that dimension.PROBABLY sorting all rows would be faster than calling list.index()for each row. But... it does not seem very efficient either. What ifI have 1 million rows and 10 dimensions? Do I sort 1 million rows onthe disk 10 times? Some of you might have ran into the same problembefore, and can tell me which is the most efficient way to do this.
The .index method does a linear search, checking on average 1/2 of theitems in the list. That's why it's so slow.
In order to avoid that you could build a dict of each value in
dimension_values[col_idx] and its index in a single pass so that it
becomes a quick lookup.

Changed to dicts and hashed lookups. Now the processing time is O(n),and went up to 8000 rows/sec.

Probably I'll never want to process more than 1M rows. That will takeabout 125 seconds. Fair enough.


Thank you very much!

  Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Re: Performance of list.index - how to speed up a silly algorithm?

Reply via email to