PyPK wrote: > I was testing this piece of code and it takes about 24-30 seconds to do > a look up on a list(m) of size 1000x1000 > m -> list of size 1000x1000 > import time > print time.ctime() > box = {} > for r,row in enumerate(m): > for c,e in enumerate(row): > if box.has_key(e): > params = box[e] > box[e] = ( min(c, params[0]), min(r, params[1]), > max(c, params[2]), max(r, params[3] ) ) > else: > box[e] = [c, r, c, r] > print time.ctime() > > Can some one tell me what exactly is taking more time here. Is it > because I am using dictionaries or what is the problem. Can some one > help me improve this .Is there a better way to write this. >
Without gross changes to the algorithm, and in no particular order: 0. Stop worrying. Find something else to do during the 30 seconds. 1. Use psyco, if your [unspecified] platform supports it. 2. Why has_key()? Try "if e in box:" instead, if your [unspecified] version of Python supports it. If it doesn't, (a) consider upgrading (b) try doing box_has_key = box.has_key outside the loops and using the result inside the loops. 3. Ensure this code is inside a function, not at global level in a script [not apparent from your post]. 4. Outside the loops, put "local_min = min", ditto for max. Then use box[e] = (local_min(c, etc etc 5. Upgrade your [unspecified] hardware. 6. Use try/except, if indicated by the [unspecified] percentage of times "e in box" is true. Hint: printing len(box) at the end might be useful. 7. Consider using pyrex. 8. Consider using numarray/mumeric. Info req'd for discussing algorithm changes: 1. How much free memory do you have? 2. What can you tell us about the distribution of "e"? 3. Is "m" a rectangle, or are the rows of variable length? 4. Do you have control over "m" i.e. are you stuck with that data structure, or can we design a different one? -- http://mail.python.org/mailman/listinfo/python-list