Steven Bethard wrote: > > As you noted, you'll get an error if you try to concatenate B as a set > to the list. > > Steve
Whoops, remind me to check things work before I type them. In the mean time, here are some more interesting timing results: With a larger data set, 500 elements instead of 100, the times were almost the same: python -m timeit -s "A=range(500); B=range(40,60,2); B_set=set(B)" "A=B+list(x for x in A if x not in B_set)" 10000 loops, best of 3: 103 usec per loop python -m timeit -s "A=range(500); B=range(40,60,2); B_set = set(B)" "A.sort(key=B_set.__contains__, reverse=True)" 10000 loops, best of 3: 99.2 usec per loop But for the last recommended one: python -m timeit -s "A=range(500); B=range(40,60,2); C = set(A) -set(B)" "A=B+filter(C.__contains__, A)" 10000 loops, best of 3: 38.3 usec per loop Much faster! But that does not take into account the set creation and set difference calculations. Let's try including that: python -m timeit -s "A=range(500); B=range(40,60,2)" "C=set(A)-set(B)" "a=B+filter(C.__contains__, A)" 10000 loops, best of 3: 105 usec per loop and compare to our other two versions including set creation: python -m timeit -s "A=range(500); B=range(40,60,2)" "B_set = set(B); A=B+list(x for x in A if X not in B_set" 10000 loops, best of 3: 105 usec per loop python -m timeit -s "A=range(500); B=range(40,60,2)" "B_set = set(B); A.sort(key=B_set.__contains__, reverse=True)" 10000 loops, best of 3: 101 usec per loop Pretty similar again. However, converting each element to a string first ("A=list(str(x) for x in range(500)" etc.), making it more similar to my data, gave: 102usec per loop for the A.sort() method 132usec per loop for the A=B+filter() method 104usec per loop for the A=B+list() method It is interesting to note the increased time taken by the set difference/filter method. It appears not to like strings as much as ints. We can also shave another 7 usec off the filter time: python -m timeit -s "A=list(str)x) for x in range(500)); B=list(str(x) for x in range(40,60,2)); not_in_B_set=lambda x: x not in B_set" "B_set=set(B); A=B+filter(not_in_B_set, A)" 10000 loops, best of 3: 125 usec per loop In conclusion, it appears I should use the clearest of all the methods, since for the data sizes I am dealing with, they are of approximately the same speed, and the time taken is negligible anyway. Thanks for all your help, Cameron. -- http://mail.python.org/mailman/listinfo/python-list