On Mon, 20 Jun 2011 12:43:52 -0700, deathweaselx86 wrote: > Howdy guys, I am new. > > I've been converting lists to sets, then back to lists again to get > unique lists. > e.g > > Python 2.5.2 (r252:60911, Jan 20 2010, 21:48:48) [GCC 4.2.4 (Ubuntu > 4.2.4-1ubuntu3)] on linux2 Type "help", "copyright", "credits" or > "license" for more information. >>>> foo = ['1','2','3'] >>>> bar = ['2','5'] >>>> foo.extend(bar) >>>> foo = list(set(foo)) >>>> foo > ['1', '3', '2', '5'] > > I used to use list comps to do this instead. >>>> foo = ['1','2','3'] >>>> bar = ['2','5'] >>>> foo.extend([a for a in bar if a not in foo]) foo > ['1', '2', '3', '5'] > > A very long time ago, we all used dictionaries, but I'm not interested > in that ever again. ;-) > Is there any performance hit to using one of these methods over the > other for rather large lists?
Absolutely! Membership testing in lists is O(N), that is, it gets slower as the number of items in the list increases. So if list foo above is huge, "a not in foo" may take time proportional to how huge it is. For sets, membership testing is (roughly) O(1), that it, it takes (roughly) constant time no matter how big the set is. There are special cases where this does not hold, but you have to work hard to find one, so in general you can assume that any lookup in a dict or set will take the same amount of time. However, the cost of that extra power is that sets are limited to hashable objects (e.g. ints, strings, but not lists, etc.), they can't contain duplicates, and they are unordered. If you care about the order of the list, it is better to do something like this: # pseudo-code target = ['1', '3', '2', '7', '5', '9'] source = ['6', '2', '1', '0'] # add unique items of source to target, keeping the order tmp = sorted(target) for item in source: flag = binary search for item in tmp if not flag: insert item in tmp keeping the list sorted append item to the end of target del tmp The bisect module will be useful here. For small lists, it really doesn't matter what you do. This probably only matters beyond a few tens of thousands of items. -- Steven -- http://mail.python.org/mailman/listinfo/python-list