The set module or function (depends on which python version) will do this if you make each record a tuple.
-----Original Message----- From: python-list-bounces+frsells=adventistcare....@python.org [mailto:python-list-bounces+frsells=adventistcare....@python.org] On Behalf Of Peter Otten Sent: Tuesday, July 26, 2011 5:04 AM To: python-list@python.org Subject: Re: Selecting unique values Kumar Mainali wrote: > I have a dataset with occurrence records of multiple species. I need to > get rid of multiple listings of the same occurrence point for a species > (as you see below in red and blue typeface). How do I create a dataset > only with unique set of longitude and latitude for each species? Thanks in > advance. > > Species_name Longitude Latitude > Abies concolor -106.601 35.868 > Abies concolor -106.493 35.9682 > Abies concolor -106.489 35.892 > Abies concolor -106.496 35.8542 > Accipiter cooperi -119.688 34.4339 > Accipiter cooperi -119.792 34.5069 > Accipiter cooperi -118.797 34.2581 > Accipiter cooperi -77.38333 39.68333 > Accipiter cooperi -77.38333 39.68333 > Accipiter cooperi -75.99153 40.633335 > Accipiter cooperi -75.99153 40.633335 >>> def uniquify(items): ... seen = set() ... for item in items: ... if item not in seen: ... seen.add(item) ... yield item ... >>> import sys >>> sys.stdout.writelines(uniquify(open("species.txt"))) Species_name Longitude Latitude Abies concolor -106.601 35.868 Abies concolor -106.493 35.9682 Abies concolor -106.489 35.892 Abies concolor -106.496 35.8542 Accipiter cooperi -119.688 34.4339 Accipiter cooperi -119.792 34.5069 Accipiter cooperi -118.797 34.2581 Accipiter cooperi -77.38333 39.68333 Accipiter cooperi -75.99153 40.633335 If you need to massage the lines a bit: >>> def uniquify(items, key=None): ... seen = set() ... for item in items: ... if key is None: ... keyval = item ... else: ... keyval = key(item) ... if keyval not in seen: ... seen.add(keyval) ... yield item ... Unique latitudes: >>> sys.stdout.writelines(uniquify(open("species.txt"), key=lambda s: s.rsplit(None, 1)[-1])) Species_name Longitude Latitude Abies concolor -106.601 35.868 Abies concolor -106.493 35.9682 Abies concolor -106.489 35.892 Abies concolor -106.496 35.8542 Accipiter cooperi -119.688 34.4339 Accipiter cooperi -119.792 34.5069 Accipiter cooperi -118.797 34.2581 Accipiter cooperi -77.38333 39.68333 Accipiter cooperi -75.99153 40.633335 Unique species names: >>> sys.stdout.writelines(uniquify(open("species.txt"), key=lambda s: s.rsplit(None, 2)[0])) Species_name Longitude Latitude Abies concolor -106.601 35.868 Accipiter cooperi -119.688 34.4339 Bonus: open() is not the built-in here: >>> from StringIO import StringIO >>> def open(filename): ... return StringIO("""Species_name Longitude Latitude ... Abies concolor -106.601 35.868 ... Abies concolor -106.493 35.9682 ... Abies concolor -106.489 35.892 ... Abies concolor -106.496 35.8542 ... Accipiter cooperi -119.688 34.4339 ... Accipiter cooperi -119.792 34.5069 ... Accipiter cooperi -118.797 34.2581 ... Accipiter cooperi -77.38333 39.68333 ... Accipiter cooperi -77.38333 39.68333 ... Accipiter cooperi -75.99153 40.633335 ... Accipiter cooperi -75.99153 40.633335 ... """) ... -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list