[issue4615] de-duping function in itertools

Tom Pinckney Tue, 09 Dec 2008 19:47:25 -0800

Tom Pinckney <[EMAIL PROTECTED]> added the comment:

My latest need for something like this was something like this:


src1 = db_query(query_1)
src2 = db_query(query_2)
results = deduped(src1 + src2, key=lambda x: x.field2)

Basically, I wanted data from src1 if it existed and otherwise from src2 
, while preserving the order of src1 (I didn't care about order of 
src2).

A previous example was reading from a file and wanting to de-dupe lines 
based on a field in each line. Again order mattered to me since I wanted 
to process the non-duped lines in the file in order.

A final example was generating a bunch of error messages from a variety 
of sources and then wanting to make sure there were no duplicate errors. 
Instead of: 

errors = set(errors)

I find this much clearer:

errors = deduped(errors)

In reality all of these examples probably do not need to be written as a 
generator. The lists being de-duped are probably not so huge in practice 
as to preclude instantiating a new list (given the reality of multi-gig 
RAM machines etc). It just seemed particularly clear to write this using 
a yield.

An ordered dictionary would probably work for me too. I don't think a 
Bag would given it's lack of ordering. 

I do find it very simple to just be able to apply deduped() to any 
existing sequence/iterator and not have to be more verbose about 
explicitly iterating and filling in an ordered dictionary somehow.

_______________________________________
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue4615>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue4615] de-duping function in itertools

Reply via email to