On 2016-09-29 10:20, Steve D'Aprano wrote: > On Thu, 29 Sep 2016 05:10 am, Tim Chase wrote: > > data1 = [ # key, data1 > > (1, "one A"), > > (1, "one B"), > > (2, "two"), > > (5, "five"), > > ] > > So data1 has keys 1, 1, 2, 5. > Likewise data2 has keys 1, 2, 3, 3, 3, 4 and data3 has keys 2, 4, 5.
Correct > (data3 also has *two* values, not one, which is an additional > complication.) As commented towards the end, the source is set of CSV files, so each row is a list where a particular (identifiable) item is the key. Assume that one can use something like get_key(row) to return the key, which in the above could be implemented as get_key = lambda row: row[0] and for my csv.DictReader data, would be something like get_key = lambda row: row["Account Number"] > > And I'd like to do something like > > > > for common_key, d1, d2, d3 in magic_happens_here(data1, data2, > > data3): > > What's common_key? In particular, given that data1, data2 and data3 > have the first key each of 1, 1 and 2 respectively, how do you get: > > > So in the above data, the outer FOR loop would > > happen 5 times with common_key being [1, 2, 3, 4, 5] > > I'm confused. Is common_key a *constant* [1, 2, 3, 4, 5] or are you > saying that it iterates over 1, 2, 3, 4, 5? Your interpretation later is correct, that it each unique key once, in-order. So if you data1.append((17, "seventeen")) the outer loop would iterate over [1,2,3,4,5,17] (so not constant, to hopefully answer that part of your question) The actual keys are account-numbers, so they're ascii-sorted strings of the form "1234567-8901", ascending in order through the files. But for equality/less-than/greater-than comparisons, they work effectively as integers in my example. > If the later, it sounds like you want something like a cross between > itertools.groupby and the "merge" stage of mergesort. That's a pretty good description at some level. I looked into groupby() but was having trouble getting it to do what I wanted. > Note that I have modified data3 so instead of three columns, (key > value value), it has two (key value) and value is a 2-tuple. I'm cool with that. Since they're CSV rows, you can imagine the source data then as a generator something like data1 = ( (get_key(row), row) for row in my_csv_iter1 ) to get the data to look like your example input data. > So first you want an iterator that does an N-way merge: > > merged = [(1, "one A"), (1, "one B"), (1, "uno"), > (2, "two"), (2, "dos"), (2, ("ii", "extra alpha")), > (3, "tres x"), (3, "tres y"), (3, "tres z"), > (4, "cuatro"), (4, ("iv", "extra beta")), > (5, "five"), (5, ("v", "extra gamma")), > ] This seems to discard the data's origin (data1/data2/data3) which is how I determine whether to use process_a(), process_b(), or process_c() in my original example where N iterators were returned, one for each input iterator. So the desired output would be akin to (converting everything to tuples as you suggest below) [ (1, [("one A",), ("one B",)], [1, ("uno",)], []), (2, [("two",)], [("dos",)], [("ii", "extra alpha")]), (3, [], [("tres x",), ("tres y",)], []), (4, [], [("cuatro",)], [("iv", "extra beta")]), (5, [("five",)], [], [("v", "extra gamma")]), ] only instead of N list()s, having N generators that are smart enough to yield the corresponding data. > You might find it easier to have *all* the iterators yield (key, > tuple) pairs, where data1 and data2 yield a 1-tuple and data3 > yields a 2-tuple. Right. Sorry my example obscured that shoulda-obviously-been-used simplification. -tkc -- https://mail.python.org/mailman/listinfo/python-list