On 9/28/2016 3:10 PM, Tim Chase wrote:
I've got several iterators sharing a common key in the same order and
would like to iterate over them in parallel, operating on all items
with the same key. I've simplified the data a bit here, but it would
be something like
data1 = [ # key, data1
(1, "one A"),
(1, "one B"),
(2, "two"),
(5, "five"),
]
data2 = [ # key, data1
(1, "uno"),
(2, "dos"),
(3, "tres x"),
(3, "tres y"),
(3, "tres z"),
(4, "cuatro"),
]
data3 = [ # key, data1, data2
(2, "ii", "extra alpha"),
(4, "iv", "extra beta"),
(5, "v", "extra gamma"),
]
And I'd like to do something like
for common_key, d1, d2, d3 in magic_happens_here(data1, data2, data3):
for row in d1:
process_a(common_key, row)
for thing in d2:
process_b(common_key, row)
for thing in d3:
process_c(common_key, row)
which would yield the common_key, along with enough of each of those
iterators (note that gaps can happen, but the sortable order should
remain the same). So in the above data, the outer FOR loop would
happen 5 times with common_key being [1, 2, 3, 4, 5], and each of
[d1, d2, d3] being an iterator that deals with just that data.
You just need d1, d2, d3 to be iterables, such as a list. Write a magic
generator that opens the three files and reads one line of each (with
next()). Then in while True loop, find minimum key and make 3 lists (up
to 2 possibly empty) of the items in each file with that key. This will
require up to 3 inner loops. The read-ahead makes this slightly messy.
If any list is not empty, yield the key and 3 lists. Otherwise break
the outer loop.
My original method was hauling everything into memory and making
multiple passes filtering on the data. However, the actual sources
are CSV-files, some of which are hundreds of megs in size, and my
system was taking a bit of a hit. So I was hoping for a way to do
this with each iterator making only one complete pass through each
source (since they're sorted by common key).
It's somewhat similar to the *nix "join" command, only dealing with
N files.
It is also somewhat similar to a 3-way mergesort.
--
Terry Jan Reedy
--
https://mail.python.org/mailman/listinfo/python-list