tag 13089 + notabug
close 13089
On 12/05/2012 02:11 PM, Ole Tange wrote:
I often have data that can be processed in parallel.
It would be great if split --filter could look at every n'th line
instead of chunking into n chunks:
cat bigfile | split --every-nth -n 8 --filter "grep foo"
The above should start 8 greps and give each a line in round robin manner.
Ideally it should be possible to do so non-blocking so if some lines
take longer for one instance of grep, then the rest of the greps are
not blocked.
So that's mostly supported already (notice the r/ below):
$ seq 8000 | split -n r/8 --filter='wc -l' | uniq -c
8 1000
The concurrency is achieved through standard I/O buffers
between split and the filters (note also the -u split option).
I'm not sure non blocking I/O would be of much benefit,
since the filters will be the same, and if we did that,
then we'd have to worry about internal buffering in split.
We had a similar question about tee, yesterday, and I
think the answer is the same here, that the complexity
doesn't seem warranted for such edge cases.
thanks,
Pádraig.