> I understand the structure, but the concurrent pipelines > need separate data sources (process or file copy), or otherwise > deadlock may happen as data overflows various buffers. > I suppose this could be encapsulated in tee(1) with non-blocking > writes and internal buffering, but that would just end up > being a data copy anyway, so I'm not sure it's warranted.
So the point to improve shall be tee? If so, I'm glad that we at least figure this out. Let me explain why file copying can be bad. In my pathological example based on sort, sort needs to take all the input before generating any output. However, there are many applications which just need to see a portion (let's call it lookahead) of input before generating any output, in which case copying the whole input is a waste, especially when the input is very large (say hundres of GB) and lookahead is much smaller (say a few MB). Could the maintainer of tee see if there is anything can be improved? -- Regards, Peng