On 07/03/15 11:49, Ole Tange wrote:
> These two commands give the same output:
>
> $ yes `echo {1..1000}` | head -c 2300M | md5sum
> a0241f2247e9a37db60e7def3e4f7038 -
>
> $ yes "`echo {1..1000}`" | head -c 2300M | md5sum
> a0241f2247e9a37db60e7def3e4f7038 -
>
> But the time to run is quite different:
>
> $ time yes "`echo {1..1000}`" | head -c 2300M >/dev/null
>
> real 0m0.897s
> user 0m0.384s
> sys 0m1.343s
>
> $ time yes `echo {1..1000}` | head -c 2300M >/dev/null
>
> real 0m11.352s
> user 0m10.571s
> sys 0m2.590s
>
> WTF?!
>
> I imagine 'yes' spends a lot of time collecting the 1000 args. But why
> does it do that more than once?
The stdio interactions dominate here.
The slow case has 1000 times more fputs_unlocked() calls.
Yes we could build the line up once and output that.
If doing that we could also build up a BUFSIZ of complete lines
to output at a time, in which case you'd probably avoid stdio altogether.
BTW I noticed tee uses stdio calls which is redundant overhead currently.
It wouldn't if we added a --buffered call to tee so that it might
honor stdbuf(1), though I'm not sure it's worth that flexibility in tee.
I'll look at improving these.
thanks,
Pádraig.