bug#20029: 'yes' surprisingly slow

Pádraig Brady Sat, 07 Mar 2015 04:11:21 -0800

On 07/03/15 11:49, Ole Tange wrote:
> These two commands give the same output:
> 
> $ yes `echo {1..1000}` | head -c 2300M | md5sum
> a0241f2247e9a37db60e7def3e4f7038  -
> 
> $ yes "`echo {1..1000}`" | head -c 2300M | md5sum
> a0241f2247e9a37db60e7def3e4f7038  -
> 
> But the time to run is quite different:
> 
> $ time yes "`echo {1..1000}`" | head -c 2300M >/dev/null
> 
> real    0m0.897s
> user    0m0.384s
> sys     0m1.343s
> 
> $ time yes `echo {1..1000}` | head -c 2300M >/dev/null
> 
> real    0m11.352s
> user    0m10.571s
> sys     0m2.590s
> 
> WTF?!
> 
> I imagine 'yes' spends a lot of time collecting the 1000 args. But why
> does it do that more than once?


The stdio interactions dominate here.
The slow case has 1000 times more fputs_unlocked() calls.
Yes we could build the line up once and output that.
If doing that we could also build up a BUFSIZ of complete lines
to output at a time, in which case you'd probably avoid stdio altogether.

BTW I noticed tee uses stdio calls which is redundant overhead currently.
It wouldn't if we added a --buffered call to tee so that it might
honor stdbuf(1), though I'm not sure it's worth that flexibility in tee.

I'll look at improving these.

thanks,
Pádraig.

bug#20029: 'yes' surprisingly slow

Reply via email to