Hi all, I just read and test this paper http://www.laurustech.com/Learning%20DTrace_Part4.pdf where they compare cp(1) and dd(1) and why plain dd(1) is so slow. With similar scripts you can even check how bad performance has some VM machine when you compare it with real machine. Eg. I had 400 write(2) and 401 read(2) calls in VM (VMware Player on non-SMP machine) for cp(1) copying 50MB of data. They had 7 write(2) and 0 read(2) on real machine. I will test it on VirtualBox on SMP and ESX at work in production. Part with syscalls is possible with ktrace(1) too, but output is not so "nice" and you must play with output to get count.
-- http://www.openbsd.org/lyrics.html