Hi Greg,

OK.  I start the segvn_fault.d script, then in a second window
a "dd if=/dev/dsk/c0d0s2 of=/dev/null bs=8k"
and then in a third window do the cp of the 1000 page file.
Now I get ~1000 segvn_faults for the cp.
I expected to get a larger count because the dd is
contending with the cp.
So, either your disk is very slow or there are other
busy processes on your old system.

max

Quoting Brendan Gregg <[EMAIL PROTECTED]>:

G'Day Max,

On Wed, 11 Jan 2006 [EMAIL PROTECTED] wrote:

Hi Greg,

Upon further reflection (and running your script), I am
very puzzled you are getting anywhere near 1000 segvn_fault
calls for the cp.

I get the full 1000 almost every time. I'm on an old x86 box,

  # psrinfo -vp
  The physical processor has 1 virtual processor (0)
    x86 (GenuineIntel family 6 model 8 step 10 clock 867 MHz)
          Intel(r) Pentium(r) III
  ... with 512 Mb RAM, 40 Gb disk, and using a 1 Gb UFS for testing.

When I run your script and do a cp on a 1000 page file,
I get about 128 segvn_fault calls for the cp.

I've now tried it on an Ultra 5 and get 132 segvn_fault()s every time,
similar to you. This is quite different indeed!

This is a much more reasonable number, since cp
is mmap-ing the file, and the file system code
should be faulting in multiple pages at a time.

Thanks - so a read ahead style algorithm for these faults isn't a far
fetched idea at all. :)

segvn_fault appears to be triggered for 1 page - I need to find the
part where it triggers multiple pages.

For instance, if the file system brings in 56k, (14 pages
on my amd64 box), and my disk is reasonably fast,
by the time cp gets a bit into the first 56k, I suspect
that all of the data is in  memory and there is no
trapping into the kernel at all until the next 56k
needs to be read in.

That would make sense. In this case (testing here anyway) it's not going
near disk for reading (only writing the destination file),

x86,
                   extended device statistics
   r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   0.0   74.0    0.0 3999.8  0.0  0.1    0.0    0.9   0   6 c0d0
sparc,
                   extended device statistics
   r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   0.0   65.0    0.0 7995.6 23.6  1.8  363.7   27.7  88  90 c0t0d0

So, considering both systems undercount expected faults - one fault must
be triggering some form of "read ahead" from the page cache, not disk.
I'm thinking the path is somthing like,

  ufs_getpage_ra -> pvn_read_kluster -> page_create_va -> (read many?)

(I guess I am assuming the hat
layer is setting up pte's as the pages are brought in,
not as cp is accessing them).

Yep - and that sort of problem would be the very thing that throws a
spanner in the works. If it always waited for cp to access them, then I'd
have consistant events to trace...

Thanks Max! :)

Brendan







_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to