Re: [perf-discuss] segvn analysis

Brendan Gregg Fri, 13 Jan 2006 06:02:22 -0800

G'Day Max,

Thanks for your reply. Things have become a little stranger...

On Thu, 12 Jan 2006 [EMAIL PROTECTED] wrote:

> Hi Greg,
>
> OK.  I start the segvn_fault.d script, then in a second window
> a "dd if=/dev/dsk/c0d0s2 of=/dev/null bs=8k"
> and then in a third window do the cp of the 1000 page file.
> Now I get ~1000 segvn_faults for the cp.
> I expected to get a larger count because the dd is
> contending with the cp.
> So, either your disk is very slow or there are other
> busy processes on your old system.

If my disk was slow or busy when cp wanted to read, then I too would
expect more faults - as cp can fault faster than disk-read-ahead.
But cp isn't reading from disk! A modified segvn.d (attached),

 # segvn.d
 Sampling...
 ^C
 segvn_fault
 -----------
 CMD                                          FILE    COUNT
 [...]
 cp                                   /extra1/1000     1000

 CMD                                          FILE          BYTES
 [...]
 cp                                   /extra1/1000        4096000

 io:::start
 ----------
 CMD                                          FILE DIR      BYTES
 cp                                   /extra1/1000   R      12288
 cp                                   /extra2/1000   W    4096000

The input file /extra1/1000 is not being read from disk (only 12 Kb).

Repeating your test kicked my Ultra 5 from a consistant 132 segvn_faults
to 1000 segvn_faults, but this remained at 1000 for subsequent tests
without the dd running. I suspect dd to /dev/dsk thrashed the cache
(or changed how it held pages) such that cache-read-ahead stopped working
(even though the pages were still cached). Ok, this is still sounding far
fetched.

Returning my Ultra 5 to a consistant 132 segvn_faults state was a real
challenge: mount remount didn't work, nor did init 6!
What did work was rewriting my /extra1/1000 file using dd. Hmmm.

It appears that using dd to WRITE to a file, leaves that file cached in a
cache-read-ahead optimal way (eg, repeatable 132 segvn_faults). Then
either remount or dd the /dev/dsk device (both affecting the cache) and
we go to a repeatable 1000 segvn_faults.

I rewrote my /extra1/1000 file on my x86 server, and yes - it now
consistantly faults at 129. Phew!

...

I came up with the following simple test to check this out,

 # dd if=/dev/urandom bs=4k of=/extra1/5000 count=5000
 0+5000 records in
 0+5000 records out
 # ptime cp -f /extra1/5000 /tmp

 real        0.077              --- fast, as we just created it
 user        0.001
 sys         0.075
 # ptime cp -f /extra1/5000 /tmp

 real        0.076              --- still fast...
 user        0.001
 sys         0.074
 # ptime cp -f /extra1/5000 /tmp

 real        0.076              --- still fast...
 user        0.001
 sys         0.074
 # umount /extra1; mount /extra1
 # ptime cp -f /extra1/5000 /tmp

 real        0.129              --- slow, as we just remounted the FS
 user        0.001
 sys         0.099
 # ptime cp -f /extra1/5000 /tmp

 real        0.084              --- faster, as the file is now cached
 user        0.001
 sys         0.081
 # ptime cp -f /extra1/5000 /tmp

 real        0.084              --- hrm...
 user        0.001
 sys         0.081
 # ptime cp -f /extra1/5000 /tmp

 real        0.084              --- not getting any faster than this.
 user        0.001
 sys         0.081

So after creation, /extra1/5000 is copied in 0.076 secs (consistantly).

After remounting, /extra1/5000 is copied in 0.084 secs (consistantly).

I haven't found a way to convince a file to be cached as well as it is on
creation. It seems that newly created files are blessed.

How does this all sound? :)

cheers,

Brendan

> max
>
> Quoting Brendan Gregg <[EMAIL PROTECTED]>:
[...]
> >> For instance, if the file system brings in 56k, (14 pages
> >> on my amd64 box), and my disk is reasonably fast,
> >> by the time cp gets a bit into the first 56k, I suspect
> >> that all of the data is in  memory and there is no
> >> trapping into the kernel at all until the next 56k
> >> needs to be read in.
> >
> > That would make sense. In this case (testing here anyway) it's not going
> > near disk for reading (only writing the destination file),
> >
> > x86,
> >                    extended device statistics
> >    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
> >    0.0   74.0    0.0 3999.8  0.0  0.1    0.0    0.9   0   6 c0d0
> > sparc,
> >                    extended device statistics
> >    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
> >    0.0   65.0    0.0 7995.6 23.6  1.8  363.7   27.7  88  90 c0t0d0
> >
> > So, considering both systems undercount expected faults - one fault must
> > be triggering some form of "read ahead" from the page cache, not disk.
> > I'm thinking the path is somthing like,
> >
> >   ufs_getpage_ra -> pvn_read_kluster -> page_create_va -> (read many?)
> >
> >> (I guess I am assuming the hat
> >> layer is setting up pte's as the pages are brought in,
> >> not as cp is accessing them).
> >
> > Yep - and that sort of problem would be the very thing that throws a
> > spanner in the works. If it always waited for cp to access them, then I'd
> > have consistant events to trace...
> >
> > Thanks Max! :)
> >
> > Brendan
> >
> >
> >
> >
>
>
>
>

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        trace("Sampling...\n");
}

fbt::segvn_fault:entry
/(int)((struct segvn_data *)args[1]->s_data)->vp != NULL/
{
        self->vn =  (struct vnode *)((struct segvn_data *)args[1]->s_data)->vp;
        @faults[execname, stringof(self->vn->v_path)] = count();
        @bytes[execname, stringof(self->vn->v_path)] = sum(args[3]);
}

io:::start
{
        @iobytes[execname, args[2]->fi_pathname, 
            args[0]->b_flags & B_READ ? "R" : "W"] = sum(args[0]->b_bcount);
}

dtrace:::END
{
        printf("segvn_fault\n-----------\n");
        printf("%-16s %32s %8s\n", "CMD", "FILE", "COUNT");
        printa("%-16s %32s [EMAIL PROTECTED]", @faults);
        printf("\n%-16s %32s %14s\n", "CMD", "FILE", "BYTES");
        printa("%-16s %32s [EMAIL PROTECTED]", @bytes);

        printf("\nio:::start\n----------\n");
        printf("%-16s %32s %3s %10s\n", "CMD", "FILE", "DIR", "BYTES");
        printa("%-16s %32s %3s [EMAIL PROTECTED]", @iobytes);
}

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Re: [perf-discuss] segvn analysis

Reply via email to