G'Day Max, Thanks for your reply. Things have become a little stranger...
On Thu, 12 Jan 2006 [EMAIL PROTECTED] wrote: > Hi Greg, > > OK. I start the segvn_fault.d script, then in a second window > a "dd if=/dev/dsk/c0d0s2 of=/dev/null bs=8k" > and then in a third window do the cp of the 1000 page file. > Now I get ~1000 segvn_faults for the cp. > I expected to get a larger count because the dd is > contending with the cp. > So, either your disk is very slow or there are other > busy processes on your old system. If my disk was slow or busy when cp wanted to read, then I too would expect more faults - as cp can fault faster than disk-read-ahead. But cp isn't reading from disk! A modified segvn.d (attached), # segvn.d Sampling... ^C segvn_fault ----------- CMD FILE COUNT [...] cp /extra1/1000 1000 CMD FILE BYTES [...] cp /extra1/1000 4096000 io:::start ---------- CMD FILE DIR BYTES cp /extra1/1000 R 12288 cp /extra2/1000 W 4096000 The input file /extra1/1000 is not being read from disk (only 12 Kb). Repeating your test kicked my Ultra 5 from a consistant 132 segvn_faults to 1000 segvn_faults, but this remained at 1000 for subsequent tests without the dd running. I suspect dd to /dev/dsk thrashed the cache (or changed how it held pages) such that cache-read-ahead stopped working (even though the pages were still cached). Ok, this is still sounding far fetched. Returning my Ultra 5 to a consistant 132 segvn_faults state was a real challenge: mount remount didn't work, nor did init 6! What did work was rewriting my /extra1/1000 file using dd. Hmmm. It appears that using dd to WRITE to a file, leaves that file cached in a cache-read-ahead optimal way (eg, repeatable 132 segvn_faults). Then either remount or dd the /dev/dsk device (both affecting the cache) and we go to a repeatable 1000 segvn_faults. I rewrote my /extra1/1000 file on my x86 server, and yes - it now consistantly faults at 129. Phew! ... I came up with the following simple test to check this out, # dd if=/dev/urandom bs=4k of=/extra1/5000 count=5000 0+5000 records in 0+5000 records out # ptime cp -f /extra1/5000 /tmp real 0.077 --- fast, as we just created it user 0.001 sys 0.075 # ptime cp -f /extra1/5000 /tmp real 0.076 --- still fast... user 0.001 sys 0.074 # ptime cp -f /extra1/5000 /tmp real 0.076 --- still fast... user 0.001 sys 0.074 # umount /extra1; mount /extra1 # ptime cp -f /extra1/5000 /tmp real 0.129 --- slow, as we just remounted the FS user 0.001 sys 0.099 # ptime cp -f /extra1/5000 /tmp real 0.084 --- faster, as the file is now cached user 0.001 sys 0.081 # ptime cp -f /extra1/5000 /tmp real 0.084 --- hrm... user 0.001 sys 0.081 # ptime cp -f /extra1/5000 /tmp real 0.084 --- not getting any faster than this. user 0.001 sys 0.081 So after creation, /extra1/5000 is copied in 0.076 secs (consistantly). After remounting, /extra1/5000 is copied in 0.084 secs (consistantly). I haven't found a way to convince a file to be cached as well as it is on creation. It seems that newly created files are blessed. How does this all sound? :) cheers, Brendan > max > > Quoting Brendan Gregg <[EMAIL PROTECTED]>: [...] > >> For instance, if the file system brings in 56k, (14 pages > >> on my amd64 box), and my disk is reasonably fast, > >> by the time cp gets a bit into the first 56k, I suspect > >> that all of the data is in memory and there is no > >> trapping into the kernel at all until the next 56k > >> needs to be read in. > > > > That would make sense. In this case (testing here anyway) it's not going > > near disk for reading (only writing the destination file), > > > > x86, > > extended device statistics > > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > > 0.0 74.0 0.0 3999.8 0.0 0.1 0.0 0.9 0 6 c0d0 > > sparc, > > extended device statistics > > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > > 0.0 65.0 0.0 7995.6 23.6 1.8 363.7 27.7 88 90 c0t0d0 > > > > So, considering both systems undercount expected faults - one fault must > > be triggering some form of "read ahead" from the page cache, not disk. > > I'm thinking the path is somthing like, > > > > ufs_getpage_ra -> pvn_read_kluster -> page_create_va -> (read many?) > > > >> (I guess I am assuming the hat > >> layer is setting up pte's as the pages are brought in, > >> not as cp is accessing them). > > > > Yep - and that sort of problem would be the very thing that throws a > > spanner in the works. If it always waited for cp to access them, then I'd > > have consistant events to trace... > > > > Thanks Max! :) > > > > Brendan > > > > > > > > > > > >
#!/usr/sbin/dtrace -s #pragma D option quiet dtrace:::BEGIN { trace("Sampling...\n"); } fbt::segvn_fault:entry /(int)((struct segvn_data *)args[1]->s_data)->vp != NULL/ { self->vn = (struct vnode *)((struct segvn_data *)args[1]->s_data)->vp; @faults[execname, stringof(self->vn->v_path)] = count(); @bytes[execname, stringof(self->vn->v_path)] = sum(args[3]); } io:::start { @iobytes[execname, args[2]->fi_pathname, args[0]->b_flags & B_READ ? "R" : "W"] = sum(args[0]->b_bcount); } dtrace:::END { printf("segvn_fault\n-----------\n"); printf("%-16s %32s %8s\n", "CMD", "FILE", "COUNT"); printa("%-16s %32s [EMAIL PROTECTED]", @faults); printf("\n%-16s %32s %14s\n", "CMD", "FILE", "BYTES"); printa("%-16s %32s [EMAIL PROTECTED]", @bytes); printf("\nio:::start\n----------\n"); printf("%-16s %32s %3s %10s\n", "CMD", "FILE", "DIR", "BYTES"); printa("%-16s %32s %3s [EMAIL PROTECTED]", @iobytes); }
_______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org