repost - Sorry for ccing the other forums. I'm running into a issue where there seems to be a high number of read iops hitting disks and physical free memory is fluctuating between 200MB -> 450MB out of 16GB total. We have the l2arc configured on a 32GB Intel X25-E ssd and slog on another 32GB X25-E ssd.
According to our tester, Oracle writes are extremely slow (high latency). Below is a snippet of iostat: r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0 4898.3 34.2 23.2 1.4 0.1 385.3 0.0 78.1 0 1246 c1 0.0 0.8 0.0 0.0 0.0 0.0 0.0 16.0 0 1 c1t0d0 401.7 0.0 1.9 0.0 0.0 31.5 0.0 78.5 1 100 c1t1d0 421.2 0.0 2.0 0.0 0.0 30.4 0.0 72.3 1 98 c1t2d0 403.9 0.0 1.9 0.0 0.0 32.0 0.0 79.2 1 100 c1t3d0 406.7 0.0 2.0 0.0 0.0 33.0 0.0 81.3 1 100 c1t4d0 414.2 0.0 1.9 0.0 0.0 28.6 0.0 69.1 1 98 c1t5d0 406.3 0.0 1.8 0.0 0.0 32.1 0.0 79.0 1 100 c1t6d0 404.3 0.0 1.9 0.0 0.0 31.9 0.0 78.8 1 100 c1t7d0 404.1 0.0 1.9 0.0 0.0 34.0 0.0 84.1 1 100 c1t8d0 407.1 0.0 1.9 0.0 0.0 31.2 0.0 76.6 1 100 c1t9d0 407.5 0.0 2.0 0.0 0.0 33.2 0.0 81.4 1 100 c1t10d0 402.8 0.0 2.0 0.0 0.0 33.5 0.0 83.2 1 100 c1t11d0 408.9 0.0 2.0 0.0 0.0 32.8 0.0 80.3 1 100 c1t12d0 9.6 10.8 0.1 0.9 0.0 0.4 0.0 20.1 0 17 c1t13d0 0.0 22.7 0.0 0.5 0.0 0.5 0.0 22.8 0 33 c1t14d0 Is this an indicator that we need more physical memory? From http://blogs.sun.com/brendan/entry/test, the order that a read request is satisfied is: 1) ARC 2) vdev cache of L2ARC devices 3) L2ARC devices 4) vdev cache of disks 5) disks Using arc_summary.pl, we determined that prefletch was not helping much so we disabled. CACHE HITS BY DATA TYPE: Demand Data: 22% 158853174 Prefetch Data: 17% 123009991 <---not helping??? Demand Metadata: 60% 437439104 Prefetch Metadata: 0% 2446824 The write iops started to kick in more and latency reduced on spinning disks: 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0 1629.0 968.0 17.4 7.3 0.0 35.9 0.0 13.8 0 1088 c1 0.0 1.9 0.0 0.0 0.0 0.0 0.0 1.7 0 0 c1t0d0 126.7 67.3 1.4 0.2 0.0 2.9 0.0 14.8 0 90 c1t1d0 129.7 76.1 1.4 0.2 0.0 2.8 0.0 13.7 0 90 c1t2d0 128.0 73.9 1.4 0.2 0.0 3.2 0.0 16.0 0 91 c1t3d0 128.3 79.1 1.3 0.2 0.0 3.6 0.0 17.2 0 92 c1t4d0 125.8 69.7 1.3 0.2 0.0 2.9 0.0 14.9 0 89 c1t5d0 128.3 81.9 1.4 0.2 0.0 2.8 0.0 13.1 0 89 c1t6d0 128.1 69.2 1.4 0.2 0.0 3.1 0.0 15.7 0 93 c1t7d0 128.3 80.3 1.4 0.2 0.0 3.1 0.0 14.7 0 91 c1t8d0 129.2 69.3 1.4 0.2 0.0 3.0 0.0 15.2 0 90 c1t9d0 130.1 80.0 1.4 0.2 0.0 2.9 0.0 13.6 0 89 c1t10d0 126.2 72.6 1.3 0.2 0.0 2.8 0.0 14.2 0 89 c1t11d0 129.7 81.0 1.4 0.2 0.0 2.7 0.0 12.9 0 88 c1t12d0 90.4 41.3 1.0 4.0 0.0 0.2 0.0 1.2 0 6 c1t13d0 0.0 24.3 0.0 1.2 0.0 0.0 0.0 0.2 0 0 c1t14d0 Is it true if your MFU stats start to go over 50% then more memory is needed? CACHE HITS BY CACHE LIST: Anon: 10% 74845266 [ New Customer, First Cache Hit ] Most Recently Used: 19% 140478087 (mru) [ Return Customer ] Most Frequently Used: 65% 475719362 (mfu) [ Frequent Customer ] Most Recently Used Ghost: 2% 20785604 (mru_ghost) [ Return Customer Evicted, Now Back ] Most Frequently Used Ghost: 1% 9920089 (mfu_ghost) [ Frequent Customer Evicted, Now Back ] CACHE HITS BY DATA TYPE: Demand Data: 22% 158852935 Prefetch Data: 17% 123009991 Demand Metadata: 60% 437438658 Prefetch Metadata: 0% 2446824 My theory is since there's not enough memory for the arc to cache data, its hits the l2arc where it can't find data and has to query the disk for the request. This causes contention between reads and writes causing the service times to inflate. uname: 5.10 Generic_141445-09 i86pc i386 i86pc Sun Fire X4270: 11+1 raidz (SAS) l2arc Intel X25-E slog Intel X25-E Thoughts? -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss