Hi,

I have a Netra X1 server with 512MB ram and two ATA disk, model ST340016A.  
Processor is a UltraSPARC-IIe 500MHz.

Version of solaris is: Solaris 10 10/09 s10s_u8wos_08a SPARC

I jumpstarted the server with ZFS as root, two disks as a mirror:


========================================================
pool: rpool
state: ONLINE
scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        rpool           ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            c0t0d0s0  ONLINE       0     0     0
            c0t2d0s0  ONLINE       0     0     0

errors: No known data errors
========================================================


Here's the following partition table:

========================================================
NAME                     USED  AVAIL  REFER  MOUNTPOINT
rpool                       28.9G  7.80G    98K  /rpool
rpool/ROOT             4.06G  7.80G    21K  legacy
rpool/ROOT/newbe  4.06G  7.80G  4.06G  /
rpool/dump              512M  7.80G   512M  -
rpool/opt                 23.8G  7.80G  4.11G  /opt
rpool/opt/export      19.7G  7.80G  19.7G  /opt/export
rpool/swap               512M  8.12G   187M  -
========================================================

When I do this command:
# digest -a md5 /opt/export/BIGFILE (4.6GB)

It took around 1 hour and 45 minutes to process it.  I can understand that 
Netra X1 with ATA can be slow but not like it.  Also, there are some 
inconsistencies between "zfs iostat 1" and dtrace outputs.

Here's the small snippet of zpool iostat 1:
========================================================
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
rpool       28.5G  8.70G     33      1  3.28M  5.78K
rpool       28.5G  8.70G    124      0  15.5M      0
rpool       28.5G  8.70G    150      0  18.8M      0
rpool       28.5G  8.70G    134      0  16.8M      0
rpool       28.5G  8.70G    135      0  16.7M      0
========================================================

No so bad as I would say, for ATA IDE drives.  But, if I dig deeper, for 
example, if I use iostat -D 1 command:

========================================================
                 extended device statistics
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
dad0     63.4    0.0 8115.1    0.0 32.7  2.0  547.5 100 100
dad1     63.4    0.0 8115.1    0.0 33.0  2.0  551.9 100 100
                 extended device statistics
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
dad0     52.0    0.0 6152.3    0.0  9.3  1.6  210.9  75  84
dad1     70.0    0.0 8714.7    0.0 16.0  2.0  256.5  93  99
                 extended device statistics
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
dad0     75.3    0.0 9634.7    0.0 23.1  1.9  332.7  96  97
dad1     71.3    0.0 9121.0    0.0 22.3  1.9  339.4  91  95

========================================================

zfs says 15MB worth of data being transferred but iostat says 8MB worth of data.

We can also clearly see that the bus and hard drive are busy (%w and %b).  
Let's see the % of IO with iotop from DTracet toolkit:

========================================================
2009 Nov 13 09:06:03,  load: 0.37,  disk_r:  93697 KB,  disk_w:      0 KB

  UID    PID   PPID CMD              DEVICE  MAJ MIN D   %I/O
    0   2250   1859 digest           dad1    136   8     R      3
    0   2250   1859 digest           dad0    136   0     R      4
    0      0      0     sched            dad0    136   0     R     88
    0      0      0     sched            dad1    136   8     R     89
========================================================

sched is taking up to 90% of IO.  I tried to trace it:

========================================================
[x]: ps -efl |grep sche
 1 T     root     0     0   0   0 SY        ?      0          21:31:44 ?        
   0:48 sched
[x]: truss -f -p 1
1:      pollsys(0x0002B9DC, 1, 0xFFBFF588, 0x00000000) (sleeping...)
[x]:
========================================================

Only one ouput from truss.  Weird. As if sched is doing nothing but is taking 
up to 90% of IO.

prstat is showing this (I'm gonna show the only first process at the top):

=======================================================
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
  2250 root     2464K 1872K run      0    0   0:01:57  15% digest/1
=======================================================

digest command is taking only 15% of cpu.  We can see that the IO is no so fast.

pfilestat from Dtracetoolkit is reporting this:
=======================================================
     STATE   FDNUM      Time Filename
   running       0        6%
   waitcpu       0        7%
      read       4       13% /opt/export/BIGFILE
   sleep-r       0       69%

     STATE   FDNUM      KB/s Filename
      read       4       614 /opt/export/BIGFILE
=======================================================

Around 615KB is really read each second.  That's is very slow !  This is not 
normal I think, even if those are ATA drives.

So my question are the following:

1.- Why zpool iostat is reporting 15MB/s of data read when in reality only 
615KB/s is read ?
2.- Why sched is taking so much io? 
3.- What I can do to improve IO performance?  It find it very unbelievable that 
this is the best performance the current hardware can provide...

Thank you!
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to