Re: [zfs-discuss] ZFS for iSCSI based SAN

Scott Meilicke Thu, 25 Jun 2009 08:11:38 -0700

> if those servers are on physical boxes right now i'd do some perfmon
> caps and add up the iops.


Using perfmon to get a sense of what is required is a good idea. Use the 95 
percentile to be conservative. The counters I have used are in the Physical 
disk object. Don't ignore the latency counters either. In my book, anything 
consistently over 20ms or so is excessive.

I run 30+ VMs on an Equallogic array with 14 sata disks, broken up as two 
striped 6 disk raid5 sets (raid 50) with 2 hot spares. That array is, on 
average, about 25% loaded from an IO stand point. Obviously my VMs are pretty 
light. And the EQL gear is *fast*, which makes me feel better about spending 
all of that money :).

>> Regarding ZIL usage, from what I have read you will only see 
>> benefits if you are using NFS backed storage, but that it can be 
>> significant.
>
> link?

>From the ZFS Evil Tuning Guide 
>(http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide):
"ZIL stands for ZFS Intent Log. It is used during synchronous writes 
operations."

further down:

"If you've noticed terrible NFS or database performance on SAN storage array, 
the problem is not with ZFS, but with the way the disk drivers interact with 
the storage devices.
ZFS is designed to work with storage devices that manage a disk-level cache. 
ZFS commonly asks the storage device to ensure that data is safely placed on 
stable storage by requesting a cache flush. For JBOD storage, this works as 
designed and without problems. For many NVRAM-based storage arrays, a problem 
might come up if the array takes the cache flush request and actually does 
something rather than ignoring it. Some storage will flush their caches despite 
the fact that the NVRAM protection makes those caches as good as stable storage.
ZFS issues infrequent flushes (every 5 second or so) after the uberblock 
updates. The problem here is fairly inconsequential. No tuning is warranted 
here.
ZFS also issues a flush every time an application requests a synchronous write 
(O_DSYNC, fsync, NFS commit, and so on). The completion of this type of flush 
is waited upon by the application and impacts performance. Greatly so, in fact. 
From a performance standpoint, this neutralizes the benefits of having an 
NVRAM-based storage."

When I was testing iSCSI vs. NFS, it was clear iSCSI was not doing sync, NFS 
was. Here are some zpool iostat numbers:

iSCSI testing using iometer with the RealLife work load (65% read, 60% random, 
8k transfers - see the link in my previous post) - it is clear that writes are 
being cached in RAM, and then spun off to disk.

# zpool iostat data01 1

                   capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
data01      55.5G  20.4T    691      0  4.21M      0
data01      55.5G  20.4T    632      0  3.80M      0
data01      55.5G  20.4T    657      0  3.93M      0
data01      55.5G  20.4T    669      0  4.12M      0
data01      55.5G  20.4T    689      0  4.09M      0
data01      55.5G  20.4T    488  1.77K  2.94M  9.56M
data01      55.5G  20.4T     29  4.28K   176K  23.5M
data01      55.5G  20.4T     25  4.26K   165K  23.7M
data01      55.5G  20.4T     20  3.97K   133K  22.0M
data01      55.6G  20.4T    170  2.26K  1.01M  11.8M
data01      55.6G  20.4T    678      0  4.05M      0
data01      55.6G  20.4T    625      0  3.74M      0
data01      55.6G  20.4T    685      0  4.17M      0
data01      55.6G  20.4T    690      0  4.04M      0
data01      55.6G  20.4T    679      0  4.02M      0
data01      55.6G  20.4T    664      0  4.03M      0
data01      55.6G  20.4T    699      0  4.27M      0
data01      55.6G  20.4T    423  1.73K  2.66M  9.32M
data01      55.6G  20.4T     26  3.97K   151K  21.8M
data01      55.6G  20.4T     34  4.23K   223K  23.2M
data01      55.6G  20.4T     13  4.37K  87.1K  23.9M
data01      55.6G  20.4T     21  3.33K   136K  18.6M
data01      55.6G  20.4T    468    496  2.89M  1.82M
data01      55.6G  20.4T    687      0  4.13M      0

Testing against NFS shows writes to disk continuously.

NFS Testing
                   capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
data01      59.6G  20.4T     57    216   352K  1.74M
data01      59.6G  20.4T     41     21   660K  2.74M
data01      59.6G  20.4T     44     24   655K  3.09M
data01      59.6G  20.4T     41     23   598K  2.97M
data01      59.6G  20.4T     34     33   552K  4.21M
data01      59.6G  20.4T     46     24   757K  3.09M
data01      59.6G  20.4T     39     24   593K  3.09M
data01      59.6G  20.4T     45     25   687K  3.22M
data01      59.6G  20.4T     45     23   683K  2.97M
data01      59.6G  20.4T     33     23   492K  2.97M
data01      59.6G  20.4T     16     41   214K  1.71M
data01      59.6G  20.4T      3  2.36K  53.4K  30.4M
data01      59.6G  20.4T      1  2.23K  20.3K  29.2M
data01      59.6G  20.4T      0  2.24K  30.2K  28.9M
data01      59.6G  20.4T      0  1.93K  30.2K  25.1M
data01      59.6G  20.4T      0  2.22K      0  28.4M
data01      59.7G  20.4T     21    295   317K  4.48M
data01      59.7G  20.4T     32     12   495K  1.61M
data01      59.7G  20.4T     35     25   515K  3.22M
data01      59.7G  20.4T     36     11   522K  1.49M
data01      59.7G  20.4T     33     24   508K  3.09M
data01      59.7G  20.4T     35     23   536K  2.97M
data01      59.7G  20.4T     32     23   483K  2.97M
data01      59.7G  20.4T     37     37   538K  4.70M

Note, the ZIL is being used, just not on a separate device. The periodic high 
writes show it being flushed. You can also see reads stall to nearly zero as 
the ZIL is dumping. Not good. This thread is discussing this behavior: 
http://www.opensolaris.org/jive/thread.jspa?threadID=106453

Coming from a mostly Windows world, I really like the tools that you get on 
Opensolaris to see this kind of stuff.

-Scott
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS for iSCSI based SAN

Reply via email to