Re: [zfs-discuss] ZFS hang and boot hang when iSCSI device removed

2008-02-06 Thread Ross
Yes, I've learnt that I get the e-mail reply a long while before it appears on 
the boards.  Not entirely sure how these boards are run, it's certainly odd for 
somebody used to forums and not mailing lists, but they do seem to work 
eventually :)

Thanks for the help Vic, will try to get back into that server this morning.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] dos programs on a

2008-02-06 Thread Maurilio Longo
Alan,

I'm using nexenta core rc4 which is based on nevada 81/82.

zfs casesensitivity is set to 'insensitive'

Best regards.

Maurilio.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] available space?

2008-02-06 Thread Jure Pečar

Maybe a basic zfs question ...

I have a pool:

# zpool status backup
  pool: backup
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
backupONLINE   0 0 0
  mirror  ONLINE   0 0 0
c1t0d0s1  ONLINE   0 0 0
c2t0d0s1  ONLINE   0 0 0
  raidz2  ONLINE   0 0 0
c1t1d0ONLINE   0 0 0
c1t2d0ONLINE   0 0 0
c1t3d0ONLINE   0 0 0
c1t4d0ONLINE   0 0 0
c1t5d0ONLINE   0 0 0
c1t6d0ONLINE   0 0 0
c1t7d0ONLINE   0 0 0
c2t1d0ONLINE   0 0 0
c2t2d0ONLINE   0 0 0
c2t3d0ONLINE   0 0 0
c2t4d0ONLINE   0 0 0
c2t5d0ONLINE   0 0 0
c2t6d0ONLINE   0 0 0
c2t7d0ONLINE   0 0 0

For which list reports:

# zpool list backup
NAME SIZE   USED  AVAILCAP  HEALTH  ALTROOT
backup  13.5T   434K  13.5T 0%  ONLINE  -


Yet df and zfs list shows something else:

[EMAIL PROTECTED]:~# zfs list
NAME USED  AVAIL  REFER  MOUNTPOINT
backup   262K  11.5T  1.78K  none
backup/files32.0K  11.5T  32.0K  /export/files
...

# df -h
Filesystem size   used  avail capacity  Mounted on
...
backup/files11T32K11T 1%/export/files


Why does AVAIL differ for such a large amount?

(NexentaOS_20080131)

-- 

Jure Pečar
http://jure.pecar.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] status of zfs boot netinstall kit

2008-02-06 Thread Roman Morokutti
Hi,

I would like to continue this (maybe a bit outdated) thread with
the question:

   1. How to create a netinstall image?
   2. How to write the netinstall image back as an ISO9660 on DVD?
   (after patching it for the zfsboot)

Roman
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-06 Thread William Fretts-Saxton
Hi Marc,

# cat /etc/release
Solaris 10 8/07 s10x_u4wos_12b X86

I don't know if my application uses synchronous I/O transactions...I'm using 
Sun's Glassfish v2u1.

I've deleted the ZFS partition and have setup an SVM stripe/mirror just to see 
if "ZFS" is getting in the way.  I"ll try out the prefetching idea when I'm 
done with the SVM testing.

Thanks.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool status -x strangeness on b78

2008-02-06 Thread Ben Miller
We run a cron job that does a 'zpool status -x' to check for any degraded 
pools.  We just happened to find a pool degraded this morning by running 'zpool 
status' by hand and were surprised that it was degraded as we didn't get a 
notice from the cron job.

# uname -srvp
SunOS 5.11 snv_78 i386

# zpool status -x
all pools are healthy

# zpool status pool1
  pool: pool1
 state: DEGRADED
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
pool1DEGRADED 0 0 0
  raidz1 DEGRADED 0 0 0
c1t8d0   REMOVED  0 0 0
c1t9d0   ONLINE   0 0 0
c1t10d0  ONLINE   0 0 0
c1t11d0  ONLINE   0 0 0

errors: No known data errors

I'm going to look into it now why the disk is listed as removed.

Does this look like a bug with 'zpool status -x'?

Ben
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mounting a copy of a zfs pool /file system while orginal is still active

2008-02-06 Thread eric kustarz
>
> While browsing the ZFS source code, I noticed that "usr/src/cmd/ 
> ztest/ztest.c", includes ztest_spa_rename(), a ZFS test which  
> renames a ZFS storage pool to a different name, tests the pool  
> under its new name, and then renames it back. I wonder why this  
> functionality was not exposed as part of zpool support?
>

See 6280547 want to rename pools.

Just hasn't been hight on the priority list.

eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS number of file systems scalability

2008-02-06 Thread Shawn Ferry

There is a write up of similar findings and more information about  
sharemgr
http://developers.sun.com/solaris/articles/nfs_zfs.html

Unfortunately I don't see anything that says those changes will be in  
u5.

Shawn

On Feb 5, 2008, at 8:21 PM, Paul B. Henson wrote:

>
> I was curious to see about how many filesystems one server could
> practically serve via NFS, and did a little empirical testing.
>
> Using an x4100M2 server running S10U4x86, I created a pool from a  
> slice of
> the hardware raid array created from the two internal hard disks,  
> and set
> sharenfs=on for the pool.
>
> I then created filesystems, 1000 at a time, and timed how long it  
> took to
> create each thousand filesystems, to set sharenfs=off for all  
> filesystems
> created so far, and to set sharenfs=on again for all filesystems. I
> understand sharetab optimization is one of the features in the latest
> OpenSolaris, so just for fun I tried symlinking /etc/dfs/sharetab to  
> a mfs
> file system to see if it made any difference. I also timed a  
> complete boot
> cycle (from typing 'init 6' until the server was again remotely  
> available)
> at 5000 and 10,000 filesystems.
>
> Interestingly, filesystem creation itself scaled reasonably well. I
> recently read a thread where someone was complaining it took over  
> eight
> minutes to create a filesystem at the 10,000 filesystem count. In my  
> tests,
> while the first 1000 filesystems averaged only a little more than  
> half a
> second each to create, filesystems 9000-1 only took roughly  
> twice that,
> averaging about 1.2 seconds each to create.
>
> Unsharing scalability wasn't as good, time requirements increasing  
> by a
> factor of six. Having sharetab in mfs made a slight difference, but  
> nothing
> outstanding. Sharing (unsurprisingly) was the least scalable,  
> increasing by
> a factor of eight.
>
> Boot-wise, the system took about 10.5 minutes to reboot at 5000
> filesystems. This increased to about 35 minutes at the 10,000 file  
> system
> counts.
>
> Based on these numbers, I don't think I'd want to run more than 5-7
> thousand filesystems per server to avoid extended outages. Given our  
> user
> count, that will probably be 6-10 servers 8-/. I suppose we could  
> have a
> large number of smaller servers rather than a small number of beefier
> servers; although that seems less than efficient. It's too bad  
> there's no
> way to fast track backporting of openSolaris improvements to  
> production
> Solaris, from what I've heard there will be virtually no ZFS  
> improvements
> in S10U5 :(.
>
> Here are the raw numbers for anyone interested. The first column is  
> number
> of file systems. The second column is total and average time in  
> seconds to
> create that block of filesystems (eg, the first 1000 took 589  
> seconds to
> create, the second 1000 took 709 seconds). The third column is the  
> time in
> seconds to turn off NFS sharing for all filesystems created so far  
> (eg, 14
> seconds for 1000 filesystems, 38 seconds for 2000 filesystems). The  
> fourth
> is the same operation with sharetab in a memory filesystem (I  
> stopped this
> measurement after 7000 because sharing was starting to take so  
> long). The
> final column is how long it took to turn on NFS sharing for all  
> filesystems
> created so far.
>
>
> #FS create/avgoff/avg  off(mfs)/avg  on/avg
> 1000 589/.59  14/.01 9/.01   32/.03
> 2000 709/.71  38/.02 25/.01  107/.05
> 3000 783/.78  70/.02 50/.02  226/.08
> 4000 836/.84  112/.0383/.02  388/.10
> 5000 968/.97  178/.04124/.02 590/.12
> 6000 930/.93 245/.04 172/.03 861/.14
> 7000 961/.96 319/.05 229/.03 1172/.17
> 8000 1045/1.05   405/.05 1515/.19
> 9000 1098/1.10   500/.06 1902/.21
> 11165/1.17   599/.06 2348/.23
>
>
> -- 
> Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/ 
> ~henson/
> Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
> California State Polytechnic University  |  Pomona CA 91768
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Shawn Ferry  shawn.ferry at sun.com
Senior Primary Systems Engineer
Sun Managed Operations
571.291.4898





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Did MDB Functionality Change?

2008-02-06 Thread spencer
On Solaris 10 u3 (11/06) I can execute the following:

bash-3.00# mdb -k
Loading modules: [ unix krtld genunix specfs dtrace ufs sd pcipsy ip sctp usba 
nca md zfs random ipc nfs crypto cpc fctl fcip logindmux ptm sppp ]
> arc::print
{
anon = ARC_anon
mru = ARC_mru
mru_ghost = ARC_mru_ghost
mfu = ARC_mfu
mfu_ghost = ARC_mfu_ghost
size = 0x6b800
p = 0x3f83f80
c = 0x7f07f00
c_min = 0x7f07f00
c_max = 0xbe8be800
hits = 0x30291
misses = 0x4f
deleted = 0xe
skipped = 0
hash_elements = 0x3a
hash_elements_max = 0x3a
hash_collisions = 0x3
hash_chains = 0x1
hash_chain_max = 0x1
no_grow = 0
}

However, when I execute the same command on Solaris 10 u4 (8/07) I receive the 
following error:

bash-3.00# mdb -k
Loading modules: [ unix krtld genunix specfs dtrace ufs ssd fcp fctl qlc pcisch 
md ip hook neti sctp arp usba nca lofs logindmux ptm cpc fcip sppp random sd 
crypto zfs ipc nfs ]
> arc::print
mdb: failed to dereference symbol: unknown symbol name

In addition, u3 doesn't recognize "::arc" where u4 does.
u3 displays memory locations with "arc::print -a" where "::arc -a" doesn't work 
for u4.

I posted this into the zfs discussion forum, because this limited u4 
functionality prevents you from dynamically changing the ARC in ZFS by trying 
the ZFS Tuning instructions.


Spencer
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-06 Thread William Fretts-Saxton
I disabled file prefetch and there was no effect.

Here are some performance numbers.  Note that, when the application server used 
a ZFS file system to save its data, the transaction took TWICE as long.  For 
some reason, though, iostat is showing 5x as much disk writing (to the physical 
disks) on the ZFS partition.  Can anyone see a problem here?

-
Average application server client response time (1st run/2nd run):

SVM - 12/18 seconds
ZFS - 35/38 seconds

SVM Performance
---
# iostat -xnz 5
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  195.1  414.3 1465.9 1657.3  0.0  1.70.02.7   0  98 md/d100
   97.5  414.3  730.2 1657.3  0.0  1.00.01.9   0  74 md/d101
   97.7  414.1  735.8 1656.5  0.0  0.80.01.5   0  59 md/d102
   54.4  203.6  370.7  814.2  0.0  0.50.02.1   0  42 c0t2d0
   52.8  210.6  359.5  842.2  0.0  0.50.01.9   0  40 c0t3d0
   54.0  203.6  374.7  814.2  0.0  0.30.01.2   0  26 c0t4d0
   52.2  210.6  361.1  842.2  0.0  0.50.01.8   0  38 c0t5d0

ZFS Performance
---
# iostat -xnz 5
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   23.2  148.8 1496.7 3806.8  0.0  2.50.0   14.7   0  21 c0t2d0
   22.8  148.8 1470.9 3806.8  0.0  2.40.0   13.9   0  22 c0t3d0
   24.2  149.0 1561.1 3805.0  0.0  1.50.08.6   0  18 c0t4d0
   23.4  149.4 1509.6 3805.0  0.0  2.50.0   14.7   0  25 c0t5d0

# zpool iostat 5
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
pool1   5.69G   266G 12243   775K  7.20M
pool1   5.69G   266G 88232  5.53M  7.12M
pool1   5.69G   266G 78216  4.87M  6.81M
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration for a thumper

2008-02-06 Thread eric kustarz

On Feb 4, 2008, at 5:10 PM, Marion Hakanson wrote:

> [EMAIL PROTECTED] said:
>> FYI, you can use the '-c' option to compare results from various  
>> runs   and
>> have one single report to look at.
>
> That's a handy feature.  I've added a couple of such comparisons:
>   http://acc.ohsu.edu/~hakansom/thumper_bench.html
>
> Marion
>
>

Your finding for random reads with or without NCQ match my findings:
http://blogs.sun.com/erickustarz/entry/ncq_performance_analysis

Disabling NCQ looks like a very tiny win for the multi-stream read  
case.  I found a much bigger win, but i was doing RAID-0 instead of  
RAID-Z.

eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] available space?

2008-02-06 Thread Richard Elling
Jure Pečar wrote:
> Maybe a basic zfs question ...
>
> I have a pool:
>
> # zpool status backup
>   pool: backup
>  state: ONLINE
>  scrub: none requested
> config:
>
> NAME  STATE READ WRITE CKSUM
> backupONLINE   0 0 0
>   mirror  ONLINE   0 0 0
> c1t0d0s1  ONLINE   0 0 0
> c2t0d0s1  ONLINE   0 0 0
>   raidz2  ONLINE   0 0 0
> c1t1d0ONLINE   0 0 0
> c1t2d0ONLINE   0 0 0
> c1t3d0ONLINE   0 0 0
> c1t4d0ONLINE   0 0 0
> c1t5d0ONLINE   0 0 0
> c1t6d0ONLINE   0 0 0
> c1t7d0ONLINE   0 0 0
> c2t1d0ONLINE   0 0 0
> c2t2d0ONLINE   0 0 0
> c2t3d0ONLINE   0 0 0
> c2t4d0ONLINE   0 0 0
> c2t5d0ONLINE   0 0 0
> c2t6d0ONLINE   0 0 0
> c2t7d0ONLINE   0 0 0
>
> For which list reports:
>
> # zpool list backup
> NAME SIZE   USED  AVAILCAP  HEALTH  ALTROOT
> backup  13.5T   434K  13.5T 0%  ONLINE  -
>
>
> Yet df and zfs list shows something else:
>
> [EMAIL PROTECTED]:~# zfs list
> NAME USED  AVAIL  REFER  MOUNTPOINT
> backup   262K  11.5T  1.78K  none
> backup/files32.0K  11.5T  32.0K  /export/files
> ...
>
> # df -h
> Filesystem size   used  avail capacity  Mounted on
> ...
> backup/files11T32K11T 1%/export/files
>
>
> Why does AVAIL differ for such a large amount?
>   

They represent two different things. See the man pages for
zpool and zfs for a description of their meanings.
 -- richard


> (NexentaOS_20080131)
>
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs send / receive between different opensolaris versions?

2008-02-06 Thread Michael Hale
Hello everybody,

I'm thinking of building out a second machine as a backup for our mail  
spool where I push out regular filesystem snapshots, something like a  
warm/hot spare situation.

Our mail spool is currently running snv_67 and the new machine would  
probably be running whatever the latest opensolaris version is (snv_77  
or later).

My first question is whether or not zfs send / receive is portable  
between differing releases of opensolaris.  My second question (kind  
of off topic for this list) is that I was wondering the difficulty  
involved in upgrading snv_67 to a later version of opensolaris given  
that we're running a zfs root boot configuration
--
Michael Hale<[EMAIL 
PROTECTED] 
 >
Manager of Engineering Support  Enterprise Engineering 
Group
Transcom Enhanced Services  
http://www.transcomus.com





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-06 Thread Will Murnane
On Feb 6, 2008 6:36 PM, William Fretts-Saxton
<[EMAIL PROTECTED]> wrote:
> Here are some performance numbers.  Note that, when the
> application server used a ZFS file system to save its data, the
> transaction took TWICE as long.  For some reason, though, iostat is
> showing 5x as much disk writing (to the physical disks) on the ZFS
> partition.  Can anyone see a problem here?
What is the disk layout of the zpool in question?  Striped?  Mirrored?
 Raidz?  I would suggest either a simple stripe or striping+mirroring
as the best-performing layout.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-06 Thread William Fretts-Saxton
It is a striped/mirror:

 # zpool status
NAMESTATE READ WRITE CKSUM
pool1   ONLINE   0 0 0
  mirrorONLINE   0 0 0
c0t2d0  ONLINE   0 0 0
c0t3d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c0t4d0  ONLINE   0 0 0
c0t5d0  ONLINE   0 0 0
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] scrub halts

2008-02-06 Thread Lida Horn
I now have a improved sata and marvell88sx driver modules that
deal with various error conditions in a much more solid way.
Changes include reducing the number of required device resets,
properly reporting media errors (rather than "no additional sense"),
clearing aborted packets more rapidly so that after an hardware error
progress is again made much more quickly.  Further the driver is
much quieter (far fewer messages in /var/adm/messages).

If there is still interest, I can make those binaries available for testing
prior to their availability in Solaris Nevada (OpenSolaris).  These changes
will be checked in soon, but the process always inserts a significant delay, so
if anyone would like, please e-mail me and I will make those binaries
available via e-mail.

Regards,
Lida
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-06 Thread Vincent Fox
Solaris 10u4 eh?

Sounds a lot like fsync issues we want into, trying to run Cyrus mail-server 
spools in ZFS.

This was highlighted for us by the filebench software varmail test.

OpenSolaris nv78 however worked very well.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-06 Thread Marc Bevand
William Fretts-Saxton  sun.com> writes:
> 
> I disabled file prefetch and there was no effect.
> 
> Here are some performance numbers.  Note that, when the application server
> used a ZFS file system to save its data, the transaction took TWICE as long.
> For some reason, though, iostat is showing 5x as much disk
> writing (to the physical disks) on the ZFS partition.  Can anyone see a
> problem here?

Possible explanation: the Glassfish applications are using synchronous
writes, causing the ZIL (ZFS Intent Log) to be intensively used, which
leads to a lot of extra I/O. Try to disable it:

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29

Since disabling it is not recommended, if you find out it is the cause of your
perf problems, you should instead try to use a SLOG (separate intent log, see
above link). Unfortunately your OS version (Solaris 10 8/07) doesn't support
SLOGs, they have only been added to OpenSolaris build snv_68:

http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on

-marc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] MySQL, Lustre and ZFS

2008-02-06 Thread kilamanjaro
Hi all, Any thoughts on if and when ZFS, MySQL, and Lustre 1.8 (and  
beyond) will work together and be supported so by Sun?

- Network Systems Architect
   Advanced Digital Systems Internet 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-06 Thread Neil Perrin
Marc Bevand wrote:
> William Fretts-Saxton  sun.com> writes:
>   
>> I disabled file prefetch and there was no effect.
>>
>> Here are some performance numbers.  Note that, when the application server
>> used a ZFS file system to save its data, the transaction took TWICE as long.
>> For some reason, though, iostat is showing 5x as much disk
>> writing (to the physical disks) on the ZFS partition.  Can anyone see a
>> problem here?
>> 
>
> Possible explanation: the Glassfish applications are using synchronous
> writes, causing the ZIL (ZFS Intent Log) to be intensively used, which
> leads to a lot of extra I/O.

The ZIL doesn't do a lot of extra IO. It usually just does one write per 
synchronous request and will batch
up multiple writes into the same log block if possible. However, it does 
need to wait for the
writes to be on stable storage before returning to the application, 
which is what the application has
requested. It does this by waiting for the write to complete and then 
flushing the disk write cache.
If the write cache is battery backed for all zpool devices then the 
global zfs_nocacheflush can be set
to give dramatically better performance.
>  Try to disable it:
>
> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29
>
> Since disabling it is not recommended, if you find out it is the cause of your
> perf problems, you should instead try to use a SLOG (separate intent log, see
> above link). Unfortunately your OS version (Solaris 10 8/07) doesn't support
> SLOGs, they have only been added to OpenSolaris build snv_68:
>
> http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on
>
> -marc
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration for a thumper

2008-02-06 Thread Marion Hakanson
[EMAIL PROTECTED] said:
> Your finding for random reads with or without NCQ match my findings: http://
> blogs.sun.com/erickustarz/entry/ncq_performance_analysis
> 
> Disabling NCQ looks like a very tiny win for the multi-stream read   case.  I
> found a much bigger win, but i was doing RAID-0 instead of   RAID-Z. 

I didn't set out to do the with/without NCQ comparisons.  Rather, my
first runs of filebench and bonnie++ triggered a number of I/O errors
and controller timeout/resets on several different drives, so I disabled
NCQ based on bug 6587133's workaround suggestion.  No more errors
during subsequent testing, so we're running with NCQ disabled until
a patch comes along.

It was useful, however, to see what effect disabling NCQ had.  I find
filebench easier to use than bonnie++, mostly because filebench is
automatically multithreaded, which is necessary to generate a heavy
enough workload to exercise anything more than a few drives (esp.
on machines like T2000's).  The HTML output doesn't hurt, either.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-06 Thread Marion Hakanson
[EMAIL PROTECTED] said:
> Here are some performance numbers.  Note that, when the application server
> used a ZFS file system to save its data, the transaction took TWICE as long.
> For some reason, though, iostat is showing 5x as much disk writing (to the
> physical disks) on the ZFS partition.  Can anyone see a problem here? 

I'm not familiar with the application in use here, but your iostat numbers
remind me of something I saw during "small overwrite" tests on ZFS.  Even
though the test was doing only writing, because it was writing over only a
small part of existing blocks, ZFS had to read (the unchanged part of) each
old block in before writing out the changed block to a new location (COW).

This is a case where you want to set the ZFS recordsize to match your
application's typical write size, in order to avoid the read overhead
inherent in partial-block updates.  UFS by default has a smaller max
blocksize than ZFS' default 128k, so in addition to the ZIL/fsync issue
UFS will also suffer less overhead from such partial-block updates.

Again, this may not be what's going on, but it's worth checking if you
haven't already done so.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-06 Thread Marc Bevand
Neil Perrin  Sun.COM> writes:
> 
> The ZIL doesn't do a lot of extra IO. It usually just does one write per 
> synchronous request and will batch up multiple writes into the same log
> block if possible.

Ok. I was wrong then. Well, William, I think Marion Hakanson has the
most plausible explanation. As he suggests, experiment with "zfs set
recordsize=XXX" to force the filesystem to use small records. See
the zfs(1) manpage.

-marc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS taking up to 80 seconds to flush a single 8KB O_SYNC block.

2008-02-06 Thread Nathan Kroenert
Hey all -

I'm working on an interesting issue where I'm seeing ZFS being quite 
cranky about writing O_SYNC written blocks.

Bottom line is that I have a small test case that does essentially this:

open file for writing  -- O_SYNC
loop(
write() 8KB of random data
print time taken to write data
}

It's taking anywhere up to 80 seconds per 8KB block. When the 'problem' 
is not in evidence, (and it's not always happening), I can do around 
1200 O_SYNC writes per second...

It seems to be waiting here virtually all of the time:

 > 0t11021::pid2proc | ::print proc_t p_tlist|::findstack -v
stack pointer for thread 30171352960: 2a118052df1
[ 02a118052df1 cv_wait+0x38() ]
   02a118052ea1 zil_commit+0x44(1, 6b50516, 193, 60005db66bc, 6b50570,
   60005db6640)
   02a118052f51 zfs_write+0x554(0, 14000, 2a1180539e8, 6000af22840, 
2000,
   2a1180539d8)
   02a118053071 fop_write+0x20(304898cd100, 2a1180539d8, 10, 
300a27a9e48, 0,
   7b7462d0)
   02a118053121 write+0x268(4, 8058, 60051a3d738, 2000, 113, 1)
   02a118053221 dtrace_systrace_syscall32+0xac(4, ffbfdaf0, 2000, 21e80,
   ff3a00c0, ff3a0100)
   02a1180532e1 syscall_trap32+0xcc(4, ffbfdaf0, 2000, 21e80, ff3a00c0,
   ff3a0100)

And this also evident in a dtrace of it, following the write in...

<...>
  28-> zil_commit
  28  -> cv_wait
  28-> thread_lock
  28<- thread_lock
  28-> cv_block
  28  -> ts_sleep
  28  <- ts_sleep
  28  -> new_mstate
  28-> cpu_update_pct
  28  -> cpu_grow
  28-> cpu_decay
  28  -> exp_x
  28  <- exp_x
  28<- cpu_decay
  28  <- cpu_grow
  28<- cpu_update_pct
  28  <- new_mstate
  28  -> disp_lock_enter_high
  28  <- disp_lock_enter_high
  28  -> disp_lock_exit_high
  28  <- disp_lock_exit_high
  28<- cv_block
  28-> sleepq_insert
  28<- sleepq_insert
  28-> disp_lock_exit_nopreempt
  28<- disp_lock_exit_nopreempt
  28-> swtch
  28  -> disp
  28-> disp_lock_enter
  28<- disp_lock_enter
  28-> disp_lock_exit
  28<- disp_lock_exit
  28-> disp_getwork
  28<- disp_getwork
  28-> restore_mstate
  28<- restore_mstate
  28  <- disp
  28  -> pg_cmt_load
  28  <- pg_cmt_load
  28<- swtch
  28-> resume
  28  -> savectx
  28-> schedctl_save
  28<- schedctl_save
  28  <- savectx
<...>

At this point, it waits for up to 80 seconds.

I'm also seeing zil_commit() being called around 7-15 times per second.

For kicks, I disabled the ZIL: zil_disable/W0t1, and that made not a 
pinch of difference. :)

For what it's worth, this is a T2000, running Oracle, connected to an 
HDS 9990 (using 2GB fibre), with 8KB record sizes for the oracle 
filesystems, and I'm only seeing the issue on the ZFS filesystems that 
have the active oracle tables on them.

The O_SYNC test case is just trying to help me understand what's 
happening. The *real* problem is that oracle is running like rubbish 
when it's trying to roll forward archive logs from another server. It's 
an almost 100% write workload. At the moment, it cannot even keep up 
with the other server's log creation rate, and it's barely doing 
anything. (The other box is quite different, so not really valid for 
direct comparison at this point).

6513020 looked interesting for a while, but I already have 120011-14 and 
127111-03 and installed.

I'm looking into the cache flush settings of the 9990 array to see if 
it's that killing me, but I'm also looking for any other ideas on what 
might be hurting me.

I also have set
zfs:zfs_nocacheflush = 1
in /etc/system

The Oracle Logs are on a separate Zpool and I'm not seeing the issue on 
those filesystems.

The lockstats I have run are not yet all that interesting. If anyone has 
ideas on specific incantations I should use or some specific D or 
anything else, I'd be most appreciative.

Cheers!

Nathan.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss