Re: [zfs-discuss] ZFS Panic

2009-04-09 Thread Remco Lengers

Grant,

Didn't see a response so I'll give it a go.

Ripping a disk away and silently inserting a new one is asking for 
trouble imho. I am not sure what you were trying to accomplish but 
generally replace a drive/lun would entail commands like


 zpool offline tank c1t3d0
cfgadm | grep c1t3d0
sata1/3::dsk/c1t3d0disk connectedconfigured   ok
# cfgadm -c unconfigure sata1/3
Unconfigure the device at: /devices/p...@0,0/pci1022,7...@2/pci11ab,1...@1:3
This operation will suspend activity on the SATA device
Continue (yes/no)? yes
# cfgadm | grep sata1/3
sata1/3disk connectedunconfigured ok

# cfgadm -c configure sata1/3

Taken from this page:

http://docs.sun.com/app/docs/doc/819-5461/gbbzy?a=view

..Remco

Grant Lowe wrote:

Hi All,

Don't know if this is worth reporting, as it's human error.  Anyway, I had a 
panic on my zfs box.  Here's the error:

marksburg /usr2/glowe> grep panic /var/log/syslog
Apr  8 06:57:17 marksburg savecore: [ID 570001 auth.error] reboot after panic: 
assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, 
&numbufs, &dbp), file: ../../common/fs/zfs/dmu.c, line: 580
Apr  8 07:15:10 marksburg savecore: [ID 570001 auth.error] reboot after panic: 
assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, 
&numbufs, &dbp), file: ../../common/fs/zfs/dmu.c, line: 580
marksburg /usr2/glowe>

What we did to cause this is we pulled a LUN from zfs, and replaced it with a 
new LUN.  We then tried to shutdown the box, but it wouldn't go down.  We had 
to send a break to the box and reboot.  This is an oracle sandbox, so we're not 
really concerned.  Ideas?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs and nfs

2009-04-09 Thread OpenSolaris Forums
> I'm using Solaris 10 (10/08). This feature is what
> exactly i want. thank for response.


Duh. What I meant previously was that this feature
is not available in the Solaris 10 releases.

Cindy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool import error! - Help Needed

2009-04-09 Thread OpenSolaris Forums
H have a similar problem:

r...@moby1:~# zpool import
  pool: bucket
id: 12835839477558970577
 state: UNAVAIL
action: The pool cannot be imported due to damaged devices or data.
config:

bucket  UNAVAIL  insufficient replicas
  raidz2UNAVAIL  corrupted data
c3t0d0  ONLINE
c3t1d0  ONLINE
c4t0d0  ONLINE
c4t1d0  ONLINE
c4t2d0  ONLINE
c4t3d0  ONLINE

How is this possible?

This is with osol b108.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data size grew.. with compression on

2009-04-09 Thread OpenSolaris Forums
if you rsync data to zfs over existing files, you need to take something more 
into account:

if you have a snapshot of your files and rsync the same files again, you need 
to use "--inplace" rsync option , otherwise completely new blocks will be 
allocated for the new files. that`s because rsync will write entirely new file 
and rename it over the old one.

not sure if this applies here, but i think it`s worth mentioning and not 
obvious.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data size grew.. with compression on

2009-04-09 Thread Harry Putnam
Jeff Bonwick  writes:

>> > Yes, I made note of that in my OP on this thread.  But is it enough to
>> > end up with 8gb of non-compressed files measuring 8gb on
>> > reiserfs(linux) and the same data showing nearly 9gb when copied to a
>> > zfs filesystem with compression on.  
>> 
>> whoops.. a hefty exaggeration it only shows about 16mb difference.
>> But still since zfs side is compressed, that seems like quite a lot..
>
> That's because ZFS reports *all* space consumed by a file, including
> all metadata (dnodes, indirect blocks, etc).  For an 8G file stored
> in 128K blocks, there are 8G / 128K = 64K block pointers, each of
> which is 128 bytes, and is two-way replicated (via ditto blocks),
> for a total of 64K * 128 * 2 = 16M.  So this is exactly as expected.

All good info thanks.  Still one thing doesn't quite work in your line
of reasoning.   The data on the gentoo linux end is uncompressed.
Whereas it is compressed on the zfs side.

A number of the files are themselves compressed formats such as jpg
mpg avi pdf maybe a few more, which aren't going to compress further
to speak of, but thousands of the files are text files (html).  So
compression should show some downsize.

Your calculation appears to be based on both ends being uncompressed.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs as a cache server

2009-04-09 Thread Francois

Hello list,

What would be the best zpool configuration for a cache/proxy server 
(probably based on squid) ?


In other words with which zpool configuration I could expect best 
reading performance ? (there'll be some writes too but much less).



Thanks.

--
Francois

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs as a cache server

2009-04-09 Thread Greg Mason

Francois,

Your best bet is probably a stripe of mirrors. i.e. a zpool made of many 
mirrors.


This way you have redundancy, and fast reads as well. You'll also enjoy 
pretty quick resilvering in the event of a disk failure as well.


For even faster reads, you can add dedicated L2ARC cache devices (folks 
typically use SSDs for very fast (15k RPM) SAS drives for this).


-Greg

Francois wrote:

Hello list,

What would be the best zpool configuration for a cache/proxy server
(probably based on squid) ?

In other words with which zpool configuration I could expect best
reading performance ? (there'll be some writes too but much less).


Thanks.

--
Francois

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data size grew.. with compression on

2009-04-09 Thread Jonathan
OpenSolaris Forums wrote:
> if you rsync data to zfs over existing files, you need to take
> something more into account:
> 
> if you have a snapshot of your files and rsync the same files again,
> you need to use "--inplace" rsync option , otherwise completely new
> blocks will be allocated for the new files. that`s because rsync will
> write entirely new file and rename it over the old one.

ZFS will allocate new blocks either way, check here
http://all-unix.blogspot.com/2007/03/zfs-cow-and-relate-features.html
for more information about how Copy-On-Write works.

Jonathan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Efficient backup of ZFS filesystems?

2009-04-09 Thread Henk Langeveld

Gary Mills wrote:

I've been watching the ZFS ARC cache on our IMAP server while the
backups are running, and also when user activity is high.  The two
seem to conflict.  Fast response for users seems to depend on their
data being in the cache when it's needed.  Most of the disk I/O seems
to be writes in this situation.  However, the backup needs to stat
all files and read many of them.  I'm assuming that all of this
information is also added to the ARC cache, even though it may never
be needed again.  It must also evict user data from the cache, causing
it to be reloaded every time it's needed.

We use Networker for backups now.  Is there some way to configure ZFS
so that backups don't churn the cache?  Is there a different way to
perform backups to avoid this problem?  We do keep two weeks of daily
ZFS snapshots to use for restores of recently-lost data.  We still
need something for longer-term backups.



Hi Gary,

Find out whether you have a problem first.  If not, don't worry, but
read one.  If you do have a problem, add memory or an L2ARC device.

The ARC was designed to mitigate the effect of any single burst of
sequential I/O, but the size of the cache dedicated to more Frequently
used pages (the current working set) will still be reduced, depending
on the amount of activity on either side of the cache.

As the ARC maintains a shadow list of recently evicted pages from both
sides of the cache, such pages that are accessed again will then return
to the 'Frequent' side of the cache.

There will be continuous competition between 'Recent' and 'Frequent'
sides of the ARC (and for convenience, I'm glossing over the existence
of 'Locked' pages).

Several reasons might cause pathological behaviour - a backup process
might access the same metadata multiple times, causing that data to
be promoted to 'Frequent', flushing out application related data.
(ZFS does not differentiate between data and metadata for resource
 allocation, they all use the same I/O mechanism and cache.)

On the other hand, you might just not have sufficient memory to keep
most of your metadata in the cache, or the backup process is just too
aggressive.   Adding memory or an L2cache might help.



Cheers,

Henk




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS stripe over EMC write performance.

2009-04-09 Thread Yuri Elson
What is the best write performance improvement anyone has seen (if any) 
on a ZFS stripe over EMC SAN?
I'd be interested to hear results for both - striped and non-striped EMC 
config.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs as a cache server

2009-04-09 Thread Jean-Noël Mattern

Hi François,

You should take care of the recordsize in your filesystems. This should 
be tuned according to the size of the most accessed files.
Maybe disabling the "atime" is also good idea (but it's probably 
something you already know ;) ).
We've also noticed some cases where enabling compression gave better I/O 
results (but don't use gzip), but this should be done only if your 
machine is exclusively running the proxy server.


About the topology of your pool, in a performance matter, prefer some 
striped mirrors if you can afford it, or raidz if not !


HTH,

Jnm.

--


Francois a écrit :

Hello list,

What would be the best zpool configuration for a cache/proxy server 
(probably based on squid) ?


In other words with which zpool configuration I could expect best 
reading performance ? (there'll be some writes too but much less).



Thanks.

--
Francois

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data size grew.. with compression on

2009-04-09 Thread Daniel Rock

Jonathan schrieb:

OpenSolaris Forums wrote:

if you have a snapshot of your files and rsync the same files again,
you need to use "--inplace" rsync option , otherwise completely new
blocks will be allocated for the new files. that`s because rsync will
write entirely new file and rename it over the old one.


ZFS will allocate new blocks either way


No it won't. --inplace doesn't rewrite blocks identical on source and 
target but only blocks which have been changed.


I use rsync to synchronize a directory with a few large files (each up 
to 32 GB). Data normally gets appended to one file until it reaches the 
size limit of 32 GB. Before I used --inplace a snapshot needed on 
average ~16 GB. Now with --inplace it is just a few kBytes.



Daniel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data size grew.. with compression on

2009-04-09 Thread Jonathan
Daniel Rock wrote:
> Jonathan schrieb:
>> OpenSolaris Forums wrote:
>>> if you have a snapshot of your files and rsync the same files again,
>>> you need to use "--inplace" rsync option , otherwise completely new
>>> blocks will be allocated for the new files. that`s because rsync will
>>> write entirely new file and rename it over the old one.
>>
>> ZFS will allocate new blocks either way
> 
> No it won't. --inplace doesn't rewrite blocks identical on source and
> target but only blocks which have been changed.
> 
> I use rsync to synchronize a directory with a few large files (each up
> to 32 GB). Data normally gets appended to one file until it reaches the
> size limit of 32 GB. Before I used --inplace a snapshot needed on
> average ~16 GB. Now with --inplace it is just a few kBytes.

It appears I may have misread the initial post.  I don't really know how
I misread it, but I think I missed the snapshot portion of the message
and got confused.  I understand the interaction between snapshots,
rsync, and --inplace being discussed now.

My apologies,
Jonathan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data size grew.. with compression on

2009-04-09 Thread Greg Mason

Harry,

ZFS will only compress data if it is able to gain more than 12% of space 
by compressing the data (I may be wrong on the exact percentage). If ZFS 
can't get get that 12% compression at least, it doesn't bother and will 
just store the block uncompressed.


Also, the default ZFS compression algorithm isn't gzip, so you aren't 
going to get the greatest compression possible, but it is quite fast.


Depending on the type of data, it may not compress well at all, leading 
ZFS to store that data completely uncompressed.


-Greg



All good info thanks.  Still one thing doesn't quite work in your line
of reasoning.   The data on the gentoo linux end is uncompressed.
Whereas it is compressed on the zfs side.

A number of the files are themselves compressed formats such as jpg
mpg avi pdf maybe a few more, which aren't going to compress further
to speak of, but thousands of the files are text files (html).  So
compression should show some downsize.

Your calculation appears to be based on both ends being uncompressed.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data size grew.. with compression on

2009-04-09 Thread reader
Greg Mason  writes:

> Harry,
>
> ZFS will only compress data if it is able to gain more than 12% of
> space by compressing the data (I may be wrong on the exact
> percentage). If ZFS can't get get that 12% compression at least, it
> doesn't bother and will just store the block uncompressed.
>
> Also, the default ZFS compression algorithm isn't gzip, so you aren't
> going to get the greatest compression possible, but it is quite fast.
>
> Depending on the type of data, it may not compress well at all,
> leading ZFS to store that data completely uncompressed.

Thanks for another little addition to my knowledge of zfs.  Good stuff
to know.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data size grew.. with compression on

2009-04-09 Thread reader
OpenSolaris Forums  writes:

> if you rsync data to zfs over existing files, you need to take
> something more into account:
>
> if you have a snapshot of your files and rsync the same files again,
> you need to use "--inplace" rsync option , otherwise completely new
> blocks will be allocated for the new files. that`s because rsync
> will write entirely new file and rename it over the old one.
>
> not sure if this applies here, but i think it`s worth mentioning and
> not obvious.

In the particular case it didn't apply as it was a first time run but
good to know what happens with rsync. 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data size grew.. with compression on

2009-04-09 Thread reader
Jonathan  writes:

> It appears I may have misread the initial post.  I don't really know how
> I misread it, but I think I missed the snapshot portion of the message
> and got confused.  I understand the interaction between snapshots,
> rsync, and --inplace being discussed now.

I don't think you did misread it. The initial post had nothing to do
with snapshots.  It had only to do with a single run of rsync from a
linux box to an zfs filesystem and noticing the data had grown even
though the zfs filesystem has compression turned on.

I'm not sure how snapthosts crept in here.. but I'm interested to know
more about the interaction with rsync in the case of snapshots.

It was a post authored by Opensolaris Forums:
Message-ID: <1811927823.191239282659293.javamail.tweb...@sf-app2>
That first mentioned snapshots.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs as a cache server

2009-04-09 Thread Scott Lawson

Hi Francois,

I use ZFS with Squid proxies here at MIT.  (MIT New Zealand that is ;))

My basic set up is like so.

- 2 x Sun SPARC v240's  dual CPU's with 2 x 36 GB boot disks and 2 x 73 
GB cache disks. Each machine has 4GB RAM.

- Each has a copy of squid,  Squidguard  and an apache server.
- Apache server, serves .pac files for client machines and each .pac 
file binds you to that proxy.
- Clients request a .pac from round robin DNS "proxy.manukau.ac.nz" 
which then gives you the real

system name of one of these two proxies.

Boot disks are mirrored using disksuite and cache and log file systems 
are ZFS. My cache pool is just a mirrored
pool which is then split into three file systems. Cache volume is 
restricted to 30 GB in squid config. Max cache object size

is 2MB. Internet bandwidth available to these machines is ~15Mbit/s.

[r...@x /]#> zpool status
 pool: proxpool
state: ONLINE
scrub: none requested
config:

   NAMESTATE READ WRITE CKSUM
   proxpoolONLINE   0 0 0
 mirrorONLINE   0 0 0
   c1t2d0  ONLINE   0 0 0
   c1t3d0  ONLINE   0 0 0

errors: No known data errors

[r...@x /]#> zfs list
NAMEUSED  AVAIL  REFER  MOUNTPOINT
proxpool   39.5G  27.4G27K  /proxpool
proxpool/apache-logs   2.40G  27.4G  2.40G  /proxpool/apache-logs
proxpool/proxy-cache2  29.5G  27.4G  29.5G  /proxpool/proxy-cache2
proxpool/proxy-logs7.54G  27.4G  7.54G  /proxpool/proxy-logs


This config works very well for our site and has done for several years 
using ZFS and quite a few more
with UFS before that. These two machines support ~4500 desktops give or 
take a few. ;)


A mirror or stripe of mirrors will give you best read performance. Also 
chuck in as much RAM as you can

for ARC caching.

Hope this real world case is of use to you. Feel free to ask any more 
questions..


Cheers,

Scott.

Francois wrote:

Hello list,

What would be the best zpool configuration for a cache/proxy server 
(probably based on squid) ?


In other words with which zpool configuration I could expect best 
reading performance ? (there'll be some writes too but much less).



Thanks.

--
Francois

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
_

Scott Lawson
Systems Architect
Information Communication Technology Services

Manukau Institute of Technology
Private Bag 94006
South Auckland Mail Centre
Manukau 2240
Auckland
New Zealand

Phone  : +64 09 968 7611
Fax: +64 09 968 7641
Mobile : +64 27 568 7611

mailto:sc...@manukau.ac.nz

http://www.manukau.ac.nz

__

perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'

__



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Panic

2009-04-09 Thread Grant Lowe

Hi Remco.

Yes, I realize that was asking for trouble.  It wasn't supposed to be a test of 
yanking a LUN.  We needed a LUN for a VxVM/VxFS system and that LUN was 
available.  I was just surprised at the panic, since the system was quiesced at 
the time.  But there is coming a time when we will be doing this.  Thanks for 
the feedback.  I appreciate it.




- Original Message 
From: Remco Lengers 
To: Grant Lowe 
Cc: zfs-discuss@opensolaris.org
Sent: Thursday, April 9, 2009 5:31:42 AM
Subject: Re: [zfs-discuss] ZFS Panic

Grant,

Didn't see a response so I'll give it a go.

Ripping a disk away and silently inserting a new one is asking for trouble 
imho. I am not sure what you were trying to accomplish but generally replace a 
drive/lun would entail commands like

zpool offline tank c1t3d0
cfgadm | grep c1t3d0
sata1/3::dsk/c1t3d0disk connectedconfigured   ok
# cfgadm -c unconfigure sata1/3
Unconfigure the device at: /devices/p...@0,0/pci1022,7...@2/pci11ab,1...@1:3
This operation will suspend activity on the SATA device
Continue (yes/no)? yes
# cfgadm | grep sata1/3
sata1/3disk connectedunconfigured ok

# cfgadm -c configure sata1/3

Taken from this page:

http://docs.sun.com/app/docs/doc/819-5461/gbbzy?a=view

..Remco

Grant Lowe wrote:
> Hi All,
> 
> Don't know if this is worth reporting, as it's human error.  Anyway, I had a 
> panic on my zfs box.  Here's the error:
> 
> marksburg /usr2/glowe> grep panic /var/log/syslog
> Apr  8 06:57:17 marksburg savecore: [ID 570001 auth.error] reboot after 
> panic: assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, 
> FALSE, FTAG, &numbufs, &dbp), file: ../../common/fs/zfs/dmu.c, line: 580
> Apr  8 07:15:10 marksburg savecore: [ID 570001 auth.error] reboot after 
> panic: assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, 
> FALSE, FTAG, &numbufs, &dbp), file: ../../common/fs/zfs/dmu.c, line: 580
> marksburg /usr2/glowe>
> 
> What we did to cause this is we pulled a LUN from zfs, and replaced it with a 
> new LUN.  We then tried to shutdown the box, but it wouldn't go down.  We had 
> to send a break to the box and reboot.  This is an oracle sandbox, so we're 
> not really concerned.  Ideas?
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] raidz on-disk layout

2009-04-09 Thread m...@bruningsystems.com

Hi,

For anyone interested, I have blogged about raidz on-disk layout at:
http://mbruning.blogspot.com/2009/04/raidz-on-disk-format.html

Comments/corrections are welcome.

thanks,
max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data size grew.. with compression on

2009-04-09 Thread David Magda

On Apr 7, 2009, at 16:43, OpenSolaris Forums wrote:

if you have a snapshot of your files and rsync the same files again,  
you need to use "--inplace" rsync option , otherwise completely new  
blocks will be allocated for the new files. that`s because rsync  
will write entirely new file and rename it over the old one.




not sure if this applies here, but i think it`s worth mentioning and  
not obvious.


With ZFS new blocks will always be allocated: it's copy-on-write (COW)  
file system.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

2009-04-09 Thread Patrick Skerrett

Hi folks,

I would appreciate it if someone can help me understand some weird 
results I'm seeing with trying to do performance testing with an SSD 
offloaded ZIL.



I'm attempting to improve my infrastructure's burstable write capacity 
(ZFS based WebDav servers), and naturally I'm looking at implementing 
SSD based ZIL devices.
I have a test machine with the crummiest hard drive I can find installed 
in it, Quantum Fireball ATA-100 4500RPM 128K cache, and an Intel X25-E 
32gig SSD drive.
I'm trying to do A-B comparisons and am coming up with some very odd 
results:


The first test involves doing IOZone write testing on the fireball 
standalone, the SSD standalone, and the fireball with the SSD as a log 
device.


My test command is:  time iozone -i 0 -a -y 64 -q 1024 -g 32M

Then I check the time it takes to complete this operation in each scenario:

Fireball alone - 2m15s (told you it was crappy)
SSD alone - 0m3s
Fireball + SSD zil - 0m28s

This looks great! Watching 'zpool iostat-v' during this test further 
proves that the ZIL device is doing the brunt of the heavy lifting 
during this test. If I can get these kind of write results in my prod 
environment, I would be one happy camper.




However, ANY other test I can think of to run on this test machine shows 
absolutely no performance improvement of the Fireball+SSD Zil over the 
Fireball by itself. Watching zpool iostat -v shows no activity on the 
ZIL at all whatsoever.

Other tests I've tried to run:

A scripted batch job of 10,000 -
dd if=/dev/urandom of=/fireball/file_$i.dat bs=1k count=1000

A scripted batch job of 10,000 -
cat /sourcedrive/$file > /fireball/$file

A scripted batch job of 10,000 -
cp /sourcedrive/$file /fireball/$file

And a scripted batch job moving 10,000 files onto the fireball using 
Apache Webdav mounted on the fireball (similar to my prod environment):

curl -T /sourcedrive/$file http://127.0.0.1/fireball/




So what is IOZone doing differently than any other write operation I can 
think of???



Thanks,

Pat S.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

2009-04-09 Thread Neil Perrin

Patrick,

The ZIL is only used for synchronous requests like O_DSYNC/O_SYNC and
fsync(). Your iozone command must be doing some synchronous writes.
All the other tests (dd, cat, cp, ...) do everything asynchronously.
That is they do not require the data to be on stable storage on
return from the write. So asynchronous writes get cached in memory
(the ARC) and written out periodically (every 30 seconds or less)
when the transaction group commits.

The ZIL would be heavily used if your system were a NFS server.
Databases also do synchronous writes.

Neil.

On 04/09/09 15:13, Patrick Skerrett wrote:

Hi folks,

I would appreciate it if someone can help me understand some weird 
results I'm seeing with trying to do performance testing with an SSD 
offloaded ZIL.



I'm attempting to improve my infrastructure's burstable write capacity 
(ZFS based WebDav servers), and naturally I'm looking at implementing 
SSD based ZIL devices.
I have a test machine with the crummiest hard drive I can find installed 
in it, Quantum Fireball ATA-100 4500RPM 128K cache, and an Intel X25-E 
32gig SSD drive.
I'm trying to do A-B comparisons and am coming up with some very odd 
results:


The first test involves doing IOZone write testing on the fireball 
standalone, the SSD standalone, and the fireball with the SSD as a log 
device.


My test command is:  time iozone -i 0 -a -y 64 -q 1024 -g 32M

Then I check the time it takes to complete this operation in each scenario:

Fireball alone - 2m15s (told you it was crappy)
SSD alone - 0m3s
Fireball + SSD zil - 0m28s

This looks great! Watching 'zpool iostat-v' during this test further 
proves that the ZIL device is doing the brunt of the heavy lifting 
during this test. If I can get these kind of write results in my prod 
environment, I would be one happy camper.




However, ANY other test I can think of to run on this test machine shows 
absolutely no performance improvement of the Fireball+SSD Zil over the 
Fireball by itself. Watching zpool iostat -v shows no activity on the 
ZIL at all whatsoever.

Other tests I've tried to run:

A scripted batch job of 10,000 -
dd if=/dev/urandom of=/fireball/file_$i.dat bs=1k count=1000

A scripted batch job of 10,000 -
cat /sourcedrive/$file > /fireball/$file

A scripted batch job of 10,000 -
cp /sourcedrive/$file /fireball/$file

And a scripted batch job moving 10,000 files onto the fireball using 
Apache Webdav mounted on the fireball (similar to my prod environment):

curl -T /sourcedrive/$file http://127.0.0.1/fireball/




So what is IOZone doing differently than any other write operation I can 
think of???



Thanks,

Pat S.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zfs send speed. Was: User quota design discussion..

2009-04-09 Thread Jorgen Lundman


We finally managed to upgrade the production x4500s to Sol 10 10/08 
(unrelated to this) but with the hope that it would also make "zfs send" 
usable.


Exactly how does "build 105" translate to Solaris 10 10/08?  My current 
speed test has sent 34Gb in 24 hours, which isn't great. Perhaps the 
next version of Solaris 10 will have the improvements.




Robert Milkowski wrote:

Hello Jorgen,

If you look at the list archives you will see that it made a huge
difference for some people including me. Now I'm easily able to
saturate GbE linke while zfs send|recv'ing.


Since build 105 it should be *MUCH* for faster.



--
Jorgen Lundman   | 
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] vdev_disk_io_start() sending NULL pointer in ldi_ioctl()

2009-04-09 Thread Shyamali . Chakravarty

Hi All,

I have corefile where we see NULL pointer de-reference PANIC as we have 
sent (deliberately) NULL pointer for return value.



vdev_disk_io_start()
...
...

error = ldi_ioctl(dvd->vd_lh, zio->io_cmd,
   (uintptr_t)&zio->io_dk_callback,
   FKIOCTL, kcred, NULL);


ldi_ioctl() expects last parameter as an integer pointer ( int *rvalp).  
I see that in strdoictl().  Corefile I am analysing has similar BAD trap 
while trying tostw%g0, [%i5]  ( clr   [%i5] )


/*
* Set return value.
*/
   *rvalp = iocbp->ioc_rval;


*/

Is it a bug??  This code is all we do in vdev_disk_io_start().  I would 
appreciate any feedback on this.


regards,
--shyamali
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Panic

2009-04-09 Thread Rince
FWIW, I strongly expect live ripping of a SATA device to not panic the disk
layer. It explicitly shouldn't panic the ZFS layer, as ZFS is supposed to be
"fault-tolerant" and "drive dropping away at any time" is a rather expected
scenario.

[I've popped disks out live in many cases, both when I was experimenting
with ZFS+RAID-Z on various systems and occasionally, when I've had to
replace a disk live. In the latter case, I've done cfgadm about half the
time - the rest, I've just live ripped and then brought the disk up after
that, and it's Just Worked.]

- Rich

On Thu, Apr 9, 2009 at 3:21 PM, Grant Lowe  wrote:

>
> Hi Remco.
>
> Yes, I realize that was asking for trouble.  It wasn't supposed to be a
> test of yanking a LUN.  We needed a LUN for a VxVM/VxFS system and that LUN
> was available.  I was just surprised at the panic, since the system was
> quiesced at the time.  But there is coming a time when we will be doing
> this.  Thanks for the feedback.  I appreciate it.
>
>
>
>
> - Original Message 
> From: Remco Lengers 
> To: Grant Lowe 
> Cc: zfs-discuss@opensolaris.org
> Sent: Thursday, April 9, 2009 5:31:42 AM
> Subject: Re: [zfs-discuss] ZFS Panic
>
> Grant,
>
> Didn't see a response so I'll give it a go.
>
> Ripping a disk away and silently inserting a new one is asking for trouble
> imho. I am not sure what you were trying to accomplish but generally replace
> a drive/lun would entail commands like
>
> zpool offline tank c1t3d0
> cfgadm | grep c1t3d0
> sata1/3::dsk/c1t3d0disk connectedconfigured   ok
> # cfgadm -c unconfigure sata1/3
> Unconfigure the device at: /devices/p...@0,0/pci1022,7...@2/pci11ab,1...@1
> :3
> This operation will suspend activity on the SATA device
> Continue (yes/no)? yes
> # cfgadm | grep sata1/3
> sata1/3disk connectedunconfigured ok
> 
> # cfgadm -c configure sata1/3
>
> Taken from this page:
>
> http://docs.sun.com/app/docs/doc/819-5461/gbbzy?a=view
>
> ..Remco
>
> Grant Lowe wrote:
> > Hi All,
> >
> > Don't know if this is worth reporting, as it's human error.  Anyway, I
> had a panic on my zfs box.  Here's the error:
> >
> > marksburg /usr2/glowe> grep panic /var/log/syslog
> > Apr  8 06:57:17 marksburg savecore: [ID 570001 auth.error] reboot after
> panic: assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size,
> FALSE, FTAG, &numbufs, &dbp), file: ../../common/fs/zfs/dmu.c, line: 580
> > Apr  8 07:15:10 marksburg savecore: [ID 570001 auth.error] reboot after
> panic: assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size,
> FALSE, FTAG, &numbufs, &dbp), file: ../../common/fs/zfs/dmu.c, line: 580
> > marksburg /usr2/glowe>
> >
> > What we did to cause this is we pulled a LUN from zfs, and replaced it
> with a new LUN.  We then tried to shutdown the box, but it wouldn't go down.
>  We had to send a break to the box and reboot.  This is an oracle sandbox,
> so we're not really concerned.  Ideas?
> >
> > ___
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 

BOFH excuse #439: Hot Java has gone cold
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Panic

2009-04-09 Thread Andre van Eyssen

On Fri, 10 Apr 2009, Rince wrote:


FWIW, I strongly expect live ripping of a SATA device to not panic the disk
layer. It explicitly shouldn't panic the ZFS layer, as ZFS is supposed to be
"fault-tolerant" and "drive dropping away at any time" is a rather expected
scenario.


Ripping a SATA device out runs a goodly chance of confusing the 
controller. If you'd had this problem with fibre channel or even SCSI, I'd 
find it a far bigger concern. IME, IDE and SATA just don't hold up to the 
abuses we'd like to level at them. Of course, this boils down to 
controller and enclosure and a lot of other random chances for disaster.


In addition, where there is a procedure to gently remove the device, use 
it. We don't just yank disks from the FC-AL backplanes on V880s, because 
there is a procedure for handling this even for failed disks. The five 
minutes to do it properly is a good investment compared to much longer 
downtime from a fault condition arising from careless manhandling of 
hardware.


--
Andre van Eyssen.
mail: an...@purplecow.org  jabber: an...@interact.purplecow.org
purplecow.org: UNIX for the masses http://www2.purplecow.org
purplecow.org: PCOWpix http://pix.purplecow.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Panic

2009-04-09 Thread Rince
On Fri, Apr 10, 2009 at 12:43 AM, Andre van Eyssen wrote:

> On Fri, 10 Apr 2009, Rince wrote:
>
>  FWIW, I strongly expect live ripping of a SATA device to not panic the
>> disk
>> layer. It explicitly shouldn't panic the ZFS layer, as ZFS is supposed to
>> be
>> "fault-tolerant" and "drive dropping away at any time" is a rather
>> expected
>> scenario.
>>
>
> Ripping a SATA device out runs a goodly chance of confusing the controller.
> If you'd had this problem with fibre channel or even SCSI, I'd find it a far
> bigger concern. IME, IDE and SATA just don't hold up to the abuses we'd like
> to level at them. Of course, this boils down to controller and enclosure and
> a lot of other random chances for disaster.
>
> In addition, where there is a procedure to gently remove the device, use
> it. We don't just yank disks from the FC-AL backplanes on V880s, because
> there is a procedure for handling this even for failed disks. The five
> minutes to do it properly is a good investment compared to much longer
> downtime from a fault condition arising from careless manhandling of
> hardware.
>

IDE isn't supposed to do this, but SATA explicitly has hotplug as a
"feature".

(I think this might be SATA 2, so any SATA 1 controllers out there are
hedging your bets, but...)

I'm not advising this as a recommended procedure, but the failure of the
controller isn't my point.

*ZFS* shouldn't panic under those conditions. The disk layer, perhaps, but
not ZFS. As far as it should be concerned, it's equivalent to ejecting a
disk via cfgadm without telling ZFS first, which *IS* a supported operation.

- Rich
-- 

Procrastination means never having to say you're sorry.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss