Re: [zfs-discuss] ZFS Panic
Grant, Didn't see a response so I'll give it a go. Ripping a disk away and silently inserting a new one is asking for trouble imho. I am not sure what you were trying to accomplish but generally replace a drive/lun would entail commands like zpool offline tank c1t3d0 cfgadm | grep c1t3d0 sata1/3::dsk/c1t3d0disk connectedconfigured ok # cfgadm -c unconfigure sata1/3 Unconfigure the device at: /devices/p...@0,0/pci1022,7...@2/pci11ab,1...@1:3 This operation will suspend activity on the SATA device Continue (yes/no)? yes # cfgadm | grep sata1/3 sata1/3disk connectedunconfigured ok # cfgadm -c configure sata1/3 Taken from this page: http://docs.sun.com/app/docs/doc/819-5461/gbbzy?a=view ..Remco Grant Lowe wrote: Hi All, Don't know if this is worth reporting, as it's human error. Anyway, I had a panic on my zfs box. Here's the error: marksburg /usr2/glowe> grep panic /var/log/syslog Apr 8 06:57:17 marksburg savecore: [ID 570001 auth.error] reboot after panic: assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs, &dbp), file: ../../common/fs/zfs/dmu.c, line: 580 Apr 8 07:15:10 marksburg savecore: [ID 570001 auth.error] reboot after panic: assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs, &dbp), file: ../../common/fs/zfs/dmu.c, line: 580 marksburg /usr2/glowe> What we did to cause this is we pulled a LUN from zfs, and replaced it with a new LUN. We then tried to shutdown the box, but it wouldn't go down. We had to send a break to the box and reboot. This is an oracle sandbox, so we're not really concerned. Ideas? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs and nfs
> I'm using Solaris 10 (10/08). This feature is what > exactly i want. thank for response. Duh. What I meant previously was that this feature is not available in the Solaris 10 releases. Cindy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpool import error! - Help Needed
H have a similar problem: r...@moby1:~# zpool import pool: bucket id: 12835839477558970577 state: UNAVAIL action: The pool cannot be imported due to damaged devices or data. config: bucket UNAVAIL insufficient replicas raidz2UNAVAIL corrupted data c3t0d0 ONLINE c3t1d0 ONLINE c4t0d0 ONLINE c4t1d0 ONLINE c4t2d0 ONLINE c4t3d0 ONLINE How is this possible? This is with osol b108. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Data size grew.. with compression on
if you rsync data to zfs over existing files, you need to take something more into account: if you have a snapshot of your files and rsync the same files again, you need to use "--inplace" rsync option , otherwise completely new blocks will be allocated for the new files. that`s because rsync will write entirely new file and rename it over the old one. not sure if this applies here, but i think it`s worth mentioning and not obvious. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Data size grew.. with compression on
Jeff Bonwick writes: >> > Yes, I made note of that in my OP on this thread. But is it enough to >> > end up with 8gb of non-compressed files measuring 8gb on >> > reiserfs(linux) and the same data showing nearly 9gb when copied to a >> > zfs filesystem with compression on. >> >> whoops.. a hefty exaggeration it only shows about 16mb difference. >> But still since zfs side is compressed, that seems like quite a lot.. > > That's because ZFS reports *all* space consumed by a file, including > all metadata (dnodes, indirect blocks, etc). For an 8G file stored > in 128K blocks, there are 8G / 128K = 64K block pointers, each of > which is 128 bytes, and is two-way replicated (via ditto blocks), > for a total of 64K * 128 * 2 = 16M. So this is exactly as expected. All good info thanks. Still one thing doesn't quite work in your line of reasoning. The data on the gentoo linux end is uncompressed. Whereas it is compressed on the zfs side. A number of the files are themselves compressed formats such as jpg mpg avi pdf maybe a few more, which aren't going to compress further to speak of, but thousands of the files are text files (html). So compression should show some downsize. Your calculation appears to be based on both ends being uncompressed. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs as a cache server
Hello list, What would be the best zpool configuration for a cache/proxy server (probably based on squid) ? In other words with which zpool configuration I could expect best reading performance ? (there'll be some writes too but much less). Thanks. -- Francois ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs as a cache server
Francois, Your best bet is probably a stripe of mirrors. i.e. a zpool made of many mirrors. This way you have redundancy, and fast reads as well. You'll also enjoy pretty quick resilvering in the event of a disk failure as well. For even faster reads, you can add dedicated L2ARC cache devices (folks typically use SSDs for very fast (15k RPM) SAS drives for this). -Greg Francois wrote: Hello list, What would be the best zpool configuration for a cache/proxy server (probably based on squid) ? In other words with which zpool configuration I could expect best reading performance ? (there'll be some writes too but much less). Thanks. -- Francois ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Data size grew.. with compression on
OpenSolaris Forums wrote: > if you rsync data to zfs over existing files, you need to take > something more into account: > > if you have a snapshot of your files and rsync the same files again, > you need to use "--inplace" rsync option , otherwise completely new > blocks will be allocated for the new files. that`s because rsync will > write entirely new file and rename it over the old one. ZFS will allocate new blocks either way, check here http://all-unix.blogspot.com/2007/03/zfs-cow-and-relate-features.html for more information about how Copy-On-Write works. Jonathan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Efficient backup of ZFS filesystems?
Gary Mills wrote: I've been watching the ZFS ARC cache on our IMAP server while the backups are running, and also when user activity is high. The two seem to conflict. Fast response for users seems to depend on their data being in the cache when it's needed. Most of the disk I/O seems to be writes in this situation. However, the backup needs to stat all files and read many of them. I'm assuming that all of this information is also added to the ARC cache, even though it may never be needed again. It must also evict user data from the cache, causing it to be reloaded every time it's needed. We use Networker for backups now. Is there some way to configure ZFS so that backups don't churn the cache? Is there a different way to perform backups to avoid this problem? We do keep two weeks of daily ZFS snapshots to use for restores of recently-lost data. We still need something for longer-term backups. Hi Gary, Find out whether you have a problem first. If not, don't worry, but read one. If you do have a problem, add memory or an L2ARC device. The ARC was designed to mitigate the effect of any single burst of sequential I/O, but the size of the cache dedicated to more Frequently used pages (the current working set) will still be reduced, depending on the amount of activity on either side of the cache. As the ARC maintains a shadow list of recently evicted pages from both sides of the cache, such pages that are accessed again will then return to the 'Frequent' side of the cache. There will be continuous competition between 'Recent' and 'Frequent' sides of the ARC (and for convenience, I'm glossing over the existence of 'Locked' pages). Several reasons might cause pathological behaviour - a backup process might access the same metadata multiple times, causing that data to be promoted to 'Frequent', flushing out application related data. (ZFS does not differentiate between data and metadata for resource allocation, they all use the same I/O mechanism and cache.) On the other hand, you might just not have sufficient memory to keep most of your metadata in the cache, or the backup process is just too aggressive. Adding memory or an L2cache might help. Cheers, Henk ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS stripe over EMC write performance.
What is the best write performance improvement anyone has seen (if any) on a ZFS stripe over EMC SAN? I'd be interested to hear results for both - striped and non-striped EMC config. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs as a cache server
Hi François, You should take care of the recordsize in your filesystems. This should be tuned according to the size of the most accessed files. Maybe disabling the "atime" is also good idea (but it's probably something you already know ;) ). We've also noticed some cases where enabling compression gave better I/O results (but don't use gzip), but this should be done only if your machine is exclusively running the proxy server. About the topology of your pool, in a performance matter, prefer some striped mirrors if you can afford it, or raidz if not ! HTH, Jnm. -- Francois a écrit : Hello list, What would be the best zpool configuration for a cache/proxy server (probably based on squid) ? In other words with which zpool configuration I could expect best reading performance ? (there'll be some writes too but much less). Thanks. -- Francois ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Data size grew.. with compression on
Jonathan schrieb: OpenSolaris Forums wrote: if you have a snapshot of your files and rsync the same files again, you need to use "--inplace" rsync option , otherwise completely new blocks will be allocated for the new files. that`s because rsync will write entirely new file and rename it over the old one. ZFS will allocate new blocks either way No it won't. --inplace doesn't rewrite blocks identical on source and target but only blocks which have been changed. I use rsync to synchronize a directory with a few large files (each up to 32 GB). Data normally gets appended to one file until it reaches the size limit of 32 GB. Before I used --inplace a snapshot needed on average ~16 GB. Now with --inplace it is just a few kBytes. Daniel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Data size grew.. with compression on
Daniel Rock wrote: > Jonathan schrieb: >> OpenSolaris Forums wrote: >>> if you have a snapshot of your files and rsync the same files again, >>> you need to use "--inplace" rsync option , otherwise completely new >>> blocks will be allocated for the new files. that`s because rsync will >>> write entirely new file and rename it over the old one. >> >> ZFS will allocate new blocks either way > > No it won't. --inplace doesn't rewrite blocks identical on source and > target but only blocks which have been changed. > > I use rsync to synchronize a directory with a few large files (each up > to 32 GB). Data normally gets appended to one file until it reaches the > size limit of 32 GB. Before I used --inplace a snapshot needed on > average ~16 GB. Now with --inplace it is just a few kBytes. It appears I may have misread the initial post. I don't really know how I misread it, but I think I missed the snapshot portion of the message and got confused. I understand the interaction between snapshots, rsync, and --inplace being discussed now. My apologies, Jonathan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Data size grew.. with compression on
Harry, ZFS will only compress data if it is able to gain more than 12% of space by compressing the data (I may be wrong on the exact percentage). If ZFS can't get get that 12% compression at least, it doesn't bother and will just store the block uncompressed. Also, the default ZFS compression algorithm isn't gzip, so you aren't going to get the greatest compression possible, but it is quite fast. Depending on the type of data, it may not compress well at all, leading ZFS to store that data completely uncompressed. -Greg All good info thanks. Still one thing doesn't quite work in your line of reasoning. The data on the gentoo linux end is uncompressed. Whereas it is compressed on the zfs side. A number of the files are themselves compressed formats such as jpg mpg avi pdf maybe a few more, which aren't going to compress further to speak of, but thousands of the files are text files (html). So compression should show some downsize. Your calculation appears to be based on both ends being uncompressed. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Data size grew.. with compression on
Greg Mason writes: > Harry, > > ZFS will only compress data if it is able to gain more than 12% of > space by compressing the data (I may be wrong on the exact > percentage). If ZFS can't get get that 12% compression at least, it > doesn't bother and will just store the block uncompressed. > > Also, the default ZFS compression algorithm isn't gzip, so you aren't > going to get the greatest compression possible, but it is quite fast. > > Depending on the type of data, it may not compress well at all, > leading ZFS to store that data completely uncompressed. Thanks for another little addition to my knowledge of zfs. Good stuff to know. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Data size grew.. with compression on
OpenSolaris Forums writes: > if you rsync data to zfs over existing files, you need to take > something more into account: > > if you have a snapshot of your files and rsync the same files again, > you need to use "--inplace" rsync option , otherwise completely new > blocks will be allocated for the new files. that`s because rsync > will write entirely new file and rename it over the old one. > > not sure if this applies here, but i think it`s worth mentioning and > not obvious. In the particular case it didn't apply as it was a first time run but good to know what happens with rsync. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Data size grew.. with compression on
Jonathan writes: > It appears I may have misread the initial post. I don't really know how > I misread it, but I think I missed the snapshot portion of the message > and got confused. I understand the interaction between snapshots, > rsync, and --inplace being discussed now. I don't think you did misread it. The initial post had nothing to do with snapshots. It had only to do with a single run of rsync from a linux box to an zfs filesystem and noticing the data had grown even though the zfs filesystem has compression turned on. I'm not sure how snapthosts crept in here.. but I'm interested to know more about the interaction with rsync in the case of snapshots. It was a post authored by Opensolaris Forums: Message-ID: <1811927823.191239282659293.javamail.tweb...@sf-app2> That first mentioned snapshots. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs as a cache server
Hi Francois, I use ZFS with Squid proxies here at MIT. (MIT New Zealand that is ;)) My basic set up is like so. - 2 x Sun SPARC v240's dual CPU's with 2 x 36 GB boot disks and 2 x 73 GB cache disks. Each machine has 4GB RAM. - Each has a copy of squid, Squidguard and an apache server. - Apache server, serves .pac files for client machines and each .pac file binds you to that proxy. - Clients request a .pac from round robin DNS "proxy.manukau.ac.nz" which then gives you the real system name of one of these two proxies. Boot disks are mirrored using disksuite and cache and log file systems are ZFS. My cache pool is just a mirrored pool which is then split into three file systems. Cache volume is restricted to 30 GB in squid config. Max cache object size is 2MB. Internet bandwidth available to these machines is ~15Mbit/s. [r...@x /]#> zpool status pool: proxpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM proxpoolONLINE 0 0 0 mirrorONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 errors: No known data errors [r...@x /]#> zfs list NAMEUSED AVAIL REFER MOUNTPOINT proxpool 39.5G 27.4G27K /proxpool proxpool/apache-logs 2.40G 27.4G 2.40G /proxpool/apache-logs proxpool/proxy-cache2 29.5G 27.4G 29.5G /proxpool/proxy-cache2 proxpool/proxy-logs7.54G 27.4G 7.54G /proxpool/proxy-logs This config works very well for our site and has done for several years using ZFS and quite a few more with UFS before that. These two machines support ~4500 desktops give or take a few. ;) A mirror or stripe of mirrors will give you best read performance. Also chuck in as much RAM as you can for ARC caching. Hope this real world case is of use to you. Feel free to ask any more questions.. Cheers, Scott. Francois wrote: Hello list, What would be the best zpool configuration for a cache/proxy server (probably based on squid) ? In other words with which zpool configuration I could expect best reading performance ? (there'll be some writes too but much less). Thanks. -- Francois ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- _ Scott Lawson Systems Architect Information Communication Technology Services Manukau Institute of Technology Private Bag 94006 South Auckland Mail Centre Manukau 2240 Auckland New Zealand Phone : +64 09 968 7611 Fax: +64 09 968 7641 Mobile : +64 27 568 7611 mailto:sc...@manukau.ac.nz http://www.manukau.ac.nz __ perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);' __ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Panic
Hi Remco. Yes, I realize that was asking for trouble. It wasn't supposed to be a test of yanking a LUN. We needed a LUN for a VxVM/VxFS system and that LUN was available. I was just surprised at the panic, since the system was quiesced at the time. But there is coming a time when we will be doing this. Thanks for the feedback. I appreciate it. - Original Message From: Remco Lengers To: Grant Lowe Cc: zfs-discuss@opensolaris.org Sent: Thursday, April 9, 2009 5:31:42 AM Subject: Re: [zfs-discuss] ZFS Panic Grant, Didn't see a response so I'll give it a go. Ripping a disk away and silently inserting a new one is asking for trouble imho. I am not sure what you were trying to accomplish but generally replace a drive/lun would entail commands like zpool offline tank c1t3d0 cfgadm | grep c1t3d0 sata1/3::dsk/c1t3d0disk connectedconfigured ok # cfgadm -c unconfigure sata1/3 Unconfigure the device at: /devices/p...@0,0/pci1022,7...@2/pci11ab,1...@1:3 This operation will suspend activity on the SATA device Continue (yes/no)? yes # cfgadm | grep sata1/3 sata1/3disk connectedunconfigured ok # cfgadm -c configure sata1/3 Taken from this page: http://docs.sun.com/app/docs/doc/819-5461/gbbzy?a=view ..Remco Grant Lowe wrote: > Hi All, > > Don't know if this is worth reporting, as it's human error. Anyway, I had a > panic on my zfs box. Here's the error: > > marksburg /usr2/glowe> grep panic /var/log/syslog > Apr 8 06:57:17 marksburg savecore: [ID 570001 auth.error] reboot after > panic: assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, > FALSE, FTAG, &numbufs, &dbp), file: ../../common/fs/zfs/dmu.c, line: 580 > Apr 8 07:15:10 marksburg savecore: [ID 570001 auth.error] reboot after > panic: assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, > FALSE, FTAG, &numbufs, &dbp), file: ../../common/fs/zfs/dmu.c, line: 580 > marksburg /usr2/glowe> > > What we did to cause this is we pulled a LUN from zfs, and replaced it with a > new LUN. We then tried to shutdown the box, but it wouldn't go down. We had > to send a break to the box and reboot. This is an oracle sandbox, so we're > not really concerned. Ideas? > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] raidz on-disk layout
Hi, For anyone interested, I have blogged about raidz on-disk layout at: http://mbruning.blogspot.com/2009/04/raidz-on-disk-format.html Comments/corrections are welcome. thanks, max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Data size grew.. with compression on
On Apr 7, 2009, at 16:43, OpenSolaris Forums wrote: if you have a snapshot of your files and rsync the same files again, you need to use "--inplace" rsync option , otherwise completely new blocks will be allocated for the new files. that`s because rsync will write entirely new file and rename it over the old one. not sure if this applies here, but i think it`s worth mentioning and not obvious. With ZFS new blocks will always be allocated: it's copy-on-write (COW) file system. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great
Hi folks, I would appreciate it if someone can help me understand some weird results I'm seeing with trying to do performance testing with an SSD offloaded ZIL. I'm attempting to improve my infrastructure's burstable write capacity (ZFS based WebDav servers), and naturally I'm looking at implementing SSD based ZIL devices. I have a test machine with the crummiest hard drive I can find installed in it, Quantum Fireball ATA-100 4500RPM 128K cache, and an Intel X25-E 32gig SSD drive. I'm trying to do A-B comparisons and am coming up with some very odd results: The first test involves doing IOZone write testing on the fireball standalone, the SSD standalone, and the fireball with the SSD as a log device. My test command is: time iozone -i 0 -a -y 64 -q 1024 -g 32M Then I check the time it takes to complete this operation in each scenario: Fireball alone - 2m15s (told you it was crappy) SSD alone - 0m3s Fireball + SSD zil - 0m28s This looks great! Watching 'zpool iostat-v' during this test further proves that the ZIL device is doing the brunt of the heavy lifting during this test. If I can get these kind of write results in my prod environment, I would be one happy camper. However, ANY other test I can think of to run on this test machine shows absolutely no performance improvement of the Fireball+SSD Zil over the Fireball by itself. Watching zpool iostat -v shows no activity on the ZIL at all whatsoever. Other tests I've tried to run: A scripted batch job of 10,000 - dd if=/dev/urandom of=/fireball/file_$i.dat bs=1k count=1000 A scripted batch job of 10,000 - cat /sourcedrive/$file > /fireball/$file A scripted batch job of 10,000 - cp /sourcedrive/$file /fireball/$file And a scripted batch job moving 10,000 files onto the fireball using Apache Webdav mounted on the fireball (similar to my prod environment): curl -T /sourcedrive/$file http://127.0.0.1/fireball/ So what is IOZone doing differently than any other write operation I can think of??? Thanks, Pat S. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great
Patrick, The ZIL is only used for synchronous requests like O_DSYNC/O_SYNC and fsync(). Your iozone command must be doing some synchronous writes. All the other tests (dd, cat, cp, ...) do everything asynchronously. That is they do not require the data to be on stable storage on return from the write. So asynchronous writes get cached in memory (the ARC) and written out periodically (every 30 seconds or less) when the transaction group commits. The ZIL would be heavily used if your system were a NFS server. Databases also do synchronous writes. Neil. On 04/09/09 15:13, Patrick Skerrett wrote: Hi folks, I would appreciate it if someone can help me understand some weird results I'm seeing with trying to do performance testing with an SSD offloaded ZIL. I'm attempting to improve my infrastructure's burstable write capacity (ZFS based WebDav servers), and naturally I'm looking at implementing SSD based ZIL devices. I have a test machine with the crummiest hard drive I can find installed in it, Quantum Fireball ATA-100 4500RPM 128K cache, and an Intel X25-E 32gig SSD drive. I'm trying to do A-B comparisons and am coming up with some very odd results: The first test involves doing IOZone write testing on the fireball standalone, the SSD standalone, and the fireball with the SSD as a log device. My test command is: time iozone -i 0 -a -y 64 -q 1024 -g 32M Then I check the time it takes to complete this operation in each scenario: Fireball alone - 2m15s (told you it was crappy) SSD alone - 0m3s Fireball + SSD zil - 0m28s This looks great! Watching 'zpool iostat-v' during this test further proves that the ZIL device is doing the brunt of the heavy lifting during this test. If I can get these kind of write results in my prod environment, I would be one happy camper. However, ANY other test I can think of to run on this test machine shows absolutely no performance improvement of the Fireball+SSD Zil over the Fireball by itself. Watching zpool iostat -v shows no activity on the ZIL at all whatsoever. Other tests I've tried to run: A scripted batch job of 10,000 - dd if=/dev/urandom of=/fireball/file_$i.dat bs=1k count=1000 A scripted batch job of 10,000 - cat /sourcedrive/$file > /fireball/$file A scripted batch job of 10,000 - cp /sourcedrive/$file /fireball/$file And a scripted batch job moving 10,000 files onto the fireball using Apache Webdav mounted on the fireball (similar to my prod environment): curl -T /sourcedrive/$file http://127.0.0.1/fireball/ So what is IOZone doing differently than any other write operation I can think of??? Thanks, Pat S. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zfs send speed. Was: User quota design discussion..
We finally managed to upgrade the production x4500s to Sol 10 10/08 (unrelated to this) but with the hope that it would also make "zfs send" usable. Exactly how does "build 105" translate to Solaris 10 10/08? My current speed test has sent 34Gb in 24 hours, which isn't great. Perhaps the next version of Solaris 10 will have the improvements. Robert Milkowski wrote: Hello Jorgen, If you look at the list archives you will see that it made a huge difference for some people including me. Now I'm easily able to saturate GbE linke while zfs send|recv'ing. Since build 105 it should be *MUCH* for faster. -- Jorgen Lundman | Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] vdev_disk_io_start() sending NULL pointer in ldi_ioctl()
Hi All, I have corefile where we see NULL pointer de-reference PANIC as we have sent (deliberately) NULL pointer for return value. vdev_disk_io_start() ... ... error = ldi_ioctl(dvd->vd_lh, zio->io_cmd, (uintptr_t)&zio->io_dk_callback, FKIOCTL, kcred, NULL); ldi_ioctl() expects last parameter as an integer pointer ( int *rvalp). I see that in strdoictl(). Corefile I am analysing has similar BAD trap while trying tostw%g0, [%i5] ( clr [%i5] ) /* * Set return value. */ *rvalp = iocbp->ioc_rval; */ Is it a bug?? This code is all we do in vdev_disk_io_start(). I would appreciate any feedback on this. regards, --shyamali ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Panic
FWIW, I strongly expect live ripping of a SATA device to not panic the disk layer. It explicitly shouldn't panic the ZFS layer, as ZFS is supposed to be "fault-tolerant" and "drive dropping away at any time" is a rather expected scenario. [I've popped disks out live in many cases, both when I was experimenting with ZFS+RAID-Z on various systems and occasionally, when I've had to replace a disk live. In the latter case, I've done cfgadm about half the time - the rest, I've just live ripped and then brought the disk up after that, and it's Just Worked.] - Rich On Thu, Apr 9, 2009 at 3:21 PM, Grant Lowe wrote: > > Hi Remco. > > Yes, I realize that was asking for trouble. It wasn't supposed to be a > test of yanking a LUN. We needed a LUN for a VxVM/VxFS system and that LUN > was available. I was just surprised at the panic, since the system was > quiesced at the time. But there is coming a time when we will be doing > this. Thanks for the feedback. I appreciate it. > > > > > - Original Message > From: Remco Lengers > To: Grant Lowe > Cc: zfs-discuss@opensolaris.org > Sent: Thursday, April 9, 2009 5:31:42 AM > Subject: Re: [zfs-discuss] ZFS Panic > > Grant, > > Didn't see a response so I'll give it a go. > > Ripping a disk away and silently inserting a new one is asking for trouble > imho. I am not sure what you were trying to accomplish but generally replace > a drive/lun would entail commands like > > zpool offline tank c1t3d0 > cfgadm | grep c1t3d0 > sata1/3::dsk/c1t3d0disk connectedconfigured ok > # cfgadm -c unconfigure sata1/3 > Unconfigure the device at: /devices/p...@0,0/pci1022,7...@2/pci11ab,1...@1 > :3 > This operation will suspend activity on the SATA device > Continue (yes/no)? yes > # cfgadm | grep sata1/3 > sata1/3disk connectedunconfigured ok > > # cfgadm -c configure sata1/3 > > Taken from this page: > > http://docs.sun.com/app/docs/doc/819-5461/gbbzy?a=view > > ..Remco > > Grant Lowe wrote: > > Hi All, > > > > Don't know if this is worth reporting, as it's human error. Anyway, I > had a panic on my zfs box. Here's the error: > > > > marksburg /usr2/glowe> grep panic /var/log/syslog > > Apr 8 06:57:17 marksburg savecore: [ID 570001 auth.error] reboot after > panic: assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, > FALSE, FTAG, &numbufs, &dbp), file: ../../common/fs/zfs/dmu.c, line: 580 > > Apr 8 07:15:10 marksburg savecore: [ID 570001 auth.error] reboot after > panic: assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, > FALSE, FTAG, &numbufs, &dbp), file: ../../common/fs/zfs/dmu.c, line: 580 > > marksburg /usr2/glowe> > > > > What we did to cause this is we pulled a LUN from zfs, and replaced it > with a new LUN. We then tried to shutdown the box, but it wouldn't go down. > We had to send a break to the box and reboot. This is an oracle sandbox, > so we're not really concerned. Ideas? > > > > ___ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > -- BOFH excuse #439: Hot Java has gone cold ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Panic
On Fri, 10 Apr 2009, Rince wrote: FWIW, I strongly expect live ripping of a SATA device to not panic the disk layer. It explicitly shouldn't panic the ZFS layer, as ZFS is supposed to be "fault-tolerant" and "drive dropping away at any time" is a rather expected scenario. Ripping a SATA device out runs a goodly chance of confusing the controller. If you'd had this problem with fibre channel or even SCSI, I'd find it a far bigger concern. IME, IDE and SATA just don't hold up to the abuses we'd like to level at them. Of course, this boils down to controller and enclosure and a lot of other random chances for disaster. In addition, where there is a procedure to gently remove the device, use it. We don't just yank disks from the FC-AL backplanes on V880s, because there is a procedure for handling this even for failed disks. The five minutes to do it properly is a good investment compared to much longer downtime from a fault condition arising from careless manhandling of hardware. -- Andre van Eyssen. mail: an...@purplecow.org jabber: an...@interact.purplecow.org purplecow.org: UNIX for the masses http://www2.purplecow.org purplecow.org: PCOWpix http://pix.purplecow.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Panic
On Fri, Apr 10, 2009 at 12:43 AM, Andre van Eyssen wrote: > On Fri, 10 Apr 2009, Rince wrote: > > FWIW, I strongly expect live ripping of a SATA device to not panic the >> disk >> layer. It explicitly shouldn't panic the ZFS layer, as ZFS is supposed to >> be >> "fault-tolerant" and "drive dropping away at any time" is a rather >> expected >> scenario. >> > > Ripping a SATA device out runs a goodly chance of confusing the controller. > If you'd had this problem with fibre channel or even SCSI, I'd find it a far > bigger concern. IME, IDE and SATA just don't hold up to the abuses we'd like > to level at them. Of course, this boils down to controller and enclosure and > a lot of other random chances for disaster. > > In addition, where there is a procedure to gently remove the device, use > it. We don't just yank disks from the FC-AL backplanes on V880s, because > there is a procedure for handling this even for failed disks. The five > minutes to do it properly is a good investment compared to much longer > downtime from a fault condition arising from careless manhandling of > hardware. > IDE isn't supposed to do this, but SATA explicitly has hotplug as a "feature". (I think this might be SATA 2, so any SATA 1 controllers out there are hedging your bets, but...) I'm not advising this as a recommended procedure, but the failure of the controller isn't my point. *ZFS* shouldn't panic under those conditions. The disk layer, perhaps, but not ZFS. As far as it should be concerned, it's equivalent to ejecting a disk via cfgadm without telling ZFS first, which *IS* a supported operation. - Rich -- Procrastination means never having to say you're sorry. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss