Re: [zfs-discuss] Re: Re: zfs snapshot for backup, Quota
Richard Elling wrote: Anyone who is really clever will easily get past a quota, especially at a university -- triple that probability for an engineering college. I studied Computing Science at Glasgow University (Scotland) the department policy was NOT to use disk quotas. This was on SunOS 4.x so it was possible. What they did instead was used a separate filesystem (actually NFS server but thats not so relevant here) for each year of students plus one more for staff and postgrads. Each student year filesystem had a shared area that was world writable and a home dir for every student. How did we manage diskspace hogs ? Peer pressure, once things got above about 70% or so the admins would send out weekly reports on who was hogging diskspace. On the other hand we DID have a printer quota system that limited how much use we could make of the laser printers because that did cost money. Of course we found various ways around about that! -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: Re[7]: [zfs-discuss] Re: Re: Due to 128KB limit in ZFS it can't saturate disks
Robert Milkowski writes: > Hello Roch, > > Monday, May 15, 2006, 3:23:14 PM, you wrote: > > RBPE> The question put forth is whether the ZFS 128K blocksize is sufficient > RBPE> to saturate a regular disk. There is great body of evidence that shows > RBPE> that the bigger the write sizes and matching large FS clustersize lead > RBPE> to more throughput. The counter point is that ZFS schedules it's I/O > RBPE> like nothing else seen before and manages to sature a single disk > RBPE> using enough concurrent 128K I/O. > > Nevertheless I get much more throughput using UFS and writing with > large block than using ZFS on the same disk. And the difference is > actually quite big in favor of UFS. > Absolutely. Isn't this issue though ? 6415647 Sequential writing is jumping We will have to fix this to allow dd to get more throughput. I'm pretty sure the fix won't need to increase the blocksize though. I'll be picking up this thread again I hope next week. I have lots of homework to do to respond properly. -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Did Sol9 have a 1Tb limit on addressing IDE, does S10, would ZFS be impacted?
Hello All, Attached is a conversation I've had with an old friend / colleague / sysadmin (cc'ed) and it raises three questions for me. I would like to ask the panel, from the context as described below: Did Sol9 have a 1Tb limit on addressing IDE Does S10? Would ZFS be impacted if using >1Tb IDE devices? ...along with any general constructive commentary/feedback on the issue described... Thanks, - alec snip snip him: >The terabyte drive is on the way.. I just hope OS writers can keep >up; it's the underlying disk access layer that has problems. >Solaris uses a pseudo-SCSI interface for IDE, which runs out of CHS >at 1TB. This is why our 4.2TB RAIDs had to be split into 5 virtual >drives. I don't know how the FC interface gets over that problem. me: >so you are presenting your raid arrays as C/H/S IDE using a raid >controller, at sizes > 1Tb ? him: >Nope, they are being presented as SCSI devices, but Solaris seems to >treat IDE drives as pseudo-SCSI devices, at least from the >programming point of view. me: >Can you paste me a uname -a on the pertinent machine? I want to ask >a few people him: >SunOS mariner 5.9 Generic_112233-11 sun4u sparc SUNW,Sun-Blade-100 > >That's the machine which has a Ultra-3 SCSI host adaptor to which is >connected a RAID system. me: >Any other information that you think is pertinent ? him: >[it appears that] format can't handle a drive bigger than >65535/128/128 (as that's all the sd data structures can handle) > >Sorry [make that] cyl 65535 alt 2 hd 256 sec 128 > >If it could handle 256 sectors per cylinder then it could cope with >up to 2TB. Not that it's a problem at the moment. It's just a pity >that the underlying SCSI sub-system can't handle device sizes large >enough to handle the filesystems which can live on them. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS recovery from a disk losing power
On Thu, 2006-05-18 at 23:40 -0600, Sanjay Nadkarni wrote: > You had a file system on top of the mirror and there was some I/O > occurring to the mirror. The *only* time, SVM puts a device into > maintenance is when we receive an EIO from the underlying device. So, > in case a write occurred to the mirror, then the write to the powered > off side failed (returned an EIO) and SVM kept going. Since all buffers > sent to sd/ssd are marked with B_FAILFAST, the driver timeouts are low > and the device is put into maintenance. Sanjay, #1 on the Pareto chart of disk error messages is the nonrecoverable read. Does SVM put the mirror in maintenance mode due to an EIO caused by a nonrecoverable read? -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] cksum errors after zpool online
I've been playing with offlining an external USB disk as a way of having a backup of a laptop drive. However when I online the device and scrub it I always get cksum errors. So I just build a v880 in the lab with a mirrored zpool. I offlined 2 disks that form the mirror and then created a new file system. Then onlined the other disks and started a scrub, again I get cksum errors: v4u-880m-gmp03 19 # zpool status pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed with 0 errors on Fri May 19 18:12:04 2006 config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 mirrorONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 1 mirrorONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 1 errors: No known data errors v4u-880m-gmp03 20 # I expected that the device would be brought online and resilvered (as it had claimed it was) cleanly without any errors. Is this not the expected behaviour? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Oracle on ZFS vs. UFS
Hi, I'm preparing a personal TPC-H benchmark. The goal is not to measure or optimize the database performance, but to compare ZFS to UFS in similar configurations. At the moment I'm preparing the tests at home. The test setup is as follows: . Solaris snv_37 . 2 x AMD Opteron 252 . 4 GB RAM . 2 x 80 GB ST380817AS . Oracle 10gR2 (small SGA (320m)) The disks also contain the OS image (mirrored via SVM). On the remaining space I have created one zpool (one disk) resp. one MD-Volume with an UFS filesystem ontop (the other disk) Later I want to rerun the tests on an old E3500 (4x400MHz, 2GB RAM) with two A5200 attached (~15 still alive 9GB disks each). The first results at home are not very promising for ZFS. I measured: . database creation . catalog integration (catalog + catproc) . tablespace creation . loading data into the database from dbgen with sqlldr I can provide all the scripts (and precompiled binaries for qgen and dbgen (SPARC + x86) if anyone wants to verify my tests. In most of these tests UFS was considerable faster than ZFS. I tested . ZFS with default options . ZFS with compression enabled . ZFS without checksums . UFS (newfs: -f 8192 -i 2097152; tunefs: -e 6144; mount: nologging) Below the (preliminary) results (with a 1GB dataset from dbgen), runtime in minutes:seconds UFSZFS (default) ZFS+compZFS+nochksum db creation 0:380:420:180:40 catalog 6:19 12:05 11:55 12:04 ts creation 0:130:140:040:16 data load[1] 8:49 26:20 25:39 26:19 index creation 0:480:380:310:36 key creation 1:551:311:181:25 [1] dbgen writes into named pipes, which are read back by sqlldr. So no interim files are created Esp. on catalog creation and loading data into the database UFS is by factor 2-3 faster than ZFS (regardless of ZFS options) Only for read intensive tasks and for file creation if compression is enabled ZFS is faster than UFS. This is to no surprise, since the machine has 4GB RAM of which at least 3GB are unused, so ZFS has plenty of space for caching (all datafiles together use just 2.8GB disk space). If I enlarge the dataset I suspect that then also on the tests where ZFS does perform better, UFS will again gain the lead. I will now prepare the query benchmark to see how ZFS performs with a larger amount of parallelism in the database. In order to test also read throughput of ZFS vs. UFS, instead of using a larger dataset I will cut the memory the OS uses by setting physmem to 1GB. -- Daniel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Did Sol9 have a 1Tb limit on addressing IDE, does S10, would ZFS be impacted?
Solaris 10 update 1 (1/06 release) supports SCSI disks larger than 2 TB. I believe that the same is true for IDE (as long as your controller supports 48-bit LBA). The initial release of Solaris 10 has a 2 TB limit for 64-bit kernels, 1 TB for 32-bit kernels (or so the documentation claims, though this seems odd to me). Solaris 9, as of the 04/03 release (for many years, then) also has 2 TB support on 64-bit kernels. To use a disk this large, though, you need to use EFI labels -- it sounds like your friend was using the standard VTOC. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle on ZFS vs. UFS
Daniel Rock wrote: Hi, I'm preparing a personal TPC-H benchmark. The goal is not to measure or optimize the database performance, but to compare ZFS to UFS in similar configurations. At the moment I'm preparing the tests at home. The test setup is as follows: . Solaris snv_37 . 2 x AMD Opteron 252 . 4 GB RAM . 2 x 80 GB ST380817AS . Oracle 10gR2 (small SGA (320m)) The disks also contain the OS image (mirrored via SVM). On the remaining space I have created one zpool (one disk) resp. one MD-Volume with an UFS filesystem ontop (the other disk) Later I want to rerun the tests on an old E3500 (4x400MHz, 2GB RAM) with two A5200 attached (~15 still alive 9GB disks each). The first results at home are not very promising for ZFS. I measured: . database creation . catalog integration (catalog + catproc) . tablespace creation . loading data into the database from dbgen with sqlldr I can provide all the scripts (and precompiled binaries for qgen and dbgen (SPARC + x86) if anyone wants to verify my tests. In most of these tests UFS was considerable faster than ZFS. I tested . ZFS with default options . ZFS with compression enabled . ZFS without checksums . UFS (newfs: -f 8192 -i 2097152; tunefs: -e 6144; mount: nologging) Below the (preliminary) results (with a 1GB dataset from dbgen), runtime in minutes:seconds UFSZFS (default) ZFS+comp ZFS+nochksum db creation 0:380:420:180:40 catalog 6:19 12:05 11:55 12:04 ts creation 0:130:140:040:16 data load[1] 8:49 26:20 25:39 26:19 index creation 0:480:380:310:36 key creation 1:551:311:181:25 [1] dbgen writes into named pipes, which are read back by sqlldr. So no interim files are created Esp. on catalog creation and loading data into the database UFS is by factor 2-3 faster than ZFS (regardless of ZFS options) Only for read intensive tasks and for file creation if compression is enabled ZFS is faster than UFS. This is to no surprise, since the machine has 4GB RAM of which at least 3GB are unused, so ZFS has plenty of space for caching (all datafiles together use just 2.8GB disk space). If I enlarge the dataset I suspect that then also on the tests where ZFS does perform better, UFS will again gain the lead. I will now prepare the query benchmark to see how ZFS performs with a larger amount of parallelism in the database. In order to test also read throughput of ZFS vs. UFS, instead of using a larger dataset I will cut the memory the OS uses by setting physmem to 1GB. How big is the database? Since oracle writes in small block sizes, did you set the recordsize for ZFS? From the zfs man page: recordsize=size Specifies a suggested block size for files in the file system. This property is designed solely for use with database workloads that access files in fixed-size records. ZFS automatically tunes block sizes according to internal algorithms optimized for typical access pat- terns. For databases that create very large files but access them in small random chunks, these algorithms may be suboptimal. Specifying a "recordsize" greater than or equal to the record size of the database can result in significant performance gains. Use of this property for general purpose file systems is strongly discouraged, and may adversely affect performance. - Bart Bart Smaalders Solaris Kernel Performance [EMAIL PROTECTED] http://blogs.sun.com/barts ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] tracking error to file
In my testing, I've found the following error: zpool status -v pool: local state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM local ONLINE 0 0 0 c0d1p0ONLINE 0 0 0 c2d0p1ONLINE 0 0 0 c3d0p1ONLINE 0 0 0 c0d0s7ONLINE 0 0 0 errors: The following persistent errors have been detected: DATASET OBJECT RANGE 1b 2402lvl=0 blkid=1965 I haven't found a way to report in human terms what the above object refers to. Is there such a method? I can clear the error using existing tools, but I'd like to know what is broken before I destroy it. Thanks! - Gregory Shaw, IT Architect Phone: (303) 673-8273Fax: (303) 673-8273 ITCTO Group, Sun Microsystems Inc. 1 StorageTek Drive ULVL4-382 [EMAIL PROTECTED] (work) Louisville, CO 80028-4382 [EMAIL PROTECTED] (home) "When Microsoft writes an application for Linux, I've Won." - Linus Torvalds ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle on ZFS vs. UFS
Bart Smaalders schrieb: How big is the database? After all the data has been loaded, all datafiles together 2.8GB, SGA 320MB. But I don't think size matters on this problem, since you can already see during the catalog creation phase that UFS is 2x faster. Since oracle writes in small block sizes, did you set the recordsize for ZFS? recordsize is default (128K). Oracle uses: db_block_size=8192 db_file_multi_block_read_count=16 I tried with "db_block_size=32768" but the results got worse. I have just rerun the first parts of my benchmark (database + catalog creation) with different parameters. The datafiles will be deleted before each run, so I assume if Oracle recreates the files again they will already use the modified zfs parameters (so I don't have to recreate the zpool/zfs). Below the results (UFS again as the reference point): UFS written as UFS(forcedirectio?,ufs:blocksize,oracle:db_block_size) ZFS written as ZFS(zfs:compression,zfs:recordsize,oracle:db_block_size) These results are now run with memory capping in effect (physmem=262144 (1GB)) db creation catalog creation UFS(-,8K,8K) [default] 0:41.8516:17.530 UFS(forcedirectio,8K,8K) 0:40.4796:03.688 UFS(forcedirectio,8K,32K)0:48.7188:19.359 ZFS(off,128K,8K) [default] 0:52.427 13:28.081 ZFS(on,128K,8K) 0:50.791 14.27.919 ZFS(on,8K,8K)0:42.611 13:34.464 ZFS(off,32K,32K) 1:40.038 15:35.177 (times in min:sec.msec) So you will win a few percent, but still slower compared to UFS. UFS catalog creation is already mostly CPU bound: During the ~6 minutes of catalog creation time the corresponding oracle process consumes ~5:30 minutes of CPU time. So for UFS there is little margin for improvement. If you have Oracle installed you can easily check yourself. I have uploaded my init.ora file and the DB creation script to http://www.deadcafe.de/perf/ Just modify the variables . ADMIN (location of the oracle admin files) . DBFILES (ZFS or UFS where datafiles should be placed) . and the paths in init.ora Benchmark results will be in "db.bench" file. BTW: Why is maxphys still only 56 kByte by default on x86? I have increased maxphys to 8MB, but not much difference on the results: db creation catalog creation ZFS(off,128K,8K) (*) 0:53.250 13:32.369 (*) maxphys = 8388608 Daniel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle on ZFS vs. UFS
Richard Elling schrieb: On Fri, 2006-05-19 at 23:09 +0200, Daniel Rock wrote: (*) maxphys = 8388608 Pedantically, because ZFS does 128kByte I/Os. Setting maxphys > 128kBytes won't make any difference. I know, but with the default maxphys value of 56kByte on x86 a 128kByte request will be split into three physical I/Os. Daniel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss