Re: [zfs-discuss] Re: Re: zfs snapshot for backup, Quota

2006-05-19 Thread Darren J Moffat

Richard Elling wrote:

Anyone who is really clever will easily get past a quota, especially
at a university -- triple that probability for an engineering college.


I studied Computing Science at Glasgow University (Scotland) the 
department policy was NOT to use disk quotas.  This was on SunOS 4.x so 
it was possible.  What they did instead was used a separate filesystem 
(actually NFS server but thats not so relevant here) for each year of 
students plus one more for staff and postgrads.   Each student year 
filesystem had a shared area that was world writable and a home dir for 
every student.


How did we manage diskspace hogs ?  Peer pressure, once things got above 
about 70% or so the admins would send out weekly reports on who was 
hogging diskspace.


On the other hand we DID have a printer quota system that limited how 
much use we could make of the laser printers because that did cost 
money.   Of course we found various ways around about that!



--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[7]: [zfs-discuss] Re: Re: Due to 128KB limit in ZFS it can't saturate disks

2006-05-19 Thread Roch Bourbonnais - Performance Engineering

Robert Milkowski writes:
 > Hello Roch,
 > 
 > Monday, May 15, 2006, 3:23:14 PM, you wrote:
 > 
 > RBPE> The question put forth is whether the ZFS 128K blocksize is sufficient
 > RBPE> to saturate a regular disk. There is great body of evidence that shows
 > RBPE> that the bigger the write sizes and matching large FS clustersize lead
 > RBPE> to more throughput. The counter point is that ZFS schedules it's I/O
 > RBPE> like nothing else seen before and manages to sature a single disk
 > RBPE> using enough concurrent 128K I/O.
 > 
 > Nevertheless I get much more throughput using UFS and writing with
 > large block than using ZFS on the same disk. And the difference is
 > actually quite big in favor of UFS.
 > 

Absolutely. Isn't this issue though ?

6415647 Sequential writing is jumping

We will have to fix this to allow dd to get more throughput.
I'm pretty sure the fix won't need to increase the
blocksize though.

I'll be picking up this thread again I hope next week. I have lots
of homework to do to respond properly.

-r

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Did Sol9 have a 1Tb limit on addressing IDE, does S10, would ZFS be impacted?

2006-05-19 Thread Alec Muffett

Hello All,


Attached is a conversation I've had with an old friend / colleague /
sysadmin (cc'ed) and it raises three questions for me.

I would like to ask the panel, from the context as described below:


Did Sol9 have a 1Tb limit on addressing IDE

Does S10?

Would ZFS be impacted if using >1Tb IDE devices?


...along with any general constructive commentary/feedback on the
issue described...

Thanks,

- alec


 snip snip 


him:

 >The terabyte drive is on the way.. I just hope OS writers can keep
 >up; it's the underlying disk access layer that has problems.
 >Solaris uses a pseudo-SCSI interface for IDE, which runs out of CHS
 >at 1TB.  This is why our 4.2TB RAIDs had to be split into 5 virtual
 >drives.  I don't know how the FC interface gets over that problem.

me:

 >so you are presenting your raid arrays as C/H/S IDE using a raid
 >controller, at sizes > 1Tb ?

him:

 >Nope, they are being presented as SCSI devices, but Solaris seems to
 >treat IDE drives as pseudo-SCSI devices, at least from the
 >programming point of view.

me:

 >Can you paste me a uname -a on the pertinent machine? I want to ask
 >a few people

him:

 >SunOS mariner 5.9 Generic_112233-11 sun4u sparc SUNW,Sun-Blade-100
 >
 >That's the machine which has a Ultra-3 SCSI host adaptor to which is
 >connected a RAID system.

me:

 >Any other information that you think is pertinent ?

him:

 >[it appears that] format can't handle a drive bigger than
 >65535/128/128 (as that's all the sd data structures can handle)
 >
 >Sorry [make that] cyl 65535 alt 2 hd 256 sec 128
 >
 >If it could handle 256 sectors per cylinder then it could cope with
 >up to 2TB.  Not that it's a problem at the moment.  It's just a pity
 >that the underlying SCSI sub-system can't handle device sizes large
 >enough to handle the filesystems which can live on them.




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS recovery from a disk losing power

2006-05-19 Thread Richard Elling
On Thu, 2006-05-18 at 23:40 -0600, Sanjay Nadkarni wrote:
>  You had a file system on top of the mirror and there was some I/O 
> occurring to the mirror.  The *only* time, SVM puts a device into 
> maintenance is when we receive an EIO from the underlying device.  So, 
> in case a write occurred to the mirror, then the write to the powered 
> off side failed (returned an EIO) and SVM kept going.  Since all buffers 
> sent to sd/ssd are marked with B_FAILFAST, the driver timeouts are low 
> and the device is put into maintenance.

Sanjay,
#1 on the Pareto chart of disk error messages is the nonrecoverable
read.  Does SVM put the mirror in maintenance mode due to an EIO caused
by a nonrecoverable read?
 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] cksum errors after zpool online

2006-05-19 Thread Chris Gerhard
I've been playing with offlining an external USB disk as a way of having a 
backup of a laptop drive. However when I online the device and scrub it I 
always get cksum errors.

So I just build a v880 in the lab with a mirrored zpool. I offlined 2 disks 
that form the mirror and then created a new file system. Then onlined the other 
disks and started a scrub, again I get cksum errors:


v4u-880m-gmp03 19 # zpool status
  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed with 0 errors on Fri May 19 18:12:04 2006
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
c1t4d0  ONLINE   0 0 1
  mirrorONLINE   0 0 0
c1t3d0  ONLINE   0 0 0
c1t5d0  ONLINE   0 0 1

errors: No known data errors
v4u-880m-gmp03 20 #

I expected that the device would be brought online and resilvered (as it had 
claimed it was) cleanly without any errors. Is this not the expected behaviour?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Oracle on ZFS vs. UFS

2006-05-19 Thread Daniel Rock

Hi,

I'm preparing a personal TPC-H benchmark. The goal is not to measure or
optimize the database performance, but to compare ZFS to UFS in similar
configurations.

At the moment I'm preparing the tests at home. The test setup is as
follows:
. Solaris snv_37
. 2 x AMD Opteron 252
. 4 GB RAM
. 2 x 80 GB ST380817AS
. Oracle 10gR2 (small SGA (320m))

The disks also contain the OS image (mirrored via SVM). On the remaining
space I have created one zpool (one disk) resp. one MD-Volume with an UFS
filesystem ontop (the other disk)
Later I want to rerun the tests on an old E3500 (4x400MHz, 2GB RAM) with
two A5200 attached (~15 still alive 9GB disks each).

The first results at home are not very promising for ZFS.

I measured:
. database creation
. catalog integration (catalog + catproc)
. tablespace creation
. loading data into the database from dbgen with sqlldr

I can provide all the scripts (and precompiled binaries for qgen and dbgen 
(SPARC + x86) if anyone wants to verify my tests.


In most of these tests UFS was considerable faster than ZFS. I tested
. ZFS with default options
. ZFS with compression enabled
. ZFS without checksums
. UFS (newfs: -f 8192 -i 2097152; tunefs: -e 6144; mount: nologging)


Below the (preliminary) results (with a 1GB dataset from dbgen), runtime
in minutes:seconds

 UFSZFS (default)   ZFS+compZFS+nochksum
db creation  0:380:420:180:40
catalog  6:19   12:05   11:55   12:04
ts creation  0:130:140:040:16
data load[1] 8:49   26:20   25:39   26:19
index creation   0:480:380:310:36
key creation 1:551:311:181:25

[1] dbgen writes into named pipes, which are read back by sqlldr. So no
interim files are created

Esp. on catalog creation and loading data into the database UFS is by factor
2-3 faster than ZFS (regardless of ZFS options)

Only for read intensive tasks and for file creation if compression is enabled
ZFS is faster than UFS. This is to no surprise, since the machine has 4GB
RAM of which at least 3GB are unused, so ZFS has plenty of space for
caching (all datafiles together use just 2.8GB disk space). If I enlarge
the dataset I suspect that then also on the tests where ZFS does perform
better, UFS will again gain the lead.

I will now prepare the query benchmark to see how ZFS performs with a larger
amount of parallelism in the database. In order to test also read throughput 
of ZFS vs. UFS, instead of using a larger dataset I will cut the memory the 
OS uses by setting physmem to 1GB.



--
Daniel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Did Sol9 have a 1Tb limit on addressing IDE, does S10, would ZFS be impacted?

2006-05-19 Thread Anton B. Rang
Solaris 10 update 1 (1/06 release) supports SCSI disks larger than 2 TB.  I 
believe that the same is true for IDE (as long as your controller supports 
48-bit LBA).

The initial release of Solaris 10 has a 2 TB limit for 64-bit kernels, 1 TB for 
32-bit kernels (or so the documentation claims, though this seems odd to me).  
Solaris 9, as of the 04/03 release (for many years, then) also has 2 TB support 
on 64-bit kernels.

To use a disk this large, though, you need to use EFI labels -- it sounds like 
your friend was using the standard VTOC.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle on ZFS vs. UFS

2006-05-19 Thread Bart Smaalders

Daniel Rock wrote:

Hi,

I'm preparing a personal TPC-H benchmark. The goal is not to measure or
optimize the database performance, but to compare ZFS to UFS in similar
configurations.

At the moment I'm preparing the tests at home. The test setup is as
follows:
. Solaris snv_37
. 2 x AMD Opteron 252
. 4 GB RAM
. 2 x 80 GB ST380817AS
. Oracle 10gR2 (small SGA (320m))

The disks also contain the OS image (mirrored via SVM). On the remaining
space I have created one zpool (one disk) resp. one MD-Volume with an UFS
filesystem ontop (the other disk)
Later I want to rerun the tests on an old E3500 (4x400MHz, 2GB RAM) with
two A5200 attached (~15 still alive 9GB disks each).

The first results at home are not very promising for ZFS.

I measured:
. database creation
. catalog integration (catalog + catproc)
. tablespace creation
. loading data into the database from dbgen with sqlldr

I can provide all the scripts (and precompiled binaries for qgen and 
dbgen (SPARC + x86) if anyone wants to verify my tests.


In most of these tests UFS was considerable faster than ZFS. I tested
. ZFS with default options
. ZFS with compression enabled
. ZFS without checksums
. UFS (newfs: -f 8192 -i 2097152; tunefs: -e 6144; mount: nologging)


Below the (preliminary) results (with a 1GB dataset from dbgen), runtime
in minutes:seconds

 UFSZFS (default)   ZFS+comp
ZFS+nochksum

db creation  0:380:420:180:40
catalog  6:19   12:05   11:55   12:04
ts creation  0:130:140:040:16
data load[1] 8:49   26:20   25:39   26:19
index creation   0:480:380:310:36
key creation 1:551:311:181:25

[1] dbgen writes into named pipes, which are read back by sqlldr. So no
interim files are created

Esp. on catalog creation and loading data into the database UFS is by 
factor

2-3 faster than ZFS (regardless of ZFS options)

Only for read intensive tasks and for file creation if compression is 
enabled

ZFS is faster than UFS. This is to no surprise, since the machine has 4GB
RAM of which at least 3GB are unused, so ZFS has plenty of space for
caching (all datafiles together use just 2.8GB disk space). If I enlarge
the dataset I suspect that then also on the tests where ZFS does perform
better, UFS will again gain the lead.

I will now prepare the query benchmark to see how ZFS performs with a 
larger
amount of parallelism in the database. In order to test also read 
throughput of ZFS vs. UFS, instead of using a larger dataset I will cut 
the memory the OS uses by setting physmem to 1GB.




How big is the database?

Since oracle writes in small block sizes, did you set the recordsize for
ZFS?

From the zfs man page:

 recordsize=size

 Specifies a suggested block size for files in  the  file
 system.  This  property  is designed solely for use with
 database  workloads  that  access  files  in  fixed-size
 records.  ZFS  automatically tunes block sizes according
 to internal algorithms optimized for typical access pat-
 terns.

 For databases that create very large  files  but  access
 them  in  small  random  chunks, these algorithms may be
 suboptimal. Specifying a "recordsize"  greater  than  or
 equal  to  the record size of the database can result in
 significant performance gains. Use of this property  for
 general  purpose  file  systems is strongly discouraged,
 and may adversely affect performance.

- Bart



Bart Smaalders  Solaris Kernel Performance
[EMAIL PROTECTED]   http://blogs.sun.com/barts
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] tracking error to file

2006-05-19 Thread Gregory Shaw

In my testing, I've found the following error:

zpool status -v
  pool: local
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
local   ONLINE   0 0 0
  c0d1p0ONLINE   0 0 0
  c2d0p1ONLINE   0 0 0
  c3d0p1ONLINE   0 0 0
  c0d0s7ONLINE   0 0 0

errors: The following persistent errors have been detected:

  DATASET  OBJECT  RANGE
  1b   2402lvl=0 blkid=1965

I haven't found a way to report in human terms what the above object  
refers to.  Is there such a method?


I can clear the error using existing tools, but I'd like to know what  
is broken before I destroy it.


Thanks!

-
Gregory Shaw, IT Architect
Phone: (303) 673-8273Fax: (303) 673-8273
ITCTO Group, Sun Microsystems Inc.
1 StorageTek Drive ULVL4-382   [EMAIL PROTECTED] (work)
Louisville, CO 80028-4382 [EMAIL PROTECTED] (home)
"When Microsoft writes an application for Linux, I've Won." - Linus  
Torvalds



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle on ZFS vs. UFS

2006-05-19 Thread Daniel Rock

Bart Smaalders schrieb:

How big is the database?


After all the data has been loaded, all datafiles together 2.8GB, SGA 320MB. 
But I don't think size matters on this problem, since you can already see 
during the catalog creation phase that UFS is 2x faster.




Since oracle writes in small block sizes, did you set the recordsize for
ZFS?


recordsize is default (128K). Oracle uses:
db_block_size=8192
db_file_multi_block_read_count=16

I tried with "db_block_size=32768" but the results got worse.

I have just rerun the first parts of my benchmark (database + catalog 
creation) with different parameters.


The datafiles will be deleted before each run, so I assume if Oracle 
recreates the files again they will already use the modified zfs parameters 
(so I don't have to recreate the zpool/zfs).


Below the results (UFS again as the reference point):
UFS written as UFS(forcedirectio?,ufs:blocksize,oracle:db_block_size)
ZFS written as ZFS(zfs:compression,zfs:recordsize,oracle:db_block_size)

These results are now run with memory capping in effect (physmem=262144 (1GB))

db creation catalog creation
UFS(-,8K,8K) [default]   0:41.8516:17.530
UFS(forcedirectio,8K,8K) 0:40.4796:03.688
UFS(forcedirectio,8K,32K)0:48.7188:19.359

ZFS(off,128K,8K) [default]   0:52.427   13:28.081
ZFS(on,128K,8K)  0:50.791   14.27.919
ZFS(on,8K,8K)0:42.611   13:34.464
ZFS(off,32K,32K) 1:40.038   15:35.177

(times in min:sec.msec)

So you will win a few percent, but still slower compared to UFS. UFS catalog 
creation is already mostly CPU bound: During the ~6 minutes of catalog 
creation time the corresponding oracle process consumes ~5:30 minutes of CPU 
time. So for UFS there is little margin for improvement.



If you have Oracle installed you can easily check yourself. I have uploaded 
my init.ora file and the DB creation script to

http://www.deadcafe.de/perf/
Just modify the variables
. ADMIN (location of the oracle admin files)
. DBFILES (ZFS or UFS where datafiles should be placed)
. and the paths in init.ora

Benchmark results will be in "db.bench" file.


BTW: Why is maxphys still only 56 kByte by default on x86? I have increased 
maxphys to 8MB, but not much difference on the results:


db creation catalog creation
ZFS(off,128K,8K) (*) 0:53.250   13:32.369

(*) maxphys = 8388608


Daniel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle on ZFS vs. UFS

2006-05-19 Thread Daniel Rock

Richard Elling schrieb:

On Fri, 2006-05-19 at 23:09 +0200, Daniel Rock wrote:

(*) maxphys = 8388608


Pedantically, because ZFS does 128kByte I/Os.  Setting maxphys >
128kBytes won't make any difference.


I know, but with the default maxphys value of 56kByte on x86 a 128kByte 
request will be split into three physical I/Os.



Daniel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss