Re: [zfs-discuss] zfs comparison

2008-01-17 Thread Darren J Moffat
Agile Aspect wrote:
> Hi - I'm new to ZFS but not to Solaris.
> 
> Is there a search able interface to the zfs-discuss mail archives?

Use google against the mailman archives:

http://mail.opensolaris.org/pipermail/zfs-discuss/

> Pardon my ignorance, but is ZFS with compression safe to use in a
> production environment?

Yes, why wouldn't it be ?  If it wasn't safe it wouldn't have been 
delivered.

What kind of "unsafe" behaviour are you worried about ?  It is possible 
that other ZFS features mitigate issues you may be worried about or have 
experienced elsewhere.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ATA UDMA data parity error

2008-01-17 Thread Kent Watsen

Hey all,

I'm not sure if this is a ZFS bug or a hardware issue I'm having - any 
pointers would be great!


Following contents include:
  - high-level info about my system
  - my first thought to debugging this
  - stack trace
  - format output
  - zpool status output
  - dmesg output



High-Level Info About My System
-
- fresh install of b78
- first time trying to do anything IO-intensive with ZFS
 - command was `cp -r /cdrom /tank/sxce_b78_disk1`
 - but this also fails `cp -r /usr /tank/usr`
- system has 24 sata/sas drive bays, but only 12 of them all populated
- system has three AOC-SAT2-MV8 cards plugged into 6 mini-sas backplanes
- card1 ("c3")
- bp1  (c3t0d0, c3t1d0)
- bp2 (c3t4d0, c3t5d0)
- card2 ("c4")
- bp1  (c4t0d0, c4t1d0)
- bp2 (c4t4d0, c4t5d0)
- card3 ("c5")
- bp1  (c5t0d0, c5t1d0)
- bp2 (c5t4d0, c5t5d0)
- system has one Barcelona Opteron (step BA)
  - the one with the potential look-aside cache bug...
  - though its not clear this is related...



My First Thought To Debugging This

After crashing my system several times (using `cp -r /usr /tank/usr`) 
and comparing the outputs, I noticed that it stack trace always points 
to "device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
PROTECTED],1", which 
corresponds to all the drives connected to aoc-sat-mv8 card #3 (i.e. "c5")

But looking at the `format` output, this device path only differs from 
the other devices in that there is a ",1" trailing the 
"/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]" 
part.  Further, again looking at 
the `format` output,  "c3" devices have "4/disk", "c4" devices have 
"6/disk", and "c5" devices also have "6/disk".

The only other thing I can add to this is that if I boot a Xen kernel, 
which I was *not* using for all these tests, I the following IRQ errors 
are reported:

SunOS Release 5.11 Version snv_78 
64-bit   
Copyright 1983-2007 Sun Microsystems, Inc.  All rights 
reserved.   
Use is subject to license 
terms.   
Hostname: 
san  
NOTICE: IRQ17 is 
shared   
Reading ZFS config: done
Mounting ZFS filesystems: (1/1)
NOTICE: IRQ20 is shared
NOTICE: IRQ21 is shared
NOTICE: IRQ22 is shared
 
Any ideas?




Stack Trace   (note: I've done this a few times and its always 
"/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED],1")
---
ATA UDMA data parity error
SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
EVENT-TIME: 0x478f39ab.0x160dc688 (0x5344bb5958)
PLATFORM: i86pc, CSN: -, HOSTNAME: san
SOURCE: SunOS, REV: 5.11 snv_78
DESC: Errors have been detected that require a reboot to ensure system
integrity.  See http://www.sun.com/msg/SUNOS-8000-0G for more information.
AUTO-RESPONSE: Solaris will attempt to save and diagnose the error telemetry
IMPACT: The system will sync files, save a crash dump if needed, and reboot
REC-ACTION: Save the error summary below in case telemetry cannot be saved


panic[cpu3]/thread=ff000f7c2c80: pcie_pci-0: PCI(-X) Express Fatal Error

ff000f7c2bc0 pcie_pci:pepb_err_msi_intr+d2 ()
ff000f7c2c20 unix:av_dispatch_autovect+78 ()
ff000f7c2c60 unix:dispatch_hardint+2f ()
ff000fd09fd0 unix:switch_sp_and_call+13 ()
ff000fd0a020 unix:do_interrupt+a0 ()
ff000fd0a030 unix:cmnint+ba ()
ff000fd0a130 genunix:avl_first+1e ()
ff000fd0a1f0 zfs:metaslab_group_alloc+d1 ()
ff000fd0a2c0 zfs:metaslab_alloc_dva+1b7 ()
ff000fd0a360 zfs:metaslab_alloc+82 ()
ff000fd0a3b0 zfs:zio_dva_allocate+8a ()
ff000fd0a3d0 zfs:zio_next_stage+b3 ()
ff000fd0a400 zfs:zio_checksum_generate+6e ()
ff000fd0a420 zfs:zio_next_stage+b3 ()
ff000fd0a490 zfs:zio_write_compress+239 ()
ff000fd0a4b0 zfs:zio_next_stage+b3 ()
ff000fd0a500 zfs:zio_wait_for_children+5d ()
ff000fd0a520 zfs:zio_wait_children_ready+20 ()
ff000fd0a540 zfs:zio_next_stage_async+bb ()
ff000fd0a560 zfs:zio_nowait+11 ()
ff000fd0a870 zfs:dbuf_sync_leaf+1ac ()
ff000fd0a8b0 zfs:dbuf_sync_list+51 ()
ff000fd0a900 zfs:dbuf_sync_indirect+cd ()
ff000fd0a940 zfs:dbuf_sync_list+5e ()
ff000fd0a9b0 zfs:dnode_sync+23b ()
ff000fd0a9f0 zfs:dmu_objset_sync_dnodes+55 ()
ff000fd0aa70 zfs:dmu_objset_sync+13d ()
ff000fd0aac0 zfs:dsl_dataset_sync+5d ()
ff000fd0ab30 zfs:dsl_pool_sync+b5 ()
ff000fd0abd0 zfs:spa_sync+208 ()
ff000fd0ac60 zfs:txg_sync_thread+19a ()
ff000fd0ac70 unix:thread_start+8 ()

syncing file systems... 1 1 done
ereport.io.pciex.rc.fe-msg ena=5344b8176c00c01 dete

Re: [zfs-discuss] ATA UDMA data parity error

2008-01-17 Thread Kent Watsen

On a lark, I decided to create a new pool not including any devices 
connected to card #3 (i.e. "c5")

It crashes again, but this time with a slightly different dump (see below)
  - actually, there are two dumps below, the first is using the xVM 
kernel and the second is not

Any ideas?

Kent



[NOTE: this one using xVM kernel - see below for dump without xVM kernel]

# zpool destroy tank
# zpool status
no pools available
# zpool create tank raidz2 c3t0d0 c3t4d0 c4t0d0 c4t4d0 raidz2 c3t1d0 
c3t5d0 c4t1d0 c4t5d0
# zpool status
  pool: tank
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz2ONLINE   0 0 0
c3t0d0  ONLINE   0 0 0
c3t4d0  ONLINE   0 0 0
c4t0d0  ONLINE   0 0 0
c4t4d0  ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c3t1d0  ONLINE   0 0 0
c3t5d0  ONLINE   0 0 0
c4t1d0  ONLINE   0 0 0
c4t5d0  ONLINE   0 0 0

errors: No known data errors
# ls /tank
# cp -r /usr /tank/usr
Jan 17 08:48:53 san sata: NOTICE: 
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Jan 17 08:48:53 san  port 5: device reset
Jan 17 08:48:53 san sata: NOTICE: 
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Jan 17 08:48:53 san  port 5: link lost
Jan 17 08:48:53 san sata: NOTICE: 
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Jan 17 08:48:53 san  port 5: link established
Jan 17 08:48:55 san marvell88sx: WARNING: marvell88sx1: port 4: DMA 
completed after timed out
Jan 17 08:48:55 san last message repeated 14 times
Jan 17 08:48:55 san sata: NOTICE: 
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Jan 17 08:48:55 san  port 4: device reset
Jan 17 08:48:55 san sata: NOTICE: 
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Jan 17 08:48:55 san  port 4: link lost
Jan 17 08:48:55 san sata: NOTICE: 
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Jan 17 08:48:55 san  port 4: link established
Jan 17 08:48:55 san scsi: WARNING: 
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd15):
Jan 17 08:48:55 san Error for Command: write   Error 
Level: Retryable
Jan 17 08:48:55 san scsi:   Requested Block: 
11893 Error Block: 11893
Jan 17 08:48:55 san scsi:   Vendor: 
ATASerial Number:
Jan 17 08:48:55 san scsi:   Sense Key: No_Additional_Sense
Jan 17 08:48:55 san scsi:   ASC: 0x0 (no additional sense info), 
ASCQ: 0x0, FRU: 0x0
Jan 17 08:48:55 san scsi: WARNING: 
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd15):
Jan 17 08:48:55 san Error for Command: write   Error 
Level: Retryable
Jan 17 08:48:55 san scsi:   Requested Block: 
11983 Error Block: 11983
Jan 17 08:48:55 san scsi:   Vendor: 
ATASerial Number:
Jan 17 08:48:55 san scsi:   Sense Key: No_Additional_Sense
Jan 17 08:48:55 san scsi:   ASC: 0x0 (no additional sense info), 
ASCQ: 0x0, FRU: 0x0
Jan 17 08:48:55 san scsi: WARNING: 
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd15):
Jan 17 08:48:55 san Error for Command: write   Error 
Level: Retryable
Jan 17 08:48:55 san scsi:   Requested Block: 
12988 Error Block: 12988
Jan 17 08:48:55 san scsi:   Vendor: 
ATASerial Number:
Jan 17 08:48:55 san scsi:   Sense Key: No_Additional_Sense
Jan 17 08:48:55 san scsi:   ASC: 0x0 (no additional sense info), 
ASCQ: 0x0, FRU: 0x0
Jan 17 08:48:55 san scsi: WARNING: 
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd15):
Jan 17 08:48:55 WARNING: marvell88sx1: error on port 4:
ATA UDMA data parity error
WARNING: marvell88sx1: error on port 4:
ATA UDMA data parity error
WARNING: marvell88sx1: error on port 4:
ATA UDMA data parity error
WARNING: marvell88sx1: error on port 4:
ATA UDMA data parity error
WARNING: marvell88sx1: error on port 4:
ATA UDMA data parity error
WARNING: marvell88sx1: error on port 4:
ATA UDMA data parity error
WARNING: marvell88sx1: error on port 4:
ATA UDMA data parity error
WARNING: marvell88sx1: error on port 4

Re: [zfs-discuss] ATA UDMA data parity error

2008-01-17 Thread Kent Watsen



Below I create zpools isolating one card at a time
 - when just card#1 - it works
 - when just card #2 - it fails
 - when just card #3 - it works

And then again using the two cards that seem to work:
 - when cards #1 and #3 - it fails

So, at first I thought I narrowed it down to a card, but my last test 
shows that it still fails when the zpool uses two cards that succeed 
individually...


The only thing I can think to point out here is that those two cards on 
on different buses - one connected to a NECuPD720400 and the other 
connected to a AIC-7902, which itself is then connected to the NECuPD720400


Any ideas?

Thanks,
Kent





OK, doing it again using just card #1 (i.e. "c3") works!

   # zpool destroy tank
   # zpool create tank raidz2 c3t0d0 c3t4d0 c3t1d0 c3t5d0
   # cp -r /usr /tank/usr
   cp: cycle detected: /usr/ccs/lib/link_audit/32
   cp: cannot access /usr/lib/amd64/libdbus-1.so.2


Doing it again using just card #2 (i.e. "c4") still fails:

   # zpool destroy tank
   # zpool create tank raidz2 c4t0d0 c4t4d0 c4t1d0 c4t5d0   
   # cp -r /usr /tank/usr

   cp: cycle detected: /usr/ccs/lib/link_audit/32
   cp: cannot access /usr/lib/amd64/libdbus-1.so.2
   WARNING: marvell88sx1: error on port 1:
   ATA UDMA data parity error
   WARNING: marvell88sx1: error on port 1:
   ATA UDMA data parity error
   WARNING: marvell88sx1: error on port 1:
   ATA UDMA data parity error
   WARNING: marvell88sx1: error on port 1:
   ATA UDMA data parity error
   WARNING: marvell88sx1: error on port 1:
   ATA UDMA data parity error
   WARNING: marvell88sx1: error on port 1:
   ATA UDMA data parity error

   SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
   EVENT-TIME: 0x478f6148.0x376ebd4b (0xbf8f86652d)
   PLATFORM: i86pc, CSN: -, HOSTNAME: san
   SOURCE: SunOS, REV: 5.11 snv_78
   DESC: Errors have been detected that require a reboot to ensure system
   integrity.  See http://www.sun.com/msg/SUNOS-8000-0G for more
   information.
   AUTO-RESPONSE: Solaris will attempt to save and diagnose the error
   telemetry
   IMPACT: The system will sync files, save a crash dump if needed, and
   reboot
   REC-ACTION: Save the error summary below in case telemetry cannot be
   saved


   panic[cpu3]/thread=ff000f7bcc80: pcie_pci-0: PCI(-X) Express
   Fatal Error

   ff000f7bcbc0 pcie_pci:pepb_err_msi_intr+d2 ()
   ff000f7bcc20 unix:av_dispatch_autovect+78 ()
   ff000f7bcc60 unix:dispatch_hardint+2f ()
   ff000f786ac0 unix:switch_sp_and_call+13 ()
   ff000f786b10 unix:do_interrupt+a0 ()
   ff000f786b20 unix:cmnint+ba ()
   ff000f786c10 unix:mach_cpu_idle+b ()
   ff000f786c40 unix:cpu_idle+c8 ()
   ff000f786c60 unix:idle+10e ()
   ff000f786c70 unix:thread_start+8 ()

   syncing file systems... done
   ereport.io.pciex.rc.fe-msg ena=bf8f828ea700c01 detector=[ version=0
   scheme=
"dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] 
rc-status=87c
   source-id=200
source-valid=1

   ereport.io.pciex.rc.mue-msg ena=bf8f828ea700c01 detector=[ version=0
   scheme=
"dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] 
rc-status=87c

   ereport.io.pci.sec-rserr ena=bf8f828ea700c01 detector=[ version=0
   scheme="dev"
device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] 
pci-sec-status=6000
   pci-bdg-ctrl=3

   ereport.io.pci.sec-ma ena=bf8f828ea700c01 detector=[ version=0
   scheme="dev"
device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] 
pci-sec-status=6000
   pci-bdg-ctrl=3

   ereport.io.pciex.bdg.sec-perr ena=bf8f828ea700c01 detector=[
   version=0 scheme=
"dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
PROTECTED]" ]
   sue-status=1800
source-id=200 source-valid=1

   ereport.io.pciex.bdg.sec-serr ena=bf8f828ea700c01 detector=[
   version=0 scheme=
"dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
PROTECTED]" ]
   sue-status=1800

   ereport.io.pci.sec-rserr ena=bf8f828ea700c01 detector=[ version=0
   scheme="dev"
device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
PROTECTED]" ]
   pci-sec-status=6420
pci-bdg-ctrl=7

   dumping to /dev/dsk/c2t0d0s1, offset 215547904, content: kernel
   NOTICE: /[EMAIL PROTECTED],0/pci15d9,[EMAIL PROTECTED]:
port 0: device reset

   100% done:


And doing it again using just card #3 (i.e. "c5") works!

   # zpool destroy tank
   cannot open 'tank': no such pool
 (interesting)
   # zpool create tank raidz2 c5t0d0 c5t4d0 c5t1d0 c5t5d0   
   # cp -r /usr /tank/usr





And doing it again using cards #1 and #3 (i.e. "c3" and "c5") fails!

   # zpool destroy tank
   # zpool create tank raidz2 c3t0d0 c3t4d0 c3t1d0 c3t5d0 raidz2 c5t0d0
   c5t4d0 c5t1d0 c5t5d0
   # cp -r /usr /tank/usr
   cp: cycle detected: /usr/ccs/lib/link_audit/32
   cp: cannot access

Re: [zfs-discuss] zfs comparison

2008-01-17 Thread Robert Milkowski
Hello Agile,

Comments in-between


Thursday, January 17, 2008, 2:20:42 AM, you wrote:

AA> Hi - I'm new to ZFS but not to Solaris.

AA> Is there a search able interface to the zfs-discuss mail archives?


http://opensolaris.org/os/discussions/
and look for zfs-discuss list.

AA> We have a Windows 2003 Cluster with 200 TB SAN running under
AA> Active Directory with file system compression.

AA> Half the population is running Linux and the other half is running
AA> Windows XP.

AA> I'm interested in replacing the Window 2003 Cluster and filesystem
AA> compression with Solaris 10 and ZFS compression.

AA> Pardon my ignorance, but is ZFS with compression safe to use in a
AA> production environment?

Yes, it is.

Keep in mind that if you go for Solaris 10 the only compression
supported right now is lzjb. Open Solaris additionally suppports gzip.


-- 
Best regards,
 Robert Milkowski   mailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SSD cache device hangs ZFS

2008-01-17 Thread Bill Moloney
I'm using a FC flash drive as a cache device to one of my pools:
  zpool  add  pool-name  cache  device-name
and I'm running random IO tests to assess performance on a 
snv-78 x86 system

I have a set of threads each doing random reads to about 25% of
its own, previously written, large file ... a test run will read in 
about 20GB on a server with 2GB of RAM

using   zpool iostat,I can see that the SSD device is being used
aggressively, and each time I run my random read test I find
better performance than the previous execution ... I also see my
SSD drive filling up more and more between runs

this behavior is what I expect, and the performance improvements
I see are quite good (4X improvement over 5 runs), but I'm getting
hung from time to time

after several successful runs of my test application, some run of
my test will be running fine, but at some point before it finishes,
I see that all IO to the pool has stopped, and, while I still can use
the system for other things, most operations that involve the pool
will also hang (e.g.   a  wcon a pool based file will hang)

any of these hung processes seem to sleep in the kernel 
at an uninterruptible level, and will not die on a  kill -9  attempt

any attempt to shutdown will hang, and the only way I can recover
is to use the   reboot   -qnd   command (I think that the -d option
in the key since it keeps the system from trying to sync before
reboot)

when I reboot, everything is fine again and I can continue testing
until I run into this problem again ... does anyone have any thoughts
on this issue ? ... thanks, Bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs send/receive of an entire pool

2008-01-17 Thread James Andrewartha
Hi,

I have a zfs filesystem that I'd like to move to another host. It's part
of a pool called space, which is mounted at /space and has several child
filesystems. The first hurdle I came across was that zfs send only works
on snapshots, so I create one:
# zfs snapshot -r [EMAIL PROTECTED]
# zfs list -t snapshot  
NAME USED  AVAIL  REFER MOUNTPOINT
[EMAIL PROTECTED]   0  -  25.9G  -
space/[EMAIL PROTECTED]0  -31K  -
space/[EMAIL PROTECTED]   924K  -  52.4G  -
space/[EMAIL PROTECTED]   0  -38K  -
space/freebsd/[EMAIL PROTECTED]  0  -36K  -
space/freebsd/[EMAIL PROTECTED]   0  -  4.11G  -
space/[EMAIL PROTECTED]  0  -  47.6G  -
space/[EMAIL PROTECTED]352K  -  14.7G  -
space/netboot/[EMAIL PROTECTED]   0  -  95.5M  -
space/netboot/manduba-freebsd/[EMAIL PROTECTED]  0  -36K  -
space/netboot/manduba-freebsd/[EMAIL PROTECTED]  0  -   327M  -
space/netboot/manduba-freebsd/[EMAIL PROTECTED]  0  -36K  -
space/[EMAIL PROTECTED]   234K  -   167G  -

On the destination, I have created a zpool, again called space and
mounted at /space. However, I can't work out how to send [EMAIL PROTECTED]
to the new machine:
# zfs send [EMAIL PROTECTED] | ssh musundo "zfs recv -vn -d space"
cannot receive: destination 'space' exists
# zfs send [EMAIL PROTECTED] | ssh musundo "zfs recv -vn space"
cannot receive: destination 'space' exists
# zfs send [EMAIL PROTECTED] | ssh musundo "zfs recv -vn space2"
cannot receive: destination does not exist
# zfs send [EMAIL PROTECTED] | ssh musundo "zfs recv -vn space/space2"
would receive full stream of [EMAIL PROTECTED] into space/[EMAIL PROTECTED]
# zfs send [EMAIL PROTECTED] | ssh musundo "zfs recv -vn [EMAIL PROTECTED]"
cannot receive: destination 'space' exists
# zfs send [EMAIL PROTECTED] | ssh musundo "zfs recv -vn [EMAIL PROTECTED]"
cannot receive: destination does not exist

What am I missing here? I can't recv to space, because it exists, but I
can't make it not exist since it's the root filesystem of the pool. Do I
have to send each filesystem individually and rsync up the root fs?

Thanks,

James Andrewartha


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD cache device hangs ZFS

2008-01-17 Thread Marion Hakanson
[EMAIL PROTECTED] said:
> I have a set of threads each doing random reads to about 25% of its own,
> previously written, large file ... a test run will read in  about 20GB on a
> server with 2GB of RAM 
> . . .
> after several successful runs of my test application, some run of my test
> will be running fine, but at some point before it finishes, I see that all IO
> to the pool has stopped, and, while I still can use the system for other
> things, most operations that involve the pool will also hang (e.g.   a
> wcon a pool based file will hang) 


Bill,

Unencumbered by full knowledge of the history of your project, I'll say
that I think you need more RAM.  I've seen this behavior on a system
with 16GB RAM (and no SSD for cache), if heavy I/O goes on long enough.
If larger RAM is not feasible, or you don't have a 64-bit CPU, you could
try limiting the size of the ARC as well.

That's not to say you're not seeing some other issue, but 2GB for heavy
ZFS I/O seems a little on the small side, given my experience.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive of an entire pool

2008-01-17 Thread Richard Elling
You don't say which version of ZFS you are running, but what you
want is the -R option for zfs send.  See also the example of send
usage in the zfs(1m) man page.
 -- richard

James Andrewartha wrote:
> Hi,
>
> I have a zfs filesystem that I'd like to move to another host. It's part
> of a pool called space, which is mounted at /space and has several child
> filesystems. The first hurdle I came across was that zfs send only works
> on snapshots, so I create one:
> # zfs snapshot -r [EMAIL PROTECTED]
> # zfs list -t snapshot  
> NAME USED  AVAIL  REFER MOUNTPOINT
> [EMAIL PROTECTED]   0  -  25.9G  -
> space/[EMAIL PROTECTED]0  -31K  -
> space/[EMAIL PROTECTED]   924K  -  52.4G  -
> space/[EMAIL PROTECTED]   0  -38K  -
> space/freebsd/[EMAIL PROTECTED]  0  -36K  -
> space/freebsd/[EMAIL PROTECTED]   0  -  4.11G  -
> space/[EMAIL PROTECTED]  0  -  47.6G  -
> space/[EMAIL PROTECTED]352K  -  14.7G  -
> space/netboot/[EMAIL PROTECTED]   0  -  95.5M  -
> space/netboot/manduba-freebsd/[EMAIL PROTECTED]  0  -36K  -
> space/netboot/manduba-freebsd/[EMAIL PROTECTED]  0  -   327M  -
> space/netboot/manduba-freebsd/[EMAIL PROTECTED]  0  -36K  -
> space/[EMAIL PROTECTED]   234K  -   167G  -
>
> On the destination, I have created a zpool, again called space and
> mounted at /space. However, I can't work out how to send [EMAIL PROTECTED]
> to the new machine:
> # zfs send [EMAIL PROTECTED] | ssh musundo "zfs recv -vn -d space"
> cannot receive: destination 'space' exists
> # zfs send [EMAIL PROTECTED] | ssh musundo "zfs recv -vn space"
> cannot receive: destination 'space' exists
> # zfs send [EMAIL PROTECTED] | ssh musundo "zfs recv -vn space2"
> cannot receive: destination does not exist
> # zfs send [EMAIL PROTECTED] | ssh musundo "zfs recv -vn space/space2"
> would receive full stream of [EMAIL PROTECTED] into space/[EMAIL PROTECTED]
> # zfs send [EMAIL PROTECTED] | ssh musundo "zfs recv -vn [EMAIL PROTECTED]"
> cannot receive: destination 'space' exists
> # zfs send [EMAIL PROTECTED] | ssh musundo "zfs recv -vn [EMAIL PROTECTED]"
> cannot receive: destination does not exist
>
> What am I missing here? I can't recv to space, because it exists, but I
> can't make it not exist since it's the root filesystem of the pool. Do I
> have to send each filesystem individually and rsync up the root fs?
>
> Thanks,
>
> James Andrewartha
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD cache device hangs ZFS

2008-01-17 Thread Richard Elling
Marion Hakanson wrote:
> [EMAIL PROTECTED] said:
>   
>> I have a set of threads each doing random reads to about 25% of its own,
>> previously written, large file ... a test run will read in  about 20GB on a
>> server with 2GB of RAM 
>> . . .
>> after several successful runs of my test application, some run of my test
>> will be running fine, but at some point before it finishes, I see that all IO
>> to the pool has stopped, and, while I still can use the system for other
>> things, most operations that involve the pool will also hang (e.g.   a
>> wcon a pool based file will hang) 
>> 
>
>
> Bill,
>
> Unencumbered by full knowledge of the history of your project, I'll say
> that I think you need more RAM.  I've seen this behavior on a system
> with 16GB RAM (and no SSD for cache), if heavy I/O goes on long enough.
> If larger RAM is not feasible, or you don't have a 64-bit CPU, you could
> try limiting the size of the ARC as well.
>
> That's not to say you're not seeing some other issue, but 2GB for heavy
> ZFS I/O seems a little on the small side, given my experience.
>   

If this is the case, you might try using arcstat to view ARC usage.
http://blogs.sun.com/realneel/entry/zfs_arc_statistics
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ATA UDMA data parity error

2008-01-17 Thread Richard Elling
Looks like flaky or broken hardware to me.  It could be a
power supply issue, those tend to rear their ugly head when
workloads get heavy and they are usually the easiest to
replace.
 -- richard

Kent Watsen wrote:
>
>
> Below I create zpools isolating one card at a time
>   - when just card#1 - it works
>   - when just card #2 - it fails
>   - when just card #3 - it works
>
> And then again using the two cards that seem to work:
>   - when cards #1 and #3 - it fails
>
> So, at first I thought I narrowed it down to a card, but my last test 
> shows that it still fails when the zpool uses two cards that succeed 
> individually...
>
> The only thing I can think to point out here is that those two cards 
> on on different buses - one connected to a NECuPD720400 and the other 
> connected to a AIC-7902, which itself is then connected to the 
> NECuPD720400
>
> Any ideas?
>
> Thanks,
> Kent
>
>
>
>
>
> OK, doing it again using just card #1 (i.e. "c3") works!
>
> # zpool destroy tank
> # zpool create tank raidz2 c3t0d0 c3t4d0 c3t1d0 c3t5d0
> # cp -r /usr /tank/usr
> cp: cycle detected: /usr/ccs/lib/link_audit/32
> cp: cannot access /usr/lib/amd64/libdbus-1.so.2
>
>
> Doing it again using just card #2 (i.e. "c4") still fails:
>
> # zpool destroy tank
> # zpool create tank raidz2 c4t0d0 c4t4d0 c4t1d0 c4t5d0   
> # cp -r /usr /tank/usr
> cp: cycle detected: /usr/ccs/lib/link_audit/32
> cp: cannot access /usr/lib/amd64/libdbus-1.so.2
> WARNING: marvell88sx1: error on port 1:
> ATA UDMA data parity error
> WARNING: marvell88sx1: error on port 1:
> ATA UDMA data parity error
> WARNING: marvell88sx1: error on port 1:
> ATA UDMA data parity error
> WARNING: marvell88sx1: error on port 1:
> ATA UDMA data parity error
> WARNING: marvell88sx1: error on port 1:
> ATA UDMA data parity error
> WARNING: marvell88sx1: error on port 1:
> ATA UDMA data parity error
>
> SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
> EVENT-TIME: 0x478f6148.0x376ebd4b (0xbf8f86652d)
> PLATFORM: i86pc, CSN: -, HOSTNAME: san
> SOURCE: SunOS, REV: 5.11 snv_78
> DESC: Errors have been detected that require a reboot to ensure system
> integrity.  See http://www.sun.com/msg/SUNOS-8000-0G for more
> information.
> AUTO-RESPONSE: Solaris will attempt to save and diagnose the error
> telemetry
> IMPACT: The system will sync files, save a crash dump if needed,
> and reboot
> REC-ACTION: Save the error summary below in case telemetry cannot
> be saved
>
>
> panic[cpu3]/thread=ff000f7bcc80: pcie_pci-0: PCI(-X) Express
> Fatal Error
>
> ff000f7bcbc0 pcie_pci:pepb_err_msi_intr+d2 ()
> ff000f7bcc20 unix:av_dispatch_autovect+78 ()
> ff000f7bcc60 unix:dispatch_hardint+2f ()
> ff000f786ac0 unix:switch_sp_and_call+13 ()
> ff000f786b10 unix:do_interrupt+a0 ()
> ff000f786b20 unix:cmnint+ba ()
> ff000f786c10 unix:mach_cpu_idle+b ()
> ff000f786c40 unix:cpu_idle+c8 ()
> ff000f786c60 unix:idle+10e ()
> ff000f786c70 unix:thread_start+8 ()
>
> syncing file systems... done
> ereport.io.pciex.rc.fe-msg ena=bf8f828ea700c01 detector=[
> version=0 scheme=
>  "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] 
> rc-status=87c
> source-id=200
>  source-valid=1
>
> ereport.io.pciex.rc.mue-msg ena=bf8f828ea700c01 detector=[
> version=0 scheme=
>  "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] 
> rc-status=87c
>
> ereport.io.pci.sec-rserr ena=bf8f828ea700c01 detector=[ version=0
> scheme="dev"
>  device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] 
> pci-sec-status=6000
> pci-bdg-ctrl=3
>
> ereport.io.pci.sec-ma ena=bf8f828ea700c01 detector=[ version=0
> scheme="dev"
>  device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] 
> pci-sec-status=6000
> pci-bdg-ctrl=3
>
> ereport.io.pciex.bdg.sec-perr ena=bf8f828ea700c01 detector=[
> version=0 scheme=
>  "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL 
> PROTECTED]/pci1033,[EMAIL PROTECTED]" ]
> sue-status=1800
>  source-id=200 source-valid=1
>
> ereport.io.pciex.bdg.sec-serr ena=bf8f828ea700c01 detector=[
> version=0 scheme=
>  "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL 
> PROTECTED]/pci1033,[EMAIL PROTECTED]" ]
> sue-status=1800
>
> ereport.io.pci.sec-rserr ena=bf8f828ea700c01 detector=[ version=0
> scheme="dev"
>  device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL 
> PROTECTED]/pci1033,[EMAIL PROTECTED]" ]
> pci-sec-status=6420
>  pci-bdg-ctrl=7
>
> dumping to /dev/dsk/c2t0d0s1, offset 215547904, content: kernel
> NOTICE: /[EMAIL PROTECTED],0/pci15d9,[EMAIL PROTECTED]:
>  port 0: device reset
>
> 100% done:
>
>
> And doing 

Re: [zfs-discuss] Does ZFS handle a SATA II " port multiplier " ?

2008-01-17 Thread Patrick O'Sullivan
At the risk of groveling, I'd like to add one more to the set of people wishing 
for this to be completed. Any hint on a timeframe? I see reference to this bug 
back in 2006 
(http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6409327), so I was 
wondering if there was any progress.

Thanks!
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Panic on Zpool Import (Urgent)

2008-01-17 Thread Ben Rockwood
The solution here was to upgrade to snv_78.  By "upgrade" I mean 
re-jumpstart the system.

I tested snv_67 via net-boot but the pool paniced just as below.  I also 
attempted using zfs_recover without success.

I then tested snv_78 via net-boot, used both "aok=1" and 
"zfs:zfs_recover=1" and was able to (slowly) import the pool.  Following 
that test I exported and then did a full re-install of the box.

A very important note to anyone upgrading a Thumper!  Don't forget about 
the NCQ bug.  After upgrading to a release more recent than snv_60 add 
the following to /etc/system:

set sata:sata_max_queue_depth = 0x1

If you don't life will be highly unpleasant and you'll believe that disks are 
failing everywhere when in fact they are not.

benr.




Ben Rockwood wrote:
> Today, suddenly, without any apparent reason that I can find, I'm 
> getting panic's during zpool import.  The system paniced earlier today 
> and has been suffering since.  This is snv_43 on a thumper.  Here's the 
> stack:
>
> panic[cpu0]/thread=99adbac0: assertion failed: ss != NULL, file: 
> ../../common/fs/zfs/space_map.c, line: 145
>
> fe8000a240a0 genunix:assfail+83 ()
> fe8000a24130 zfs:space_map_remove+1d6 ()
> fe8000a24180 zfs:space_map_claim+49 ()
> fe8000a241e0 zfs:metaslab_claim_dva+130 ()
> fe8000a24240 zfs:metaslab_claim+94 ()
> fe8000a24270 zfs:zio_dva_claim+27 ()
> fe8000a24290 zfs:zio_next_stage+6b ()
> fe8000a242b0 zfs:zio_gang_pipeline+33 ()
> fe8000a242d0 zfs:zio_next_stage+6b ()
> fe8000a24320 zfs:zio_wait_for_children+67 ()
> fe8000a24340 zfs:zio_wait_children_ready+22 ()
> fe8000a24360 zfs:zio_next_stage_async+c9 ()
> fe8000a243a0 zfs:zio_wait+33 ()
> fe8000a243f0 zfs:zil_claim_log_block+69 ()
> fe8000a24520 zfs:zil_parse+ec ()
> fe8000a24570 zfs:zil_claim+9a ()
> fe8000a24750 zfs:dmu_objset_find+2cc ()
> fe8000a24930 zfs:dmu_objset_find+fc ()
> fe8000a24b10 zfs:dmu_objset_find+fc ()
> fe8000a24bb0 zfs:spa_load+67b ()
> fe8000a24c20 zfs:spa_import+a0 ()
> fe8000a24c60 zfs:zfs_ioc_pool_import+79 ()
> fe8000a24ce0 zfs:zfsdev_ioctl+135 ()
> fe8000a24d20 genunix:cdev_ioctl+55 ()
> fe8000a24d60 specfs:spec_ioctl+99 ()
> fe8000a24dc0 genunix:fop_ioctl+3b ()
> fe8000a24ec0 genunix:ioctl+180 ()
> fe8000a24f10 unix:sys_syscall32+101 ()
>
> syncing file systems... done
>
> This is almost identical to a post to this list over a year ago titled 
> "ZFS Panic".  There was follow up on it but the results didn't make it 
> back to the list.
>
> I spent time doing a full sweep for any hardware failures, pulled 2 
> drives that I suspected as problematic but weren't flagged as such, etc, 
> etc, etc.  Nothing helps.
>
> Bill suggested a 'zpool import -o ro' on the other post, but thats not 
> working either.
>
> I _can_ use 'zpool import' to see the pool, but I have to force the 
> import.  A simple 'zpool import' returns output in about a minute.  
> 'zpool import -f poolname' takes almost exactly 10 minutes every single 
> time, like it hits some timeout and then panics.
>
> I did notice that while the 'zpool import' is running 'iostat' is 
> useless, just hangs.  I still want to believe this is some device 
> misbehaving but I have no evidence to support that theory.
>
> Any and all suggestions are greatly appreciated.  I've put around 8 
> hours into this so far and I'm getting absolutely nowhere.
>
> Thanks
>
> benr.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Does ZFS handle a SATA II " port multiplier " ?

2008-01-17 Thread Richard Elling
Patrick O'Sullivan wrote:
> At the risk of groveling, I'd like to add one more to the set of people 
> wishing for this to be completed. Any hint on a timeframe? I see reference to 
> this bug back in 2006 
> (http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6409327), so I 
> was wondering if there was any progress.
>   

This has nothing to do with ZFS, as it is a device driver issue
(hint: category driver:sata)
Perhaps you could ask over in the device drivers community?
http://www.opensolaris.org/os/community/device_drivers/
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Does ZFS handle a SATA II " port multiplier " ?

2008-01-17 Thread Brian Pinkerton
Right.  I can confirm that using port-multiplier-capable cards works  
on the Mac.  I've got an 8-disk zpool with only two SATA ports.  Works  
like a charm.

bri

On Jan 17, 2008, at 12:22 PM, Richard Elling wrote:

> Patrick O'Sullivan wrote:
>> At the risk of groveling, I'd like to add one more to the set of  
>> people wishing for this to be completed. Any hint on a timeframe? I  
>> see reference to this bug back in 2006 
>> (http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6409327 
>> ), so I was wondering if there was any progress.
>>
>
> This has nothing to do with ZFS, as it is a device driver issue
> (hint: category driver:sata)
> Perhaps you could ask over in the device drivers community?
> http://www.opensolaris.org/os/community/device_drivers/
> -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Does ZFS handle a SATA II " port multiplier " ?

2008-01-17 Thread Patrick O'Sullivan
Thanks. I'll give that a shot. I neglected to notice what forum it was in since 
the question morphed into "when will Solaris support port multipliers?"

Thanks again.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD cache device hangs ZFS

2008-01-17 Thread Bill Moloney
Thanks Marion and richard,
but I've run these tests with much larger data sets
and have never had this kind of problem when no
cache device was involved

In fact, if I remove the SSD cache device from my
pool and run the tests, they seem to run with no issues
(except for some reduced performance as I would expect)

the same SSD disk works perfectly as a separate ZIL device,
providing improved IO with synchronous writes on large test
runs of > 100GBs

... Bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ATA UDMA data parity error

2008-01-17 Thread Al Hopper
On Thu, 17 Jan 2008, Richard Elling wrote:

> Looks like flaky or broken hardware to me.  It could be a
> power supply issue, those tend to rear their ugly head when
> workloads get heavy and they are usually the easiest to
> replace.

+1  PSU or memory (run memtestx86)


> -- richard
>
> Kent Watsen wrote:
>>
>>
>> Below I create zpools isolating one card at a time
>>   - when just card#1 - it works
>>   - when just card #2 - it fails
>>   - when just card #3 - it works
>>
>> And then again using the two cards that seem to work:
>>   - when cards #1 and #3 - it fails

 snip .

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ATA UDMA data parity error

2008-01-17 Thread Kent Watsen





Thanks Richard and Al,

I'll refrain from express how disturbing this is, as I'm trying to help
the Internet be kid-safe   ;)

As for the PSU, I'd be very surprised there if that were it as it is a
3+1 redundant PSU that came with this system, built by a reputable
integrator.  Also, the PSU is plugged into a high-end APC UPS, sucking
just 25% of its capacity.  And the UPS has a dedicated 240V 30A circuit.

As for the memory, it might be - even though the same integrator
installed the SIMMs and did a 24-hour burn-in test, you never know.  So
I'm running memtest86 now, which is 12% passed so far...

I'm going to try another hardware test, which is to switch around the
backplanes my cards are plugging into.  If the same backplanes are
failing, then I know all my AOC-SAT2-MV8 cards are OK.  Likewise, if
the same backplane doesn't fail, then I know all my backplanes are OK. 
Either way, I'll eliminate one potential hardware issue.

But I still think that it might be software related.  My first post was
trying to point out some anomalies in how the devices are being named -
see the highlighted parts below? - doesn't that look strange?  - why
would Solaris use different naming convention for some disks?

  /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
  /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
  /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
  /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
  /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,11ab@6/[EMAIL PROTECTED],0
  /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,11ab@6/[EMAIL PROTECTED],0
  /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,11ab@6/[EMAIL PROTECTED],0
  /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,11ab@6/[EMAIL PROTECTED],0
  /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED],1/pci11ab,11ab@6/[EMAIL PROTECTED],0
  /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED],1/pci11ab,11ab@6/[EMAIL PROTECTED],0
  /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED],1/pci11ab,11ab@6/[EMAIL PROTECTED],0
  /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED],1/pci11ab,11ab@6/[EMAIL PROTECTED],0



  PS: in case you can't see it, look at the last four
disks and notice how the contain a spurious ",1" and also have the same
"@6" as the middle four disks







Any ideas, suggestions, condolences?

Thanks,
Kent







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ATA UDMA data parity error

2008-01-17 Thread Richard Elling
Kent Watsen wrote:
>
> Thanks Richard and Al,
>
> I'll refrain from express how disturbing this is, as I'm trying to 
> help the Internet be kid-safe   ;)
>
> As for the PSU, I'd be very surprised there if that were it as it is a 
> 3+1 redundant PSU that came with this system, built by a reputable 
> integrator.  Also, the PSU is plugged into a high-end APC UPS, sucking 
> just 25% of its capacity.  And the UPS has a dedicated 240V 30A circuit.
>
> As for the memory, it might be - even though the same integrator 
> installed the SIMMs and did a 24-hour burn-in test, you never know.  
> So I'm running memtest86 now, which is 12% passed so far...
>
> I'm going to try another hardware test, which is to switch around the 
> backplanes my cards are plugging into.  If the same backplanes are 
> failing, then I know all my AOC-SAT2-MV8 cards are OK.  Likewise, if 
> the same backplane doesn't fail, then I know all my backplanes are 
> OK.  Either way, I'll eliminate one potential hardware issue.
>

You could also try the SunVTS system tests.  They are
what we use in the factory to prove systems work before
being shipped to customers.  Located in /usr/sunvts where
READMEs, man pages, and binaries live.  There are a
zillion options, but I highly recommend the readonly disk
tests for your case.

> But I still think that it might be software related.  My first post 
> was trying to point out some anomalies in how the devices are being 
> named - see the highlighted parts below? - doesn't that look strange?  
> - why would Solaris use different naming convention for some disks?
>   /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
> PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
>   /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
> PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
>   /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
> PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
>   /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
> PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
>   /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
> PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
>   /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
> PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
>   /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
> PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
>   /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
> PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
>   /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
> PROTECTED],1/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
>   /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
> PROTECTED],1/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
>   /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
> PROTECTED],1/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
>   /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL 
> PROTECTED],1/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
>
>
> PS: in case you can't see it, look at the last four disks and
> notice how the contain a spurious ",1" and also have the same
> "@6" as the middle four disks
>
>

looks reasonable to me.  These are just PCI device identifiers, certainly
nothing to be worried about.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] MySQL/ZFS backup program posted.

2008-01-17 Thread Jason J. W. Williams
Hey Y'all,

I've posted the program (SnapBack) my company developed internally for
backing up production MySQL servers using ZFS snapshots:
http://blogs.digitar.com/jjww/?itemid=56

Hopefully, it'll save other folks some time. We use it a lot for
standing up new MySQL slaves as well.

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive of an entire pool

2008-01-17 Thread James Andrewartha
On Thu, 2008-01-17 at 09:29 -0800, Richard Elling wrote:
> You don't say which version of ZFS you are running, but what you
> want is the -R option for zfs send.  See also the example of send
> usage in the zfs(1m) man page.

Sorry, I'm running SXCE nv75. I can't see any mention of send -R in the
man page. Ah, it's PSARC/2007/574 and nv77. I'm not convinced it'll
solve my problem (sending the root filesystem of a pool), but I'll
upgrade and give it a shot.

Thanks,

James Andrewartha
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Integrated transactional upgrades with ZFS

2008-01-17 Thread Erast Benson
Hi guys,

new article available explaining details on how enterprise-like upgrades
integrated with Nexenta Core Platform starting from RC2 using ZFS
capabilities and Debian APT:

http://www.nexenta.org/os/TransactionalZFSUpgrades

What is NexentaCP?

NexentaCP is a minimal (core) foundation that can be used to quickly
build servers, desktops, and custom distributions tailored for
specialized applications such as NexentaStor. Similar to NexentOS
desktop distribution, NexentaCP combines reliable state-of-the-art
kernel with the GNU userland, and the ability to integrate open source
components in no time. However, unlike NexentaOS desktop distribution,
NexentaCP does not aim to provide a complete desktop. The overriding
objective for NexentaCP is - stable foundation.

Enjoy!

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs comparison

2008-01-17 Thread Anton B. Rang
> Pardon my ignorance, but is ZFS with compression safe to use in a
> production environment?

I'd say, as safe as ZFS in general.  ZFS has been well-tested by Sun, but it's 
not as mature as UFS, say.  There is not yet a fsck equivalent for ZFS, so if a 
bug results in damage to your ZFS data pool, you'll need to restore the whole 
pool from backups.  This may or may not be an issue depending on the amount of 
downtime you can tolerate (and the size of your pool).

Adding compression to the mix doesn't really increase risk, IMO.

As always, test yourself, in your environment before deploying.  :-)

Anton
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ATA UDMA data parity error

2008-01-17 Thread Anton B. Rang
Definitely a hardware problem (possibly compounded by a bug).  Some key phrases 
and routines:

  ATA UDMA data parity error

This one actually looks like a misnomer.  At least, I'd normally expect "data 
parity error" not to crash the system!  (It should result in a retry or EIO.)

  PCI(-X) Express Fatal Error

This one's more of an issue -- it indicates that the PCI Express bus had an 
error.

  pcie_pci:pepb_err_msi_intr

This indicates an error on the PCI bus which has been reflected through to the 
PCI Express bus. There should be more detail, but it's hard to figure it out 
from what's below. (The report is showing multiple errors, including both 
parity errors & system errors, which seems unlikely unless there's a hardware 
design flaw or a software bug.)

Others have suggested the power supply or memory, but in my experience these 
types of errors are more often due to a faulty system backplane or card (and 
occasionally a bad bridge chip).
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD cache device hangs ZFS

2008-01-17 Thread Tomas Ögren
On 17 January, 2008 - Bill Moloney sent me these 0,7K bytes:

> Thanks Marion and richard,
> but I've run these tests with much larger data sets
> and have never had this kind of problem when no
> cache device was involved
> 
> In fact, if I remove the SSD cache device from my
> pool and run the tests, they seem to run with no issues
> (except for some reduced performance as I would expect)

My uneducated guess is that without the SSD, the disk performance is low
enough that you don't need that much memory.. with the SSD, performance
goes up and so does memory usage due to caches.. Limiting the ARC or
lowering the flush timeout might help..

/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss