date:20070323

Re: [zfs-discuss] 6410 expansion shelf

2007-03-23 Thread Wee Yeh Tan


I should be able to reply to you next Tuesday -- my 6140 SATA
expansion tray is due to arrive. Meanwhile, what kind of problem do
you have with the 3511?


--
Just me,
Wire ...

On 3/23/07, Frank Cusack <[EMAIL PROTECTED]> wrote:

Does anyone have a 6140 expansion shelf that they can hook directly to
a host?  Just wondering if this configuration works.  Previously I
though the expansion connector was proprietary but now I see it's
just fibre channel.

I tried this before with a 3511 and it "kind of" worked but ultimately
had various problems and I had to give up on it.

Hoping to avoid the cost of the raid controller.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Proposal: ZFS hotplug support and autoconfiguration

2007-03-23 Thread Pawel Jakub Dawidek

On Thu, Mar 22, 2007 at 08:39:55AM -0700, Eric Schrock wrote:
> Again, thanks to devids, the autoreplace code would not kick in here at
> all.  You would end up with an identical pool.

Eric, maybe I'm missing something, but why ZFS depend on devids at all?
As I understand it, devid is something that never change for a block
device, eg. disk serial number, but on the other hand it is optional, so
we can rely on the fact it's always there (I mean for all block devices
we use).

Why we simply not forget about devids and just focus on on-disk metadata
to detect pool components?

The only reason I see is performance. This is probably why
/etc/zfs/zpool.cache is used as well.

In FreeBSD we have the GEOM infrastructure for storage. Each storage
device (disk, partition, mirror, etc.) is simply a GEOM provider. If
GEOM provider appears (eg. disk is inserted, partition is configured)
all interested parties are informed about this I can 'taste' the
provider by reading metadata specific for them. The same when provider
goes away - all interested parties are informed and can react
accordingly.

We don't see any performance problems related to the fact that each disk
that appears is read by many "GEOM classes".

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!

pgpm6A6Tnggir.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Proposal: ZFS hotplug support and autoconfiguration

2007-03-23 Thread Pawel Jakub Dawidek

On Fri, Mar 23, 2007 at 11:31:03AM +0100, Pawel Jakub Dawidek wrote:
> On Thu, Mar 22, 2007 at 08:39:55AM -0700, Eric Schrock wrote:
> > Again, thanks to devids, the autoreplace code would not kick in here at
> > all.  You would end up with an identical pool.
> 
> Eric, maybe I'm missing something, but why ZFS depend on devids at all?
> As I understand it, devid is something that never change for a block
> device, eg. disk serial number, but on the other hand it is optional, so
> we can rely on the fact it's always there (I mean for all block devices

s/can/can't/

> we use).
> 
> Why we simply not forget about devids and just focus on on-disk metadata
> to detect pool components?
> 
> The only reason I see is performance. This is probably why
> /etc/zfs/zpool.cache is used as well.
> 
> In FreeBSD we have the GEOM infrastructure for storage. Each storage
> device (disk, partition, mirror, etc.) is simply a GEOM provider. If
> GEOM provider appears (eg. disk is inserted, partition is configured)
> all interested parties are informed about this I can 'taste' the
> provider by reading metadata specific for them. The same when provider
> goes away - all interested parties are informed and can react
> accordingly.
> 
> We don't see any performance problems related to the fact that each disk
> that appears is read by many "GEOM classes".

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpMjTSvCwNLk.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS ontop of SVM - CKSUM errors

2007-03-23 Thread Robert Milkowski

Hi.

bash-3.00# uname -a
SunOS nfs-14-2.srv 5.10 Generic_125101-03 i86pc i386 i86pc

I created first zpool (stripe of 85 disks) and did some simple stress testing - 
everything seems almost alright (~700MB seq reads, ~430 seqential writes).

Then I destroyed pool and put SVM stripe on top the same disks utilizing the 
fact that zfs already put EFI and s0 represents almost entire disk. The on top 
on SVM volume I put zfs and simple dd files, then zpool scrub and:

bash-3.00# zpool status test
  pool: test
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub completed with 66 errors on Fri Mar 23 12:52:36 2007
config:

NAMESTATE READ WRITE CKSUM
testONLINE   0 0   134
  /dev/md/dsk/d100  ONLINE   0 0   134

errors: 66 data errors, use '-v' for a list
bash-3.00# 

Disks are from Clariion CX3-40 with FC 15K disks using MPxIO (2x 4Gb links).

I was changing watermarks for cache on the array and now I wonder - the array 
or SVM+ZFS?

I'm a little bit suspicious about SVM as I can get ~80MB/s only on avarage with 
short burst upto ~380MB/s (no matter if it's ZFS, UFS or directly raw-device) 
which is much much less than ZFS (and on x4500 I can get ~2GB/s read with SVM).

No errors in logs, metastat is clear.
Of course fmdump -e reports errors from zfs but it's expected.

So I destroyed zpool, created again, dd from /dev/zero to pool, and then read a 
file - and right a way I get CKSUM errors so it seems like repeatable (no 
watermarks fiddling this time).

Later I destroyed pool and SVM device, create new pool on the same disks, the 
same dd and this time no CKSUM errors and much better performance.


bash-3.00# metastat -p d100
d100 1 85 /dev/dsk/c6t6006016062231B003CBA35791CD9DB11d0s0 
/dev/dsk/c6t6006016062231B0004D599691CD9DB11d0s0 
/dev/dsk/c6t6006016062231B00BC373C571CD9DB11d0s0 
/dev/dsk/c6t6006016062231B0032CCFE481CD9DB11d0s0 
/dev/dsk/c6t6006016062231B0096CB093A1CD9DB11d0s0 
/dev/dsk/c6t6006016062231B00D40FEB261CD9DB11d0s0 
/dev/dsk/c6t6006016062231B00DC759B171CD9DB11d0s0 
/dev/dsk/c6t6006016062231B00D68713071CD9DB11d0s0 
/dev/dsk/c6t6006016062231B00CE8F64F71BD9DB11d0s0 
/dev/dsk/c6t6006016062231B009005C0E61BD9DB11d0s0 
/dev/dsk/c6t6006016062231B00CABCE6D81BD9DB11d0s0 
/dev/dsk/c6t6006016062231B00F2B124C91BD9DB11d0s0 
/dev/dsk/c6t6006016062231B0004FE5CBA1BD9DB11d0s0 
/dev/dsk/c6t6006016062231B0034CFFBAB1BD9DB11d0s0 
/dev/dsk/c6t6006016062231B00DCB4349F1BD9DB11d0s0 
/dev/dsk/c6t6006016062231B0024C093921BD9DB11d0s0 
/dev/dsk/c6t6006016062231B0090F561871BD9DB11d0s0 
/dev/dsk/c6t6006016062231B000EB2C0751BD9DB11d0s0 
/dev/dsk/c6t6006016062231B008CF5B2671BD9DB11d0s0 
/dev/dsk/c6t6006016062231B002A6ED0561BD9DB11d0s0 
/dev/dsk/c6t6006016062231B00441DFD4C1BD9DB11d0s0 
/dev/dsk/c6t6006016062231B001CF022401BD9DB11d0s0 
/dev/dsk/c6t6006016062231B00449925351BD9DB11d0s0 
/dev/dsk/c6t6006016062231B00A01632271BD9DB11d0s0 
/dev/dsk/c6t6006016062231B00F2344A1C1BD9DB11d0s0 
/dev/dsk/c6t6006016062231B0048C112121BD9DB11d0s0 
/dev/dsk/c6t6006016062231B004CE643031BD9DB11d0s0 
/dev/dsk/c6t6006016062231B004E2E7FF61AD9DB11d0s0 
/dev/dsk/c6t6006016062231B008CADB8EB1AD9DB11d0s0 
/dev/dsk/c6t6006016062231B00C8C868DF1AD9DB11d0s0 
/dev/dsk/c6t6006016062231B009CD37BCF1AD9DB11d0s0 
/dev/dsk/c6t6006016062231B00E84C8BC31AD9DB11d0s0 
/dev/dsk/c6t6006016062231B0086796DB71AD9DB11d0s0 
/dev/dsk/c6t6006016062231B00B2098DA91AD9DB11d0s0 
/dev/dsk/c6t6006016062231B00124185971AD9DB11d0s0 
/dev/dsk/c6t6006016062231B003E7742871AD9DB11d0s0 
/dev/dsk/c6t6006016062231B003C7EFE7A1AD9DB11d0s0 
/dev/dsk/c6t6006016062231B00D48C6B711AD9DB11d0s0 
/dev/dsk/c6t6006016062231B001C98CA641AD9DB11d0s0 
/dev/dsk/c6t6006016062231B0054BE36541AD9DB11d0s0 
/dev/dsk/c6t6006016062231B009A650C461AD9DB11d0s0 
/dev/dsk/c6t6006016062231B005CBC5D3B1AD9DB11d0s0 
/dev/dsk/c6t6006016062231B00201DD62F1AD9DB11d0s0 
/dev/dsk/c6t6006016062231B00703483111AD9DB11d0s0 
/dev/dsk/c6t6006016062231B00941573031AD9DB11d0s0 
/dev/dsk/c6t6006016062231B00862C80F719D9DB11d0s0 
/dev/dsk/c6t6006016062231B007E15C7ED19D9DB11d0s0 
/dev/dsk/c6t6006016062231B00A07323E419D9DB11d0s0 
/dev/dsk/c6t6006016062231B0096F8E0D819D9DB11d0s0 
/dev/dsk/c6t6006016062231B00AAD5D3CC19D9DB11d0s0 
/dev/dsk/c6t6006016062231B8FCDC319D9DB11d0s0 
/dev/dsk/c6t6006016062231BCDE1B719D9DB11d0s0 
/dev/dsk/c6t6006016062231B00BC24C8A919D9DB11d0s0 
/dev/dsk/c6t6006016062231B008834709E19D9DB11d0s0 
/dev/dsk/c6t6006016062231B00BC73BF9019D9DB11d0s0 
/dev/dsk/c6t6006016062231B0026B0497919D9DB11d0s0 
/dev/dsk/c6t6006016062231B0012E7F56319D9DB11d0s0 
/dev/dsk/c6t6006016062231B00BA53C25A19D9DB11d0s0 
/dev/dsk/c6t6006016062231B0052622F5119D9DB11d0s0 
/dev/dsk/c6t6006016062231B008832394619D9DB11d0s0 
/dev/dsk/c6t6006

[zfs-discuss] crash during snapshot operations

2007-03-23 Thread Łukasz

When I'm trying to do in kernel in zfs ioctl:
 1. snapshot destroy PREVIOS
 2. snapshot rename LATEST->PREVIOUS
 3. snapshot create LATEST

code is:
/* delete previous snapshot */
zfs_unmount_snap(snap_previous, NULL);
dmu_objset_destroy(snap_previous);

/* rename snapshot */
zfs_unmount_snap(snap_latest, NULL);
dmu_objset_rename(snap_latest, snap_previous);

/* create snapshot */
dmu_objset_snapshot(zc->zc_name,
REPLICATE_SNAPSHOT_LATEST, 0);

I get kernel panic. 

MDB
> ::status
debugging crash dump vmcore.3 (32-bit) from zfs.dev
operating system: 5.11 snv_56 (i86pc)
panic message: BAD TRAP: type=8 (#df Double fault) rp=fec244f8 addr=d5904ffc
dump content: kernel pages only

This happens only when the ZFS filesystem is loaded with I/O operations.
( I copy studio11 folder on this filesystem. )

MDB ::stack show nothing, but walking threads I found:

stack pointer for thread d8ff9e00: d421b028
  d421b04c zio_pop_transform+0x45(d9aba380, d421b090, d421b070, d421b078)
  d421b094 zio_clear_transform_stack+0x23(d9aba380)
  d421b200 zio_done+0x12b(d9aba380)
  d421b21c zio_next_stage+0x66(d9aba380)
  d421b230 zio_checksum_verify+0x17(d9aba380)
  d421b24c zio_next_stage+0x66(d9aba380)
  d421b26c zio_wait_for_children+0x46(d9aba380, 11, d9aba570)
  d421b280 zio_wait_children_done+0x18(d9aba380)
  d421b298 zio_next_stage+0x66(d9aba380)
  d421b2d0 zio_vdev_io_assess+0x11a(d9aba380)
  d421b2e8 zio_next_stage+0x66(d9aba380)
  d421b368 vdev_cache_read+0x157(d9aba380)
  d421b394 vdev_disk_io_start+0x35(d9aba380)
  d421b3a4 vdev_io_start+0x18(d9aba380)
  d421b3d0 zio_vdev_io_start+0x142(d9aba380)
  d421b3e4 zio_next_stage_async+0xac(d9aba380)
  d421b3f4 zio_nowait+0xe(d9aba380)
  d421b424 vdev_mirror_io_start+0x151(deab5cc0)
  d421b450 zio_vdev_io_start+0x14f(deab5cc0)
  d421b460 zio_next_stage+0x66(deab5cc0)
  d421b470 zio_ready+0x124(deab5cc0)
  d421b48c zio_next_stage+0x66(deab5cc0)
  d421b4ac zio_wait_for_children+0x46(deab5cc0, 1, deab5ea8)
  d421b4c0 zio_wait_children_ready+0x18(deab5cc0)
  d421b4d4 zio_next_stage_async+0xac(deab5cc0)
  d421b4e4 zio_nowait+0xe(deab5cc0)
  d421b520 arc_read+0x3cc(d8a2cd00, da9f6ac0, d418e840, f9e55e5c, f9e249b0, 
d515c010)
  d421b590 dbuf_read_impl+0x11b(d515c010, d8a2cd00, d421b5cc)
  d421b5bc dbuf_read+0xa5(d515c010, d8a2cd00, 2)
  d421b5fc dmu_buf_hold+0x7c(d47cb854, 4, 0, 0, 0, 0)
  d421b654 zap_lockdir+0x38(d47cb854, 4, 0, 0, 1, 1)
  d421b690 zap_lookup+0x23(d47cb854, 4, 0, d421b6e0, 8, 0)
  d421b804 dsl_dir_open_spa+0x10a(da9f6ac0, d8fde000, f9e7378f, d421b85c, 
d421b860)
  d421b864 dsl_dataset_open_spa+0x2c(0, d8fde000, 1, debe83c0, d421b938)
  d421b88c dsl_dataset_open+0x19(d8fde000, 1, debe83c0, d421b938)
  d421b940 dmu_objset_open+0x2e(d8fde000, 5, 1, d421b970)
  d421b974 dmu_objset_snapshot_one+0x2c(d8fde000, d421b998)
  d421bdb0 dmu_objset_snapshot+0xaf(d8fde000, d4c6a3e8, 0)
  d421c9e8 zfs_ioc_replicate_send+0x1ab(d8fde000)
  d421ce18 zfs_ioc_sendbackup+0x126()
  d421ce40 zfsdev_ioctl+0x100(2d8, 5a1e, 8046cac, 13, d5938650, 
d421cf78)
  d421ce6c cdev_ioctl+0x2e(2d8, 5a1e, 8046cac, 13, d5938650, d421cf78)
  d421ce94 spec_ioctl+0x65(d6591780, 5a1e, 8046cac, 13, d5938650, d421cf78)
  d421ced4 fop_ioctl+0x27(d6591780, 5a1e, 8046cac, 13, d5938650, d421cf78)
  d421cf84 ioctl+0x151()
  d421cfac sys_sysenter+0x101()

> $r
%cs = 0x0158%eax = 0x
%ds = 0x0160%ebx = 0xe58abac0
%ss = 0x0160%ecx = 0x
%es = 0x0160%edx = 0x0018
%fs = 0x%esi = 0x
%gs = 0x01b0%edi = 0x

%eip = 0xfe8ebd71 kmem_free+0x111
%ebp = 0x
%esp = 0xfec24530

%eflags = 0x00010246
  id=0 vip=0 vif=0 ac=0 vm=0 rf=1 nt=0 iopl=0x0
  status=

  %uesp = 0xd5905000
%trapno = 0x8
   %err = 0x0

I was trying to cause error from command line:
  [EMAIL PROTECTED] ~]# zfs destroy solaris/[EMAIL PROTECTED] ; zfs rename 
solaris/[EMAIL PROTECTED] solaris/[EMAIL PROTECTED]; zfs snapshot 
solaris/[EMAIL PROTECTED]

but without success.
Any idea ?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS over iSCSI question

2007-03-23 Thread Thomas Nau


Dear all.
I've setup the following scenario:

Galaxy 4200 running OpenSolaris build 59 as iSCSI target; remaining 
diskspace of the two internal drives with a total of 90GB is used as zpool 
for the two 32GB volumes "exported" via iSCSI


The initiator is an up to date Solaris 10 11/06 x86 box using the above 
mentioned volumes as disks for a local zpool.


I've now started rsync to copy about 1GB of data in several thousand 
files. During the operation I took the network interface on the iSCSI 
target down which resulted in no more disk IO on that server. On the other 
hand, the client happily dumps data into the ZFS cache actually completely 
finishing all of the copy operation.


Now the big question: we plan to use that kind of setup for email or other 
important services so what happens if the client crashes while the network 
is down? Does it mean that all the data in the cache is gone forever?


If so, is this a transport independent problem which can also happen if 
ZFS used Fibre Channel attached drives instead of iSCSI devices?


Thanks for your help
Thomas

-
GPG fingerprint: B1 EE D2 39 2C 82 26 DA  A5 4D E0 50 35 75 9E ED
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS machine to be reinstalled

2007-03-23 Thread Ionescu Mircea

Hello,

Our Solaris 10 machine need to be reinstalled.

Inside we have 2 HDDs in striping ZFS with 4 filesystems.

After Solaris is installed how can I "mount" or recover the 4 filesystems 
without losing the existing data?

Thank you very much!
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Is there any performance problem with hard

2007-03-23 Thread Viktor Turskyi

>See fsattr(5) 

It was helpful :). Thanks!
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS machine to be reinstalled

2007-03-23 Thread Cyril Plisko


On 3/23/07, Ionescu Mircea <[EMAIL PROTECTED]> wrote:

Hello,

Our Solaris 10 machine need to be reinstalled.

Inside we have 2 HDDs in striping ZFS with 4 filesystems.

After Solaris is installed how can I "mount" or recover the 4 filesystems 
without losing the existing data?


Check "zfs import"

--
Regards,
   Cyril
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS ontop of SVM - CKSUM errors

2007-03-23 Thread Robert Milkowski

Hello Robert,

Forget it, silly me.

Pool was mounted on one host, SVM metadevice was created on another
host on the same disk at the same time and both hosts were issuing
IOs.

Once I corrected it I do no longer see CKSUM errors with ZFS on top of
SVM and performance is similar.

:)))


I'm still wondering however why I'm getting only about 400MB/s of
sequential writes while ~700MB/s of sequential reads to a stripe made
of 85 FC disks. I guess in both cases I should be getting about
700MB/s.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS over iSCSI question

2007-03-23 Thread Roch - PAE


Thomas Nau writes:
 > Dear all.
 > I've setup the following scenario:
 > 
 > Galaxy 4200 running OpenSolaris build 59 as iSCSI target; remaining 
 > diskspace of the two internal drives with a total of 90GB is used as zpool 
 > for the two 32GB volumes "exported" via iSCSI
 > 
 > The initiator is an up to date Solaris 10 11/06 x86 box using the above 
 > mentioned volumes as disks for a local zpool.
 > 
 > I've now started rsync to copy about 1GB of data in several thousand 
 > files. During the operation I took the network interface on the iSCSI 
 > target down which resulted in no more disk IO on that server. On the other 
 > hand, the client happily dumps data into the ZFS cache actually completely 
 > finishing all of the copy operation.
 > 
 > Now the big question: we plan to use that kind of setup for email or other 
 > important services so what happens if the client crashes while the network 
 > is down? Does it mean that all the data in the cache is gone forever?
 > 
 > If so, is this a transport independent problem which can also happen if 
 > ZFS used Fibre Channel attached drives instead of iSCSI devices?
 > 

I assume the rsync is not issuing fsyncs (and it's files are 
not opened O_DSYNC). If so,  rsync just works against the
filesystem cache and does not commit the data to disk.

You might want to run sync(1M) after a successful rsync.

A larger  rsync would presumably have blocked. It's just
that the amount of data you needs to rsync fitted in a couple of
transaction groups.

-r



 > Thanks for your help
 > Thomas
 > 
 > -
 > GPG fingerprint: B1 EE D2 39 2C 82 26 DA  A5 4D E0 50 35 75 9E ED
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] crash during snapshot operations

2007-03-23 Thread eric kustarz



On Mar 23, 2007, at 6:13 AM, Łukasz wrote:


When I'm trying to do in kernel in zfs ioctl:
 1. snapshot destroy PREVIOS
 2. snapshot rename LATEST->PREVIOUS
 3. snapshot create LATEST

code is:
/* delete previous snapshot */
zfs_unmount_snap(snap_previous, NULL);
dmu_objset_destroy(snap_previous);

/* rename snapshot */
zfs_unmount_snap(snap_latest, NULL);
dmu_objset_rename(snap_latest, snap_previous);

/* create snapshot */
dmu_objset_snapshot(zc->zc_name,
REPLICATE_SNAPSHOT_LATEST, 0);

I get kernel panic.

MDB

::status

debugging crash dump vmcore.3 (32-bit) from zfs.dev
operating system: 5.11 snv_56 (i86pc)
panic message: BAD TRAP: type=8 (#df Double fault) rp=fec244f8  
addr=d5904ffc

dump content: kernel pages only


This is most likely due to stack overflow.

You're stack is 0xd421cfac - 0xd421b04c = 0t8032 bytes.

The PAGESIZE on x86/x64 machines is 4k and the DEFAULTSTKSZ for 32bit  
is 8k (2 * PAGESIZE) and 20k (5 * PAGESIZE) for amd64.  So you''ve  
blown your stack of 8k.  This is mostly due to:

6354519 stack overflow in zfs due to zio pipeline

Running on a 64bit machine would also help.

eric





This happens only when the ZFS filesystem is loaded with I/O  
operations.

( I copy studio11 folder on this filesystem. )

MDB ::stack show nothing, but walking threads I found:

stack pointer for thread d8ff9e00: d421b028
  d421b04c zio_pop_transform+0x45(d9aba380, d421b090, d421b070,  
d421b078)

  d421b094 zio_clear_transform_stack+0x23(d9aba380)
  d421b200 zio_done+0x12b(d9aba380)
  d421b21c zio_next_stage+0x66(d9aba380)
  d421b230 zio_checksum_verify+0x17(d9aba380)
  d421b24c zio_next_stage+0x66(d9aba380)
  d421b26c zio_wait_for_children+0x46(d9aba380, 11, d9aba570)
  d421b280 zio_wait_children_done+0x18(d9aba380)
  d421b298 zio_next_stage+0x66(d9aba380)
  d421b2d0 zio_vdev_io_assess+0x11a(d9aba380)
  d421b2e8 zio_next_stage+0x66(d9aba380)
  d421b368 vdev_cache_read+0x157(d9aba380)
  d421b394 vdev_disk_io_start+0x35(d9aba380)
  d421b3a4 vdev_io_start+0x18(d9aba380)
  d421b3d0 zio_vdev_io_start+0x142(d9aba380)
  d421b3e4 zio_next_stage_async+0xac(d9aba380)
  d421b3f4 zio_nowait+0xe(d9aba380)
  d421b424 vdev_mirror_io_start+0x151(deab5cc0)
  d421b450 zio_vdev_io_start+0x14f(deab5cc0)
  d421b460 zio_next_stage+0x66(deab5cc0)
  d421b470 zio_ready+0x124(deab5cc0)
  d421b48c zio_next_stage+0x66(deab5cc0)
  d421b4ac zio_wait_for_children+0x46(deab5cc0, 1, deab5ea8)
  d421b4c0 zio_wait_children_ready+0x18(deab5cc0)
  d421b4d4 zio_next_stage_async+0xac(deab5cc0)
  d421b4e4 zio_nowait+0xe(deab5cc0)
  d421b520 arc_read+0x3cc(d8a2cd00, da9f6ac0, d418e840, f9e55e5c,  
f9e249b0, d515c010)

  d421b590 dbuf_read_impl+0x11b(d515c010, d8a2cd00, d421b5cc)
  d421b5bc dbuf_read+0xa5(d515c010, d8a2cd00, 2)
  d421b5fc dmu_buf_hold+0x7c(d47cb854, 4, 0, 0, 0, 0)
  d421b654 zap_lockdir+0x38(d47cb854, 4, 0, 0, 1, 1)
  d421b690 zap_lookup+0x23(d47cb854, 4, 0, d421b6e0, 8, 0)
  d421b804 dsl_dir_open_spa+0x10a(da9f6ac0, d8fde000, f9e7378f,  
d421b85c, d421b860)
  d421b864 dsl_dataset_open_spa+0x2c(0, d8fde000, 1, debe83c0,  
d421b938)

  d421b88c dsl_dataset_open+0x19(d8fde000, 1, debe83c0, d421b938)
  d421b940 dmu_objset_open+0x2e(d8fde000, 5, 1, d421b970)
  d421b974 dmu_objset_snapshot_one+0x2c(d8fde000, d421b998)
  d421bdb0 dmu_objset_snapshot+0xaf(d8fde000, d4c6a3e8, 0)
  d421c9e8 zfs_ioc_replicate_send+0x1ab(d8fde000)
  d421ce18 zfs_ioc_sendbackup+0x126()
  d421ce40 zfsdev_ioctl+0x100(2d8, 5a1e, 8046cac, 13,  
d5938650, d421cf78)
  d421ce6c cdev_ioctl+0x2e(2d8, 5a1e, 8046cac, 13,  
d5938650, d421cf78)
  d421ce94 spec_ioctl+0x65(d6591780, 5a1e, 8046cac, 13,  
d5938650, d421cf78)
  d421ced4 fop_ioctl+0x27(d6591780, 5a1e, 8046cac, 13,  
d5938650, d421cf78)

  d421cf84 ioctl+0x151()
  d421cfac sys_sysenter+0x101()


$r

%cs = 0x0158%eax = 0x
%ds = 0x0160%ebx = 0xe58abac0
%ss = 0x0160%ecx = 0x
%es = 0x0160%edx = 0x0018
%fs = 0x%esi = 0x
%gs = 0x01b0%edi = 0x

%eip = 0xfe8ebd71 kmem_free+0x111
%ebp = 0x
%esp = 0xfec24530

%eflags = 0x00010246
  id=0 vip=0 vif=0 ac=0 vm=0 rf=1 nt=0 iopl=0x0
  status=

  %uesp = 0xd5905000
%trapno = 0x8
   %err = 0x0

I was trying to cause error from command line:
  [EMAIL PROTECTED] ~]# zfs destroy solaris/[EMAIL PROTECTED] ; zfs rename  
solaris/[EMAIL PROTECTED] solaris/[EMAIL PROTECTED]; zfs snapshot solaris/ 
[EMAIL PROTECTED]


but without success.
Any idea ?


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-dis

[zfs-discuss] Re: ZFS machine to be reinstalled

2007-03-23 Thread Ron Halstead

where the name of the pool is xyx:

zpool export xyz
rebuild the system (Stay clear of the pool disks)
zpool import xyx

Ron Halstead
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] asize is 300MB smaller than lsize - why?

2007-03-23 Thread Matthew Ahrens


Robert Milkowski wrote:

Basically we've implemented a mechanizm to replicate zfs file system
implementing new ioctl based on zfs send|recv. The difference is that
we sleep() for specified time (default 5s) and then ask for new
transcation and if there's one we send it out.

More details really soon I hope.


ps. zdb output sent privately


The smaller file has its first 320MB as a hole, while the larger file is 
 entirely filled in.  You can see this from the zdb output (the first 
number on each line is the offset):


Indirect blocks:
   0 L2   0:115be2400:1200 4000L/1200P F=10192 B=831417
1400  L1  0:c0028c00:400 4000L/400P F=30 B=831370
14c4   L0 0:b818:2 2L/2P F=1 B=831367
14c6   L0 0:b81a:2 2L/2P F=1 B=831367
...

vs.

Indirect blocks:
   0 L2   0:ea1a0800:1400 4000L/1400P F=12911 B=831388
   0  L1  0:2553bb400:400 4000L/400P F=128 B=831346
   0   L0 0:25540:2 2L/2P F=1 B=831346
   2   L0 0:25542:2 2L/2P F=1 B=831346
   4   L0 0:25544:2 2L/2P F=1 B=831346
...

How it got that way, I couldn't really say without looking at your code. 
 If you are able to reproduce this using OpenSolaris bits, let me know.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] migration/acl4 problem

2007-03-23 Thread Mark Shellenbaum





It looks like we're between a rock and a hard place. We want to use
ZFS for one project because of snapshots and data integrity - both
would give us considerable advantages over ufs (not to mention
filesystem size). Unfortunately, this is critical company data and the
access control has to be exactly right all the time: the default
ACLs as implemented in UFS are exactly what we need and work
perfectly.



The original plan was to allow the inheritance of owner/group/other 
permissions. Unfortunately, during ARC reviews we were forced to remove 
that functionality, due to POSIX compliance and security concerns.


We can look into alternatives to provide a way to force the creation of 
directory trees with a specified set of permissions.


 -Mark
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] C'mon ARC, stay small...

2007-03-23 Thread Roch - PAE


With latest Nevada setting zfs_arc_max in /etc/system is
sufficient. Playing with mdb on a live system is more
tricky and is what caused the problem here.

-r

[EMAIL PROTECTED] writes:
 > Jim Mauro wrote:
 > 
 > > All righty...I set c_max to 512MB, c to 512MB, and p to 256MB...
 > >
 > >  > arc::print -tad
 > > {
 > >  ...
 > > c02e29e8 uint64_t size = 0t299008
 > > c02e29f0 uint64_t p = 0t16588228608
 > > c02e29f8 uint64_t c = 0t33176457216
 > > c02e2a00 uint64_t c_min = 0t1070318720
 > > c02e2a08 uint64_t c_max = 0t33176457216
 > > ...
 > > }
 > >  > c02e2a08 /Z 0x2000
 > > arc+0x48:   0x7b9789000 =   0x2000
 > >  > c02e29f8 /Z 0x2000
 > > arc+0x38:   0x7b9789000 =   0x2000
 > >  > c02e29f0 /Z 0x1000
 > > arc+0x30:   0x3dcbc4800 =   0x1000
 > >  > arc::print -tad
 > > {
 > > ...
 > > c02e29e8 uint64_t size = 0t299008
 > > c02e29f0 uint64_t p = 0t268435456  <-- p 
 > > is 256MB
 > > c02e29f8 uint64_t c = 0t536870912  <-- c 
 > > is 512MB
 > > c02e2a00 uint64_t c_min = 0t1070318720
 > > c02e2a08 uint64_t c_max = 0t536870912<--- c_max is 
 > > 512MB
 > > ...
 > > }
 > >
 > > After a few runs of the workload ...
 > >
 > >  > arc::print -d size
 > > size = 0t536788992
 > >  >
 > >
 > >
 > > Ah - looks like we're out of the woods. The ARC remains clamped at 512MB.
 > 
 > 
 > Is there a way to set these fields using /etc/system?
 > Or does this require a new or modified init script to
 > run and do the above with each boot?
 > 
 > Darren
 > 
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: asize is 300MB smaller than lsize - why?

2007-03-23 Thread Łukasz

> How it got that way, I couldn't really say without looking at your code. 

It works like this:

In new ioctl operation
zfs_ioc_replicate_send(zfs_cmd_t *zc) 
   
we open filesystem ( not snapshot )

   dmu_objset_open(zc->zc_name, DMU_OST_ANY,
DS_MODE_STANDARD | DS_MODE_READONLY, &filesystem);

call dmu replicate send function
   
   dmu_replicate_send(filesystem,  &txg, ...);
 (  txg  - is tranzaction group number  )

we set max_txg
  ba.max_txg = (spa_get_dsl(filesystem->os->os_spa))->dp_tx.tx_synced_txg;

and call traverse_dsl_dataset

   traverse_dsl_dataset(filesystem->os->os_dsl_dataset, *txg,
ADVANCE_PRE | ADVANCE_HOLES | ADVANCE_DATA | ADVANCE_NOLOCK,
replicate_cb, &ba);

after traversing next txg is returned

   if (ba.got_data != 0)
  *txg = ba.max_txg + 1;
 
in replicate_cb we do the same what backup_cb does, but at the beginning we 
are checking txg:

/* remember last txg */
if (bc->bc_blkptr.blk_birth) {

if (bc->bc_blkptr.blk_birth > ba->max_txg) return;

ba->got_data = 1;
}

After 5 seconds delay we call ioctl with txg returned from last operation.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Proposal: ZFS hotplug support and autoconfiguration

2007-03-23 Thread Eric Schrock

On Fri, Mar 23, 2007 at 11:31:03AM +0100, Pawel Jakub Dawidek wrote:
> 
> Eric, maybe I'm missing something, but why ZFS depend on devids at all?
> As I understand it, devid is something that never change for a block
> device, eg. disk serial number, but on the other hand it is optional, so
> we can rely on the fact it's always there (I mean for all block devices
> we use).
> 
> Why we simply not forget about devids and just focus on on-disk metadata
> to detect pool components?
> 
> The only reason I see is performance. This is probably why
> /etc/zfs/zpool.cache is used as well.
> 
> In FreeBSD we have the GEOM infrastructure for storage. Each storage
> device (disk, partition, mirror, etc.) is simply a GEOM provider. If
> GEOM provider appears (eg. disk is inserted, partition is configured)
> all interested parties are informed about this I can 'taste' the
> provider by reading metadata specific for them. The same when provider
> goes away - all interested parties are informed and can react
> accordingly.
> 
> We don't see any performance problems related to the fact that each disk
> that appears is read by many "GEOM classes".

We do use the on-disk metatdata for verification purposes, but we can't
open the device based on the metadata.  We don't have a corresponding
interface in Solaris, so there is no way to say "open the device with
this particular on-disk data".  The devid is also unique to the device
(it's based on manufacturer/model/serialnumber), so that we can uniquely
identify devices for fault management purposes.

The world of hotplug and device configuration in Solaris is quite
complicated.  Part of my time spent on this work has been just writing
down the existing semantics.  A scheme like that in FreeBSD would be
nice, but unlikely to appear given the existing complexity.  As part of
the I/O retire work we will likely be introducing device contracts,
which is a step in the right direction but it's a very long road.

Thanks for sharing the details on FreeBSD, it's quite interesting.
Since the majority of this work is Solaris-specific, I'll be interested
to see how other platforms deal with this type of reconfiguration.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] migration/acl4 problem

2007-03-23 Thread Peter Tribble


On 3/23/07, Mark Shellenbaum <[EMAIL PROTECTED]> wrote:


The original plan was to allow the inheritance of owner/group/other
permissions. Unfortunately, during ARC reviews we were forced to remove
that functionality, due to POSIX compliance and security concerns.


What exactly is the POSIX compliance requirement here?

(It's also not clear to me how *not* allowing control of permissions
helps security in any way.)

--
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] migration/acl4 problem

2007-03-23 Thread Mark Shellenbaum


Peter Tribble wrote:

On 3/23/07, Mark Shellenbaum <[EMAIL PROTECTED]> wrote:


The original plan was to allow the inheritance of owner/group/other
permissions. Unfortunately, during ARC reviews we were forced to remove
that functionality, due to POSIX compliance and security concerns.


What exactly is the POSIX compliance requirement here?


The ignoring of a users umask.


(It's also not clear to me how *not* allowing control of permissions
helps security in any way.)



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: crash during snapshot operations

2007-03-23 Thread Łukasz

Thanks for advice.
I removed my buffers snap_previous and snap_latest and it helped.
I'm using zc->value as buffer.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS over iSCSI question

2007-03-23 Thread Thomas Nau


On Fri, 23 Mar 2007, Roch - PAE wrote:


I assume the rsync is not issuing fsyncs (and it's files are
not opened O_DSYNC). If so,  rsync just works against the
filesystem cache and does not commit the data to disk.

You might want to run sync(1M) after a successful rsync.

A larger  rsync would presumably have blocked. It's just
that the amount of data you needs to rsync fitted in a couple of
transaction groups.


Thanks for the hints but this would make our worst nightmares become true. 
At least they could because it means that we would have to check every 
application handling critical data and I think it's not the apps 
responsibility. Up to a certain amount like a database transaction but not 
any further. There's always a time window where data might be cached in 
memory but I would argue that caching several GB of data, in our case 
written data, with thousands of files in unbuffered memory circumvents all 
the build in reliability of ZFS.


I'm in a way still hoping that it's a iSCSI related Problem as detecting 
dead hosts in a network can be a non trivial problem and it takes quite 
some time for TCP to timeout and inform the upper layers. Just a 
guess/hope here that FC-AL, ... do better in this case


Thomas

-
GPG fingerprint: B1 EE D2 39 2C 82 26 DA  A5 4D E0 50 35 75 9E ED
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: ZFS machine to be reinstalled

2007-03-23 Thread Ionescu Mircea

Thank you all !

The machine crashed unexpectedly so no export was possible.

Anyway just using "zpool import pool_name" helped me to recover everything.

Thanks again for your help!
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] 6410 expansion shelf

2007-03-23 Thread Frank Cusack


On March 23, 2007 5:38:20 PM +0800 Wee Yeh Tan <[EMAIL PROTECTED]> wrote:

I should be able to reply to you next Tuesday -- my 6140 SATA
expansion tray is due to arrive. Meanwhile, what kind of problem do
you have with the 3511?


I'm not sure that it had anything to do with the raid controller being
present or not.  The initial configuration (5x250 original sata disks)
worked well.  Changing the disks to 750gb disks worked well.  Then I
had to get 7 more drive carriers and then some of the slots didn't
work -- disks would not spin up.  The 7 addt'l carriers had different
electronics than the original 5.  Just a hardware revision, I suppose.
Oh, and they were "dot hill" labelled instead of Sun labelled (dot hill
is the OEM for the 3510/3511).

When I was able to replace the 7 new carriers with ones that looked like
the original 5 (same electronics and Sun branding), I had better luck but
there was still one or two slots that were SOL.  Swapping hardware around,
I identified that it was definitely the slot and not a carrier or drive
problem.  But maybe a bad carrier "broke" the slot itself.  I dunno!

I was tempted to just use the array with the 10 or 11 slots that worked,
since I got it for a very good price, but I was worried that there'd be
more failures in the future, and the cost savings wasn't worth even the
potential hassle of having to deal with that.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS over iSCSI question

2007-03-23 Thread Frank Cusack


On March 23, 2007 6:51:10 PM +0100 Thomas Nau <[EMAIL PROTECTED]> wrote:

Thanks for the hints but this would make our worst nightmares become
true. At least they could because it means that we would have to check
every application handling critical data and I think it's not the apps
responsibility.


I'd tend to disagree with that.  POSIX/SUS does not guarantee data makes
it to disk until you do an fsync() (or open the file with the right flags,
or other techniques).  If an application REQUIRES that data get to disk,
it really MUST DTRT.


Up to a certain amount like a database transaction but
not any further. There's always a time window where data might be cached
in memory but I would argue that caching several GB of data, in our case
written data, with thousands of files in unbuffered memory circumvents
all the build in reliability of ZFS.

I'm in a way still hoping that it's a iSCSI related Problem as detecting
dead hosts in a network can be a non trivial problem and it takes quite
some time for TCP to timeout and inform the upper layers. Just a
guess/hope here that FC-AL, ... do better in this case


iscsi doesn't use TCP, does it?  Anyway, the problem is really transport
independent.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] gzip compression support

2007-03-23 Thread Adam Leventhal

I recently integrated this fix into ON:

  6536606 gzip compression for ZFS

With this, ZFS now supports gzip compression. To enable gzip compression
just set the 'compression' property to 'gzip' (or 'gzip-N' where N=1..9).
Existing pools will need to upgrade in order to use this feature, and, yes,
this is the second ZFS version number update this week. Recall that once
you've upgraded a pool older software will no longer be able to access it
regardless of whether you're using the gzip compression algorithm.

I did some very simple tests to look at relative size and time requirements:

  http://blogs.sun.com/ahl/entry/gzip_for_zfs_update

I've also asked Roch Bourbonnais and Richard Elling to do some more
extensive tests.

Adam


>From zfs(1M):

 compression=on | off | lzjb | gzip | gzip-N

 Controls  the  compression  algorithm  used   for   this
 dataset.  The  "lzjb" compression algorithm is optimized
 for performance while providing decent data compression.
 Setting  compression to "on" uses the "lzjb" compression
 algorithm. The "gzip"  compression  algorithm  uses  the
 same  compression  as  the  gzip(1)  command.   You  can
 specify the gzip level  by  using  the  value  "gzip-N",
 where  N  is  an  integer  from  1  (fastest) to 9 (best
 compression ratio). Currently, "gzip" is  equivalent  to
 "gzip-6" (which is also the default for gzip(1)).

 This property can also be referred to by  its  shortened
 column name "compress".

-- 
Adam Leventhal, Solaris Kernel Development   http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] gzip compression support

2007-03-23 Thread Rich Teer

On Fri, 23 Mar 2007, Adam Leventhal wrote:

> I recently integrated this fix into ON:
> 
>   6536606 gzip compression for ZFS

Cool!  Can you recall into which build it went?

-- 
Rich Teer, SCSA, SCNA, SCSECA, OpenSolaris CAB member

CEO,
My Online Home Inventory

Voice: +1 (250) 979-1638
URLs: http://www.rite-group.com/rich
  http://www.myonlinehomeinventory.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] gzip compression support

2007-03-23 Thread Adam Leventhal

On Fri, Mar 23, 2007 at 11:41:21AM -0700, Rich Teer wrote:
> > I recently integrated this fix into ON:
> > 
> >   6536606 gzip compression for ZFS
> 
> Cool!  Can you recall into which build it went?

I put it back yesterday so it will be in build 62.

Adam

-- 
Adam Leventhal, Solaris Kernel Development   http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] migration/acl4 problem

2007-03-23 Thread Casper . Dik


>Peter Tribble wrote:
>> On 3/23/07, Mark Shellenbaum <[EMAIL PROTECTED]> wrote:
>>>
>>> The original plan was to allow the inheritance of owner/group/other
>>> permissions. Unfortunately, during ARC reviews we were forced to remove
>>> that functionality, due to POSIX compliance and security concerns.
>> 
>> What exactly is the POSIX compliance requirement here?
>> 
>The ignoring of a users umask.

Which is what made UFS ACLs useless until we "fixed" it to break
POSIX semantics.

(I think we should really have some form of uacl which, when set,
forces the umask to 0 but which is used as default acl when there is
no acl present)

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: /tmp on ZFS?

2007-03-23 Thread Matt B

Well, I am aware that /tmp can be mounted on swap as tmpfs and that this is 
really fast as most all writes go straight to memory, but this is of little to 
no value to the server in question.

The server in question is running 2 enterprise third party applications. No 
compilers are installed...in fact its a super minimal Solaris 10 core install 
(06/06). The reasoning behind moving /tmp onto ZFS was to protect against the 
occasional misdirected administrator who accidently fills up tmp while 
transferring a file or what have you. As I said its a production server, so we 
are doing our best to insulate it from inadvertent errors

When this server was build it was built with 8GB of swap on a dedicated slice. 
/tmp was left  on / (root) and later mounted on a zpool.

Is this dangerous given the server profile? Am i missing something here? Some 
other SUN engineers say that /tmp "is" swap and vice versa on Solaris, but my 
understanding is that my dedicated swap slice "is" swap and is not directly 
accessible. /tmp is just another filesystem that is happens to be mounted on a 
zpool with a quota so there is no fear of user/admin error. Based on how the 
system was setup is this a correct assertion?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS over iSCSI question

2007-03-23 Thread Casper . Dik


>I'd tend to disagree with that.  POSIX/SUS does not guarantee data makes
>it to disk until you do an fsync() (or open the file with the right flags,
>or other techniques).  If an application REQUIRES that data get to disk,
>it really MUST DTRT.

Indeed; want your data safe?  Use:

fflush(fp);
fsync(fileno(fp));
fclose(fp);

and check errors.


(It's remarkable how often people get the above sequence wrong and only
do something like fsync(fileno(fp)); fclose(fp);


Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: /tmp on ZFS?

2007-03-23 Thread Rich Teer

On Fri, 23 Mar 2007, Matt B wrote:

> The server in question is running 2 enterprise third party
> applications. No compilers are installed...in fact its a super minimal
> Solaris 10 core install (06/06). The reasoning behind moving /tmp onto
> ZFS was to protect against the occasional misdirected administrator who
> accidently fills up tmp while transferring a file or what have you. As
> I said its a production server, so we are doing our best to insulate it
> from inadvertent errors

In that case, I think the easiest approach would be to use the "size"
tmpfs mount option, which limits the amount of VM /tmp can use.

> Is this dangerous given the server profile? Am i missing something

Dangerous?  I think not.  But most likely suboptimal.

-- 
Rich Teer, SCSA, SCNA, SCSECA, OpenSolaris CAB member

CEO,
My Online Home Inventory

Voice: +1 (250) 979-1638
URLs: http://www.rite-group.com/rich
  http://www.myonlinehomeinventory.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: /tmp on ZFS?

2007-03-23 Thread Gary Mills

On Fri, Mar 23, 2007 at 11:57:40AM -0700, Matt B wrote:
> 
> The server in question is running 2 enterprise third party
> applications. No compilers are installed...in fact its a super minimal
> Solaris 10 core install (06/06). The reasoning behind moving /tmp onto
> ZFS was to protect against the occasional misdirected administrator
> who accidently fills up tmp while transferring a file or what have
> you. As I said its a production server, so we are doing our best to
> insulate it from inadvertent errors

You can solve that problem by putting a size limit on /tmp.  For
example, we do this in /etc/vfstab:

swap-   /tmptmpfs   -   yes size=500m

The filesystem will still fill up, but you won't run out of swap space.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] migration/acl4 problem

2007-03-23 Thread Peter Tribble


On 3/23/07, Mark Shellenbaum <[EMAIL PROTECTED]> wrote:

Peter Tribble wrote:
> What exactly is the POSIX compliance requirement here?
>
The ignoring of a users umask.


Where in POSIX does it specify the interaction of ACLs and a
user's umask?

--
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: Proposal: ZFS hotplug supportandautoconfiguration

2007-03-23 Thread Richard Elling


Anton B. Rang wrote:
Is this because C would already have a devid? If I insert an unlabeled disk, 
what happens? What if B takes five minutes to spin up? If it never does?


N.B.  You get different error messages from the disk.  If a disk is not ready
then it will return a not ready code and the sd driver will record this and
patiently retry.  The reason I know this in some detail is scar #523, which
was inflicted when we realized that some/many/most RAID arrays don't do this.
The difference is that the JBOD disk electronics start very quickly, perhaps
a few seconds after power-on.  A RAID array can take several minutes (or more)
to get to a state where it will reply to any request.  So, if you do not
perform a full, simultaneous power-on test for your entire (cluster) system,
then you may not hit the problem that the slow storage start makes Solaris
think that the device doesn't exist -- which can be a bad thing for highly
available services.  Yes, this is yet another systems engineering problem.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: Proposal: ZFS hotplug supportandautoconfiguration

2007-03-23 Thread Richard Elling


workaround below...

Richard Elling wrote:

Anton B. Rang wrote:
Is this because C would already have a devid? If I insert an unlabeled 
disk, what happens? What if B takes five minutes to spin up? If it 
never does?


N.B.  You get different error messages from the disk.  If a disk is not 
ready

then it will return a not ready code and the sd driver will record this and
patiently retry.  The reason I know this in some detail is scar #523, which
was inflicted when we realized that some/many/most RAID arrays don't do this.
The difference is that the JBOD disk electronics start very quickly, perhaps
a few seconds after power-on.  A RAID array can take several minutes (or more)
to get to a state where it will reply to any request.  So, if you do not
perform a full, simultaneous power-on test for your entire (cluster) system,
then you may not hit the problem that the slow storage start makes Solaris
think that the device doesn't exist -- which can be a bad thing for highly
available services.  Yes, this is yet another systems engineering problem.


Sorry, it was rude of me not to include the workaround.  We put a delay in
the SPARC OBP to slow down the power-on boot time of the servers to match
the attached storage.  While this worked, it is butugly.  You can do this with
GRUB, too.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] migration/acl4 problem

2007-03-23 Thread Mark Shellenbaum

Peter Tribble wrote:

On 3/23/07, Mark Shellenbaum <[EMAIL PROTECTED]> wrote:

Peter Tribble wrote:
> What exactly is the POSIX compliance requirement here?
>
The ignoring of a users umask.

Where in POSIX does it specify the interaction of ACLs and a
user's umask?

Let me try and summarize the discussion that took place, a few years ago.

The POSIX ACL draft stated:

p 269: "The process umask is the user's way of specifying security for
newly created objects. It was a goal to preserve this behavior
//unless it is specifically overridden in a default ACL//."

However, that is a withdrawn specification and Solaris is required to 
conform to a set of "approved standards".

The main POSIX specification doesn't say anything specific about ACLs, 
but rather alternate and additional access control methods.

POSIX gives clear rules for file access permissions based on umask, file 
mode bits, additional access control mechanisms, and alternate access 
control mechanisms.  Most of this is discussed in section 2.3 "General 
Concepts".

Since there is nothing in the spec that states that we *can* ignore the 
umask, we are therefore forced to honor it.  At least until we find a 
way to

I will open an RFE to look into alternative ways to work around this issue.

  -Mark

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS over iSCSI question

2007-03-23 Thread Richard Elling

Thomas Nau wrote:

Dear all.
I've setup the following scenario:

Galaxy 4200 running OpenSolaris build 59 as iSCSI target; remaining 
diskspace of the two internal drives with a total of 90GB is used as 
zpool for the two 32GB volumes "exported" via iSCSI

The initiator is an up to date Solaris 10 11/06 x86 box using the above 
mentioned volumes as disks for a local zpool.

Like this?
disk--zpool--zvol--iscsitarget--network--iscsiclient--zpool--filesystem--app

> I'm in a way still hoping that it's a iSCSI related Problem as detecting
> dead hosts in a network can be a non trivial problem and it takes quite
> some time for TCP to timeout and inform the upper layers. Just a
> guess/hope here that FC-AL, ... do better in this case

Actually, this is why NFS was invented.  Prior to NFS we had something like:
disk--raw--ndserver--network--ndclient--filesystem--app

The problem is that the failure modes are very different for networks and
presumably reliable local disk connections.  Hence NFS has a lot of error
handling code and provides well understood error handling semantics.  Maybe
what you really want is NFS?
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: ZFS layout for 10 disk?

2007-03-23 Thread John-Paul Drawneek

> Consider that 18GByte disks are old and their failure
> rate will
> increase dramatically over the next few years.  

I guess thats why i am asking about raidz and mirrors, not just creating a huge 
stripe them

> Do something to
> have redundancy.  If raidz2 works for your workload,
> I'd go with that.
 
Well i thinks so, the file system is currently on a raidz with three disk with 
no complains


[b]Food for thought[/b]

How do you fit sata into a scsi array

[b]More food for thought[/b]

So how do these two 500gb HDD work then?  Do i just leave them on the side and 
the magic of ISCSI allows then to work and shared between boxes?

Sorry for the gitty response but i am in the uk (see my details) so this 160 
pound idea of yours is not 160 pounds.

It is cost of the hdd plus the new box that can take them, so its a nice cheap 
option then :)

> BTW, I was just at Fry's, new 500 GByte Seagate
> drives are $180.
> Prices for new disks tend to approach $150 (USD)
> after which they
> are replaced by larger drives and the inventory is
> price reduced
> until gone. A 2-new disk mirror will be more reliable
> than any
> reasonable combination of 5-year old disks. Food for
> thought.
>   -- richard
> _
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
>
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: ZFS layout for 10 disk?

2007-03-23 Thread John-Paul Drawneek

Just to clarify

pool1 -> 5 disk raidz2
pool2 -> 4 disk raid 10

spare for both pools

Is that correct?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: /tmp on ZFS?

2007-03-23 Thread Matt B

Ok so you are suggesting that I simply mount /tmp as tmpfs on my existing 8GB 
swap slice and then put in the VM limit on /tmp? Will that limit only affect 
users writing data to /tmp or will it also affect the systems use of swap?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS ontop of SVM - CKSUM errors

2007-03-23 Thread Richard Elling


Robert Milkowski wrote:

Hello Robert,

Forget it, silly me.

Pool was mounted on one host, SVM metadevice was created on another
host on the same disk at the same time and both hosts were issuing
IOs.

Once I corrected it I do no longer see CKSUM errors with ZFS on top of
SVM and performance is similar.

:)))


Smiles because ZFS detected the corruption :-)
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: /tmp on ZFS?

2007-03-23 Thread Matt B

For reference...here is my disk layout currently (one disk of two, but both are 
identical)
s4 is for the MetaDB
s5 is dedicated for ZFS

partition> print
Current partition table (original):
Total disk cylinders available: 8921 + 2 (reserved cylinders)

Part  TagFlag CylindersSizeBlocks
  0   rootwm   1 -  7655.86GB(765/0/0)   12289725
  1   swapwu 766 - 17857.81GB(1020/0/0)  16386300
  2 backupwm   0 - 8920   68.34GB(8921/0/0) 143315865
  3varwm1786 - 25505.86GB(765/0/0)   12289725
  4 unassignedwm2551 - 2557   54.91MB(7/0/0)   112455
  5 unassignedwm2558 - 8824   48.01GB(6267/0/0) 100679355
  6 unassignedwm   0   0 (0/0/0)0
  7 unassignedwm   0   0 (0/0/0)0
  8   bootwu   0 -07.84MB(1/0/0)16065
  9 unassignedwm   0   0 (0/0/0)0

--
df output
--
 df -k
Filesystemkbytesused   avail capacity  Mounted on
/dev/md/dsk/d0   6050982 1172802 481767120%/
/devices   0   0   0 0%/devices
ctfs   0   0   0 0%/system/contract
proc   0   0   0 0%/proc
mnttab 0   0   0 0%/etc/mnttab
swap 9149740 436 9149304 1%/etc/svc/volatile
objfs  0   0   0 0%/system/object
/usr/lib/libc/libc_hwcap2.so.1
 6050982 1172802 481767120%/lib/libc.so.1
fd 0   0   0 0%/dev/fd
/dev/md/dsk/d3   6050982   43303 5947170 1%/var
swap 9149312   8 9149304 1%/var/run
zpool/home   4194304  91 4194212 1%/home
zpool/data   49545216 3799227 45745635 8%/data
zpool/tmp49545216  55 45745635 1%/tmp
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: /tmp on ZFS?

2007-03-23 Thread Rich Teer

On Fri, 23 Mar 2007, Matt B wrote:

> Ok so you are suggesting that I simply mount /tmp as tmpfs on my
> existing 8GB swap slice and then put in the VM limit on /tmp? Will that

Yes.

> limit only affect users writing data to /tmp or will it also affect the
> systems use of swap?

Well, they'd potentially be sharing the slice, so yes, that's possible.
If your (say) 1GB /tmp becomes full, only 7GB will remain for paging.
However, if /tmp is empty, the whole 8GB will be available for paging.

-- 
Rich Teer, SCSA, SCNA, SCSECA, OpenSolaris CAB member

CEO,
My Online Home Inventory

Voice: +1 (250) 979-1638
URLs: http://www.rite-group.com/rich
  http://www.myonlinehomeinventory.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Re: /tmp on ZFS?

2007-03-23 Thread Matt B

Ok, since I already have an 8GB swap slice i'd like to use, what would be the 
best way of setting up /tmp on this existing SWAP slice as tmpfs and then apply 
the 1GB quota limit?

I know how to get rid of the zpool/tmp filesystem in ZFS, but I'm not sure how 
to actually get to the above in a post-install scenario with existing raw swap

Thanks
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: Re: /tmp on ZFS?

2007-03-23 Thread Rich Teer

On Fri, 23 Mar 2007, Matt B wrote:

> Ok, since I already have an 8GB swap slice i'd like to use, what
> would be the best way of setting up /tmp on this existing SWAP slice as
> tmpfs and then apply the 1GB quota limit?

Have a line similar to the following in your /etc/vfstab:

swap-   /tmptmpfs   -   yes size=1024m

-- 
Rich Teer, SCSA, SCNA, SCSECA, OpenSolaris CAB member

CEO,
My Online Home Inventory

Voice: +1 (250) 979-1638
URLs: http://www.rite-group.com/rich
  http://www.myonlinehomeinventory.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Re: Re: /tmp on ZFS?

2007-03-23 Thread Matt B

And just doing this will automatically target my /tmp at my 8GB swap slice on 
s1 as well as placing the quota in place?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: Re: Re: /tmp on ZFS?

2007-03-23 Thread Rich Teer

On Fri, 23 Mar 2007, Matt B wrote:

> And just doing this will automatically target my /tmp at my 8GB swap
> slice on s1 as well as placing the quota in place?

After a reboot, yes.

-- 
Rich Teer, SCSA, SCNA, SCSECA, OpenSolaris CAB member

CEO,
My Online Home Inventory

Voice: +1 (250) 979-1638
URLs: http://www.rite-group.com/rich
  http://www.myonlinehomeinventory.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Re: Re: /tmp on ZFS?

2007-03-23 Thread Matt B

Oh, one other thing...s1 (8GB swap) is part of an SVM mirror (on d1)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: Re: Re: /tmp on ZFS?

2007-03-23 Thread Rich Teer

On Fri, 23 Mar 2007, Matt B wrote:

> Oh, one other thing...s1 (8GB swap) is part of an SVM mirror (on d1)

That's not relevant in this case.

-- 
Rich Teer, SCSA, SCNA, SCSECA, OpenSolaris CAB member

CEO,
My Online Home Inventory

Voice: +1 (250) 979-1638
URLs: http://www.rite-group.com/rich
  http://www.myonlinehomeinventory.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Re: Re: Re: /tmp on ZFS?

2007-03-23 Thread Matt B

Worked great. Thanks
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: ZFS layout for 10 disk?

2007-03-23 Thread Adam Leventhal

I'd take your 10 data disks and make a single raidz2 stripe. You can sustain
two disk failures before losing data, and presumably you'd replace the failed
disks before that was likely to happen. If you're very concerned about
failures, I'd have a single 9-wide raidz2 stripe with a hot spare.

Adam

On Fri, Mar 23, 2007 at 01:44:06PM -0700, John-Paul Drawneek wrote:
> Just to clarify
> 
> pool1 -> 5 disk raidz2
> pool2 -> 4 disk raid 10
> 
> spare for both pools
> 
> Is that correct?
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Adam Leventhal, Solaris Kernel Development   http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS over iSCSI question

2007-03-23 Thread Thomas Nau


Dear Fran & Casper


I'd tend to disagree with that.  POSIX/SUS does not guarantee data makes
it to disk until you do an fsync() (or open the file with the right flags,
or other techniques).  If an application REQUIRES that data get to disk,
it really MUST DTRT.


Indeed; want your data safe?  Use:

fflush(fp);
fsync(fileno(fp));
fclose(fp);

and check errors.


(It's remarkable how often people get the above sequence wrong and only
do something like fsync(fileno(fp)); fclose(fp);



Thanks for clarifying! Seems I really need to check the apps with truss or 
dtrace to see if they use that sequence. Allow me one more question: why 
is fflush() required prior to fsync()?


Putting all pieces together this means that if the app doesn't do it it 
suffered from the problem with UFS anyway just with typically smaller 
caches, right?


Thanks again
Thomas

-
GPG fingerprint: B1 EE D2 39 2C 82 26 DA  A5 4D E0 50 35 75 9E ED
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] gzip compression support

2007-03-23 Thread Mark J. Nelson

snv_62

On Fri, 23 Mar 2007, Rich Teer wrote:

Date: Fri, 23 Mar 2007 11:41:21 -0700 (PDT)
From: Rich Teer <[EMAIL PROTECTED]>
To: Adam Leventhal <[EMAIL PROTECTED]>
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] gzip compression support

On Fri, 23 Mar 2007, Adam Leventhal wrote:

I recently integrated this fix into ON:

  6536606 gzip compression for ZFS

Cool!  Can you recall into which build it went?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS over iSCSI question

2007-03-23 Thread Thomas Nau


Richard,


Like this?
disk--zpool--zvol--iscsitarget--network--iscsiclient--zpool--filesystem--app


exactly


I'm in a way still hoping that it's a iSCSI related Problem as detecting
dead hosts in a network can be a non trivial problem and it takes quite
some time for TCP to timeout and inform the upper layers. Just a
guess/hope here that FC-AL, ... do better in this case


Actually, this is why NFS was invented.  Prior to NFS we had something like:
disk--raw--ndserver--network--ndclient--filesystem--app


The problem is that our NFS, Mail, DB and other servers use mirrrored 
disks located in different building on campus. Currently we use FCAL 
devices and recently switched from UFS to ZFS. The drawback with FCAL is 
that you always need to have a second infrastructure (not the real 
problem) but with different components. Having all ethernet would be much 
easier.



The problem is that the failure modes are very different for networks and
presumably reliable local disk connections.  Hence NFS has a lot of error
handling code and provides well understood error handling semantics.  Maybe
what you really want is NFS?


We thought about using NFS as backend for as much as possible applications 
but we need to have redundancy for the fileserver itself too


Thomas

-
GPG fingerprint: B1 EE D2 39 2C 82 26 DA  A5 4D E0 50 35 75 9E ED
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS over iSCSI question

2007-03-23 Thread Casper . Dik


>Thanks for clarifying! Seems I really need to check the apps with truss or 
>dtrace to see if they use that sequence. Allow me one more question: why 
>is fflush() required prior to fsync()?

When you use stdio, you need to make sure the data is in the
system buffers prior to call fsync.

fclose() will otherwise write the rest of the data which is not sync'ed.


(In S10 I fixed this for /etc/*_* driver files , they are generally
under 8 K and therefor never written to disk before fsync'ed
if not preceeded by fflush().

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: asize is 300MB smaller than lsize - why?

2007-03-23 Thread Matthew Ahrens


Łukasz wrote:
How it got that way, I couldn't really say without looking at your code. 


It works like this:

...

we set max_txg
  ba.max_txg = (spa_get_dsl(filesystem->os->os_spa))->dp_tx.tx_synced_txg;


So, how do you send the initial stream?  Presumably you need to do it 
with ba.max_txg = 0?  If, say the first 320MB were written before your 
first ba.max_txg, then you wouldn't be sending that data, thus 
explaining the behavior you're seeing.


It seems to me that your algorithm is fundamentally flawed -- if the 
filesystem is changing, it will not result in a consistent (from the 
ZPL's point if view) filesystem.  For example:


There are two directories, A and B.  You last sent txg 10.

In txg 13, a file is renamed from directory A to directory B.

It is now txg 15, and you begin traversing to do a send, from txg 10 -> 15.

While that's in progress, a new file is created in directory A, and 
synced out in txg 16.


When you visit directory A, you see that its birth time is 16 > 15, so 
you don't send it.  When you visit directory B, you see that its birth 
time is 13 <= 15 so you send it.


Now the other side has two links to the file, when it should have one.

Given that you don't actually have the data from txg 15 (because you 
didn't take a snapshot), I don't see how you could make this work.


(Also FYI, traversing changing filesystems in this way will almost 
certainly break once we rewrite as part of the pool space reduction work.)


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] mirror question

2007-03-23 Thread Richard L. Hamilton

If I create a mirror, presumably if possible I use two or more identically 
sized devices,
since it can only be as large as the smallest.  However, if later I want to 
replace a disk
with a larger one, and detach the mirror (and anything else on the disk), 
replace the
disk (and if applicable repartition it), since it _is_ a larger disk (and/or 
the partitions
will likely be larger since they mustn't be smaller, and blocks per cylinder 
will likely differ,
and partitions are on cylinder boundaries), once I reattach everything, I'll 
now have
two different sized devices in the mirror.  So far, the mirror is still the 
original size.
But what if I later replace the other disks with ones identical to the first 
one I replaced?
With all the devices within the mirror now the larger size, will the mirror and 
the zpool
of which it is a part expand?  And if that won't happen automatically, can it 
(without
inordinate trickery, and online, i.e. without backup and restore) be forced to 
do so?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] mirror question

2007-03-23 Thread Neil Perrin


Yes, this is supported now. Replacing one half of a mirror with a larger device;
letting it resilver; then replacing the other half does indeed get a larger 
mirror.
I believe this is described somewhere but I can't remember where now.

Neil.

Richard L. Hamilton wrote On 03/23/07 20:45,:

If I create a mirror, presumably if possible I use two or more identically 
sized devices,
since it can only be as large as the smallest.  However, if later I want to 
replace a disk
with a larger one, and detach the mirror (and anything else on the disk), 
replace the
disk (and if applicable repartition it), since it _is_ a larger disk (and/or 
the partitions
will likely be larger since they mustn't be smaller, and blocks per cylinder 
will likely differ,
and partitions are on cylinder boundaries), once I reattach everything, I'll 
now have
two different sized devices in the mirror.  So far, the mirror is still the 
original size.
But what if I later replace the other disks with ones identical to the first 
one I replaced?
With all the devices within the mirror now the larger size, will the mirror and 
the zpool
of which it is a part expand?  And if that won't happen automatically, can it 
(without
inordinate trickery, and online, i.e. without backup and restore) be forced to 
do so?
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Backup of ZFS Filesystem with ACL 4

2007-03-23 Thread Ayaz Anjum

HI Guys !

Please share you experience on how to backup zfs with ACL using commercially 
available backup softwares. Has any one tested backup of zfs with acl using 
Tivoli (TSM)

thanks

Ayaz
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS over iSCSI question

2007-03-23 Thread Adam Leventhal

On Fri, Mar 23, 2007 at 11:28:19AM -0700, Frank Cusack wrote:
> >I'm in a way still hoping that it's a iSCSI related Problem as detecting
> >dead hosts in a network can be a non trivial problem and it takes quite
> >some time for TCP to timeout and inform the upper layers. Just a
> >guess/hope here that FC-AL, ... do better in this case
> 
> iscsi doesn't use TCP, does it?  Anyway, the problem is really transport
> independent.

It does use TCP. Were you thinking UDP?

Adam

-- 
Adam Leventhal, Solaris Kernel Development   http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

61 matches

Mail list logo