Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Will Murnane
On Jan 30, 2008 1:34 AM, Carson Gaspar <[EMAIL PROTECTED]> wrote:
> If this is Sun's cp, file a bug. It's failing to notice that it didn't
> provide a large enough buffer to getdents(), so it only got partial results.
>
> Of course, the getdents() API is rather unfortunate. It appears the only
> safe algorithm is:
>
> while ((r = getdents(...)) > 0) {
> /* process results */
> }
> if (r < 0) {
> /* handle error */
> }
>
> You _always_ have to call it at least twice to be sure you've gotten
> everything.
In OpenSolaris, cp uses (indirectly) readdir(), not raw getdents().
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libcmd/common/cp.c#487
which uses the build-a-linked-list code here:
http://src.opensolaris.org/source/xref/sfw/usr/src/cmd/coreutils/coreutils-6.7/lib/fts.c#913
That code appears to error out and return incomplete results if a) the
filename is too long or b) an integer overflows.  Christopher's
filenames are only 96 chars; could Unicode be involved somehow?  b)
seems unlikely in the extreme.  It still seems like a bug, but I don't
see where it is.  I am only an egg ;-)

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Casper . Dik

>Christopher Gorski wrote:
>
>> I noticed that the first calls in the "cp" and "ls" to getdents() return 
>> similar file lists, with the same values.
>> 
>> However, in the "ls", it makes a second call to getdents():
>
>If this is Sun's cp, file a bug. It's failing to notice that it didn't 
>provide a large enough buffer to getdents(), so it only got partial results.

"cp" doesn't use getdents() but it uses readdir() instead; the whole 
buffer is hidden to it.

>Of course, the getdents() API is rather unfortunate. It appears the only 
>safe algorithm is:
>
>while ((r = getdents(...)) > 0) {
>   /* process results */
>}
>if (r < 0) {
>   /* handle error */
>}
>
>You _always_ have to call it at least twice to be sure you've gotten 
>everything.


That's why you never use getdents but rather readdir() which hides this 
for you.

It appears that the off_t of the directory entries in the particular 
second read is > 2^32; so perhaps a cp which hasn't been compiled with
"handle large files" is being used?

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Casper . Dik


>That code appears to error out and return incomplete results if a) the
>filename is too long or b) an integer overflows.  Christopher's
>filenames are only 96 chars; could Unicode be involved somehow?  b)
>seems unlikely in the extreme.  It still seems like a bug, but I don't
>see where it is.  I am only an egg ;-)


And "ls" would fail in the same manner.


There's one piece of code in "cp" (see usr/src/cmd/mv/mv.c) which 
short-circuits a readdir-loop:

while ((dp = readdir(srcdirp)) != NULL) {
int ret;

if ((ret = traverse_attrfile(dp, source, target, 1)) == -1)
continue;
else if (ret > 0) {
++error;
goto out;
}


This is strange to me because all other failures result in cp going
over to the next file.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Roch - PAE

Jonathan Loran writes:
 > 
 > Is it true that Solaris 10 u4 does not have any of the nice ZIL controls 
 > that exist in the various recent Open Solaris flavors?  I would like to 
 > move my ZIL to solid state storage, but I fear I can't do it until I 
 > have another update.  Heck, I would be happy to just be able to turn the 
 > ZIL off to see how my NFS on ZFS performance is effected before spending 
 > the $'s.  Anyone know when will we see this in Solaris 10?
 > 

You can certainly turn it off with any release (Jim's link).

It's true that S10u4 does not have the "Separate Intent Log" 
to allow using an SSD for ZIL blocks. I believe S10U5 will
have that feature.

As  noted,   disabling  the  ZIL  won't   lead to   ZFS pool
corruption,   just  DBcorruption(that includes   NFS
clients). To protect against that, in  the event of a server
crash  with zil_disable=1,  you'd  need  to reboot  all  NFS
clients of the server (clear the client's caches) and better
do this before the   server comes back  up  (kind of a   raw
proposition here).

-r


 > Thanks,
 > 
 > Jon
 > 
 > -- 
 > 
 > 
 > - _/ _/  /   - Jonathan Loran -   -
 > -/  /   /IT Manager   -
 > -  _  /   _  / / Space Sciences Laboratory, UC Berkeley
 > -/  / /  (510) 643-5146 [EMAIL PROTECTED]
 > - __/__/__/   AST:7731^29u18e3
 >  
 > 
 > 
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Joerg Schilling
"Will Murnane" <[EMAIL PROTECTED]> wrote:

> On Jan 30, 2008 1:34 AM, Carson Gaspar <[EMAIL PROTECTED]> wrote:
> > If this is Sun's cp, file a bug. It's failing to notice that it didn't
> > provide a large enough buffer to getdents(), so it only got partial results.
> >
> > Of course, the getdents() API is rather unfortunate. It appears the only
> > safe algorithm is:
> >
> > while ((r = getdents(...)) > 0) {
> > /* process results */
> > }
> > if (r < 0) {
> > /* handle error */
> > }
> >
> > You _always_ have to call it at least twice to be sure you've gotten
> > everything.
> In OpenSolaris, cp uses (indirectly) readdir(), not raw getdents().
> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libcmd/common/cp.c#487
> which uses the build-a-linked-list code here:
> http://src.opensolaris.org/source/xref/sfw/usr/src/cmd/coreutils/coreutils-6.7/lib/fts.c#913
> That code appears to error out and return incomplete results if a) the
> filename is too long or b) an integer overflows.  Christopher's
> filenames are only 96 chars; could Unicode be involved somehow?  b)
> seems unlikely in the extreme.  It still seems like a bug, but I don't
> see where it is.  I am only an egg ;-)

An interesting thought

We of course need to know whether the user used /bin/cp or a "shadow 
implementation" from ksh93.

I did never see any problems with star(1) and star(1)/libfind(3) are heavy
readdir(3) users...

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Joerg Schilling
Christopher Gorski <[EMAIL PROTECTED]> wrote:

> > Of course, the getdents() API is rather unfortunate. It appears the only 
> > safe algorithm is:
> > 
> > while ((r = getdents(...)) > 0) {
> > /* process results */
> > }
> > if (r < 0) {
> > /* handle error */
> > }
> > 
> > You _always_ have to call it at least twice to be sure you've gotten 
> > everything.
> > 
>
> Yes, it is Sun's cp.  I'm trying, with some difficulty, to figure out 
> exactly how to reproduce this error in a way not specific to my data.  I 
> copied a set of randomly generated files with a deep directory structure 
> and cp seems to correctly call getdents() multiple times.

Note that cp (mv) does not call getdents() directly but readdir().

If there is a problem, it is most likely in readdir() and it really looks 
strangee that ls(1) (although it uses the same implementaion) works for you.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Joerg Schilling
[EMAIL PROTECTED] wrote:

> And "ls" would fail in the same manner.
>
>
> There's one piece of code in "cp" (see usr/src/cmd/mv/mv.c) which 
> short-circuits a readdir-loop:
>
> while ((dp = readdir(srcdirp)) != NULL) {
> int ret;
>
> if ((ret = traverse_attrfile(dp, source, target, 1)) == -1)
> continue;
> else if (ret > 0) {
> ++error;
> goto out;
> }
>
>
> This is strange to me because all other failures result in cp going
> over to the next file.

traverse_attrfile() returns -1 only for:

if ((dp->d_name[0] == '.' && dp->d_name[1] == '\0') || 
(dp->d_name[0] == '.' && dp->d_name[1] == '.' && 
dp->d_name[2] == '\0') || 
(sysattr_type(dp->d_name) == _RO_SATTR) || 
(sysattr_type(dp->d_name) == _RW_SATTR)) 
return (-1); 

So this primarily skips '.' and '..'.

The rest seems to check for DOS extensions in extended attributes.

 but this is only done to copy attributes and not files.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Robert Milkowski
Hello Christopher,

Wednesday, January 30, 2008, 7:27:01 AM, you wrote:

CG> Carson Gaspar wrote:
>> Christopher Gorski wrote:
>> 
>>> I noticed that the first calls in the "cp" and "ls" to getdents() return 
>>> similar file lists, with the same values.
>>>
>>> However, in the "ls", it makes a second call to getdents():
>> 
>> If this is Sun's cp, file a bug. It's failing to notice that it didn't 
>> provide a large enough buffer to getdents(), so it only got partial results.
>> 
>> Of course, the getdents() API is rather unfortunate. It appears the only 
>> safe algorithm is:
>> 
>> while ((r = getdents(...)) > 0) {
>>   /* process results */
>> }
>> if (r < 0) {
>>   /* handle error */
>> }
>> 
>> You _always_ have to call it at least twice to be sure you've gotten 
>> everything.
>> 

CG> Yes, it is Sun's cp.  I'm trying, with some difficulty, to figure out 
CG> exactly how to reproduce this error in a way not specific to my data.  I
CG> copied a set of randomly generated files with a deep directory structure
CG> and cp seems to correctly call getdents() multiple times.

If you could re-create empty files - exactly the same directory
atructure and file names, check if you still got a problem.
If you do, then if you could send a script here (mkdir's -p and touch)
so we can investigate.

Assuming your file names and directory structure can be made public.

-- 
Best regards,
 Robert Milkowskimailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Christopher Gorski
Joerg Schilling wrote:
> "Will Murnane" <[EMAIL PROTECTED]> wrote:
> 
>> On Jan 30, 2008 1:34 AM, Carson Gaspar <[EMAIL PROTECTED]> wrote:
>>> If this is Sun's cp, file a bug. It's failing to notice that it didn't
>>> provide a large enough buffer to getdents(), so it only got partial results.
>>>
>>> Of course, the getdents() API is rather unfortunate. It appears the only
>>> safe algorithm is:
>>>
>>> while ((r = getdents(...)) > 0) {
>>> /* process results */
>>> }
>>> if (r < 0) {
>>> /* handle error */
>>> }
>>>
>>> You _always_ have to call it at least twice to be sure you've gotten
>>> everything.
>> In OpenSolaris, cp uses (indirectly) readdir(), not raw getdents().
>> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libcmd/common/cp.c#487
>> which uses the build-a-linked-list code here:
>> http://src.opensolaris.org/source/xref/sfw/usr/src/cmd/coreutils/coreutils-6.7/lib/fts.c#913
>> That code appears to error out and return incomplete results if a) the
>> filename is too long or b) an integer overflows.  Christopher's
>> filenames are only 96 chars; could Unicode be involved somehow?  b)
>> seems unlikely in the extreme.  It still seems like a bug, but I don't
>> see where it is.  I am only an egg ;-)
> 
> An interesting thought
> 
> We of course need to know whether the user used /bin/cp or a "shadow 
> implementation" from ksh93.
> 
> I did never see any problems with star(1) and star(1)/libfind(3) are heavy
> readdir(3) users...
> 
> Jörg
> 

I am able to replicate the problem in bash using:
#truss -tall -vall -o /tmp/getdents.bin.cp.truss /bin/cp -pr
/pond/photos/* /pond/copytestsame/

So I'm assuming that's using /bin/cp

Also, from my _very limited_ investigation this morning, it seems that
#grep Err /tmp/getdents.bin.cp.truss | grep -v ENOENT | grep getdents

returns entries such as:
getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
getdents64(0, 0xFEE34000, 8192) Err#9 EBADF
...(truncated)

whereas it seems like with a copy where everything is transferred
correctly that the above statement returns no getdents64() with an EBADF
error, leading me to believe that somewhere along the line getdents64 is
attempted to be called but that the descriptor is invalidated somehow.
Again...I am only gleaming that from a very limited test.

-Chris

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Joerg Schilling
Christopher Gorski <[EMAIL PROTECTED]> wrote:

> I am able to replicate the problem in bash using:
> #truss -tall -vall -o /tmp/getdents.bin.cp.truss /bin/cp -pr
> /pond/photos/* /pond/copytestsame/
>
> So I'm assuming that's using /bin/cp
>
> Also, from my _very limited_ investigation this morning, it seems that
> #grep Err /tmp/getdents.bin.cp.truss | grep -v ENOENT | grep getdents
>
> returns entries such as:
> getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
> getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
> getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
> getdents64(0, 0xFEE34000, 8192) Err#9 EBADF
> ...(truncated)

If you get this, you may need to provide the full truss output
to allow to understand what'ts happening.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Joerg Schilling
Robert Milkowski <[EMAIL PROTECTED]> wrote:

> If you could re-create empty files - exactly the same directory
> atructure and file names, check if you still got a problem.
> If you do, then if you could send a script here (mkdir's -p and touch)
> so we can investigate.

If you like to replicate a long directory structure with empty files,
you can use star:

star -c -meta f=/tmp/x.tar -C dir .

and later:

star -xp -xmeta f=/tmp/x.tar 

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS configuration for a thumper

2008-01-30 Thread Albert Shih
Hi all

I've a Sun X4500 with 48 disk of 750Go

The server come with Solaris install on two disk. That's mean I've got 46
disk for ZFS.

When I look the defautl configuration of the zpool 

zpool create -f zpool1 raidz c0t0d0 c1t0d0 c4t0d0 c6t0d0 c7t0d0
zpool add -f zpool1 raidz c0t1d0 c1t1d0 c4t1d0 c5t1d0 c6t1d0 c7t1d0
zpool add -f zpool1 raidz c0t2d0 c1t2d0 c4t2d0 c5t2d0 c6t2d0 c7t2d0
zpool add -f zpool1 raidz c0t3d0 c1t3d0 c4t3d0 c5t3d0 c6t3d0 c7t3d0
zpool add -f zpool1 raidz c0t4d0 c1t4d0 c4t4d0 c6t4d0 c7t4d0
zpool add -f zpool1 raidz c0t5d0 c1t5d0 c4t5d0 c5t5d0 c6t5d0 c7t5d0
zpool add -f zpool1 raidz c0t6d0 c1t6d0 c4t6d0 c5t6d0 c6t6d0 c7t6d0
zpool add -f zpool1 raidz c0t7d0 c1t7d0 c4t7d0 c5t7d0 c6t7d0 c7t7d0

that's mean there'are pool with 5 disk and other with 6 disk.

When I want to do the same I've got this message :

mismatched replication level: pool uses 5-way raidz and new vdev uses 6-way 
raidz

I can force this with «-f» option.

But what's that mean (sorry if the question is stupid). 

What's kind of pool you use with 46 disk ? (46=2*23 and 23 is prime number
that's mean I can make raidz with 6 or 7 or any number of disk).

Regards.

--
Albert SHIH
Observatoire de Paris Meudon
SIO batiment 15
Heure local/Local time:
Mer 30 jan 2008 16:36:49 CET
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Casper . Dik


>Also, from my _very limited_ investigation this morning, it seems tha=
>t
>#grep Err /tmp/getdents.bin.cp.truss | grep -v ENOENT | grep getdents
>
>returns entries such as:
>getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
>getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
>getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
>getdents64(0, 0xFEE34000, 8192) Err#9 EBADF
>=2E..(truncated)

Ah, this looks like "someone closed stdin and then something weird
happened.  Hm.



We need full truss out, specifically of all calls which return or
release filedescriptors.

The plot thickens.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration for a thumper

2008-01-30 Thread Albert Shih
 Le 30/01/2008 à 11:01:35-0500, Kyle McDonald a écrit
> Albert Shih wrote:
>> What's kind of pool you use with 46 disk ? (46=2*23 and 23 is prime number
>> that's mean I can make raidz with 6 or 7 or any number of disk).
>> 
>>   
> Depending on needs for space vs. performance, I'd probably pixk eithr 5*9 
> or 9*5,  with 1 hot spare.

Thanks for the tips...

How you can check the speed (I'm totally newbie on Solaris)

I've use 

mkfile 10g

for write and I've got same perf with 5*9 or 9*5.

Have you some advice about tool like iozone ? 

Regards.

--
Albert SHIH
Observatoire de Paris Meudon
SIO batiment 15
Heure local/Local time:
Mer 30 jan 2008 17:10:55 CET
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration for a thumper

2008-01-30 Thread Tim
On 1/30/08, Albert Shih <[EMAIL PROTECTED]> wrote:


Thanks for the tips...
>
> How you can check the speed (I'm totally newbie on Solaris)
>
> I've use
>
> mkfile 10g
>
> for write and I've got same perf with 5*9 or 9*5.
>
> Have you some advice about tool like iozone ?
>
> Regards.
>
> --
> Albert SHIH
> Observatoire de Paris Meudon
> SIO batiment 15
> Heure local/Local time:
> Mer 30 jan 2008 17:10:55 CET
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>




I'd take a look at bonnie++

http://www.sunfreeware.com/programlistintel10.html#bonnie++


--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration for a thumper

2008-01-30 Thread Kyle McDonald
Albert Shih wrote:
> What's kind of pool you use with 46 disk ? (46=2*23 and 23 is prime number
> that's mean I can make raidz with 6 or 7 or any number of disk).
>
>   
Depending on needs for space vs. performance, I'd probably pixk eithr 
5*9 or 9*5,  with 1 hot spare.

   -Kyle

> Regards.
>
> --
> Albert SHIH
> Observatoire de Paris Meudon
> SIO batiment 15
> Heure local/Local time:
> Mer 30 jan 2008 16:36:49 CET
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Robert Milkowski
Hello Joerg,

Wednesday, January 30, 2008, 2:56:27 PM, you wrote:

JS> Robert Milkowski <[EMAIL PROTECTED]> wrote:

>> If you could re-create empty files - exactly the same directory
>> atructure and file names, check if you still got a problem.
>> If you do, then if you could send a script here (mkdir's -p and touch)
>> so we can investigate.

JS> If you like to replicate a long directory structure with empty files,
JS> you can use star:

JS> star -c -meta f=/tmp/x.tar -C dir .

JS> and later:

JS> star -xp -xmeta f=/tmp/x.tar 


It really is a swiss knife :)
That's a handy one (although it's a first time I actually have seen a
need for such functionality).


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Joerg Schilling
[EMAIL PROTECTED] wrote:

>
>
> >Also, from my _very limited_ investigation this morning, it seems tha=
> >t
> >#grep Err /tmp/getdents.bin.cp.truss | grep -v ENOENT | grep getdents
> >
> >returns entries such as:
> >getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
> >getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
> >getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
> >getdents64(0, 0xFEE34000, 8192) Err#9 EBADF
> >=2E..(truncated)
>
> Ah, this looks like "someone closed stdin and then something weird
> happened.  Hm.

stdin is usually not a directory ;-9

This looks much more weird

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Neil Perrin


Roch - PAE wrote:
> Jonathan Loran writes:
>  > 
>  > Is it true that Solaris 10 u4 does not have any of the nice ZIL controls 
>  > that exist in the various recent Open Solaris flavors?  I would like to 
>  > move my ZIL to solid state storage, but I fear I can't do it until I 
>  > have another update.  Heck, I would be happy to just be able to turn the 
>  > ZIL off to see how my NFS on ZFS performance is effected before spending 
>  > the $'s.  Anyone know when will we see this in Solaris 10?
>  > 
> 
> You can certainly turn it off with any release (Jim's link).
> 
> It's true that S10u4 does not have the "Separate Intent Log" 
> to allow using an SSD for ZIL blocks. I believe S10U5 will
> have that feature.

Unfortunately it will not. A lot of ZFS fixes and features
that had existed for a while will not be in U5 (for reasons I
can't go into here). They should be in S10U6...

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Jonathan Loran


Neil Perrin wrote:
>
>
> Roch - PAE wrote:
>> Jonathan Loran writes:
>>  >  > Is it true that Solaris 10 u4 does not have any of the nice ZIL 
>> controls  > that exist in the various recent Open Solaris flavors?  I 
>> would like to  > move my ZIL to solid state storage, but I fear I 
>> can't do it until I  > have another update.  Heck, I would be happy 
>> to just be able to turn the  > ZIL off to see how my NFS on ZFS 
>> performance is effected before spending  > the $'s.  Anyone know when 
>> will we see this in Solaris 10?
>>  >
>> You can certainly turn it off with any release (Jim's link).
>>
>> It's true that S10u4 does not have the "Separate Intent Log" to allow 
>> using an SSD for ZIL blocks. I believe S10U5 will
>> have that feature.
>
Don't think we can live with this.  Thanks
> Unfortunately it will not. A lot of ZFS fixes and features
> that had existed for a while will not be in U5 (for reasons I
> can't go into here). They should be in S10U6...
>
> Neil.
I feel like we're being hung out to dry here.  I've got 70TB on 9 
various Solaris 10 u4 servers, with different data sets.  All of these 
are NFS servers.  Two servers have a ton of small files, with a lot of 
read and write updating, and NFS performance on these are abysmal.  ZFS 
is installed on SAN array's (my first mistake).  I will test by 
disabling the ZIL, but if it turns out the ZIL needs to be on a separate 
device, we're hosed. 

Before ranting any more, I'll do the test of disabling the ZIL.  We may 
have to build out these systems with Open Solaris, but that will be hard 
as they are in production.  I would have to install the new OS on test 
systems and swap out the drives during scheduled down time.  Ouch.

Jon

-- 


- _/ _/  /   - Jonathan Loran -   -
-/  /   /IT Manager   -
-  _  /   _  / / Space Sciences Laboratory, UC Berkeley
-/  / /  (510) 643-5146 [EMAIL PROTECTED]
- __/__/__/   AST:7731^29u18e3
 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Vincent Fox
Are you already running with zfs_nocacheflush=1?   We have SAN arrays with dual 
battery-backed controllers for the cache, so we definitely have this set on all 
our production systems.  It makes a big difference for us.

As I said before I don't see the catastrophe in disabling ZIL though.

We actually run our production Cyrus mail servers using failover servers so our 
downtime is typically just the small interval to switch active & idle nodes 
anyhow.  We did this mainly for patching purposes.

But we toyed with the idea of running OpenSolaris on them, then just upgrading 
the idle node to new OpenSolaris image every month using Jumpstart and 
switching to it.  Anything goes wrong switch back to the other node.

What we ended up doing, for political reasons, was putting the squeeze on our 
Sun reps and getting a 10u4 kernel spin patch with... what did they call it?  
Oh yeah "a big wad of ZFS fixes".  So this ends up being a hug PITA because for 
the next 6 months to a year we are tied to getting any kernel patches through 
this other channel rather than the usual way.   But it does work for us, so 
there you are.

Give my choice I'd go with OpenSolaris but that's a hard sell for datacenter 
management types.  I think it's no big deal in a production shop with good 
JumpStart and CFengine setups, where any host should be rebuildable from 
scratch in a matter of hours.  Good luck.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration for a thumper

2008-01-30 Thread Marion Hakanson
[EMAIL PROTECTED] said:
> I'd take a look at bonnie++
> http://www.sunfreeware.com/programlistintel10.html#bonnie++ 

Also filebench:
  http://www.solarisinternals.com/wiki/index.php/FileBench

You'll see the most difference between 5x9 and 9x5 in small random reads:

http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance
http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl
http://lindsay.at/blog/archive/2007/04/15/zfs-performance-models-for-a-streamin
g-server.html

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Dale Ghent
On Jan 30, 2008, at 3:44 PM, Vincent Fox wrote:

> What we ended up doing, for political reasons, was putting the  
> squeeze on our Sun reps and getting a 10u4 kernel spin patch with...  
> what did they call it?  Oh yeah "a big wad of ZFS fixes".  So this  
> ends up being a hug PITA because for the next 6 months to a year we  
> are tied to getting any kernel patches through this other channel  
> rather than the usual way.   But it does work for us, so there you  
> are.

Speaking of "big wad of ZFS fixes", is it me or is anyone else here  
getting kind of displeased over the glacial speed of the backporting  
of ZFS stability fixes to s10? It seems that we have to wait around  
4-5 months for a oft-delayed s10 update for any fixes of substance to  
come out.

Not only that, but also one day the zfs is its own patch, and then it  
is part of the current KU, and now it's part of the nfs patch where  
"zfs" isn't mentioned anywhere in the patch's synopsis.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Marion Hakanson
[EMAIL PROTECTED] said:
> I feel like we're being hung out to dry here.  I've got 70TB on 9  various
> Solaris 10 u4 servers, with different data sets.  All of these  are NFS
> servers.  Two servers have a ton of small files, with a lot of  read and
> write updating, and NFS performance on these are abysmal.  ZFS  is installed
> on SAN array's (my first mistake).  I will test by  disabling the ZIL, but if
> it turns out the ZIL needs to be on a separate  device, we're hosed.  

If you're using SAN arrays, you should be in good shape.  I'll echo what
Vincent Fox said about using either zfs_nocacheflush=1 (which is in S10U4),
or setting the arrays to ignore the cache flush (SYNC_CACHE) requests.
We do the latter here, and it makes a huge difference for NFS clients,
basically putting the ZIL in NVRAM.

However, I'm also unhappy about having to wait for S10U6 for the separate
ZIL and/or cache features of ZFS.  The lack of NV ZIL on our new Thumper
makes it painfully slow over NFS for the large number of file create/delete
type of workload.

Here's a question:  Would having the client mount with "-o nocto" have
the same effect (for that particular client) as disabling the ZIL on the
server?  If so, it might be less drastic than losing the ZIL for everyone.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS under VMware

2008-01-30 Thread Lewis Thompson
Hello,

I'm planning to use VMware Server on Ubuntu to host multiple VMs, one
of which will be a Solaris instance for the purposes of ZFS
I would give the ZFS VM two physical disks for my zpool, e.g. /dev/sda
and /dev/sdb, in addition to the VMware virtual disk for the Solaris
OS

Now I know that Solaris/ZFS likes to have total control over the disks
to ensure writes are flushed as and when it is ready for them to
happen, so I wonder if anybody comment on what implications using the
disks in this way (i.e. through Linux and then VMware) has on the
control Solaris has over these disks?  By using a VM will I be missing
out in terms of reliability?  If so, can anybody suggest any
improvements I could make while still allowing Solaris/ZFS to run in a
VM?

Many thanks, Lewis
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS under VMware

2008-01-30 Thread Torrey McMahon
Lewis Thompson wrote:
> Hello,
>
> I'm planning to use VMware Server on Ubuntu to host multiple VMs, one
> of which will be a Solaris instance for the purposes of ZFS
> I would give the ZFS VM two physical disks for my zpool, e.g. /dev/sda
> and /dev/sdb, in addition to the VMware virtual disk for the Solaris
> OS
>
> Now I know that Solaris/ZFS likes to have total control over the disks
> to ensure writes are flushed as and when it is ready for them to
> happen, so I wonder if anybody comment on what implications using the
> disks in this way (i.e. through Linux and then VMware) has on the
> control Solaris has over these disks?  By using a VM will I be missing
> out in terms of reliability?  If so, can anybody suggest any
> improvements I could make while still allowing Solaris/ZFS to run in a
> VM?

I'm not sure what the perf aspects would be but it depends on what the 
VMware software passes through. Does it ignore cache sync commands in 
its i/o stack? Got me.

You won't be missing out on reliability but you will be introducing more 
layers in the stack where something could go wrong.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Jonathan Loran

Vincent Fox wrote:
> Are you already running with zfs_nocacheflush=1?   We have SAN arrays with 
> dual battery-backed controllers for the cache, so we definitely have this set 
> on all our production systems.  It makes a big difference for us.
>
>   
No, we're not using the zfs_nocacheflush=1, but our SAN array's are set 
to cache all writebacks, so it shouldn't be needed.  I may test this, if 
I get the chance to reboot one of the servers, but I'll bet the storage 
arrays' are working correctly.

> As I said before I don't see the catastrophe in disabling ZIL though.
>
>   
No catastrophe, just a potential mess.

> We actually run our production Cyrus mail servers using failover servers so 
> our downtime is typically just the small interval to switch active & idle 
> nodes anyhow.  We did this mainly for patching purposes.
>   
Wish we could afford such replication.  Poor EDU environment here, I'm 
afraid.
> But we toyed with the idea of running OpenSolaris on them, then just 
> upgrading the idle node to new OpenSolaris image every month using Jumpstart 
> and switching to it.  Anything goes wrong switch back to the other node.
>
> What we ended up doing, for political reasons, was putting the squeeze on our 
> Sun reps and getting a 10u4 kernel spin patch with... what did they call it?  
> Oh yeah "a big wad of ZFS fixes".  So this ends up being a hug PITA because 
> for the next 6 months to a year we are tied to getting any kernel patches 
> through this other channel rather than the usual way.   But it does work for 
> us, so there you are.
>   
Mmmm, for us, Open Solaris may be easier.  I manly was after stability, 
to be honest.  Our ongoing experience with bleeding edge Linux is 
painful at times, and on our big iron, I want them to just work.  but if 
they're so slow, they're not really working right, are they?  Sigh...
> Give my choice I'd go with OpenSolaris but that's a hard sell for datacenter 
> management types.  I think it's no big deal in a production shop with good 
> JumpStart and CFengine setups, where any host should be rebuildable from 
> scratch in a matter of hours.  Good luck.
>  
>   
True, I'll think about that going forward.  Thanks,

Jon
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

-- 


- _/ _/  /   - Jonathan Loran -   -
-/  /   /IT Manager   -
-  _  /   _  / / Space Sciences Laboratory, UC Berkeley
-/  / /  (510) 643-5146 [EMAIL PROTECTED]
- __/__/__/   AST:7731^29u18e3
 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Neil Perrin


Jonathan Loran wrote:
> Vincent Fox wrote:
>> Are you already running with zfs_nocacheflush=1?   We have SAN arrays with 
>> dual battery-backed controllers for the cache, so we definitely have this 
>> set on all our production systems.  It makes a big difference for us.
>>
>>   
> No, we're not using the zfs_nocacheflush=1, but our SAN array's are set 
> to cache all writebacks, so it shouldn't be needed.  I may test this, if 
> I get the chance to reboot one of the servers, but I'll bet the storage 
> arrays' are working correctly.

I think there's some confusion. ZFS and the ZIL issue controller commands
to force the disk cache to be flushed to ensure data is on stable
storage. If the disk cache is battery backed then the costly flush
is unnecessary. As Vincent said, setting zfs_nocacheflush=1 can make a
huge difference.

Note that this is a system wide variable so all controllers serving ZFS
devices should be non volatile to enable it.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Vincent Fox
> No, we're not using the zfs_nocacheflush=1, but our
> SAN array's are set 
> to cache all writebacks, so it shouldn't be needed.
>  I may test this, if 
> get the chance to reboot one of the servers, but
>  I'll bet the storage 
> rrays' are working correctly.

Bzzzt, wrong.

Read up on a few threads about this variable.  The ZFS flush command used 
equates to "flush to rust" for most any array.  What this works out to, is your 
array is not using it's NV for what it's supposed to.  You get a little data in 
the NV but it's tagged with this command that requires the NV to finish it's 
job and report back data is on disk, before proceeding.   Hopefully at some 
point the array people and the ZFS people will have a meeting of the minds on 
this issue of having the array report to the OS "yes I have battery-back SAFE 
NV" and it will all just automagically work.  Until then, we set the variable 
in /etc/system.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] 30 seond hang, ls command....

2008-01-30 Thread Neal Pollack
I'm running Nevada build 81 on x86 on an Ultra 40.
# uname -a
SunOS zbit 5.11 snv_81 i86pc i386 i86pc
Memory size: 8191 Megabytes

I started with this zfs pool many dozens of builds ago, approx a year ago.
I do live upgrade and zfs upgrade every few builds.

When I have not accessed the zfs file systems for a long time,
if I cd there and do an ls command, nothing happens for approx 30 seconds.

Any clues how I would find out what is wrong?

--

# zpool status -v
  pool: tank
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz2ONLINE   0 0 0
c2d0ONLINE   0 0 0
c3d0ONLINE   0 0 0
c4d0ONLINE   0 0 0
c5d0ONLINE   0 0 0
c6d0ONLINE   0 0 0
c7d0ONLINE   0 0 0
c8d0ONLINE   0 0 0

errors: No known data errors


# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank   172G  2.04T  52.3K  /tank
tank/arc   172G  2.04T   172G  /zfs/arc

# zpool list
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
tank  3.16T   242G  2.92T 7%  ONLINE  -



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 30 seond hang, ls command....

2008-01-30 Thread Nathan Kroenert
Any chance the disks are being powered down, and you are waiting for 
them to power back up?

Nathan. :)

Neal Pollack wrote:
> I'm running Nevada build 81 on x86 on an Ultra 40.
> # uname -a
> SunOS zbit 5.11 snv_81 i86pc i386 i86pc
> Memory size: 8191 Megabytes
> 
> I started with this zfs pool many dozens of builds ago, approx a year ago.
> I do live upgrade and zfs upgrade every few builds.
> 
> When I have not accessed the zfs file systems for a long time,
> if I cd there and do an ls command, nothing happens for approx 30 seconds.
> 
> Any clues how I would find out what is wrong?
> 
> --
> 
> # zpool status -v
>   pool: tank
>  state: ONLINE
>  scrub: none requested
> config:
> 
> NAMESTATE READ WRITE CKSUM
> tankONLINE   0 0 0
>   raidz2ONLINE   0 0 0
> c2d0ONLINE   0 0 0
> c3d0ONLINE   0 0 0
> c4d0ONLINE   0 0 0
> c5d0ONLINE   0 0 0
> c6d0ONLINE   0 0 0
> c7d0ONLINE   0 0 0
> c8d0ONLINE   0 0 0
> 
> errors: No known data errors
> 
> 
> # zfs list
> NAME   USED  AVAIL  REFER  MOUNTPOINT
> tank   172G  2.04T  52.3K  /tank
> tank/arc   172G  2.04T   172G  /zfs/arc
> 
> # zpool list
> NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
> tank  3.16T   242G  2.92T 7%  ONLINE  -
> 
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] I.O error: zpool metadata corrupted after powercut

2008-01-30 Thread kristof
Last 2 weeks we had 2 zpools corrupted.

Pool was visible via zpool import, but could not be imported anymore. During 
import attempt we got I/O error,

After a first powercut we lost our jumpstart/nfsroot zpool (another pool was 
still OK). Luckaly jumpstart data was backed up and easely restored, nfsroot 
Filesystems where not but those where just test machines.  We thought the 
metadata corruption was caused because of the zfs no cache flush setting we had 
configured in /etc/system (for perfomance reason) in combination with a non 
battery backuppped NVRAM cache (areca raid controller).

zpool was raidz with 10 local sata disks (JBOD mode)


2 days ago we had another powercut in our test labo :-(

And again one pool was lost. This system was not configured with zfs no cache 
flush. On the pool we had +/- 40 zvols used by running vm's (iscsi 
boot/swap/data disks for xen & virtual box guests)

The first failure was on a b68 system, the second on a b77 system.

Last zpool was using iscsi disks: 

setup:

pool
 mirror:
   iscsidisk1 san1
   iscsidisk1 san2
 mirror:
   iscsidisk2 san1
   iscsidisk2 san2

I thought zfs was always persistent on disk, but apparently a power cut has can 
cause unrecoverable damage.

I can accept the first failure (because of the dangerous setting), but loosing 
that second pool was unacceptable for me.

Since no fsck alike utility is available for zfs I was wondering if there are 
any plans to create something like meta data repair tools?

Using ZFS now for almost 1 year I was a big Fan, In one year I lost not 1 zpool 
till last week.

At this time I'm concidering to say ZFS is not yet production ready

any comment welcome...

krdoor
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Can't offline second disk in a mirror

2008-01-30 Thread Boyd Adamson
Since I spend a lot of time going from machine to machine so I thought  
I'd carry a pool with me on a couple of USB keys. It all works fine  
but it's slow, so I thought I'd attach a file vdev to the pool and  
then offline the USB devices for speed, then undo when I want to take  
the keys with me. Unfortunately, it seems that once I've offlined one  
device, the mirror is marked as degraded and then I'm not allows to  
take the other USB key offline:

# zpool create usb mirror /dev/dsk/c5t0d0p0 /dev/dsk/c6t0d0p0
# mkfile 2g /file
# zpool attach usb c6t0d0p0 /file
# zpool status
pool: usb
  state: ONLINE
  scrub: resilver completed with 0 errors on Thu Jan 31 13:24:22 2008
config:

 NAME  STATE READ WRITE CKSUM
 usb   ONLINE   0 0 0
   mirror  ONLINE   0 0 0
 c5t0d0p0  ONLINE   0 0 0
 c6t0d0p0  ONLINE   0 0 0
 /file ONLINE   0 0 0

errors: No known data errors
# zpool offline usb c5t0d0p0
Bringing device c5t0d0p0 offline
# zpool status
   pool: usb
  state: DEGRADED
status: One or more devices has been taken offline by the administrator.
 Sufficient replicas exist for the pool to continue  
functioning in a
 degraded state.
action: Online the device using 'zpool online' or replace the device  
with
 'zpool replace'.
  scrub: resilver completed with 0 errors on Thu Jan 31 13:24:22 2008
config:

 NAME  STATE READ WRITE CKSUM
 usb   DEGRADED 0 0 0
   mirror  DEGRADED 0 0 0
 c5t0d0p0  OFFLINE  0 0 0
 c6t0d0p0  ONLINE   0 0 0
 /file ONLINE   0 0 0

errors: No known data errors
# zpool offline usb c6t0d0p0
cannot offline c6t0d0p0: no valid replicas
# cat /etc/release
 Solaris 10 8/07 s10x_u4wos_12b X86
Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
 Use is subject to license terms.
 Assembled 16 August 2007


I've experimented with other configurations (not just keys and files,  
but slices as well) and found the same thing - once one device in a  
mirror is offline I can't offline any others, even though there are  
other (sometimes multiple) copies left.

Of course, I can detach the device, but I was hoping to avoid a full  
resilver when I reattach.

Is this the expected behaviour? Am I missing something that would mean  
that what I'm trying to do is a bad idea?

Boyd
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I.O error: zpool metadata corrupted after powercut

2008-01-30 Thread Richard Elling
kristof wrote:
> Last 2 weeks we had 2 zpools corrupted.
>
> Pool was visible via zpool import, but could not be imported anymore. During 
> import attempt we got I/O error,
>   

What exactly was the error message?
Also look at the fma messages, as they are often more precise.
 -- richard

> After a first powercut we lost our jumpstart/nfsroot zpool (another pool was 
> still OK). Luckaly jumpstart data was backed up and easely restored, nfsroot 
> Filesystems where not but those where just test machines.  We thought the 
> metadata corruption was caused because of the zfs no cache flush setting we 
> had configured in /etc/system (for perfomance reason) in combination with a 
> non battery backuppped NVRAM cache (areca raid controller).
>
> zpool was raidz with 10 local sata disks (JBOD mode)
>
>
> 2 days ago we had another powercut in our test labo :-(
>
> And again one pool was lost. This system was not configured with zfs no cache 
> flush. On the pool we had +/- 40 zvols used by running vm's (iscsi 
> boot/swap/data disks for xen & virtual box guests)
>
> The first failure was on a b68 system, the second on a b77 system.
>
> Last zpool was using iscsi disks: 
>
> setup:
>
> pool
>  mirror:
>iscsidisk1 san1
>iscsidisk1 san2
>  mirror:
>iscsidisk2 san1
>iscsidisk2 san2
>
> I thought zfs was always persistent on disk, but apparently a power cut has 
> can cause unrecoverable damage.
>
> I can accept the first failure (because of the dangerous setting), but 
> loosing that second pool was unacceptable for me.
>
> Since no fsck alike utility is available for zfs I was wondering if there are 
> any plans to create something like meta data repair tools?
>
> Using ZFS now for almost 1 year I was a big Fan, In one year I lost not 1 
> zpool till last week.
>
> At this time I'm concidering to say ZFS is not yet production ready
>
> any comment welcome...
>
> krdoor
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Hardware RAID vs. ZFS RAID

2008-01-30 Thread Gregory Perry
Hello,

I have a Dell 2950 with a Perc 5/i, two 300GB 15K SAS drives in a RAID0 array.  
I am considering going to ZFS and I would like to get some feedback about which 
situation would yield the highest performance:  using the Perc 5/i to provide a 
hardware RAID0 that is presented as a single volume to OpenSolaris, or using 
the drives separately and creating the RAID0 with OpenSolaris and ZFS?  Or 
maybe just adding the hardware RAID0 to a ZFS pool?  Can anyone suggest some 
articles or FAQs on implementing ZFS RAID?

Which situation would provide the highest read and write throughput?

Thanks in advance
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] seen on freebsd-stable: reproducible zfs panic

2008-01-30 Thread James C. McPherson
Hi everybody,
Greg pointed me to
http://lists.freebsd.org/pipermail/freebsd-stable/2008-January/040136.html
from a Daniel Eriksson:



If you import and export more than one zpool FreeBSD will panic during
shutdown. This bug is present in both RELENG_7 and RELENG_7_0 (I have
not tested CURRENT).

kgdb output:

Syncing disks, vnodes remaining...2 1 0 0 done
All buffers synced.
vput: negative ref count
0xc2ad1aa0: tag ufs, type VDIR
 usecount 0, writecount 0, refcount 2 mountedhere 0
 flags (VV_ROOT)
  VI_LOCKedv_object 0xc1030174 ref 0 pages 1
  lock type ufs: EXCL (count 1) by thread 0xc296 (pid 1)
 ino 2, on dev ad0s1a
panic: vput: negative ref cnt
KDB: stack backtrace:
db_trace_self_wrapper(c086ad8a,d3b19b68,c06265ba,c0868fd5,c08e9ca0,...)
at db_trace_self_wrapper+0x26
kdb_backtrace(c0868fd5,c08e9ca0,c086f57a,d3b19b74,d3b19b74,...) at
kdb_backtrace+0x29
panic(c086f57a,c085,c086f561,c2ad1aa0,d3b19b90,...) at panic+0xaa
vput(c2ad1aa0,2,d3b19bf0,c296,c086eedd,...) at vput+0xdb
dounmount(c2ba6d0c,8,c296,0,0,...) at dounmount+0x49f
vfs_unmountall(c0868ebb,0,c2967000,8,d3b19c50,...) at
vfs_unmountall+0x33
boot(c296,8,1,c295e000,c296,...) at boot+0x3e3
reboot(c296,d3b19cfc,4,c086b882,56,...) at reboot+0x66
syscall(d3b19d38) at syscall+0x33a
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (55, FreeBSD ELF32, reboot), eip = 0x8050903, esp =
0xbfbfe90c, ebp = 0xbfbfe9d8 ---
Uptime: 2m42s
Physical memory: 503 MB
Dumping 39 MB: 24 8


Run this script and then reboot the computer to trigger the panic:

dd if=/dev/zero of=/usr/_disk1 bs=1m count=80
dd if=/dev/zero of=/usr/_disk2 bs=1m count=80
mdconfig -f /usr/_disk1 -u 1
mdconfig -f /usr/_disk2 -u 2
/etc/rc.d/zfs forcestart
zpool create tank1 md1
zpool create tank2 md2
sleep 2
touch /tank1/testfile
touch /tank2/testfile
sleep 2
zpool export tank2
zpool export tank1
sleep 10
zpool import tank1
zpool import tank2
sleep 2
touch /tank1/testfile
touch /tank2/testfile
sleep 2
zpool export tank2
zpool export tank1
/etc/rc.d/zfs forcestop
sleep 2
mdconfig -d -u 1
mdconfig -d -u 2
rm /usr/_disk1
rm /usr/_disk2

/Daniel Eriksson




Anybody seen anything like this, on Solaris or freebsd?

I've got my doubts about whether Daniel's got a valid test.



thanks,
James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Steve Hillman
> 
> However, I'm also unhappy about having to wait for S10U6 for the separate
> ZIL and/or cache features of ZFS.  The lack of NV ZIL on our new Thumper
> makes it painfully slow over NFS for the large number of file create/delete
> type of workload.

I did a bit of testing on this (because I'm in the same boat) and was able to 
work around it by breaking my filesystem up into lots of individual zfs 
filesystems. Although the performance of each one isn't great, as long as your 
load is threaded and distributed across filesystems, it should balance out.

Steve Hillman
Simon Fraser University
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss