date:20090728

Re: [zfs-discuss] Help with setting up ZFS

2009-07-28 Thread Brian

> >   
> The two plugs that I indicated are multi-lane SAS
> ports, which /require/ 
> using a breakout cable;  don't worry - that the
> design for them. 
> "multi-lane" means exactly that - several actual SAS
> connections in a 
> single plug.  The other 6 ports next to them (in
> black) are SATA ports 
> connected to the ICH9R.



Just a quick question before I address everyone else.  
I bought this connector
http://www.newegg.com/Product/Product.aspx?Item=N82E16812198020

However its pretty clear to me now (after Ive ordered it) that it won't at all 
fit in the SAS connector on the board. What kind of cable do I need for this?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS Mirror : drive unexpectedly unplugged

2009-07-28 Thread Avérous Julien-Pierre

Hi !

I'm a Mac User, but I think that I will get more response here about this 
question than on a Mac forum.
And first, sorry for my approximative English.

I have a ZFS Pool named "MyPool" with two device (two external USB drive), 
configured as mirror :

NAME  STATE READ WRITE CKSUM
MyPoolONLINE   0 0 0
  mirror  ONLINE   0 0 0
/dev/disk2s2  ONLINE   0 0 0
/dev/disk3s2  ONLINE   0 0 0


The drive disk3 is unexpectedly unplugged for some reason (power failure, etc), 
but the drive is perfectely functional.

I have plugged an other driver before replugging the unplugged drive. This new 
drive take the device name "disk3"...
So, if I plug again the unplugged drive, the drive take the device name "disk4" 
and my ZFS mirror partition the name "disk4s2".

Here is my problem. If I scrub my pool, he don't recognize the new device name. 
It seem that the only solution is to export and import again the pool, because :
- If I do a detach of disk3s2 and a attach of disk4s2, he say to me that the 
drive is not of the good size (it's exact, but I have no problem with this when 
I create my mirrored pool).
- If I do a attach of disk4s2 on disk2s2, it say to me that disk3s2 is busy 
(it's suspicious : the drive is not used)
- If I do a replace, ZFS seem to resilver the totality of the drive (and this 
can take age to finish !), but the drive doesn't need to be fully resilvered, 
no (when the drive is unplugged, the drives are in the same stat) ?

So what is the official method to say to ZFS that the device name was changed 
without exporting and re-importing the pool.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Mirror : drive unexpectedly unplugged

2009-07-28 Thread Thomas Burgess

I don't have an answer to your question exactly because i'm a noob and i'm
not using mac but i can say that on FreeBSD which i'm using atm there is a
method to name devices ahead of time so if the drive letters change you
avoid this exact problem.  I'm sure opensolaris and mac have something
similar.

As far as your issue goes i THINK you can just export/import the
pool.i'm sure someone in this list who knows more than me will speak up
on offical stuff =)

2009/7/28 Avérous Julien-Pierre 

> Hi !
>
> I'm a Mac User, but I think that I will get more response here about this
> question than on a Mac forum.
> And first, sorry for my approximative English.
>
> I have a ZFS Pool named "MyPool" with two device (two external USB drive),
> configured as mirror :
>
> NAME  STATE READ WRITE CKSUM
> MyPoolONLINE   0 0 0
>  mirror  ONLINE   0 0 0
>/dev/disk2s2  ONLINE   0 0 0
>/dev/disk3s2  ONLINE   0 0 0
>
>
> The drive disk3 is unexpectedly unplugged for some reason (power failure,
> etc), but the drive is perfectely functional.
>
> I have plugged an other driver before replugging the unplugged drive. This
> new drive take the device name "disk3"...
> So, if I plug again the unplugged drive, the drive take the device name
> "disk4" and my ZFS mirror partition the name "disk4s2".
>
> Here is my problem. If I scrub my pool, he don't recognize the new device
> name. It seem that the only solution is to export and import again the pool,
> because :
> - If I do a detach of disk3s2 and a attach of disk4s2, he say to me that
> the drive is not of the good size (it's exact, but I have no problem with
> this when I create my mirrored pool).
> - If I do a attach of disk4s2 on disk2s2, it say to me that disk3s2 is busy
> (it's suspicious : the drive is not used)
> - If I do a replace, ZFS seem to resilver the totality of the drive (and
> this can take age to finish !), but the drive doesn't need to be fully
> resilvered, no (when the drive is unplugged, the drives are in the same
> stat) ?
>
> So what is the official method to say to ZFS that the device name was
> changed without exporting and re-importing the pool.
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-28 Thread Ross

I think people can understand the concept of missing flushes.  The big 
conceptual problem is how this manages to hose an entire filesystem, which is 
assumed to have rather a lot of data which ZFS has already verified to be ok.

Hardware ignoring flushes and loosing recent data is understandable, I don't 
think anybody would argue with that.  Loosing access to your entire pool and 
multiple gigabytes of data because a few writes failed is a whole different 
story, and while I understand how it happens, ZFS appears to be unique among 
modern filesystems in suffering such a catastrophic failure so often.

To give a quick personal example:  I can plug a fat32 usb disk into a windows 
system, drag some files to it, and pull that drive at any point.  I might loose 
a few files, but I've never lost the entire filesystem.  Even if the absolute 
worst happened, I know I can run scandisk, chkdisk, or any number of file 
recovery tools and get my data back.

I would never, ever attempt this with ZFS.

For a filesystem like ZFS where it's integrity and stability are sold as being 
way better than existing filesystems, loosing your entire pool is a bit of a 
shock.  I know that work is going on to be able to recover pools, and I'll 
sleep a lot sounder at night once it is available.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Mirror : drive unexpectedly unplugged

2009-07-28 Thread Avérous Julien-Pierre

Thank you for you response, wonslung.

I can export / import, yes, but for this I should unmount all filesystems 
depending of the pool, and it's not always possible (and it's sad to be forced 
to do that).

For the same name device, I don't know how to do that. I will search for this.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Mirror : drive unexpectedly unplugged

2009-07-28 Thread Avérous Julien-Pierre

There is a little mistake :

"If I do a attach of disk4s2 on disk2s2, it say to me that disk3s2 is busy 
(it's suspicious : the drive is not used)"

The good version is :

"If I do a attach of disk4s2 on disk2s2, it say to me that disk4s2 is busy 
(it's suspicious : the drive is not used)"

(disk3s2 is no more concerned by ZFS in this case)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Mirror : drive unexpectedly unplugged

2009-07-28 Thread Thomas Burgess

sometimes the disk will be busy just from being in the directory or if
something is trying to connect to it.

Again, i'm no expert so i'm going to refrain from commenting on your issue
further.


2009/7/28 Avérous Julien-Pierre 

> There is a little mistake :
>
> "If I do a attach of disk4s2 on disk2s2, it say to me that disk3s2 is busy
> (it's suspicious : the drive is not used)"
>
> The good version is :
>
> "If I do a attach of disk4s2 on disk2s2, it say to me that disk4s2 is busy
> (it's suspicious : the drive is not used)"
>
> (disk3s2 is no more concerned by ZFS in this case)
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] How to "mirror" an entire zfs pool to another pool

2009-07-28 Thread Thomas Walker

We are upgrading to new storage hardware.  We currently have a zfs pool with 
the old storage volumes.  I would like to create a new zfs pool, completely 
separate, with the new storage volumes.  I do not want to just replace the old 
volumes with new volumes in the pool we are currently using.  I don't see a way 
to create a mirror of a pool.  Note, I'm not talking about a mirrored-pool, 
meaning mirrored drives inside the pool.  I want to mirror pool1 to pool2.  
Snapshots and clones do not seem to be what I want as they only work inside a 
given pool.  I have looked at Sun Network Data Replicator (SNDR) but that 
doesn't seem to be what I want either as the physical volumes in the new pool 
may be a different size than in the old pool.

  Does anyone know how to do this?  My only idea at the moment is to create the 
new pool, create new filesystems and then use rsync from the old filesystems to 
the new filesystems, but it seems like there should be a way to mirror or 
replicate the pool itself rather than doing it at the filesystem level.

  Thomas Walker
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to "mirror" an entire zfs pool to another pool

2009-07-28 Thread michael schuster


Thomas Walker wrote:

We are upgrading to new storage hardware.  We currently have a zfs pool
with the old storage volumes.  I would like to create a new zfs pool,
completely separate, with the new storage volumes.  I do not want to
just replace the old volumes with new volumes in the pool we are
currently using.  I don't see a way to create a mirror of a pool.  Note,
I'm not talking about a mirrored-pool, meaning mirrored drives inside
the pool.  I want to mirror pool1 to pool2.  Snapshots and clones do not
seem to be what I want as they only work inside a given pool.  I have
looked at Sun Network Data Replicator (SNDR) but that doesn't seem to be
what I want either as the physical volumes in the new pool may be a
different size than in the old pool.

Does anyone know how to do this?  My only idea at the moment is to
create the new pool, create new filesystems and then use rsync from the
old filesystems to the new filesystems, but it seems like there should
be a way to mirror or replicate the pool itself rather than doing it at
the filesystem level.


have you looked at what 'zfs send' can do?

Michael
--
Michael Schuster http://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to "mirror" an entire zfs pool to another pool

2009-07-28 Thread Darren J Moffat


Thomas Walker wrote:

We are upgrading to new storage hardware.  We currently have a zfs pool with 
the old storage volumes.  I would like to create a new zfs pool, completely 
separate, with the new storage volumes.  I do not want to just replace the old 
volumes with new volumes in the pool we are currently using.  I don't see a way 
to create a mirror of a pool.  Note, I'm not talking about a mirrored-pool, 
meaning mirrored drives inside the pool.  I want to mirror pool1 to pool2.  
Snapshots and clones do not seem to be what I want as they only work inside a 
given pool.  I have looked at Sun Network Data Replicator (SNDR) but that 
doesn't seem to be what I want either as the physical volumes in the new pool 
may be a different size than in the old pool.

  Does anyone know how to do this?  My only idea at the moment is to create the 
new pool, create new filesystems and then use rsync from the old filesystems to 
the new filesystems, but it seems like there should be a way to mirror or 
replicate the pool itself rather than doing it at the filesystem level.


You can do this by attaching the new disks one by one to the old ones.

This is only going to work if your new storage pool has exactly the
same number (at the same size or larger) disks.

For example you have 12 500G drives and your new storage is 12 1TB 
drivers.   That will work.


For each drive in the old pool do:
zpool attach   

When you have done that and the resilver has completed then you can
zpool detach all the old drives.

If your existing storage is all ready mirrored this still works you just 
do the detach twice to get off the old pool.


On the other hand if you have 12 500G drives and your new storage is
6 1TB drives then you can't do that via mirroring you need to use
zfs send and recv eg:

zpool create newpool 
zfs snapshot -r oldp...@sendit
zfs send -R oldp...@sendit | zfs recv -vFd newpool

That will work providing the data will fit and unlike rsync it will 
preserve all your snapshots and you don't have to recreate the new 
filesystems.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to "mirror" an entire zfs pool to another pool

2009-07-28 Thread Thomas Walker

> zpool create newpool 
> zfs snapshot -r oldp...@sendit
> zfs send -R oldp...@sendit | zfs recv -vFd newpool

   I think this is probably something like what I want, the problem is I'm not 
really "getting it" yet.  If you could explain just what is happening here in 
an example.  Let's say I have this setup;

oldpool = 10 x 500GB volumes, with two mounted filesystems; fs1 and fs2

I create newpool = 12 x 1TB volumes using new storage hardware.  newpool thus 
has a lot more capacity than oldpool, but not the same number of physical 
volumes or the same size volumes.

I want to replicate oldpool and thus oldpool/fs1 and oldpool/fs2 on newpool/fs1 
and newpool/fs2.  And I want to do this in a way that allows me to "switch 
over" from oldpool to newpool on a day that is scheduled with the customers and 
then take oldpool away.

So on Monday I take a snapshot of oldpool, like you say;

zfs snapshot -r oldp...@sendit

And I send/recv it to newpool;

zfs send -R oldp...@sendit | zfs recv -vFd newpool

At this point does all of that data, say 3TB or so, start copying over to the 
newpool?  How do I monitor the progress of the transfer?  Once that initial 
copy is done, on say Wednesday, how do I then do a final "sync" from oldpool to 
newpool to pick up any changes that occurred since the first snapshot on 
Monday.  I assume that for this final snapshot I would unmount the filesystems 
to prevent any changes by the customer.

   Sorry I'm being dense here, I think I sort of get it but I don't have the 
whole picture.

   Thomas Walker
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to "mirror" an entire zfs pool to another pool

2009-07-28 Thread Darren J Moffat


I think this is probably something like what I want, the problem is I'm
not really "getting it" yet.  If you could explain just what is
happening here in an example.  Let's say I have this setup;

oldpool = 10 x 500GB volumes, with two mounted filesystems; fs1 and fs2

I create newpool = 12 x 1TB volumes using new storage hardware.  
newpool thus has a lot more capacity than oldpool, but not the same

number of physical volumes or the same size volumes.


That is fine because the zfs send | zfs recv copies the data across.

I want to replicate oldpool and thus oldpool/fs1 and oldpool/fs2 on 
newpool/fs1 and newpool/fs2.  And I want to do this in a way that

allows me to "switch over" from oldpool to newpool on a day that is
scheduled with the customers and then take oldpool away.


So depending on the volume of data change you might need to do the
snapshot and send several times.


So on Monday I take a snapshot of oldpool, like you say;

zfs snapshot -r oldp...@sendit

And I send/recv it to newpool;

zfs send -R oldp...@sendit | zfs recv -vFd newpool

At this point does all of that data, say 3TB or so, start copying over 
to the newpool?  


Everything in all the oldpool datasets that was written upto the time 
the @sendit snapshot was created will be.


> How do I monitor the progress of the transfer?  Once

Unfortunately there is no easy way to do that just now.  When the 'zfs 
recv' finishes is it is done.



that initial copy is done, on say Wednesday, how do I then do a final
"sync" from oldpool to newpool to pick up any changes that occurred
since the first snapshot on Monday. 


Do almost the same again eg:

zfs snapshot -r oldp...@wednesday
zfs send -R -i oldp...@monday oldp...@wednesday | zfs recv -vFd newpool

> I assume that for this final

snapshot I would unmount the filesystems to prevent any changes by the
customer.


That is very good idea, the filesystem does *not* need to be mounted for 
the zfs send to work.


Once the last send is finished do:

zpool export oldpool

If you want to actually rename newpool back to the oldpool name do this:

zpool export newpool
zpool import newpool oldpool



Sorry I'm being dense here, I think I sort of get it but I don't have
the whole picture.


You are very close, there is some more info in the zfs(1M) man page.

--
Darren J Moffat


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] USF drive on S10u7

2009-07-28 Thread dick hoogendijk

What is the best way to attach an USB harddisk to Solaris 10u7?
I know some program is running to auto detect such a device (have
forgotten the name, because I do almost all work on OSOL (hal).
do I use "that program" or disable it an manualy attach the drive to
the system?

-- 
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
+ http://nagual.nl/ | SunOS 10u7 05/09 | OpenSolaris 2010.02 B118
+ All that's really worth doing is what we do for others (Lewis Carrol)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [indiana-discuss] zfs issues?

2009-07-28 Thread James Lever


Thanks for that Brian.

I've logged a bug:

CR 6865661 *HOT* Created, P1 opensolaris/triage-queue zfs scrub rpool  
causes zpool hang




Just discovered after trying to create a further crash dump that it's  
failing and rebooting with the following error (just caught it prior  
to the reboot):




panic dump timeout



so I'm not sure how else to assist with debugging this issue.



cheers,

James



On 28/07/2009, at 9:08 PM, Brian Ruthven - Solaris Network Sustaining  
- Sun UK wrote:



Yes:

$Make sure your dumpadm is set up beforehand to enable savecore, and  
that you have a dump device. In my case the output looks like this:


$ pfexec dumpadm
Dump content: kernel pages
 Dump device: /dev/zvol/dsk/rpool/dump (dedicated)
Savecore directory: /var/crash/opensolaris
Savecore enabled: yes


Then you should get a dump saved in /var/crash/ on next  
reboot.


Brian


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to "mirror" an entire zfs pool to another pool

2009-07-28 Thread Thomas Walker

I think you've given me enough information to get started on a test of the 
procedure.  Thanks very much.

  Thomas Walker
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] When writing to SLOG at full speed all disk IO is blocked

2009-07-28 Thread Marcelo Leal

Ok Bob, but i think that is the problem about picket fencing... and so we are 
talking about commit the sync operations to disk. What i'm seeing is no read 
activity from disks when the slog is beeing written. The disks are "zero" (no 
read, no write).

 Thanks a lot for your reply.

 Leal
[ http://www.eall.com.br/blog ]
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] USF drive on S10u7

2009-07-28 Thread Cindy . Swearingen


Hi Dick,

The Solaris 10 volume management service is volfs.

If you attach the USB hard disk and run volcheck, the disk should
be mounted under the /rmdisk directory.

If the auto-mounting doesn't occur, you can disable volfs and mount
it manually.

You can read more about this feature here:

http://docs.sun.com/app/docs/doc/817-5093/medaccess-29267?a=view

Cindy

On 07/28/09 07:56, dick hoogendijk wrote:

What is the best way to attach an USB harddisk to Solaris 10u7?
I know some program is running to auto detect such a device (have
forgotten the name, because I do almost all work on OSOL (hal).
do I use "that program" or disable it an manualy attach the drive to
the system?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [sam-qfs-discuss] sam-fs on zfs-pool

2009-07-28 Thread David Gwynne



On 27/07/2009, at 10:14 PM, Tobias Exner wrote:


Hi list,

I've did some tests and run into a very strange situation..


I created a zvol using "zfs create -V" and initialize an sam- 
filesystem on this zvol.

After that I restored some testdata using a dump from another system.

So far so good.

After some big troubles I found out that releasing files in the sam- 
filesystem doesn't create space on the underlying zvol.
So staging and releasing files just work until the "zfs list" shows  
me a zvol with 100% usage although the sam-filesystem was only  
filled up to 20%.

I didn't create snapshots and a scrub did show any errors.

When the zvol was filled up even a sammkfs can't solve the problem.  
I had to destroy the zvol ( not zpool ).

After that I was able recreate a new zvol with sam-fs on top.


this is a feature of block devices. once you (or samfs) uses a block  
on the zvol, it has no mechanism to tell the zvol when it is no longer  
using it. samfs simply unreferences the blocks it frees, it doesnt  
actively go through them and tell the block layer underneath it that  
they can be reclaimed. from the zvols point of view theyre still being  
used because they were used at some point in the past.


you might be able to get the space back in the zvol by writing a  
massive file full of zeros in the samfs, but you'd have to test that.



Is that a known behaviour? .. or did I run into a bug?


it's known.

dlg




System:

SAM-FS 4.6.85
Solaris 10 U7 X86


___
sam-qfs-discuss mailing list
sam-qfs-disc...@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/sam-qfs-discuss


David Gwynne
Infrastructure Architect
Engineering, Architecture, and IT
University of Queensland
+61 7 3365 3636




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [indiana-discuss] zfs issues?

2009-07-28 Thread Brian Ruthven - Solaris Network Sustaining - Sun UK


Yes:

$Make sure your dumpadm is set up beforehand to enable savecore, and that 
you have a dump device. In my case the output looks like this:


$ pfexec dumpadm
 Dump content: kernel pages
  Dump device: /dev/zvol/dsk/rpool/dump (dedicated)
Savecore directory: /var/crash/opensolaris
 Savecore enabled: yes


Then you should get a dump saved in /var/crash/ on next reboot.

Brian

James Lever wrote:


On 28/07/2009, at 9:22 AM, Robert Thurlow wrote:


I can't help with your ZFS issue, but to get a reasonable crash
dump in circumstances like these, you should be able to do
"savecore -L" on OpenSolaris.


That would be well and good if I could get a login - due to the rpool 
being unresponsive, that was not possible.


So the only recourse we had was via kmdb :/  Is there a way to 
explicitly invoke savecore via kmdb?


James

___
indiana-discuss mailing list
indiana-disc...@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/indiana-discuss


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [sam-qfs-discuss] sam-fs on zfs-pool

2009-07-28 Thread Tomas Ögren

On 28 July, 2009 - David Gwynne sent me these 1,9K bytes:

>
> On 27/07/2009, at 10:14 PM, Tobias Exner wrote:
>
>> Hi list,
>>
>> I've did some tests and run into a very strange situation..
>>
>>
>> I created a zvol using "zfs create -V" and initialize an sam- 
>> filesystem on this zvol.
>> After that I restored some testdata using a dump from another system.
>>
>> So far so good.
>>
>> After some big troubles I found out that releasing files in the sam- 
>> filesystem doesn't create space on the underlying zvol.
>> So staging and releasing files just work until the "zfs list" shows me 
>> a zvol with 100% usage although the sam-filesystem was only filled up 
>> to 20%.
>> I didn't create snapshots and a scrub did show any errors.
>>
>> When the zvol was filled up even a sammkfs can't solve the problem. I 
>> had to destroy the zvol ( not zpool ).
>> After that I was able recreate a new zvol with sam-fs on top.
>
> this is a feature of block devices. once you (or samfs) uses a block on 
> the zvol, it has no mechanism to tell the zvol when it is no longer  
> using it. samfs simply unreferences the blocks it frees, it doesnt  
> actively go through them and tell the block layer underneath it that  
> they can be reclaimed. from the zvols point of view theyre still being  
> used because they were used at some point in the past.

http://en.wikipedia.org/wiki/TRIM_(SSD_command)  should make it possible
I guess.. (assuming it's implemented all the way in the chain)..
Should/could help in virtualization too..

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] When writing to SLOG at full speed all disk IO is blocked

2009-07-28 Thread Bob Friesenhahn


On Tue, 28 Jul 2009, Marcelo Leal wrote:

Ok Bob, but i think that is the problem about picket fencing... and 
so we are talking about commit the sync operations to disk. What i'm 
seeing is no read activity from disks when the slog is beeing 
written. The disks are "zero" (no read, no write).


This is an interesting issue.  While synchronous writes are requested, 
what do you expect a read to return?  If there is a synchronous write 
in progress, should readers wait for the write to be persisted in case 
the write influences the data read?


Note that I am not saying that huge synchronous writes should 
necessary block reading (particularly if the reads are for unrelated 
blocks/files), but it is understandable if zfs focuses more on the 
writes.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to "mirror" an entire zfs pool to another pool

2009-07-28 Thread Gaëtan Lehmann



Le 28 juil. 09 à 15:54, Darren J Moffat a écrit :


> How do I monitor the progress of the transfer?  Once

Unfortunately there is no easy way to do that just now.  When the  
'zfs recv' finishes is it is done.


I've just found pv (pipe viewer) today (http://www.ivarch.com/programs/pv.shtml 
) which is packaged in /contrib (http://pkg.opensolaris.org/contrib/p5i/0/pv.p5i 
).


You can do

  zfs send -R oldp...@sendit | pv -s 3T | zfs recv -vFd newpool

and you'll see a message like that:

  8GO 0:00:05 [5,71GO/s] [=> ]   
7% ETA 0:00:58


A nice and simple way to get a progress report!

Gaëtan


--
Gaëtan Lehmann
Biologie du Développement et de la Reproduction
INRA de Jouy-en-Josas (France)
tel: +33 1 34 65 29 66fax: 01 34 65 29 09
http://voxel.jouy.inra.fr  http://www.itk.org
http://www.mandriva.org  http://www.bepo.fr



PGP.sig
Description: Ceci est une signature électronique PGP
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool is lain to burnination (bwahahahah!)

2009-07-28 Thread Graeme Clark

Hi Again,

A bit more futzing around and I notice that output from a plain 'zdb' returns 
this:

store
version=14
name='store'
state=0
txg=0
pool_guid=13934602390719084200
hostid=8462299
hostname='store'
vdev_tree
type='root'
id=0
guid=13934602390719084200
bad config type 16 for stats
children[0]
type='disk'
id=0
guid=14931103169794670927
path='/dev/dsk/c0t22310001557D05D5d0s0'
devid='id1,s...@x22310001557d05d5/a'
phys_path='/scsi_vhci/d...@g22310001557d05d5:a'
whole_disk=1
metaslab_array=23
metaslab_shift=35
ashift=9
asize=6486985015296
is_log=0
DTL=44
bad config type 16 for stats


So the last line there - 'bad config type 16 for stats' is interesting. The 
only reference I can find to this error is on and IRC log for some Nexenta 
folks. Doesn't look like there's much help there.

So, uh. Blow away and try again? It seems like that's the way to go here. If 
anyone has any suggestions let me know! I think I'll start over at 3 PM EST on 
July 28th. Yes - I did just give you a deadline to recover my data help forum, 
or I'm blowing it away!

Thanks,

Graeme
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Help with setting up ZFS

2009-07-28 Thread Will Murnane

On Tue, Jul 28, 2009 at 03:04, Brian wrote:
> Just a quick question before I address everyone else.
> I bought this connector
> http://www.newegg.com/Product/Product.aspx?Item=N82E16812198020
>
> However its pretty clear to me now (after Ive ordered it) that it won't at 
> all fit in the SAS connector on the board. What kind of cable do I need for 
> this?

Search "8087 forward" on provantage.com.  They're about $15, unless
you want attached power connectors (which would be necessary for SAS
drives, unless some kind of backplane were in play), in which case
they're $30.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [sam-qfs-discuss] sam-fs on zfs-pool

2009-07-28 Thread Richard Elling



On Jul 28, 2009, at 8:53 AM, Tomas Ögren wrote:


On 28 July, 2009 - David Gwynne sent me these 1,9K bytes:



On 27/07/2009, at 10:14 PM, Tobias Exner wrote:


Hi list,

I've did some tests and run into a very strange situation..


I created a zvol using "zfs create -V" and initialize an sam-
filesystem on this zvol.
After that I restored some testdata using a dump from another  
system.


So far so good.

After some big troubles I found out that releasing files in the sam-
filesystem doesn't create space on the underlying zvol.
So staging and releasing files just work until the "zfs list"  
shows me
a zvol with 100% usage although the sam-filesystem was only filled  
up

to 20%.
I didn't create snapshots and a scrub did show any errors.

When the zvol was filled up even a sammkfs can't solve the  
problem. I

had to destroy the zvol ( not zpool ).
After that I was able recreate a new zvol with sam-fs on top.


this is a feature of block devices. once you (or samfs) uses a  
block on

the zvol, it has no mechanism to tell the zvol when it is no longer
using it. samfs simply unreferences the blocks it frees, it doesnt
actively go through them and tell the block layer underneath it that
they can be reclaimed. from the zvols point of view theyre still  
being

used because they were used at some point in the past.


http://en.wikipedia.org/wiki/TRIM_(SSD_command)  should make it  
possible

I guess.. (assuming it's implemented all the way in the chain)..
Should/could help in virtualization too..


Or just enable compression and zero fill.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] USF drive on S10u7

2009-07-28 Thread dick hoogendijk

On Tue, 28 Jul 2009 09:03:14 -0600
cindy.swearin...@sun.com wrote:

> The Solaris 10 volume management service is volfs.
#svcs -a | grep vol has told me that ;-)

> If the auto-mounting doesn't occur, you can disable volfs and mount
> it manually.

I don't want the automounting to occur, so I diabled volfs.
I then did a "rmformat" to learn the device name, followed by a "zpool
create archive /dev/rdsk/devicename

All running nicely. Thanks for the advice.

-- 
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
+ http://nagual.nl/ | SunOS 10u7 05/09 | OpenSolaris 2010.02 B118
+ All that's really worth doing is what we do for others (Lewis Carrol)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] When writing to SLOG at full speed all disk IO is blocked

2009-07-28 Thread Ross

My understanding is that there's never any need for a reader to wait for a 
write in progress.  ZFS keeps all writes in memory until they're committed to 
disk - if you ever try to read something that's either waiting to be, or is 
being written to disk, ZFS will serve it straight from RAM.

One question I do have after reading this again though is:

Leal, do you have the slog on the same controller as the disks?  Have you 
tested whether reads are also blocked if you're running on a separate 
controller?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs destroy slow?

2009-07-28 Thread Brent Jones

On Mon, Jul 27, 2009 at 3:58 AM, Markus Kovero wrote:
> Oh well, whole system seems to be deadlocked.
>
> nice. Little too keen keeping data safe :-P
>
>
>
> Yours
>
> Markus Kovero
>
>
>
> From: zfs-discuss-boun...@opensolaris.org
> [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Markus Kovero
> Sent: 27. heinäkuuta 2009 13:39
> To: zfs-discuss@opensolaris.org
> Subject: [zfs-discuss] zfs destroy slow?
>
>
>
> Hi, how come zfs destroy being so slow, eg. destroying 6TB dataset renders
> zfs admin commands useless for time being, in this case for hours?
>
> (running osol 111b with latest patches.)
>
>
>
> Yours
>
> Markus Kovero
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>

I submitted a bug, but I don't think its been assigned a case number yet.
I see this exact same behavior on my X4540's. I create a lot of
snapshots, and when I tidy up, zfs destroy can 'stall' any and all ZFS
related commands for hours, or even days (in the case of nested
snapshots).
The only resolution is not to ever use zfs destroy, or just simply
wait it out. It will eventually finish, just not in any reasonable
timeframe.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs destroy slow?

2009-07-28 Thread Brent Jones

>>
>>
>
> I submitted a bug, but I don't think its been assigned a case number yet.
> I see this exact same behavior on my X4540's. I create a lot of
> snapshots, and when I tidy up, zfs destroy can 'stall' any and all ZFS
> related commands for hours, or even days (in the case of nested
> snapshots).
> The only resolution is not to ever use zfs destroy, or just simply
> wait it out. It will eventually finish, just not in any reasonable
> timeframe.
>
> --
> Brent Jones
> br...@servuhome.net
>

Correction, looks like my bug is 6855208

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] USF drive on S10u7

2009-07-28 Thread Bob Friesenhahn


On Tue, 28 Jul 2009, dick hoogendijk wrote:


I don't want the automounting to occur, so I diabled volfs.
I then did a "rmformat" to learn the device name, followed by a "zpool
create archive /dev/rdsk/devicename


It is better to edit /etc/vold.conf since vold is used for other 
purposes as well such as auto-mounting CDs, DVDs, and floppies.  I 
commented out this line:


#use rmdisk drive /dev/rdsk/c*s2 dev_rmdisk.so rmdisk%d

and then all was good.

With a bit more care, some removable devices could be handled 
differently than others so you could just exclude the ones used for 
zfs.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs destroy slow?

2009-07-28 Thread Eric Schrock


On 07/27/09 03:39, Markus Kovero wrote:
Hi, how come zfs destroy being so slow, eg. destroying 6TB dataset 
renders zfs admin commands useless for time being, in this case for hours?


(running osol 111b with latest patches.)


I'm not sure what "latest patches" means w.r.t. ON build, but this is 
almost certainly:


6809683 zfs destroy fails to free object in open context, stops up txg 
train


Fixed in snv_114.  With the above fix in place, destroys can still take 
awhile (fundamentally you have to I/O proportional to the amount of 
metadata), but it will be doing all the work in open context which won't 
stop the txg train and admin commands should continue to work.


- Eric

--
Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool is lain to burnination (bwahahahah!)

2009-07-28 Thread Victor Latushkin


On 28.07.09 20:31, Graeme Clark wrote:

Hi Again,

A bit more futzing around and I notice that output from a plain 'zdb' returns 
this:

store
version=14
name='store'
state=0
txg=0
pool_guid=13934602390719084200
hostid=8462299
hostname='store'
vdev_tree
type='root'
id=0
guid=13934602390719084200
bad config type 16 for stats
children[0]
type='disk'
id=0
guid=14931103169794670927
path='/dev/dsk/c0t22310001557D05D5d0s0'
devid='id1,s...@x22310001557d05d5/a'
phys_path='/scsi_vhci/d...@g22310001557d05d5:a'
whole_disk=1
metaslab_array=23
metaslab_shift=35
ashift=9
asize=6486985015296
is_log=0
DTL=44
bad config type 16 for stats



This is a dump of /etc/zfs/zpool.cache. While 'stats' should not be 
there, it does not matter much.



So the last line there - 'bad config type 16 for stats' is interesting. The 
only reference I can find to this error is on and IRC log for some Nexenta 
folks. Doesn't look like there's much help there.

So, uh. Blow away and try again? It seems like that's the way to go here. If 
anyone has any suggestions let me know! I think I'll start over at 3 PM EST on 
July 28th. Yes - I did just give you a deadline to recover my data help forum, 
or I'm blowing it away!


It would be helpful if you provide a little bit more information here:

what OpenSolaris release/build are you running (i suspect something like 
build 114-118, though I may be wrong),


what other commands you tried (zpool impor/export etc) and what was a 
result.


You can also explain what do you mean here:


I can force export and import the pool, but I can't seem to get it active again.


as pool status provided before suggests that pool cannot be imported.



I can do a zdb to the device and I get some info (well, actually to s0 on the 
disk, which is weird because I think I built the array without specifying a 
slice. Maybe relevant - don't know...)


When you specify disk without s0 in the end during pool creation you 
tell ZFS to use the whole disk, so it labels it with EFI label, creates 
single slice 0 all over the disk and uses that slice for pool as 
recorded in the configuration (see 'path' and 'whole_disk' name-value 
pairs).


victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Set New File/Folder ZFS ACLs Automatically through Samba?

2009-07-28 Thread Jeff Hulen

Do any of you know how to set the default ZFS ACLs for newly created
files and folders when those files and folders are created through Samba?

I want to have all new files and folders only inherit extended
(non-trivial) ACLs that are set on the parent folders.  But when a file
is created through samba on the zfs file system, it gets mode 744
(trivial) added to it.  For directories, it gets mode 755 added to it.

I've tried everything I could find and think of:

1.) Setting a umask.
2.) Editing /etc/sfw/smb.conf 'force create mode' and 'force directory
mode".  Then `svcadm restart samba`.
3.) Adding trivial inheritable ACLs to the parent folder.

Changes 1 and 2 had no effect.

In number 3 I got folders to effectively do what I want, but not files.
 I set the ACLs of the parent to:
> drwx--+ 24 AD+administrator AD+records2132 Jul 28 12:01 records/
> user:AD+administrator:rwxpdDaARWcCos:fdi---:allow
> user:AD+administrator:rwxpdDaARWcCos:--:allow
> group:AD+records:rwxpd-aARWc--s:fdi---:allow
> group:AD+records:rwxpd-aARWc--s:--:allow
> group:AD+release:r-x---a-R-c---:--:allow
> owner@:rwxp---A-W-Co-:fd:allow
> group@:rwxp--:fd:deny
>  everyone@:rwxp---A-W-Co-:fd:deny

Then new directories and files get created like this from a windows
workstation connected to the server:
> drwx--+  2 AD+testuser AD+domain users   2 Jul 28 12:01 test
> user:AD+administrator:rwxpdDaARWcCos:fdi---:allow
> user:AD+administrator:rwxpdDaARWcCos:--:allow
> group:AD+records:rwxpd-aARWc--s:fdi---:allow
> group:AD+records:rwxpd-aARWc--s:--:allow
> owner@:rwxp---A-W-Co-:fdi---:allow
> owner@:---A-W-Co-:--:allow
> group@:rwxp--:fdi---:deny
> group@:--:--:deny
>  everyone@:rwxp---A-W-Co-:fdi---:deny
>  everyone@:---A-W-Co-:--:deny
> owner@:--:--:deny
> owner@:rwxp---A-W-Co-:--:allow
> group@:-w-p--:--:deny
> group@:r-x---:--:allow
>  everyone@:-w-p---A-W-Co-:--:deny
>  everyone@:r-x---a-R-c--s:--:allow
> -rwxr--r--+  1 AD+testuser AD+domain users   0 Jul 28 12:01 test.txt
> user:AD+administrator:rwxpdDaARWcCos:--:allow
> group:AD+records:rwxpd-aARWc--s:--:allow
> owner@:---A-W-Co-:--:allow
> group@:--:--:deny
>  everyone@:---A-W-Co-:--:deny
> owner@:--:--:deny
> owner@:rwxp---A-W-Co-:--:allow
> group@:-wxp--:--:deny
> group@:r-:--:allow
>  everyone@:-wxp---A-W-Co-:--:deny
>  everyone@:r-a-R-c--s:--:allow

I need group "AD+release" to have read-only access to only
specific files within records.  I could set that up, but any new files or
folders that are created will be viewable by AD+release.  That
would not be acceptable.

Do any of you know how to set the samba file/folder creation ACLS on ZFS
file systems?  Or do you have something I could try?

Thank you for your time.

-- 
Jeff Hulen
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-28 Thread Rich Morris

On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:

Sun has opened internal CR 6859997. It is now in Dispatched state at High
priority.

CR 6859997 has been accepted and is actively being worked on. The
following info has been added to that CR:

This is a problem with the ZFS file prefetch code (zfetch) in dmu_zfetch.c.
The test script provided by the submitter (thanks Bob!) does no file
prefetching the second time through each file. This problem exists in ZFS in
Solaris 10, Nevada, and OpenSolaris.

This test script creates 3000 files each 8M long so the amount of data (24G) is
greater than the amount of memory (16G on a Thumper). With the default
blocksize of 128k, each of the 3000 files has 63 blocks. The first time
through, zfetch ramps up a single prefetch stream normally. But the second
time through, dmu_zfetch() calls dmu_zfetch_find() which thinks that the data
has already been prefetched so no additional prefetching is started.

This problem is not seen with 500 files each 48M in length (still 24G of data). In that
case there's still only one prefetch stream but it is reclaimed when one of the requested
offsets is not found. The reason it is not found is that stream "strided" the
first time through after reaching the zfetch cap, which is 256 blocks. Files with no
more than 256 blocks don't require a stride. So this problem will only be seen when the
data from a file with no more than 256 blocks is accessed after being tossed from the ARC.

The fix for this problem may be more feedback between the ARC and the zfetch
code. Or it may make sense to restart the prefetch stream after some time has
passed or perhaps whenever there's a miss on a block that was expected to have
already been prefetched?

On a Thumper running Nevada build 118, the first pass of this test takes 2
minutes 50 seconds and the second pass takes 5 minutes 22 seconds. If
dmu_zfetch_find() is modified to restart the refetch stream when the requested
offset is 0 and more than 2 seconds has passed since the stream was last
accessed then the time needed for the second pass is reduced to 2 minutes 24
seconds.

Additional investigation is currently taking place to determine if another
solution makes more sense. And more testing will be needed to see what affect
this change has on other prefetch patterns.

6412053 is a related CR which mentions that the zfetch code may not be issuing
I/O at a sufficient pace. This behavior is also seen on a Thumper running the
test script in CR 6859997 since, even when prefetch is ramping up as expected,
less than half of the available I/O bandwidth is being used. Although more
aggressive file prefetching could increase memory pressure as described in CRs
6258102 and 6469558.

-- Rich
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zfs send/recv syntax

2009-07-28 Thread Joseph L. Casale

Is it possible to send an entire pool (including all its zfs filesystems)
to a zfs filesystem in a different pool on another host? Or must I send each
zfs filesystem one at a time?

Thanks!
jlc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint

2009-07-28 Thread Glen Gunselman

This is my first ZFS pool.  I'm using an X4500 with 48 TB drives.  Solaris is 
5/09.
After the create zfs list shows 40.8T but after creating 4 
filesystems/mountpoints the available drops 8.8TB to 32.1TB.  What happened to 
the 8.8TB. Is this much overhead normal?


zpool create -f zpool1 raidz c1t0d0 c2t0d0 c3t0d0 c5t0d0 c6t0d0 \
   raidz c1t1d0 c2t1d0 c3t1d0 c4t1d0 c5t1d0 \
   raidz c6t1d0 c1t2d0 c2t2d0 c3t2d0 c4t2d0 \
   raidz c5t2d0 c6t2d0 c1t3d0 c2t3d0 c3t3d0 \
   raidz c4t3d0 c5t3d0 c6t3d0 c1t4d0 c2t4d0 \
   raidz c3t4d0 c5t4d0 c6t4d0 c1t5d0 c2t5d0 \
   raidz c3t5d0 c4t5d0 c5t5d0 c6t5d0 c1t6d0 \
   raidz c2t6d0 c3t6d0 c4t6d0 c5t6d0 c6t6d0 \
   raidz c1t7d0 c2t7d0 c3t7d0 c4t7d0 c5t7d0 \
   spare c6t7d0 c4t0d0 c4t4d0
zpool list
NAME SIZE   USED  AVAILCAP  HEALTH  ALTROOT
zpool1  40.8T   176K  [b]40.8T[/b] 0%  ONLINE  - 
## create multiple file systems in the pool
zfs create -o mountpoint=/backup1fs zpool1/backup1fs
zfs create -o mountpoint=/backup2fs zpool1/backup2fs
zfs create -o mountpoint=/backup3fs zpool1/backup3fs
zfs create -o mountpoint=/backup4fs zpool1/backup4fs
zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
zpool1 364K  [b]32.1T[/b]  28.8K  /zpool1
zpool1/backup1fs  28.8K  32.1T  28.8K  /backup1fs
zpool1/backup2fs  28.8K  32.1T  28.8K  /backup2fs
zpool1/backup3fs  28.8K  32.1T  28.8K  /backup3fs
zpool1/backup4fs  28.8K  32.1T  28.8K  /backup4fs

Thanks,
Glen
(PS. As I said this is my first time working with ZFS, if this is a dumb 
question - just say so.)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint

2009-07-28 Thread Mario Goebbels


This is my first ZFS pool.  I'm using an X4500 with 48 TB drives.  Solaris is 
5/09.
After the create zfs list shows 40.8T but after creating 4 
filesystems/mountpoints the available drops 8.8TB to 32.1TB.  What happened to 
the 8.8TB. Is this much overhead normal?


IIRC zpool list includes the parity drives in the disk space calculation 
and zfs list doesn't.


Terabyte drives are more likely 900-something GB drives thanks to that 
base-2 vs. base-10 confusion HD manufacturers introduced. Using that 
900GB figure I get to both 40TB and 32TB for with and without parity 
drives. Spares aren't counted.


Regards,
-mg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs send/recv syntax

2009-07-28 Thread Ian Collins




 On Wed 29/07/09 10:09 , "Joseph L. Casale" jcas...@activenetwerx.com sent:
> Is it possible to send an entire pool (including all its zfsfilesystems)
> to a zfs filesystem in a different pool on another host? Or must I send each
> zfs filesystem one at a time?

Yes, use -R on the sending side and -d on the receiving side.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint

2009-07-28 Thread Scott Lawson




Glen Gunselman wrote:

This is my first ZFS pool.  I'm using an X4500 with 48 TB drives.  Solaris is 
5/09.
After the create zfs list shows 40.8T but after creating 4 
filesystems/mountpoints the available drops 8.8TB to 32.1TB.  What happened to 
the 8.8TB. Is this much overhead normal?


zpool create -f zpool1 raidz c1t0d0 c2t0d0 c3t0d0 c5t0d0 c6t0d0 \
   raidz c1t1d0 c2t1d0 c3t1d0 c4t1d0 c5t1d0 \
   raidz c6t1d0 c1t2d0 c2t2d0 c3t2d0 c4t2d0 \
   raidz c5t2d0 c6t2d0 c1t3d0 c2t3d0 c3t3d0 \
   raidz c4t3d0 c5t3d0 c6t3d0 c1t4d0 c2t4d0 \
   raidz c3t4d0 c5t4d0 c6t4d0 c1t5d0 c2t5d0 \
   raidz c3t5d0 c4t5d0 c5t5d0 c6t5d0 c1t6d0 \
   raidz c2t6d0 c3t6d0 c4t6d0 c5t6d0 c6t6d0 \
   raidz c1t7d0 c2t7d0 c3t7d0 c4t7d0 c5t7d0 \
   spare c6t7d0 c4t0d0 c4t4d0
zpool list
NAME SIZE   USED  AVAILCAP  HEALTH  ALTROOT
zpool1  40.8T   176K  [b]40.8T[/b] 0%  ONLINE  - 
## create multiple file systems in the pool

zfs create -o mountpoint=/backup1fs zpool1/backup1fs
zfs create -o mountpoint=/backup2fs zpool1/backup2fs
zfs create -o mountpoint=/backup3fs zpool1/backup3fs
zfs create -o mountpoint=/backup4fs zpool1/backup4fs
zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
zpool1 364K  [b]32.1T[/b]  28.8K  /zpool1
zpool1/backup1fs  28.8K  32.1T  28.8K  /backup1fs
zpool1/backup2fs  28.8K  32.1T  28.8K  /backup2fs
zpool1/backup3fs  28.8K  32.1T  28.8K  /backup3fs
zpool1/backup4fs  28.8K  32.1T  28.8K  /backup4fs

Thanks,
Glen
(PS. As I said this is my first time working with ZFS, if this is a dumb 
question - just say so.)
  
Here is the output from my J4500 with 48 x 1 TB disks. It is almost the 
exact same configuration as
yours. This is used for Netbackup. As Mario just pointed out, "zpool 
list" includes the parity drive

in the space calculation whereas "zfs list" doesn't.

[r...@xxx /]#> zpool status

errors: No known data errors

pool: nbupool
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
nbupool ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c2t2d0 ONLINE 0 0 0
c2t3d0 ONLINE 0 0 0
c2t4d0 ONLINE 0 0 0
c2t5d0 ONLINE 0 0 0
c2t6d0 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c2t7d0 ONLINE 0 0 0
c2t8d0 ONLINE 0 0 0
c2t9d0 ONLINE 0 0 0
c2t10d0 ONLINE 0 0 0
c2t11d0 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c2t12d0 ONLINE 0 0 0
c2t13d0 ONLINE 0 0 0
c2t14d0 ONLINE 0 0 0
c2t15d0 ONLINE 0 0 0
c2t16d0 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c2t17d0 ONLINE 0 0 0
c2t18d0 ONLINE 0 0 0
c2t19d0 ONLINE 0 0 0
c2t20d0 ONLINE 0 0 0
c2t21d0 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c2t22d0 ONLINE 0 0 0
c2t23d0 ONLINE 0 0 0
c2t24d0 ONLINE 0 0 0
c2t25d0 ONLINE 0 0 0
c2t26d0 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c2t27d0 ONLINE 0 0 0
c2t28d0 ONLINE 0 0 0
c2t29d0 ONLINE 0 0 0
c2t30d0 ONLINE 0 0 0
c2t31d0 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c2t32d0 ONLINE 0 0 0
c2t33d0 ONLINE 0 0 0
c2t34d0 ONLINE 0 0 0
c2t35d0 ONLINE 0 0 0
c2t36d0 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c2t37d0 ONLINE 0 0 0
c2t38d0 ONLINE 0 0 0
c2t39d0 ONLINE 0 0 0
c2t40d0 ONLINE 0 0 0
c2t41d0 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c2t42d0 ONLINE 0 0 0
c2t43d0 ONLINE 0 0 0
c2t44d0 ONLINE 0 0 0
c2t45d0 ONLINE 0 0 0
c2t46d0 ONLINE 0 0 0
spares
c2t47d0 AVAIL
c2t48d0 AVAIL
c2t49d0 AVAIL

errors: No known data errors
[r...@xxx /]#> zfs list
NAME USED AVAIL REFER MOUNTPOINT
NBU 113G 20.6G 113G /NBU
nbupool 27.5T 4.58T 30.4K /nbupool
nbupool/backup1 6.90T 4.58T 6.90T /backup1
nbupool/backup2 6.79T 4.58T 6.79T /backup2
nbupool/backup3 7.28T 4.58T 7.28T /backup3
nbupool/backup4 6.43T 4.58T 6.43T /backup4
nbupool/nbushareddisk 20.1G 4.58T 20.1G /nbushareddisk
nbupool/zfscachetest 69.2G 4.58T 69.2G /nbupool/zfscachetest

[r...@xxx /]#> zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
NBU 136G 113G 22.8G 83% ONLINE -
nbupool 40.8T 34.4T 6.37T 84% ONLINE -
[r...@solnbu1 /]#>


--
___


Scott Lawson
Systems Architect
Manukau Institute of Technology
Information Communication Technology Services Private Bag 94006 Manukau
City Auckland New Zealand

Phone  : +64 09 968 7611
Fax: +64 09 968 7641
Mobile : +64 27 568 7611

mailto:sc...@manukau.ac.nz

http://www.manukau.ac.nz




perl -e 'print
$i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'

 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs send/recv syntax

2009-07-28 Thread Joseph L. Casale

>Yes, use -R on the sending side and -d on the receiving side.

I tried that first, going from Solaris 10 to osol 0906:

# zfs send -vR mypo...@snap |ssh j...@catania "pfexec /usr/sbin/zfs recv -dF 
mypool/somename"

didn't create any of the zfs filesystems under mypool2?

Thanks!
jlc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-28 Thread Bob Friesenhahn


On Tue, 28 Jul 2009, Rich Morris wrote:


6412053 is a related CR which mentions that the zfetch code may not be 
issuing I/O at a sufficient pace.  This behavior is also seen on a Thumper 
running the test script in CR 6859997 since, even when prefetch is ramping up 
as expected, less than half of the available I/O bandwidth is being used. 
Although more aggressive file prefetching could increase memory pressure as 
described in CRs 6258102 and 6469558.


It is good to see this analysis.  Certainly the optimum prefetching 
required for an Internet video streaming server (with maybe 300 
kilobits/second per stream) is radically different than what is 
required for uncompressed 2K preview (8MB/frame) of motion picture 
frames (320 megabytes/second per stream) but zfs should be able to 
support both.


Besides real-time analysis based on current stream behavior and 
memory, it would be useful to maintain some recent history for the 
whole pool so that a pool which is usually used for 1000 slow-speed 
video streams behaves differently by default than one used for one or 
two high-speed video streams.  With this bit of hint information, 
files belonging to a pool recently producing high-speed streams can be 
ramped up quickly while files belonging to a pool which has recently 
fed low-speed streams can be ramped up more conservatively (until 
proven otherwise) in order to not flood memory and starve the I/O 
needed by other streams.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs send/recv syntax

2009-07-28 Thread Trevor Pretty






Try send/receive to the same host  (ssh localhost). I used this when
trying send/receive as it removes ssh between hosts "problems"

The on disk format of ZFS has changed there is something about it in
the man pages from memory so I don't think you can go S10 ->
OpenSolaris without doing an upgrade, but I could be wrong!

Joseph L. Casale wrote:

  
Yes, use -R on the sending side and -d on the receiving side.

  
  
I tried that first, going from Solaris 10 to osol 0906:

# zfs send -vR mypo...@snap |ssh j...@catania "pfexec /usr/sbin/zfs recv -dF mypool/somename"

didn't create any of the zfs filesystems under mypool2?

Thanks!
jlc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  














www.eagle.co.nz 
This email is confidential and may be legally 
privileged. If received in error please destroy and immediately notify 
us.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-28 Thread Bob Friesenhahn


On Tue, 28 Jul 2009, Rich Morris wrote:


The fix for this problem may be more feedback between the ARC and the zfetch 
code.  Or it may make sense to restart the prefetch stream after some time 
has passed or perhaps whenever there's a miss on a block that was expected to 
have already been prefetched?


Regarding this approach of waiting for a prefetch miss, this seems 
like it would produce an uneven flow of data to the application and 
not ensure that data is always available when the application goes to 
read it.  A stutter is likely to produce at least a 10ms gap (and 
possibly far greater) while the application is blocked in read() 
waiting for data.  Since zfs blocks are large, stuttering becomes 
expensive, and if the application itself needs to read ahead 128K in 
order to avoid the stutter, then it consumes memory in an expensive 
non-sharable way.  In the ideal case, zfs will always stay one 128K 
block ahead of the application's requirement and the unconsumed data 
will be cached in the ARC where it can be shared with other processes.


For an application with real-time data requirements, it is definitely 
desireable not to stutter at all if possible.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-28 Thread Rennie Allen

> 
> Can *someone* please name a single drive+firmware or
> RAID
> controller+firmware that ignores FLUSH CACHE / FLUSH
> CACHE EXT
> commands? Or worse, responds "ok" when the flush
> hasn't occurred?

I think it would be a shorter list if one were to name the drives/controllers 
that actually implement a flush properly. 
 
> Everyone on this list seems to blame lying hardware
> for ignoring
> commands, but disks are relatively mature and I can't
> believe that
> major OEMs would qualify disks or other hardware that
> willingly ignore
> commands.

It seems you have too much faith in major OEM's of storage, considering that 
99.9% of the market is personal use, and for which a 2% throughput advantage 
over a competitor can make or break the profit margin on a device.  Ignoring 
cache requests is guaranteed to get the best drive performance benchmarks 
regardless of what the software is driving the device.  For example, it is 
virtually impossible to find a USB drive that honors cache sync (to do so would 
require that the device would stop completely until a fully synchronous USB 
transaction had made it to the device, the data had been written).  Can you 
imagine how long a USB drive would sit on store shelves if it actually did do a 
proper cache sync?  While USB is the extreme case; and it does get better the 
more expensive the drive, it is still far from a given that any particular 
device properly handles cache flushes.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs send/recv syntax

2009-07-28 Thread Ian Collins

 On Wed 29/07/09 10:49 , "Joseph L. Casale" jcas...@activenetwerx.com sent:

> >Yes, use -R on the sending side and -d on the receiving side.

> I tried that first, going from Solaris 10 to osol 0906:
> 
> # zfs send -vR mypo...@snap|ssh j...@catania "pfexec /usr/sbin/zfs recv -dF 
> mypool/somename"
> didn't create any of the zfs filesystems under mypool2?

What happens if you try it on the local host where you can just pipe from the 
send to the receive (no need for ssh)?

zfs send -R mypo...@snap |  zfs recv -d -n -v  newpool/somename

Another thing to try is use "-n -v" on the receive end to see what would be 
created id -n were omitted.

I find -v more useful on the receiving side than on the send.

-- 
Ian

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-28 Thread Rennie Allen

> This is also (theoretically) why a drive purchased
> from Sun is more  
> that expensive then a drive purchased from your
> neighbourhood computer  
> shop:

It's more significant than that.  Drives aimed at the consumer market are at a 
competitive disadvantage if they do handle cache flush correctly (since the 
popular hardware blog of the day will show that the device is far slower than 
the competitors that throw away the sync requests).

 Sun (and presumably other manufacturers) takes
> the time and  
> effort to test things to make sure that when a drive
> says "I've synced  
> the data", it actually has synced the data. This
> testing is what  
> you're presumably paying for.

It wouldn't cost any more for commercial vendors to implement cache flush 
properly, it is just that they are penalized by the market for doing so.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs send/recv syntax

2009-07-28 Thread Cindy Swearingen

I apologize for replying in the middle of this thread, but I never
saw the initial snapshot syntax of mypool2, which needs to be
recursive (zfs snapshot -r mypo...@snap) to snapshot all the 
datasets in mypool2. Then, use zfs send -R to pick up and
restore all the dataset properties.

What was the original snapshot syntax?

Cindy



- Original Message -
From: Ian Collins 
Date: Tuesday, July 28, 2009 5:53 pm
Subject: Re: [zfs-discuss] zfs send/recv syntax
To: "zfs-discuss@opensolaris.org" , "Joseph L. 
Casale" 

>  On Wed 29/07/09 10:49 , "Joseph L. Casale" jcas...@activenetwerx.com 
> sent:
> 
> > >Yes, use -R on the sending side and -d on the receiving side.
> 
> > I tried that first, going from Solaris 10 to osol 0906:
> > 
> > # zfs send -vR mypo...@snap|ssh j...@catania "pfexec /usr/sbin/zfs 
> recv -dF mypool/somename"
> > didn't create any of the zfs filesystems under mypool2?
> 
> What happens if you try it on the local host where you can just pipe 
> from the send to the receive (no need for ssh)?
> 
> zfs send -R mypo...@snap |  zfs recv -d -n -v  newpool/somename
> 
> Another thing to try is use "-n -v" on the receive end to see what 
> would be created id -n were omitted.
> 
> I find -v more useful on the receiving side than on the send.
> 
> -- 
> Ian
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08

2009-07-28 Thread Jorgen Lundman



This thread started over in nfs-discuss, as it appeared to be an nfs 
problem initially. Or at the very least, interaction between nfs and zil.


Just summarising speeds we have found when untarring something. Always 
in a new/empty directory. Only looking at write speed. read is always 
very fast.


The reason we started to look at this was because the 7 year old netapp 
being phased out, could untar the test file in 11 seconds. The 
x4500/x4540 Suns took 5 minutes.


For all our tests, we used MTOS-4.261-ja.tar.gz, just a random tarball I 
had lying around, but it can be downloaded here if you want the same 
test. (http://www.movabletype.org/downloads/stable/MTOS-4.261-ja.tar.gz)


The command executed generally, is:

# mkdir .test34 && time gtar --directory=.test34 -zxf 
/tmp/MTOS-4.261-ja.tar.gz




Solaris 10 1/06 intel client: netapp 6.5.1 FAS960 server: NFSv3
  0m11.114s

Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 server: nfsv4
  5m11.654s

Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv3
  8m55.911s

Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv4
  10m32.629s


Just untarring the tarball on the x4500 itself:

: x4500 OpenSolaris svn117 server
  0m0.478s

: x4500 Solaris 10 10/08 server
  0m1.361s



So ZFS itself is very fast. Replacing NFS with different protocols, 
identical setup, just changing tar with rsync, and nfsd with sshd.


The baseline test, using:
"rsync -are ssh /tmp/MTOS-4.261-ja /export/x4500/testXX"


Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync on nfsv4
  3m44.857s

Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync+ssh
  0m1.387s

So, get rid of nfsd and it goes from 3 minutes to 1 second!

Lets share it with smb, and mount it:


OsX 10.5.6 intel client: x4500 OpenSolaris svn117 : smb+untar
  0m24.480s


Neat, even SMB can beat nfs in default settings.

This would then indicate to me that nfsd is broken somehow, but then we 
try again after only disabling ZIL.



Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE ZIL: nfsv4
  0m8.453s
  0m8.284s
  0m8.264s

Nice, so this is theoretically the fastest NFS speeds we can reach? We 
run postfix+dovecot for mail, which probably would be safe to not use 
ZIL. The other type is FTP/WWW/CGI, which has more active 
writes/updates. Probably not as good. Comments?



Enable ZIL, but disable zfscache (Just as a test, I have been told 
disabling zfscache is far more dangerous).



Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE zfscacheflush: nfsv4
  0m45.139s

Interesting. Anyway, enable ZIL and zfscacheflush again, and learn a 
whole lot about slog.


First I tried creating a 2G slog on the boot mirror:


Solaris 10 6/06 : x4500 OpenSolaris svn117 slog boot pool: nfsv4

  1m59.970s


Some improvements. For a lark, I created a 2GB file in /tmp/ and changed 
the slog to that. (I know, having the slog in volatile RAM is pretty 
much the same as disabling ZIL. But it should give me theoretical 
maximum speed with ZIL enabled right?).



Solaris 10 6/06 : x4500 OpenSolaris svn117 slog /tmp/junk: nfsv4
  0m8.916s


Nice! Same speed as ZIL disabled. Since this is a X4540, we thought we 
would test with a CF card attached. Alas the 600X (92MB/s) card are not 
out until next month, rats! So, we bought a 300X (40MB/s) card.



Solaris 10 6/06 : x4500 OpenSolaris svn117 slog 300X CFFlash: nfsv4
  0m26.566s


Not too bad really. But you have to reboot to see a CF card, fiddle with 
BIOS for the boot order etc. Just not an easy add on a live system. A 
SATA emulated SSD DISK can be hot-swapped.



Also, I learned an interesting lesson about rebooting with slog at 
/tmp/junk.



I am hoping to pick up a SSD SATA device today and see what speeds we 
get out of that.


That rsync (1s) vs nfs(8s) I can accept as over-head on a much more 
complicated protocol, but why would it take 3 minutes to write the same 
data on the same pool, with rsync(1s) vs nfs(3m)? The ZIL was on, slog 
is default, but both writing the same way. Does nfsd add FD_SYNC to 
every close regardless as to whether the application did or not?

This I have not yet wrapped my head around.

For example, I know rsync and tar does not use fdsync (but dovecot does) 
on its close(), but does NFS make it fdsync anyway?



Sorry for the giant email.


--
Jorgen Lundman   | 
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08

2009-07-28 Thread Bob Friesenhahn


On Wed, 29 Jul 2009, Jorgen Lundman wrote:


For example, I know rsync and tar does not use fdsync (but dovecot does) on 
its close(), but does NFS make it fdsync anyway?


NFS is required to do synchronous writes.  This is what allows NFS 
clients to recover seamlessly if the server spontaneously reboots. 
If the NFS client supports it, it can send substantial data (multiple 
writes) to the server, and then commit it all via an NFS commit. 
Note that this requires more work by the client since the NFS client 
is required to replay the uncommited writes if the server goes away.



Sorry for the giant email.


No, thank you very much for the interesting measurements and data.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-28 Thread Eric D. Mudama


On Mon, Jul 27 at 13:50, Richard Elling wrote:

On Jul 27, 2009, at 10:27 AM, Eric D. Mudama wrote:

Can *someone* please name a single drive+firmware or RAID
controller+firmware that ignores FLUSH CACHE / FLUSH CACHE EXT
commands? Or worse, responds "ok" when the flush hasn't occurred?


two seconds with google shows
http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=183771&NewLang=en&Hilite=cache+flush

Give it up. These things happen.  Not much you can do about it, other
than design around it.
-- richard



That example is a windows-specific, and is a software driver, where
the data integrity feature must be manually disabled by the end user.
The default behavior was always maximum data protection.

While perhaps analagous at some level, the perpetual "your hardware
must be crappy/cheap/not-as-expensive-as-mine" doesn't seem to be a
sufficient explanation when things go wrong, like complete loss of a
pool.


--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Failing device in a replicated configuration....on a non replicated pool??

2009-07-28 Thread fyleow

I was greeted by this today. The Sun Message ID page says this should happen 
when there were errors in a replicated configuration. Clearly there's only one 
drive here. If there are unrecoverable errors how can my applications not be 
affected since there's no mirror or parity to recover from? 

# zpool status rpool
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 0h3m with 0 errors on Wed Jul 29 18:52:20 2009
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  c9d0s0ONLINE   6 0 2

errors: No known data errors
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Failing device in a replicated configuration....on a non replicated pool??

2009-07-28 Thread Bob Friesenhahn


On Tue, 28 Jul 2009, fyleow wrote:

I was greeted by this today. The Sun Message ID page says this 
should happen when there were errors in a replicated configuration. 
Clearly there's only one drive here. If there are unrecoverable 
errors how can my applications not be affected since there's no 
mirror or parity to recover from?


Metadata is stored redundantly so some metadata could become corrupted 
and recovered via the redundant copy.  In recent zfs you can set 
copies=2 to store user data redundantly on one drive.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08

2009-07-28 Thread Jorgen Lundman



We just picked up the fastest SSD we could in the local biccamera, which 
turned out to be a CSSDーSM32NI, with supposedly 95MB/s write speed.


I put it in place, and replaced the slog over:

  0m49.173s
  0m48.809s

So, it is slower than the CF test. This is disappointing. Everyone else 
seems to use Intel X25-M, which have a write-speed of 170MB/s (2nd 
generation) so perhaps that is why it works better for them. It is 
curious that it is slower than the CF card. Perhaps because it shares 
with so many other SATA devices?


Oh and we'll probably have to get a 3.5" frame for it, as I doubt it'll 
stay standing after the next earthquake. :)


Lund


Jorgen Lundman wrote:


This thread started over in nfs-discuss, as it appeared to be an nfs 
problem initially. Or at the very least, interaction between nfs and zil.


Just summarising speeds we have found when untarring something. Always 
in a new/empty directory. Only looking at write speed. read is always 
very fast.


The reason we started to look at this was because the 7 year old netapp 
being phased out, could untar the test file in 11 seconds. The 
x4500/x4540 Suns took 5 minutes.


For all our tests, we used MTOS-4.261-ja.tar.gz, just a random tarball I 
had lying around, but it can be downloaded here if you want the same 
test. (http://www.movabletype.org/downloads/stable/MTOS-4.261-ja.tar.gz)


The command executed generally, is:

# mkdir .test34 && time gtar --directory=.test34 -zxf 
/tmp/MTOS-4.261-ja.tar.gz




Solaris 10 1/06 intel client: netapp 6.5.1 FAS960 server: NFSv3
  0m11.114s

Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 server: nfsv4
  5m11.654s

Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv3
  8m55.911s

Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv4
  10m32.629s


Just untarring the tarball on the x4500 itself:

: x4500 OpenSolaris svn117 server
  0m0.478s

: x4500 Solaris 10 10/08 server
  0m1.361s



So ZFS itself is very fast. Replacing NFS with different protocols, 
identical setup, just changing tar with rsync, and nfsd with sshd.


The baseline test, using:
"rsync -are ssh /tmp/MTOS-4.261-ja /export/x4500/testXX"


Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync on nfsv4
  3m44.857s

Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync+ssh
  0m1.387s

So, get rid of nfsd and it goes from 3 minutes to 1 second!

Lets share it with smb, and mount it:


OsX 10.5.6 intel client: x4500 OpenSolaris svn117 : smb+untar
  0m24.480s


Neat, even SMB can beat nfs in default settings.

This would then indicate to me that nfsd is broken somehow, but then we 
try again after only disabling ZIL.



Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE ZIL: nfsv4
  0m8.453s
  0m8.284s
  0m8.264s

Nice, so this is theoretically the fastest NFS speeds we can reach? We 
run postfix+dovecot for mail, which probably would be safe to not use 
ZIL. The other type is FTP/WWW/CGI, which has more active 
writes/updates. Probably not as good. Comments?



Enable ZIL, but disable zfscache (Just as a test, I have been told 
disabling zfscache is far more dangerous).



Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE zfscacheflush: nfsv4
  0m45.139s

Interesting. Anyway, enable ZIL and zfscacheflush again, and learn a 
whole lot about slog.


First I tried creating a 2G slog on the boot mirror:


Solaris 10 6/06 : x4500 OpenSolaris svn117 slog boot pool: nfsv4

  1m59.970s


Some improvements. For a lark, I created a 2GB file in /tmp/ and changed 
the slog to that. (I know, having the slog in volatile RAM is pretty 
much the same as disabling ZIL. But it should give me theoretical 
maximum speed with ZIL enabled right?).



Solaris 10 6/06 : x4500 OpenSolaris svn117 slog /tmp/junk: nfsv4
  0m8.916s


Nice! Same speed as ZIL disabled. Since this is a X4540, we thought we 
would test with a CF card attached. Alas the 600X (92MB/s) card are not 
out until next month, rats! So, we bought a 300X (40MB/s) card.



Solaris 10 6/06 : x4500 OpenSolaris svn117 slog 300X CFFlash: nfsv4
  0m26.566s


Not too bad really. But you have to reboot to see a CF card, fiddle with 
BIOS for the boot order etc. Just not an easy add on a live system. A 
SATA emulated SSD DISK can be hot-swapped.



Also, I learned an interesting lesson about rebooting with slog at 
/tmp/junk.



I am hoping to pick up a SSD SATA device today and see what speeds we 
get out of that.


That rsync (1s) vs nfs(8s) I can accept as over-head on a much more 
complicated protocol, but why would it take 3 minutes to write the same 
data on the same pool, with rsync(1s) vs nfs(3m)? The ZIL was on, slog 
is default, but both writing the same way. Does nfsd add FD_SYNC to 
every close regardless as to whether the application did or not?

This I have not yet wrapped my head around.

For example, I know rsyn

53 matches

Mail list logo