[zfs-discuss] 'primarycache' and 'secondarycache'

2010-09-16 Thread Jackie Cheng
My understand of the read cache is that L2ARC has a read thread to read the 
cache from ARC. Hence my question.

if primarycache is set to 'metadata', will L2ARC get to cache user data?
similarly, what if primarycache is set to none.

Thanks,
--Jackie
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mac OS X clients with ZFS server

2010-09-16 Thread erik.ableson

On 15 sept. 2010, at 22:04, Mike Mackovitch wrote:

> On Wed, Sep 15, 2010 at 12:08:20PM -0700, Nabil wrote:
>> any resolution to this issue?  I'm experiencing the same annoying
>> lockd thing with mac osx 10.6 clients.  I am at pool ver 14, fs ver
>> 3.  Would somehow going back to the earlier 8/2 setup make things
>> better?
> 
> As noted in the earlier thread, the "annoying lockd thing" is not a
> ZFS issue, but rather a networking issue.
> 
> FWIW, I never saw a resolution.  But the suggestions for how to debug
> situations like this still stand:

And for reference, I have a number of 10.6 clients using NFS for sharing Fusion 
virtual machines, iTunes library, iPhoto libraries etc. without any issues.

Cheers,

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Replacing a disk never completes

2010-09-16 Thread Ben Miller
I have an X4540 running b134 where I'm replacing 500GB disks with 2TB disks 
(Seagate Constellation) and the pool seems sick now.  The pool has four 
raidz2 vdevs (8+2) where the first set of 10 disks were replaced a few 
months ago.  I replaced two disks in the second set (c2t0d0, c3t0d0) a 
couple of weeks ago, but have been unable to get the third disk to finish 
replacing (c4t0d0).


I have tried the resilver for c4t0d0 four times now and the pool also comes 
up with checksum errors and a permanent error (:<0x0>).  The 
first resilver was from 'zpool replace', which came up with checksum 
errors.  I cleared the errors which triggered the second resilver (same 
result).  I then did a 'zpool scrub' which started the third resilver and 
also identified three permanent errors (the two additional were in files in 
snapshots which I then destroyed).  I then did a 'zpool clear' and then 
another scrub which started the fourth resilver attempt.  This last attempt 
identified another file with errors in a snapshot that I have now destroyed.


Any ideas how to get this disk finished being replaced without rebuilding 
the pool and restoring from backup?  The pool is working, but is reporting 
as degraded and with checksum errors.


Here is what the pool currently looks like:

 # zpool status -v pool2
  pool: pool2
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed after 33h9m with 4 errors on Thu Sep 16 00:28:14
config:

NAME  STATE READ WRITE CKSUM
pool2 DEGRADED 0 0 8
  raidz2-0ONLINE   0 0 0
c0t4d0ONLINE   0 0 0
c1t4d0ONLINE   0 0 0
c2t4d0ONLINE   0 0 0
c3t4d0ONLINE   0 0 0
c4t4d0ONLINE   0 0 0
c5t4d0ONLINE   0 0 0
c2t5d0ONLINE   0 0 0
c3t5d0ONLINE   0 0 0
c4t5d0ONLINE   0 0 0
c5t5d0ONLINE   0 0 0
  raidz2-1DEGRADED 0 014
c0t5d0ONLINE   0 0 0
c1t5d0ONLINE   0 0 0
c2t1d0ONLINE   0 0 0
c3t1d0ONLINE   0 0 0
c4t1d0ONLINE   0 0 0
c5t1d0ONLINE   0 0 0
c2t0d0ONLINE   0 0 0
c3t0d0ONLINE   0 0 0
replacing-8   DEGRADED 0 0 0
  c4t0d0s0/o  OFFLINE  0 0 0
  c4t0d0  ONLINE   0 0 0  268G resilvered
c5t0d0ONLINE   0 0 0
  raidz2-2ONLINE   0 0 0
c0t6d0ONLINE   0 0 0
c1t6d0ONLINE   0 0 0
c2t6d0ONLINE   0 0 0
c3t6d0ONLINE   0 0 0
c4t6d0ONLINE   0 0 0
c5t6d0ONLINE   0 0 0
c2t7d0ONLINE   0 0 0
c3t7d0ONLINE   0 0 0
c4t7d0ONLINE   0 0 0
c5t7d0ONLINE   0 0 0
  raidz2-3ONLINE   0 0 0
c0t7d0ONLINE   0 0 0
c1t7d0ONLINE   0 0 0
c2t3d0ONLINE   0 0 0
c3t3d0ONLINE   0 0 0
c4t3d0ONLINE   0 0 0
c5t3d0ONLINE   0 0 0
c2t2d0ONLINE   0 0 0
c3t2d0ONLINE   0 0 0
c4t2d0ONLINE   0 0 0
c5t2d0ONLINE   0 0 0
logs
  mirror-4ONLINE   0 0 0
c0t1d0s0  ONLINE   0 0 0
c1t3d0s0  ONLINE   0 0 0
cache
  c0t3d0s7ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

:<0x0>
<0x167a2>:<0x552ed>
(This second file was in a snapshot I destroyed after the resilver 
completed).


# zpool list pool2
NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
pool2  31.8T  13.8T  17.9T43%  1.65x  DEGRADED  -

The slog is a mirror of two SLC SSDs and the L2ARC is an MLC SSD.

thanks,
Ben
___
zfs-discuss mailing list
zfs-discus

Re: [zfs-discuss] dedicated ZIL/L2ARC

2010-09-16 Thread Wolfraider
We downloaded zilstat from 
http://www.richardelling.com/Home/scripts-and-programs-1 but we never could get 
the script to run. We are not really sure how to debug. :(

./zilstat.ksh 
dtrace: invalid probe specifier 
#pragma D option quiet
 inline int OPT_time = 0;
 inline int OPT_txg = 0;
 inline int OPT_pool = 0;
 inline int OPT_mega = 0;
 inline int INTERVAL = 1;
 inline int LINES = -1;
 inline int COUNTER = -1;
 inline int FILTER = 0;
 inline string POOL = "";
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mac OS X clients with ZFS server

2010-09-16 Thread Rich Teer
On Thu, 16 Sep 2010, erik.ableson wrote:

> And for reference, I have a number of 10.6 clients using NFS for
> sharing Fusion virtual machines, iTunes library, iPhoto libraries etc.
> without any issues.

Excellent; what OS is your NFS server running?

-- 
Rich Teer, Publisher
Vinylphile Magazine

www.vinylphilemag.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedicated ZIL/L2ARC

2010-09-16 Thread Wolfraider
We have the following setup configured.  The drives are running on a couple PAC 
PS-5404s. Since these units do not support JBOD, we have configured each 
individual drive as a RAID0 and shared out all 48 RAID0’s per box. This is 
connected to the solaris box through a dual port 4G Emulex fibrechannel card 
with MPIO enabled (round-robin).  This is configured with the 18 raidz2 vdevs 
and 1 big pool. We currently have 2 zvols created with the size being around 
40TB sparse (30T in use). This in turn is shared out using a fibrechannel 
Qlogic QLA2462 in target mode, using both ports. We have 1 zvol connected to 1 
windows server and the other zvol connected to another windows server with both 
windows servers having a qlogic 2462 fibrechannel adapter, using both ports and 
MPIO enabled. The windows servers are running Windows 2008 R2. The zvols are 
formatted NTFS and used as a staging area and D2D2T system for both Commvault 
and Microsoft Data Protection Manager backup solutions. The SAN system sees 
mostly writes since it is used for backups.

We are using Cisco 9124 fibrechannel switches and we have recently upgraded to 
Cisco 10G Nexus switches for our Ethernet side. Fibrechannel support on the 
Nexus will be in a few years due to the cost. We are just trying to fine tune 
our SAN for the best performance possible and we don’t really have any 
expectations right now. We are always looking to improve something. :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver = defrag?

2010-09-16 Thread David Dyer-Bennet

On Wed, September 15, 2010 16:18, Edward Ned Harvey wrote:

> For example, if you start with an empty drive, and you write a large
> amount
> of data to it, you will have no fragmentation.  (At least, no significant
> fragmentation; you may get a little bit based on random factors.)  As life
> goes on, as long as you keep plenty of empty space on the drive, there's
> never any reason for anything to become significantly fragmented.

Sure, if only a single thread is ever writing to the disk store at a time.

This situation doesn't exist with any kind of enterprise disk appliance,
though; there are always multiple users doing stuff.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] recordsize

2010-09-16 Thread Mike DeMarco
What are the ramifications to changing the recordsize of a zfs filesystem that 
already has data on it?

I want to tune down the recordsize to speed up very small reads to a size that 
is more in line with the read size. can I do this on a filestystem that has 
data already on it and how does it effect that data? zpool consists of 8 SANs 
Luns.

Thanks
mike
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mac OS X clients with ZFS server

2010-09-16 Thread Rich Teer
On Thu, 16 Sep 2010, Erik Ableson wrote:

> OpenSolaris snv129

Hmm, SXCE snv_130 here.  Did you have to do any server-side tuning
(e.g., allowing remote connections), or did it just work out of the
box?  I know that Sendmail needs some gentle persuasion to accept
remote connections out of the box; perhaps lockd is the same?

-- 
Rich Teer, Publisher
Vinylphile Magazine

www.vinylphilemag.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] recordsize

2010-09-16 Thread Freddie Cash
On Thu, Sep 16, 2010 at 8:21 AM, Mike DeMarco  wrote:
> What are the ramifications to changing the recordsize of a zfs filesystem 
> that already has data on it?
>
> I want to tune down the recordsize to speed up very small reads to a size 
> that is more in line with the read size.
> can I do this on a filestystem that has data already on it and how does it 
> effect that data? zpool consists of 8 SANs Luns.

Changing any of the zfs properties only affects data written after the
change is made.  Thus, reducing the recordsize for a filesystem will
only affect newly written data.  Any existing data is not affected
until it is re-written or copied.


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mac OS X clients with ZFS server

2010-09-16 Thread Mike Mackovitch
On Thu, Sep 16, 2010 at 08:15:53AM -0700, Rich Teer wrote:
> On Thu, 16 Sep 2010, Erik Ableson wrote:
> 
> > OpenSolaris snv129
> 
> Hmm, SXCE snv_130 here.  Did you have to do any server-side tuning
> (e.g., allowing remote connections), or did it just work out of the
> box?  I know that Sendmail needs some gentle persuasion to accept
> remote connections out of the box; perhaps lockd is the same?

So, you've been having this problem since April.
Did you ever try getting packet traces to see where the problem is?

As I previously stated, if you want, you can forward the traces to me to
look at.  Let me know if you need the directions on how to capture them.

--macko
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver = defrag?

2010-09-16 Thread Miles Nordin
> "dd" == David Dyer-Bennet  writes:

dd> Sure, if only a single thread is ever writing to the disk
dd> store at a time.

video warehousing is a reasonable use case that will have small
numbers of sequential readers and writers to large files.  virtual
tape library is another obviously similar one.  basically, things
which used to be stored on tape.  which are not uncommon.  

AIUI ZFS does not have a fragmentation problem for these cases unless
you fill past 96%, though I've been trying to keep my pool below 80%
because .

dd> This situation doesn't exist with any kind of enterprise disk
dd> appliance, though; there are always multiple users doing
dd> stuff.

the point's relevant, but I'm starting to tune out every time I hear
the word ``enterprise.''  seems it often decodes to: 

 (1) ``fat sacks and no clue,'' or 

 (2) ``i can't hear you i can't hear you i have one big hammer in my
 toolchest and one quick answer to all questions, and everything's
 perfect! perfect, I say.  unless you're offering an even bigger
 hammer I can swap for this one, I don't want to hear it,'' or

 (3) ``However of course I agree that hammers come in different
 colors, and a wise and experienced craftsman will always choose
 the color of his hammer based on the color of the nail he's
 hitting, because the interface between hammers and nails doesn't
 work well otherwise.  We all know here how to match hammer and
 nail colors, but I don't want to discuss that at all because it's
 a private decision to make between you and your salesdroid.  

 ``However, in this forum here we talk about GREEN NAILS ONLY.  If
 you are hitting green nails with red hammers and finding they go
 into the wood anyway then you are being very unprofessional
 because that nail might have been a bank transaction. --posted
 from opensolaris.org''


pgpqzPhCxoUuU.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver = defrag?

2010-09-16 Thread Marty Scholes
David Dyer-Bennet wote:
> Sure, if only a single thread is ever writing to the
> disk store at a time.
> 
> This situation doesn't exist with any kind of
> enterprise disk appliance,
> though; there are always multiple users doing stuff.

Ok, I'll bite.

Your assertion seems to be that "any kind of enterprise disk appliance" will 
always have enough simultaneous I/O requests queued that any sequential read 
from any application will be sufficiently broken up by requests from other 
applications, effectively rendering all read requests as random.  If I follow 
your logic, since all requests are essentially random anyway, then where they 
fall on the disk is irrelevant.

I might challenge a couple of those assumptions.

First, if the data is not fragmented, then ZFS would coalesce multiple 
contiguous read requests into a single large read request, increasing total 
throughput regardless of competing I/O requests (which also might benefit from 
the same effect).

Second, I am unaware of an enterprise requirement that disk I/O run at 100% 
busy, any more than I am aware of the same requirement for full network link 
utilization, CPU utilization or PCI bus utilization.

What appears to be missing from this discussion is any shred of scientific 
evidence that fragmentation is good or bad and by how much.  We also lack any 
detail on how much fragmentation does take place.

Let's see if some people in the community can get some real numbers behind this 
stuff in real world situations.

Cheers,
Marty
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Compression block sizes

2010-09-16 Thread Bob Friesenhahn

On Wed, 15 Sep 2010, Brandon High wrote:


When using compression, are the on-disk record sizes determined before
or after compression is applied? In other words, if record size is set
to 128k, is that the amount of data fed into the compression engine,
or is the output size trimmed to fit? I think it's the former, but I'm
not certain.


We have been told before that the blocksize is applied to the 
uncompressed data and that when compression is applied, short blocks 
may be written to disk.  This does not mean that the short blocks 
don't start at a particular alignment.  When using raidz, the zfs 
blocks are already broken up into smaller chunks, using a smaller 
alignment than the zfs record size.  For zfs send, the data is 
uncompressed to full records prior to sending.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Best practice for Sol10U9 ZIL -- mirrored or not?

2010-09-16 Thread Ray Van Dolson
Best practice in Solaris 10 U8 and older was to use a mirrored ZIL.

With the ability to remove slog devices in Solaris 10 U9, we're
thinking we may get more bang for our buck to use two slog devices for
improved IOPS performance instead of needing the redundancy so much.

Any thoughts on this?

If we lost our slog devices and had to reboot, would the system come up
(eg could we "remove" failed slog devices from the zpool so the zpool
would come online..)

Thanks,
Ray
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best practice for Sol10U9 ZIL -- mirrored or not?

2010-09-16 Thread Bryan Horstmann-Allen
+--
| On 2010-09-16 18:08:46, Ray Van Dolson wrote:
| 
| Best practice in Solaris 10 U8 and older was to use a mirrored ZIL.
| 
| With the ability to remove slog devices in Solaris 10 U9, we're
| thinking we may get more bang for our buck to use two slog devices for
| improved IOPS performance instead of needing the redundancy so much.
| 
| Any thoughts on this?
| 
| If we lost our slog devices and had to reboot, would the system come up
| (eg could we "remove" failed slog devices from the zpool so the zpool
| would come online..)

The ability to remove the slogs isn't really the win here, it's import -F. The
problem is: If the ZIL dies, you will lose whatever writes were in-flight.

I've just deployed some SSD ZIL (on U9), and decided to mirror them. Cut the
two SSDs into 1GB and 31GB partitions, mirrored the two 1GBs as slog and have
the two 31GB as L2ARC.

So far extremely happy with it. Running a scrub during production hours,
before, was unheard of. (And, well, "production" for mail storage is basically
all hours, so.)

As for running non-mirrored slogs... dunno. Our customers would be pretty
pissed if we lost any mail, so I doubt I will do so. My SSDs were only $90
each, though, so cost is hardly a factor for us.

Cheers.
-- 
bdha
cyberpunk is dead. long live cyberpunk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss