Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-09 Thread hatish
Ahhh! So thats how the formula works. That makes perfect sense. Lets take my case as a scenario: Each of my vdevs is 10 disk RaidZ2 (8 data + 2 Parity). Using 128K stripe, I'll have 128K/8 = 16K blocks per data drive & 16K blocks per parity drive. That fits both 512B & 4KB. It works in my favo

Re: [zfs-discuss] resilver = defrag?

2010-09-09 Thread Bill Sommerfeld
On 09/09/10 20:08, Edward Ned Harvey wrote: Scores so far: 2 No 1 Yes No. resilver does not re-layout your data or change whats in the block pointers on disk. if it was fragmented before, it will be fragmented after. C) Does zfs send zfs receive mean it will defrag? Scor

Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-09 Thread Edward Ned Harvey
> From: Haudy Kazemi [mailto:kaze0...@umn.edu] > > There is another optimization in the Best Practices Guide that says the > number of devices in a vdev should be (N+P) with P = 1 (raidz), 2 > (raidz2), or 3 (raidz3) and N equals 2, 4, or 8. > I.e. 2^N + P where N is 1, 2, or 3 and P is the RAIDZ l

Re: [zfs-discuss] resilver = defrag?

2010-09-09 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Orvar Korvar > > A) Resilver = Defrag. True/false? I think everyone will agree "false" on this question. However, more detail may be appropriate. See below. > B) If I buy larger drives and

Re: [zfs-discuss] performance leakage when copy huge data

2010-09-09 Thread Fei Xu
Just to update the status and findings. I've checked TLER settings and they are off by default. I moved the source pool to another chassis and do the 3.8TB send again. this time, not any problems! the difference is 1. New chassis 2. BIGGER memory. 32GB v.s 12GB 3. although wdidle time is dis

Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-09 Thread Haudy Kazemi
Erik Trimble wrote: On 9/9/2010 2:15 AM, taemun wrote: Erik: does that mean that keeping the number of data drives in a raidz(n) to a power of two is better? In the example you gave, you mentioned 14kb being written to each drive. That doesn't sound very efficient to me. (when I say the abo

Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-09 Thread Haudy Kazemi
Comment at end... Mattias Pantzare wrote: On Wed, Sep 8, 2010 at 15:27, Edward Ned Harvey wrote: From: pantz...@gmail.com [mailto:pantz...@gmail.com] On Behalf Of Mattias Pantzare It is about 1 vdev with 12 disk or 2 vdev with 6 disks. If you have 2 vdev you have to read half the data com

Re: [zfs-discuss] NetApp/Oracle-Sun lawsuit done

2010-09-09 Thread Tim Cook
On Thu, Sep 9, 2010 at 2:49 PM, Bob Friesenhahn < bfrie...@simple.dallas.tx.us> wrote: > On Thu, 9 Sep 2010, Garrett D'Amore wrote: > >> >> True. But, I wonder if the settlement sets a precedent? >> > > No precedent has been set. > > > Certainly the lack of a successful lawsuit has *failed* to s

Re: [zfs-discuss] resilver = defrag?

2010-09-09 Thread Freddie Cash
On Thu, Sep 9, 2010 at 1:26 PM, Freddie Cash wrote: > On Thu, Sep 9, 2010 at 1:04 PM, Orvar Korvar > wrote: >> A) Resilver = Defrag. True/false? > > False.  Resilver just rebuilds a drive in a vdev based on the > redundant data stored on the other drives in the vdev.  Similar to how > replacing a

Re: [zfs-discuss] resilver = defrag?

2010-09-09 Thread Marty Scholes
I am speaking from my own observations and nothing scientific such as reading the code or designing the process. > A) Resilver = Defrag. True/false? False > B) If I buy larger drives and resilver, does defrag > happen? No. The first X sectors of the bigger drive are identical to the smaller

Re: [zfs-discuss] NetApp/Oracle-Sun lawsuit done

2010-09-09 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Bob Friesenhahn > > There should be little doubt that NetApp's goal was to make money by > suing Sun. Nexenta does not have enough income/assets to make a risky > lawsuit worthwhile. But in a

Re: [zfs-discuss] resilver = defrag?

2010-09-09 Thread Freddie Cash
On Thu, Sep 9, 2010 at 1:04 PM, Orvar Korvar wrote: > A) Resilver = Defrag. True/false? False. Resilver just rebuilds a drive in a vdev based on the redundant data stored on the other drives in the vdev. Similar to how replacing a dead drive works in a hardware RAID array. > B) If I buy larger

[zfs-discuss] How to migrate to 4KB sector drives?

2010-09-09 Thread Orvar Korvar
ZFS does not handle 4K sector drives well, you need to create a new zpool with "4K" property (ashift) set. http://www.solarismen.de/archives/5-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-2.html Are there plans to allow resilver to handle 4K sector drives? -- This message posted from

[zfs-discuss] resilver = defrag?

2010-09-09 Thread Orvar Korvar
A) Resilver = Defrag. True/false? B) If I buy larger drives and resilver, does defrag happen? C) Does zfs send zfs receive mean it will defrag? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://

Re: [zfs-discuss] NetApp/Oracle-Sun lawsuit done

2010-09-09 Thread Bob Friesenhahn
On Thu, 9 Sep 2010, Garrett D'Amore wrote: True. But, I wonder if the settlement sets a precedent? No precedent has been set. Certainly the lack of a successful lawsuit has *failed* to set any precedent conclusively indicating that NetApp has enforceable patents where ZFS is concerned. Ri

Re: [zfs-discuss] NetApp/Oracle-Sun lawsuit done

2010-09-09 Thread Erik Trimble
On 9/9/2010 11:11 AM, Garrett D'Amore wrote: On Thu, 2010-09-09 at 12:58 -0500, Bob Friesenhahn wrote: On Thu, 9 Sep 2010, Erik Trimble wrote: Yes, it's welcome to get it over with. I do get to bitch about one aspect here of the US civil legal system, though. If you've gone so far as to burn

Re: [zfs-discuss] NetApp/Oracle-Sun lawsuit done

2010-09-09 Thread Miles Nordin
> "dm" == David Magda writes: dm> http://www.theregister.co.uk/2010/09/09/oracle_netapp_zfs_dismiss/ http://www.groklaw.net/articlebasic.php?story=20050121014650517 says when the MPL was modified to become the CDDL, clauses were removed which would have required Oracle to disclose any p

Re: [zfs-discuss] performance leakage when copy huge data

2010-09-09 Thread Miles Nordin
> "ml" == Mark Little writes: ml> Just to clarify - do you mean TLER should be off or on? It should be set to ``do not have asvc_t 11 seconds and <1 io/s''. ...which is not one of the settings of the TLER knob. This isn't a problem with the TLER *setting*. TLER does not even apply unl

Re: [zfs-discuss] NetApp/Oracle-Sun lawsuit done

2010-09-09 Thread Bob Friesenhahn
On Thu, 9 Sep 2010, Erik Trimble wrote: Yes, it's welcome to get it over with. I do get to bitch about one aspect here of the US civil legal system, though. If you've gone so far as to burn our (the public's) time and money to file a lawsuit, you shouldn't be able to seal up the court transcri

Re: [zfs-discuss] NFS performance near zero on a very full pool

2010-09-09 Thread Arne Jansen
Richard Elling wrote: On Sep 9, 2010, at 10:09 AM, Arne Jansen wrote: Hi Neil, Neil Perrin wrote: NFS often demands it's transactions are stable before returning. This forces ZFS to do the system call synchronously. Usually the ZIL (code) allocates and writes a new block in the intent log cha

Re: [zfs-discuss] NFS performance near zero on a very full pool

2010-09-09 Thread Richard Elling
On Sep 9, 2010, at 10:09 AM, Arne Jansen wrote: > Hi Neil, > > Neil Perrin wrote: >> NFS often demands it's transactions are stable before returning. >> This forces ZFS to do the system call synchronously. Usually the >> ZIL (code) allocates and writes a new block in the intent log chain to >> a

Re: [zfs-discuss] NetApp/Oracle-Sun lawsuit done

2010-09-09 Thread Erik Trimble
in 2007 between Sun Microsystems and NetApp. Oracle and NetApp seek to have the lawsuits dismissed without prejudice. The terms of the agreement are confidential. http://tinyurl.com/39qkzgz http://www.netapp.com/us/company/news/news-rel-20100909-oracle-settlement.html A recap of the history at

Re: [zfs-discuss] NetApp/Oracle-Sun lawsuit done

2010-09-09 Thread Richard Elling
gt;> Microsystems and NetApp. Oracle and NetApp seek to have the lawsuits >> dismissed without prejudice. The terms of the agreement are confidential. > > http://tinyurl.com/39qkzgz > http://www.netapp.com/us/company/news/news-rel-20100909-oracle-settlement.html &g

Re: [zfs-discuss] NFS performance near zero on a very full pool

2010-09-09 Thread Arne Jansen
Hi Neil, Neil Perrin wrote: NFS often demands it's transactions are stable before returning. This forces ZFS to do the system call synchronously. Usually the ZIL (code) allocates and writes a new block in the intent log chain to achieve this. If ever it fails to allocate a block (of the size r

Re: [zfs-discuss] NFS performance near zero on a very full pool

2010-09-09 Thread Neil Perrin
I should also have mentioned that if the pool has a separate log device then this shouldn't happen.Assuming the slog is big enough then it it should have enough blocks to not be forced into using main pool device blocks. Neil. On 09/09/10 10:36, Neil Perrin wrote: Arne, NFS often demands it'

[zfs-discuss] NetApp/Oracle-Sun lawsuit done

2010-09-09 Thread David Magda
ejudice. The terms of the agreement are confidential. http://tinyurl.com/39qkzgz http://www.netapp.com/us/company/news/news-rel-20100909-oracle-settlement.html A recap of the history at: http://www.theregister.co.uk/2010/09/09/oracle_netapp_zfs_dismiss/ ___

Re: [zfs-discuss] NFS performance near zero on a very full pool

2010-09-09 Thread Neil Perrin
Arne, NFS often demands it's transactions are stable before returning. This forces ZFS to do the system call synchronously. Usually the ZIL (code) allocates and writes a new block in the intent log chain to achieve this. If ever it fails to allocate a block (of the size requested) it it forced

Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-09 Thread Erik Trimble
On 9/9/2010 6:19 AM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Erik Trimble the thing that folks tend to forget is that RaidZ is IOPS limited. For the most part, if I want to reconstruct a single slab (stripe)

Re: [zfs-discuss] performance leakage when copy huge data

2010-09-09 Thread Mark Little
On Thu, 9 Sep 2010 14:05:51 +, Markus Kovero wrote: > On Sep 9, 2010, at 8:27 AM, Fei Xu wrote: > > >> This might be the dreaded WD TLER issue. Basically the drive keeps retrying >> a read operation over and over after a bit error trying to recover from a > >> read error themselves. With

[zfs-discuss] NFS performance near zero on a very full pool

2010-09-09 Thread Arne Jansen
Hi, currently I'm trying to debug a very strange phenomenon on a nearly full pool (96%). Here are the symptoms: over NFS, a find on the pool takes a very long time, up to 30s (!) for each file. Locally, the performance is quite normal. What I found out so far: It seems that every nfs write (rfs3_w

Re: [zfs-discuss] zpool create using whole disk - do I add "p0"? E.g. c4t2d0 or c42d0p0

2010-09-09 Thread Cindy Swearingen
Hi-- It might help to review the disk component terminology description: c#t#d#p# = represents the the fdisk partition on x86 systems, where you can have up to 4 fdisk partitions, such as one for the Solaris OS or a Windows OS. An fdisk partition is the larger container of the disk or disk slice

Re: [zfs-discuss] performance leakage when copy huge data

2010-09-09 Thread Markus Kovero
On Sep 9, 2010, at 8:27 AM, Fei Xu wrote: > This might be the dreaded WD TLER issue. Basically the drive keeps retrying a > read operation over and over after a bit error trying to recover from a > > read error themselves. With ZFS one really needs to disable this and have the > drives fail i

Re: [zfs-discuss] performance leakage when copy huge data

2010-09-09 Thread Ross Walker
On Sep 9, 2010, at 8:27 AM, Fei Xu wrote: >> >> Service times here are crap. Disks are malfunctioning >> in some way. If >> your source disks can take seconds (or 10+ seconds) >> to reply, then of >> course your copy will be slow. Disk is probably >> having a hard time >> reading the data or som

Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-09 Thread Marty Scholes
Erik wrote: > Actually, your biggest bottleneck will be the IOPS > limits of the > drives. A 7200RPM SATA drive tops out at 100 IOPS. > Yup. That's it. > So, if you need to do 62.5e6 IOPS, and the rebuild > drive can do just 100 > IOPS, that means you will finish (best case) in > 62.5e4 seconds

Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-09 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Edward Ned Harvey > > The characteristic that *really* makes a big difference is the number > of > slabs in the pool. i.e. if your filesystem is composed of mostly small > files or fragments,

Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-09 Thread Edward Ned Harvey
> From: Hatish Narotam [mailto:hat...@gmail.com] > > PCI-E 8X 4-port ESata Raid Controller. > 4 x ESata to 5Sata Port multipliers (each connected to a ESata port on > the controller). > 20 x Samsung 1TB HDD's. (each connected to a Port Multiplier). Assuming your disks can all sustain 500Mbit/sec,

Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-09 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Erik Trimble > > the thing that folks tend to forget is that RaidZ is IOPS limited. For > the most part, if I want to reconstruct a single slab (stripe) of data, > I have to issue a read to EA

Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-09 Thread Will Murnane
On Thu, Sep 9, 2010 at 09:03, Erik Trimble wrote: > Actually, your biggest bottleneck will be the IOPS limits of the drives.  A > 7200RPM SATA drive tops out at 100 IOPS.  Yup. That's it. > > So, if you need to do 62.5e6 IOPS, and the rebuild drive can do just 100 > IOPS, that means you will finis

Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-09 Thread Erik Trimble
On 9/9/2010 5:49 AM, hatish wrote: Very interesting... Well, lets see if we can do the numbers for my setup. From a previous post of mine: [i]This is my exact breakdown (cheap disks on cheap bus :P) : PCI-E 8X 4-port ESata Raid Controller. 4 x ESata to 5Sata Port multipliers (each connected

Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-09 Thread Erik Trimble
On 9/9/2010 5:49 AM, hatish wrote: Very interesting... Well, lets see if we can do the numbers for my setup. From a previous post of mine: [i]This is my exact breakdown (cheap disks on cheap bus :P) : PCI-E 8X 4-port ESata Raid Controller. 4 x ESata to 5Sata Port multipliers (each connected

Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-09 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Freddie Cash > > No, it (21-disk raidz3 vdev) most certainly will not resilver in the > same amount of time. In fact, I highly doubt it would resilver at > all. > > My first foray into ZFS re

Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-09 Thread hatish
Very interesting... Well, lets see if we can do the numbers for my setup. >From a previous post of mine: [i]This is my exact breakdown (cheap disks on cheap bus :P) : PCI-E 8X 4-port ESata Raid Controller. 4 x ESata to 5Sata Port multipliers (each connected to a ESata port on the controller).

Re: [zfs-discuss] performance leakage when copy huge data

2010-09-09 Thread Fei Xu
> > Service times here are crap. Disks are malfunctioning > in some way. If > your source disks can take seconds (or 10+ seconds) > to reply, then of > course your copy will be slow. Disk is probably > having a hard time > reading the data or something. > Yeah, that should not go over 15ms. I

Re: [zfs-discuss] performance leakage when copy huge data

2010-09-09 Thread Tomas Ögren
On 08 September, 2010 - Fei Xu sent me these 5,9K bytes: > I dig deeper into it and might find some useful information. > I attached an X25 SSD for ZIL to see if it helps. but no luck. > I run IOstate -xnz for more details and got interesting result as > below.(maybe too long) > some explainatio

Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-09 Thread Erik Trimble
On 9/9/2010 2:15 AM, taemun wrote: Erik: does that mean that keeping the number of data drives in a raidz(n) to a power of two is better? In the example you gave, you mentioned 14kb being written to each drive. That doesn't sound very efficient to me. (when I say the above, I mean a five dis

Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-09 Thread taemun
Erik: does that mean that keeping the number of data drives in a raidz(n) to a power of two is better? In the example you gave, you mentioned 14kb being written to each drive. That doesn't sound very efficient to me. (when I say the above, I mean a five disk raidz or a ten disk raidz2, etc) Cheer

Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-09 Thread Erik Trimble
On 9/8/2010 10:08 PM, Freddie Cash wrote: On Wed, Sep 8, 2010 at 6:27 AM, Edward Ned Harvey wrote: Both of the above situations resilver in equal time, unless there is a bus bottleneck. 21 disks in a single raidz3 will resilver just as fast as 7 disks in a raidz1, as long as you are avoiding