Re: [zfs-discuss] Wired write performance problem

2011-06-08 Thread Tomas Ögren
On 08 June, 2011 - Donald Stahl sent me these 0,6K bytes:

> >> One day, the write performance of zfs degrade.
> >> The write performance decrease from 60MB/s to about 6MB/s in sequence
> >> write.
> >>
> >> Command:
> >> date;dd if=/dev/zero of=block bs=1024*128 count=1;date
> 
> See this thread:
> 
> http://www.opensolaris.org/jive/thread.jspa?threadID=139317&tstart=45
> 
> And search in the page for:
> "metaslab_min_alloc_size"
> 
> Try adjusting the metaslab size and see if it fixes your performance problem.

And if pool usage is >90%, then there's another problem (change of
finding free space algorithm).

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Wired write performance problem

2011-06-08 Thread Markus Kovero
Hi, also see;
http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg45408.html

We hit this with Sol11 though, not sure if it's possible with sol10

Yours
Markus Kovero

-Original Message-
From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Ding Honghui
Sent: 8. kesäkuuta 2011 6:07
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] Wired write performance problem

Hi,

I got a wired write performance and need your help.

One day, the write performance of zfs degrade.
The write performance decrease from 60MB/s to about 6MB/s in sequence write.

Command:
date;dd if=/dev/zero of=block bs=1024*128 count=1;date

The hardware configuration is 1 Dell MD3000 and 1 MD1000 with 30 disks.
The OS is Solaris 10U8, zpool version 15 and zfs version 4.

I run Dtrace to trace the write performance:

fbt:zfs:zfs_write:entry
{
 self->ts = timestamp;
}


fbt:zfs:zfs_write:return
/self->ts/
{
 @time = quantize(timestamp-self->ts);
 self->ts = 0;
}

It shows
value  - Distribution - count
 8192 | 0
16384 | 16
32768 | 3270
65536 |@@@  898
   131072 |@@@  985
   262144 | 33
   524288 | 1
  1048576 | 1
  2097152 | 3
  4194304 | 0
  8388608 |@180
 16777216 | 33
 33554432 | 0
 67108864 | 0
134217728 | 0
268435456 | 1
536870912 | 1
   1073741824 | 2
   2147483648 | 0
   4294967296 | 0
   8589934592 | 0
  17179869184 | 2
  34359738368 | 3
  68719476736 | 0

Compare to a working well storage(1 MD3000), the max write time of zfs_write is 
4294967296, it is about 10 times faster.

Any suggestions?

Thanks
Ding

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Wired write performance problem

2011-06-08 Thread Ding Honghui



On 06/08/2011 12:12 PM, Donald Stahl wrote:

One day, the write performance of zfs degrade.
The write performance decrease from 60MB/s to about 6MB/s in sequence
write.

Command:
date;dd if=/dev/zero of=block bs=1024*128 count=1;date

See this thread:

http://www.opensolaris.org/jive/thread.jspa?threadID=139317&tstart=45

And search in the page for:
"metaslab_min_alloc_size"

Try adjusting the metaslab size and see if it fixes your performance problem.

-Don



"metaslab_min_alloc_size" is not in use when block allocator isDynamic block 
allocator[1].
So it is not tunable parameter in my case.

Thanks anyway.

[1] 
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c#496


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Wired write performance problem

2011-06-08 Thread Ding Honghui
For now, I find it take long time in function metaslab_block_picker in 
metaslab.c.

I guess there maybe many avl search actions.

I still not sure what cause the avl search and if there is any 
parameters to tune for it.


Any suggestions?

On 06/08/2011 05:57 PM, Markus Kovero wrote:

Hi, also see;
http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg45408.html

We hit this with Sol11 though, not sure if it's possible with sol10

Yours
Markus Kovero

-Original Message-
From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Ding Honghui
Sent: 8. kesäkuuta 2011 6:07
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] Wired write performance problem

Hi,

I got a wired write performance and need your help.

One day, the write performance of zfs degrade.
The write performance decrease from 60MB/s to about 6MB/s in sequence write.

Command:
date;dd if=/dev/zero of=block bs=1024*128 count=1;date

The hardware configuration is 1 Dell MD3000 and 1 MD1000 with 30 disks.
The OS is Solaris 10U8, zpool version 15 and zfs version 4.

I run Dtrace to trace the write performance:

fbt:zfs:zfs_write:entry
{
  self->ts = timestamp;
}


fbt:zfs:zfs_write:return
/self->ts/
{
  @time = quantize(timestamp-self->ts);
  self->ts = 0;
}

It shows
 value  - Distribution - count
  8192 | 0
 16384 | 16
 32768 | 3270
 65536 |@@@  898
131072 |@@@  985
262144 | 33
524288 | 1
   1048576 | 1
   2097152 | 3
   4194304 | 0
   8388608 |@180
  16777216 | 33
  33554432 | 0
  67108864 | 0
 134217728 | 0
 268435456 | 1
 536870912 | 1
1073741824 | 2
2147483648 | 0
4294967296 | 0
8589934592 | 0
   17179869184 | 2
   34359738368 | 3
   68719476736 | 0

Compare to a working well storage(1 MD3000), the max write time of zfs_write is 
4294967296, it is about 10 times faster.

Any suggestions?

Thanks
Ding

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Wired write performance problem

2011-06-08 Thread Ding Honghui

On 06/08/2011 04:05 PM, Tomas Ögren wrote:

On 08 June, 2011 - Donald Stahl sent me these 0,6K bytes:


One day, the write performance of zfs degrade.
The write performance decrease from 60MB/s to about 6MB/s in sequence
write.

Command:
date;dd if=/dev/zero of=block bs=1024*128 count=1;date

See this thread:

http://www.opensolaris.org/jive/thread.jspa?threadID=139317&tstart=45

And search in the page for:
"metaslab_min_alloc_size"

Try adjusting the metaslab size and see if it fixes your performance problem.

And if pool usage is>90%, then there's another problem (change of
finding free space algorithm).

/Tomas


Tomas,

Thanks for your suggestion.

You are right.

I have tune parameter metaslab_df_free_pct from 35 to 4 to reduce this 
problem some days ago.

The performance keep good for about 1 week and performance degrade again.

And I still not sure how many operation run into best fit block allocate 
policy

and how many run into fist fit block allocate policy in current situation.

It's very appreciate if you can help.

Regards,
Ding


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool import crashs SX11 trying to recovering a corrupted zpool

2011-06-08 Thread Stefano Lassi
Hi 

I got following problem: duing a controller (LSI MegaRAID 9261-8i) outage I 
goot a Solaris Express 11 zpool corrupted. 

It is a whole 1,3 TB rpool zpool, RAID5 made by controller. 

Changing damaged controller the new one reports it be OPTIMAL. 

Important, zpool has got dedup enabled. 




If I try to: 

- boot it normally 
- rollback some ZILs with  (launched by SXCD in rescue mode) 
- starting re-install from SXCD 

I got everytime system crash and instantelly reboot. 



Then I tried to check/manipulate it from OpenIndiana v148, and v151b, CDs in 
rescue mode. 

Trying , it reported zpool is present, but it has got a 
newer formating version. 

Launching zdb, it reporting me that label and other zpool information seem OK. 


But Solaris Express 11 has got zpool version 31, OpenIndiana (version 151 beta 
too) reachs only version 28. 



If standard SX11 is crashing seeing this zpool, and OI can't manipulate this 
newer zpool version, how can try to fix it via  (o via 
whatever other tools)? 

Exist a newer (patched) version of SXCD different from standard one 
downloadable from Oracle web site? 

Exist any "indipendent" Solaris distribution implementing zpool v31? 

Otherwise, have you got some other workaround to fix it? 




Thank you very much 

Stefano
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SE11 express Encryption on - > errors in the pool after Scrub

2011-06-08 Thread Christian Rapp

Ok,

I tested it. It made two Scrubs with open encrypted folders. No issues  
anymore. Thanks for the hint. Hope that will be fixed for all soon.


Cheers


Am 06.06.2011, 11:54 Uhr, schrieb Darren J Moffat  
:



On 06/04/11 13:52, Thomas Hobbes wrote:

I am testing Solaris Express 11 with napp-it on two machines. In both
cases the same problem: Enabling encryption on a folder, filling it with
data will result in errors indicated by a subsequent scrub. I did not
find the topic on the web, but also not experiences shared by people
using encryption on SE11 express. Advice would be highly appreciated.


If you are doing the scrub when the encryption keys are not present it  
is possible you are hitting a known (and very recently fixed in the  
Solaris 11 development gates) bug.


If you have an operating systems support contract with Oracle you should  
be able to log a support ticket and request a backport of the fix for CR  
6989185.





--
Erstellt mit Operas revolutionärem E-Mail-Modul: http://www.opera.com/mail/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC and poor read performance

2011-06-08 Thread Marty Scholes
> > Are some of the reads sequential?  Sequential reads
> don't go to L2ARC.
> 
> That'll be it. I assume the L2ARC is just taking
> metadata. In situations 
> such as mine, I would quite like the option of
> routing sequential read 
> data to the L2ARC also.

The good news is that it is almost a certaintly that actual iSCSI usage will be 
of a (more) random nature than your tests, suggesting higher L2ARC usage in 
real world application.

I'm not sure how zfs makes the distinction between a random and sequential 
read, but the more you think about it, not caching sequential requests makes 
sense.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC and poor read performance

2011-06-08 Thread Phil Harman

On 08/06/2011 14:35, Marty Scholes wrote:

Are some of the reads sequential?  Sequential reads

don't go to L2ARC.

That'll be it. I assume the L2ARC is just taking
metadata. In situations
such as mine, I would quite like the option of
routing sequential read
data to the L2ARC also.

The good news is that it is almost a certaintly that actual iSCSI usage will be 
of a (more) random nature than your tests, suggesting higher L2ARC usage in 
real world application.

I'm not sure how zfs makes the distinction between a random and sequential 
read, but the more you think about it, not caching sequential requests makes 
sense.

Yes, in most cases, but I can think of some counter examples ;)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Wired write performance problem

2011-06-08 Thread Ding Honghui

On 06/08/2011 09:15 PM, Donald Stahl wrote:

"metaslab_min_alloc_size" is not in use when block allocator isDynamic block
allocator[1].
So it is not tunable parameter in my case.

May I ask where it says this is not a tunable in that case? I've read
through the code and I don't see what you are talking about.

The problem you are describing- including the "long time in function
metaslab_block_picker" exactly matches the block picker trying to find
a large enough block and failing.

What value do you get when you run:
echo "metaslab_min_alloc_size/K" | mdb -kw
?

You can always try setting it via:
echo "metaslab_min_alloc_size/Z 1000" | mdb -kw

and if that doesn't work set it right back.

I'm not familiar with the specifics of Solaris 10u8 so perhaps this is
not a tunable in that version but if it is- I would suggest you try
changing it. If your performance is as bad as you say then it can't
hurt to try it.

-Don


Thanks very much, Don.

In Solaris 10u8:
root@nas-hz-01:~# uname -a
SunOS nas-hz-01 5.10 Generic_141445-09 i86pc i386 i86pc
root@nas-hz-01:~# echo "metaslab_min_alloc_size/K" | mdb -kw
mdb: failed to dereference symbol: unknown symbol name
root@nas-hz-01:~#

The pool version is 15 and zfs version is 4.

And this parameter is valid in my openindiana build 148, it's zpool 
version is 28 and zfs version is 5.

ops@oi:~$ echo "metaslab_min_alloc_size/Z 1000" | pfexec mdb -kw
metaslab_min_alloc_size:0x1000  =   0x1000
ops@oi:~$

I'm not sure which version introduce the parameter.

Should I run this openindiana? Any suggestions?

Regards,
Ding
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] RealSSD C300 -> Crucial CT064M4SSD2

2011-06-08 Thread Eugen Leitl

Anyone running a Crucial CT064M4SSD2? Any good, or should
I try getting a RealSSD C300, as long as these are still 
available?

-- 
Eugen* Leitl http://leitl.org";>leitl http://leitl.org
__
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Wired write performance problem

2011-06-08 Thread Donald Stahl
> In Solaris 10u8:
> root@nas-hz-01:~# uname -a
> SunOS nas-hz-01 5.10 Generic_141445-09 i86pc i386 i86pc
> root@nas-hz-01:~# echo "metaslab_min_alloc_size/K" | mdb -kw
> mdb: failed to dereference symbol: unknown symbol name
Fair enough. I don't have anything older than b147 at this point so I
wasn't sure if that was in there or not.

If you delete a bunch of data (perhaps old files you have laying
around) does your performance go back up- even if temporarily?

The problem we had matches your description word for word. All of a
sudden we had terrible write performance with a ton of time spent in
the metaslab allocator. Then we'd delete a big chunk of data (100 gigs
or so) and poof- performance would get better for a short while.

Several people suggested changing the allocation free percent from 30
to 4 but that change was already incorporated into the b147 box we
were testing. The only thing that made a difference (and I mean a
night and day difference) was the change above. That said- I have no
idea how that part of the code works in 10u8.

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Wired write performance problem

2011-06-08 Thread Bill Sommerfeld

On 06/08/11 01:05, Tomas Ögren wrote:

And if pool usage is>90%, then there's another problem (change of
finding free space algorithm).


Another (less satisfying) workaround is to increase the amount of free 
space in the pool, either by reducing usage or adding more storage. 
Observed behavior is that allocation is fast until usage crosses a 
threshhold, then performance hits a wall.


I have a small sample size (maybe 2-3 samples), but the threshhold point 
varies from pool to pool but tends to be consistent for a given pool.  I 
suspect some artifact of layout/fragmentation is at play.  I've seen 
things hit the wall at as low as 70% on one pool.


The original poster's pool is about 78% full.  If possible, try freeing 
stuff until usage goes back under 75% or 70% and see if your performance 
returns.




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC and poor read performance

2011-06-08 Thread Richard Elling
On Jun 7, 2011, at 9:12 AM, Phil Harman wrote:

> Ok here's the thing ...
> 
> A customer has some big tier 1 storage, and has presented 24 LUNs (from four 
> RAID6 groups) to an OI148 box which is acting as a kind of iSCSI/FC bridge 
> (using some of the cool features of ZFS along the way). The OI box currently 
> has 32GB configured for the ARC, and 4x 223GB SSDs for L2ARC. It has a dual 
> port QLogic HBA, and is currently configured to do round-robin MPXIO over two 
> 4Gbps links. The iSCSI traffic is over a dual 10Gbps card (rather like the 
> one Sun used to sell).

The ARC size is not big enough to hold the data for the L2ARC headers for the 
size
of the L2ARC.

> 
> I've just built a fresh pool, and have created 20x 100GB zvols which are 
> mapped to iSCSI clients. I have initialised the first 20GB of each zvol with 
> random data. I've had a lot of success with write performance (e.g. in 
> earlier tests I had 20 parallel streams writing 100GB each at over 600MB/sec 
> aggregate), but read performance is very poor.
> 
> Right now I'm just playing with 20 parallel streams of reads from the first 
> 2GB of each zvol (i.e. 40GB in all). During each run, I see lots of writes to 
> the L2ARC, but less than a quarter the volume of reads. Yet my FC LUNS are 
> hot with 1000s of reads per second. This doesn't change from run to run. Why?

Writes to the L2ARC devices are throttled to 8 or 16 MB/sec. If the L2ARC fill 
cannot keep up,
the data is unceremoniously evicted.

> Surely 20x 2GB of data (and it's associated metadata) will sit nicely in 4x 
> 223GB SSDs?

On Jun 7, 2011, at 12:34 PM, Marty Scholes wrote:

> I'll throw out some (possibly bad) ideas.
> 
> Is ARC satisfying the caching needs?  32 GB for ARC should almost cover the 
> 40GB of total reads, suggesting that the L2ARC doesn't add any value for this 
> test.
> 
> Are the SSD devices saturated from an I/O standpoint?  Put another way, can 
> ZFS put data to them fast enough?  If they aren't taking writes fast enough, 
> then maybe they can't effectively load for caching.  Certainly if they are 
> saturated for writes they can't do much for reads.
> 
> Are some of the reads sequential?  Sequential reads don't go to L2ARC.

This is not a true statement. If the primarycache policy is set to the default, 
all data will
be cached in the ARC.

> 
> What does iostat say for the SSD units?  What does arc_summary.pl (maybe 
> spelled differently) say about the ARC / L2ARC usage?  How much of the SSD 
> units are in use as reported in zpool iostat -v?

The ARC statistics are nicely documented in arc.c and available as kstats.
 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Wired write performance problem

2011-06-08 Thread Donald Stahl
> Another (less satisfying) workaround is to increase the amount of free space
> in the pool, either by reducing usage or adding more storage. Observed
> behavior is that allocation is fast until usage crosses a threshhold, then
> performance hits a wall.
We actually tried this solution. We were at 70% usage and performance
hit a wall. We figured it was because of the change of fit algorithm
so we added 16 2TB disks in mirrors. (Added 16TB to an 18TB pool). It
made almost no difference in our pool performance. It wasn't until we
told the metaslab allocator to stop looking for such large chunks that
the problem went away.

> The original poster's pool is about 78% full.  If possible, try freeing
> stuff until usage goes back under 75% or 70% and see if your performance
> returns.
Freeing stuff did fix the problem for us (temporarily) but only in an
indirect way. When we freed up a bunch of space, the metaslab
allocator was able to find large enough blocks to write to without
searching all over the place. This would fix the performance problem
until those large free blocks got used up. Then- even though we were
below the usage problem threshold from earlier- we would still have
the performance problem.

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC and poor read performance

2011-06-08 Thread Marty Scholes
> This is not a true statement. If the primarycache
> policy is set to the default, all data will
> be cached in the ARC.

Richard, you know this stuff so well that I am hesitant to disagree with you.  
At the same time, I have seen this myself, trying to load video files into 
L2ARC without success.

> The ARC statistics are nicely documented in arc.c and
> available as kstats.

And I looked in the source.  My C is a little rusty, yet it appears that 
prefetch items are not stored in L2ARC by default.  Prefetches will satisfy a 
good portion of sequential reads but won't go to L2ARC.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RealSSD C300 -> Crucial CT064M4SSD2

2011-06-08 Thread Tomas Ögren
On 08 June, 2011 - Eugen Leitl sent me these 0,5K bytes:

> 
> Anyone running a Crucial CT064M4SSD2? Any good, or should
> I try getting a RealSSD C300, as long as these are still 
> available?

Haven't tried any of those, but how about one of these:

OCZ Vertex3 (Sandforce SF-2281, sataIII, MLC, to be used for l2arc):
shazoo:~# gdd if=/dev/rdsk/c0t5E83A97F98CEFE5Dd0s0 of=/dev/null bs=1024k 
count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 2.21005 s, 486 MB/s

OCZ Vertex2 EX (Sandforce SF-1500, sataII, SLC and supercap, to be used for zil)
shazoo:~# gdd if=/dev/rdsk/c0t5E83A97F1471E0A4d0s0 of=/dev/null bs=1024k 
count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 3.93114 s, 273 MB/s

This is in a x4170m2 with Solaris10.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RealSSD C300 -> Crucial CT064M4SSD2

2011-06-08 Thread Ruschmann, Chris J (DOL)
I am running 4 of the 128GB version in our DR environment as L2ARC. I don't 
have anything bad to say about them. They run quite well.

-Original Message-
From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Tomas Ögren
Sent: Wednesday, June 08, 2011 12:30 PM
To: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] RealSSD C300 -> Crucial CT064M4SSD2

On 08 June, 2011 - Eugen Leitl sent me these 0,5K bytes:

> 
> Anyone running a Crucial CT064M4SSD2? Any good, or should I try 
> getting a RealSSD C300, as long as these are still available?

Haven't tried any of those, but how about one of these:

OCZ Vertex3 (Sandforce SF-2281, sataIII, MLC, to be used for l2arc):
shazoo:~# gdd if=/dev/rdsk/c0t5E83A97F98CEFE5Dd0s0 of=/dev/null bs=1024k 
count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 2.21005 s, 486 MB/s

OCZ Vertex2 EX (Sandforce SF-1500, sataII, SLC and supercap, to be used for 
zil) shazoo:~# gdd if=/dev/rdsk/c0t5E83A97F1471E0A4d0s0 of=/dev/null bs=1024k 
count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 3.93114 s, 273 MB/s

This is in a x4170m2 with Solaris10.

/Tomas
--
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC and poor read performance

2011-06-08 Thread Daniel Carosone
On Wed, Jun 08, 2011 at 11:44:16AM -0700, Marty Scholes wrote:
> And I looked in the source.  My C is a little rusty, yet it appears
> that prefetch items are not stored in L2ARC by default.  Prefetches
> will satisfy a good portion of sequential reads but won't go to
> L2ARC.  

Won't go to L2ARC while they're still speculative reads, maybe.
Once they're actually used by the app to satisfy a good portion of the
actual reads, they'll have hits stats and will.

I suspect the problem is the threshold for l2arc writes.  Sequential
reads can be much faster than this rate, meaning it can take a lot of
effort/time to fill.

You could test by doing slow sequential reads, and see if the l2arc
fills any more for the same reads spread over a longer time.

--
Dan.

pgp0CnUan5EkQ.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Wired write performance problem

2011-06-08 Thread Ding Honghui


On 06/09/2011 12:23 AM, Donald Stahl wrote:

Another (less satisfying) workaround is to increase the amount of free space
in the pool, either by reducing usage or adding more storage. Observed
behavior is that allocation is fast until usage crosses a threshhold, then
performance hits a wall.

We actually tried this solution. We were at 70% usage and performance
hit a wall. We figured it was because of the change of fit algorithm
so we added 16 2TB disks in mirrors. (Added 16TB to an 18TB pool). It
made almost no difference in our pool performance. It wasn't until we
told the metaslab allocator to stop looking for such large chunks that
the problem went away.


The original poster's pool is about 78% full.  If possible, try freeing
stuff until usage goes back under 75% or 70% and see if your performance
returns.

Freeing stuff did fix the problem for us (temporarily) but only in an
indirect way. When we freed up a bunch of space, the metaslab
allocator was able to find large enough blocks to write to without
searching all over the place. This would fix the performance problem
until those large free blocks got used up. Then- even though we were
below the usage problem threshold from earlier- we would still have
the performance problem.

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Don,

From your words, my symptom is almost same with yours.

We have examine the metaslab layout, when metaslab_df_free_pct is 35, 
there are 65 free metaslab(64G),
The write performance is very low and the rough test shows no new free 
metaslab will be loaded and activated.


Then we tune the metaslab_df_free_pct to 4, the performance keep good 
for 1 week and the free metaslab reduce to 51.
But now, the write bandwidth is poor again ( maybe I'd better trace the 
free space of each metaslab? )


Maybe there are some problem in metaslab rating score(weight) for select 
the metaslab and block allocator algorithm?


There is snapshot of metaslab layout, the last 51 metaslabs have 64G 
free space.


vdev offsetspacemap  free
--   ---   ---   -

... snip

vdev 3   offset  270   spacemap440   free21.0G
vdev 3   offset  280   spacemap 31   free7.36G
vdev 3   offset  290   spacemap 32   free2.44G
vdev 3   offset  2a0   spacemap 33   free2.91G
vdev 3   offset  2b0   spacemap 34   free3.25G
vdev 3   offset  2c0   spacemap 35   free3.03G
vdev 3   offset  2d0   spacemap 36   free3.20G
vdev 3   offset  2e0   spacemap 90   free3.28G
vdev 3   offset  2f0   spacemap 91   free2.46G
vdev 3   offset  300   spacemap 92   free2.98G
vdev 3   offset  310   spacemap 93   free2.19G
vdev 3   offset  320   spacemap 94   free2.42G
vdev 3   offset  330   spacemap 95   free2.83G
vdev 3   offset  340   spacemap252   free41.6G
vdev 3   offset  350   spacemap  0   free  64G
vdev 3   offset  360   spacemap  0   free  64G
vdev 3   offset  370   spacemap  0   free  64G
vdev 3   offset  380   spacemap  0   free  64G
vdev 3   offset  390   spacemap  0   free  64G
vdev 3   offset  3a0   spacemap  0   free  64G
vdev 3   offset  3b0   spacemap  0   free  64G
vdev 3   offset  3c0   spacemap  0   free  64G
vdev 3   offset  3d0   spacemap  0   free  64G
vdev 3   offset  3e0   spacemap  0   free  64G
...snip
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Wired write performance problem

2011-06-08 Thread Ding Honghui

On 06/09/2011 10:14 AM, Ding Honghui wrote:


On 06/09/2011 12:23 AM, Donald Stahl wrote:
Another (less satisfying) workaround is to increase the amount of 
free space

in the pool, either by reducing usage or adding more storage. Observed
behavior is that allocation is fast until usage crosses a 
threshhold, then

performance hits a wall.

We actually tried this solution. We were at 70% usage and performance
hit a wall. We figured it was because of the change of fit algorithm
so we added 16 2TB disks in mirrors. (Added 16TB to an 18TB pool). It
made almost no difference in our pool performance. It wasn't until we
told the metaslab allocator to stop looking for such large chunks that
the problem went away.


The original poster's pool is about 78% full.  If possible, try freeing
stuff until usage goes back under 75% or 70% and see if your 
performance

returns.

Freeing stuff did fix the problem for us (temporarily) but only in an
indirect way. When we freed up a bunch of space, the metaslab
allocator was able to find large enough blocks to write to without
searching all over the place. This would fix the performance problem
until those large free blocks got used up. Then- even though we were
below the usage problem threshold from earlier- we would still have
the performance problem.

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Don,

From your words, my symptom is almost same with yours.

We have examine the metaslab layout, when metaslab_df_free_pct is 35, 
there are 65 free metaslab(64G),
The write performance is very low and the rough test shows no new free 
metaslab will be loaded and activated.


Then we tune the metaslab_df_free_pct to 4, the performance keep good 
for 1 week and the free metaslab reduce to 51.
But now, the write bandwidth is poor again ( maybe I'd better trace 
the free space of each metaslab? )


Maybe there are some problem in metaslab rating score(weight) for 
select the metaslab and block allocator algorithm?


There is snapshot of metaslab layout, the last 51 metaslabs have 64G 
free space.


vdev offsetspacemap  free
--   ---   ---   
-


... snip

vdev 3   offset  270   spacemap440   free
21.0G
vdev 3   offset  280   spacemap 31   free
7.36G
vdev 3   offset  290   spacemap 32   free
2.44G
vdev 3   offset  2a0   spacemap 33   free
2.91G
vdev 3   offset  2b0   spacemap 34   free
3.25G
vdev 3   offset  2c0   spacemap 35   free
3.03G
vdev 3   offset  2d0   spacemap 36   free
3.20G
vdev 3   offset  2e0   spacemap 90   free
3.28G
vdev 3   offset  2f0   spacemap 91   free
2.46G
vdev 3   offset  300   spacemap 92   free
2.98G
vdev 3   offset  310   spacemap 93   free
2.19G
vdev 3   offset  320   spacemap 94   free
2.42G
vdev 3   offset  330   spacemap 95   free
2.83G
vdev 3   offset  340   spacemap252   free
41.6G
vdev 3   offset  350   spacemap  0   free  
64G
vdev 3   offset  360   spacemap  0   free  
64G
vdev 3   offset  370   spacemap  0   free  
64G
vdev 3   offset  380   spacemap  0   free  
64G
vdev 3   offset  390   spacemap  0   free  
64G
vdev 3   offset  3a0   spacemap  0   free  
64G
vdev 3   offset  3b0   spacemap  0   free  
64G
vdev 3   offset  3c0   spacemap  0   free  
64G
vdev 3   offset  3d0   spacemap  0   free  
64G
vdev 3   offset  3e0   spacemap  0   free  
64G

...snip


I free up some disk space(about 300GB), the performance is back again.
I'm sure the performance will degrade again soon.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Wired write performance problem

2011-06-08 Thread Donald Stahl
> There is snapshot of metaslab layout, the last 51 metaslabs have 64G free
> space.
After we added all the disks to our system we had lots of free
metaslabs- but that didn't seem to matter. I don't know if perhaps the
system was attempting to balance the writes across more of our devices
but whatever the reason- the percentage didn't seem to matter. All
that mattered was changing the size of the min_alloc tunable.

You seem to have gotten a lot deeper into some of this analysis than I
did so I'm not sure if I can really add anything. Since 10u8 doesn't
support that tunable I'm not really sure where to go from there.

If you can take the pool offline, you might try connecting it to a
b148 box and see if that tunable makes a difference. Beyond that I
don't really have any suggestions.

Your problem description, including the return of performance when
freeing space is _identical_ to the problem we had. After checking
every single piece of hardware, replacing countless pieces, removing
COMSTAR and other pieces from the puzzle- the only change that helped
was changing that tunable.

I wish I could be of more help but I have not had the time to dive
into the ZFS code with any gusto.

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss