>
> What's the earliest build someone has seen this
> problem? i.e. if we binary chop, has anyone seen it
> in
> b118?
>
We have used every "stable" build from b118 up, as b118 was the first reliable
one that could be used is a CIFS-heavy environment. The problem occurs on all
of them.
- Adam
> Can folks confirm/deny each of these?
>
> o The problems are not seen with Sun's version of
> this card
On the Thumper x4540 (which uses 6 of the same LSI 1068E controller chips), we
do not see this problem. Then again, it uses a one-to-one mapping of controller
PHY ports to internal disks;
>
> I thought you had just set
>
> set xpv_psm:xen_support_msi = -1
>
> which is different, because that sets the
> xen_support_msi variable
> which lives inside the xpv_psm module.
>
> Setting mptsas:* will have no effect on your system
> if you do not
> have an mptsas card installed. The mpts
> Hi Adam,
> thanks for this info. I've talked with my colleagues
> in Beijing (since
> I'm in Beijing this week) and we'd like you to try
> disabling MSI/MSI-X
> for your mpt instances. In /etc/system, add
>
> set mpt:mpt_enable_msi = 0
>
> then regen your boot archive and reboot.
>
I had alre
So, while we are working on resolving this issue with Sun, let me approach this
from the another perspective: what kind of controller/drive ratio would be the
minimum recommended to support a functional OpenSolaris-based archival
solution? Given the following:
- the vast majority of IO to the s
The controller connects to two disk shelves (expanders), one per port on the
card. If you look back in the thread, you'll see our zpool config has one vdev
per shelf. All of the disks are Western Digital (model WD1002FBYS-18A6B0) 1TB
7.2K, firmware rev. 03.00C06. Without actually matching up the
The iostat I posted previously was from a system we had already tuned the
zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10 in
actv per disk).
I reset this value in /etc/system to 7, rebooted, and started a scrub. iostat
output showed busier disks (%b is higher, which
Here is example of the pool config we use:
# zpool status
pool: pool002
state: ONLINE
scrub: scrub stopped after 0h1m with 0 errors on Fri Oct 23 23:07:52 2009
config:
NAME STATE READ WRITE CKSUM
pool002 ONLINE 0 0 0
raidz2 ONLINE
And therein lies the issue. The excessive load that causes the IO issues is
almost always generated locally from a scrub or a local recursive "ls" used to
warm up the SSD-based zpool cache with metadata. The regular network IO to the
box is minimal and is very read-centric; once we load the box
LSI's sales literature on that card specs "128 devices" which I take with a few
hearty grains of salt. I agree that with all 46 drives pumping out streamed
data, the controller would be overworked BUT the drives will only deliver data
as fast as the OS tells them to. Just because the speedometer
I don't think there was any intention on Sun's part to ignore the
problem...obviously their target market wants a performance-oriented box and
the x4540 delivers that. Each 1068E controller chip supports 8 SAS PHY channels
= 1 channel per drive = no contention for channels. The x4540 is a monste
Just submitted the bug yesterday, under advice of James, so I don't have a
number you can refer to you...the "change request" number is 6894775 if that
helps or is directly related to the future bugid.
>From what I seen/read this problem has been around for awhile but only rears
>its ugly head
Our config is:
OpenSolaris snv_118 x64
1 x LSISAS3801E controller
2 x 23-disk JBOD (fully populated, 1TB 7.2k SATA drives)
Each of the two external ports on the LSI connects to a 23-disk JBOD. ZFS-wise
we use 1 zpool with 2 x 22-disk raidz2 vdevs (1 vdev per JBOD). Each zpool has
one ZFS filesyst
I've filed the bug, but was unable to include the "prtconf -v" output as the
comments field only accepted 15000 chars total. Let me know if there is
anything else I can provide/do to help figure this problem out as it is
essentially preventing us from doing any kind of heavy IO to these pools,
James: We are running Phase 16 on our LSISAS3801E's, and have also tried the
recently released Phase 17 but it didn't help. All firmware NVRAM settings are
default. Basically, when we put the disks behind this controller under load
(e.g. scrubbing, recursive ls on large ZFS filesystem) we get th
Cindy: How can I view the bug report you referenced? Standard methods show my
the bug number is valid (6694909) but no content or notes. We are having
similar messages appear with snv_118 with a busy LSI controller, especially
during scrubbing, and I'd be interested to see what they mentioned in
16 matches
Mail list logo