> The latter, we run these VMs over NFS anyway and had
> ESXi boxes under test already. we were already
> separating "data" exports from "VM" exports. We use
> an in-house developed configuration management/bare
> metal system which allows us to install new machines
> pretty easily. In this case we
>How did your migration to ESXi go? Are you using it on the same hardware or
>did you just switch that server to an NFS server and run the VMs on another
>box?
The latter, we run these VMs over NFS anyway and had ESXi boxes under test
already. we were already separating "data" exports from "VM"
Travis Tabbal wrote:
I'm running nv126 XvM right now. I haven't tried
it
without XvM.
Without XvM we do not see these issues. We're running
the VMs through NFS now (using ESXi)...
Interesting. It sounds like it might be an XvM specific bug. I'm glad I mentioned that in my bug report to Sun.
> > I'm running nv126 XvM right now. I haven't tried
> it
> > without XvM.
>
> Without XvM we do not see these issues. We're running
> the VMs through NFS now (using ESXi)...
Interesting. It sounds like it might be an XvM specific bug. I'm glad I
mentioned that in my bug report to Sun. Hopefully
> I'm running nv126 XvM right now. I haven't tried it
> without XvM.
Without XvM we do not see these issues. We're running the VMs through NFS now
(using ESXi)...
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@open
We see the same issue on a x4540 Thor system with 500G disks:
lots of:
...
Nov 3 16:41:46 uva.nl scsi: [ID 107833 kern.warning] WARNING:
/p...@3c,0/pci10de,3...@f/pci1000,1...@0 (mpt5):
Nov 3 16:41:46 encore.science.uva.nl Disconnected command timeout for Target
7
...
This system is run
I am also running 2 of the Supermicro cards. I just upgraded to b126 and it
seems improved. I am running a large file copy locally. I get these warnings in
the dmesg log. When I do, I/O seems to stall for about 60sec. It comes back up
fine, but it's very annoying. Any hints? I have 4 disks per c
I'm having similar issues, with two AOC-USAS-L8i Supermicro 1068e
cards mpt2 and mpt3, running 1.26.00.00IT
It seems to only affect a specific revision of disk. (???)
sd67 Soft Errors: 0 Hard Errors: 127 Transport Errors: 3416
Vendor: ATA Product: WDC WD10EACS-00D Revision: 1A01 Seria
So, while we are working on resolving this issue with Sun, let me approach this
from the another perspective: what kind of controller/drive ratio would be the
minimum recommended to support a functional OpenSolaris-based archival
solution? Given the following:
- the vast majority of IO to the s
The controller connects to two disk shelves (expanders), one per port on the
card. If you look back in the thread, you'll see our zpool config has one vdev
per shelf. All of the disks are Western Digital (model WD1002FBYS-18A6B0) 1TB
7.2K, firmware rev. 03.00C06. Without actually matching up the
On Sat, Oct 24, 2009 at 12:30 PM, Carson Gaspar wrote:
>
> I saw this with my WD 500GB SATA disks (HDS725050KLA360) and LSI firmware
> 1.28.02.00 in IT mode, but I (almost?) always had exactly 1 "stuck" I/O.
> Note that my disks were one per channel, no expanders. I have _not_ seen it
> since rep
On 10/24/09 9:43 AM, Richard Elling wrote:
OK, here we see 4 I/Os pending outside of the host. The host has
sent them on and is waiting for them to return. This means they are
getting dropped either at the disk or somewhere between the disk
and the controller.
When this happens, the sd driver w
more below...
On Oct 24, 2009, at 2:49 AM, Adam Cheal wrote:
The iostat I posted previously was from a system we had already
tuned the zfs:zfs_vdev_max_pending depth down to 10 (as visible by
the max of about 10 in actv per disk).
I reset this value in /etc/system to 7, rebooted, and start
On Sat, Oct 24, 2009 at 11:20 AM, Tim Cook wrote:
>
>
> On Sat, Oct 24, 2009 at 4:49 AM, Adam Cheal wrote:
>
>> The iostat I posted previously was from a system we had already tuned the
>> zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10
>> in actv per disk).
>>
>> I
On Sat, Oct 24, 2009 at 4:49 AM, Adam Cheal wrote:
> The iostat I posted previously was from a system we had already tuned the
> zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10
> in actv per disk).
>
> I reset this value in /etc/system to 7, rebooted, and started a sc
overo
Lähettäjä: zfs-discuss-boun...@opensolaris.org
[zfs-discuss-boun...@opensolaris.org] käyttäjän Adam Cheal
[ach...@pnimedia.com] puolesta
Lähetetty: 24. lokakuuta 2009 12:49
Vastaanottaja: zfs-discuss@opensolaris.org
Aihe: Re: [zfs-discuss] SNV_125 MPT warni
The iostat I posted previously was from a system we had already tuned the
zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10 in
actv per disk).
I reset this value in /etc/system to 7, rebooted, and started a scrub. iostat
output showed busier disks (%b is higher, which
Elling
[richard.ell...@gmail.com] puolesta
Lähetetty: 24. lokakuuta 2009 7:36
Vastaanottaja: Adam Cheal
Kopio: zfs-discuss@opensolaris.org
Aihe: Re: [zfs-discuss] SNV_125 MPT warning in logfile
ok, see below...
On Oct 23, 2009, at 8:14 PM, Adam Cheal wrote:
> Here is example of the pool config
ok, see below...
On Oct 23, 2009, at 8:14 PM, Adam Cheal wrote:
Here is example of the pool config we use:
# zpool status
pool: pool002
state: ONLINE
scrub: scrub stopped after 0h1m with 0 errors on Fri Oct 23 23:07:52
2009
config:
NAME STATE READ WRITE CKSUM
poo
Here is example of the pool config we use:
# zpool status
pool: pool002
state: ONLINE
scrub: scrub stopped after 0h1m with 0 errors on Fri Oct 23 23:07:52 2009
config:
NAME STATE READ WRITE CKSUM
pool002 ONLINE 0 0 0
raidz2 ONLINE
On Oct 23, 2009, at 5:32 PM, Tim Cook wrote:
On Fri, Oct 23, 2009 at 7:17 PM, Richard Elling > wrote:
Tim has a valid point. By default, ZFS will queue 35 commands per
disk.
For 46 disks that is 1,610 concurrent I/Os. Historically, it has
proven to be
relatively easy to crater performance o
And therein lies the issue. The excessive load that causes the IO issues is
almost always generated locally from a scrub or a local recursive "ls" used to
warm up the SSD-based zpool cache with metadata. The regular network IO to the
box is minimal and is very read-centric; once we load the box
On Fri, Oct 23, 2009 at 7:17 PM, Richard Elling wrote:
>
> Tim has a valid point. By default, ZFS will queue 35 commands per disk.
> For 46 disks that is 1,610 concurrent I/Os. Historically, it has proven to
> be
> relatively easy to crater performance or cause problems with very, very,
> very ex
On Fri, Oct 23, 2009 at 7:17 PM, Adam Cheal wrote:
> LSI's sales literature on that card specs "128 devices" which I take with a
> few hearty grains of salt. I agree that with all 46 drives pumping out
> streamed data, the controller would be overworked BUT the drives will only
> deliver data as
On Oct 23, 2009, at 4:46 PM, Tim Cook wrote:
On Fri, Oct 23, 2009 at 6:32 PM, Adam Cheal
wrote:
I don't think there was any intention on Sun's part to ignore the
problem...obviously their target market wants a performance-oriented
box and the x4540 delivers that. Each 1068E controller chip
LSI's sales literature on that card specs "128 devices" which I take with a few
hearty grains of salt. I agree that with all 46 drives pumping out streamed
data, the controller would be overworked BUT the drives will only deliver data
as fast as the OS tells them to. Just because the speedometer
On Fri, Oct 23, 2009 at 6:32 PM, Adam Cheal wrote:
> I don't think there was any intention on Sun's part to ignore the
> problem...obviously their target market wants a performance-oriented box and
> the x4540 delivers that. Each 1068E controller chip supports 8 SAS PHY
> channels = 1 channel per
I don't think there was any intention on Sun's part to ignore the
problem...obviously their target market wants a performance-oriented box and
the x4540 delivers that. Each 1068E controller chip supports 8 SAS PHY channels
= 1 channel per drive = no contention for channels. The x4540 is a monste
On Fri, Oct 23, 2009 at 3:48 PM, Bruno Sousa wrote:
> Could Sun'x x4540 Thumper reason to have 6 LSI's some sort of "hidden"
> problems found by Sun where the HBA resets, and due to market time pressure
> the "quick and dirty" solution was to spread the load over multiple HBA's
> instead of softw
On Oct 23, 2009, at 1:48 PM, Bruno Sousa wrote:
Could Sun'x x4540 Thumper reason to have 6 LSI's some sort of
"hidden" problems found by Sun where the HBA resets, and due to
market time pressure the "quick and dirty" solution was to spread
the load over multiple HBA's instead of software fix
Hi Cindy,
Thank you for the update, mas it seems like i can't see any information
specific to that bug.
I can only see bugs number 6702538 and 6615564, but according to their
history, they have been fixed quite some time ago.
Can you by any chance present the information about bug 6694909 ?
T
Could Sun'x x4540 Thumper reason to have 6 LSI's some sort of "hidden"
problems found by Sun where the HBA resets, and due to market time
pressure the "quick and dirty" solution was to spread the load over
multiple HBA's instead of software fix?
Just my 2 cents..
Bruno
Adam Cheal wrote:
J
Adam Cheal wrote:
Just submitted the bug yesterday, under advice of James, so I don't have a number you can
refer to you...the "change request" number is 6894775 if that helps or is
directly related to the future bugid.
From what I seen/read this problem has been around for awhile but only re
Just submitted the bug yesterday, under advice of James, so I don't have a
number you can refer to you...the "change request" number is 6894775 if that
helps or is directly related to the future bugid.
>From what I seen/read this problem has been around for awhile but only rears
>its ugly head
Sorry, running snv_123, indiana
On Fri, Oct 23, 2009 at 11:16 AM, Jeremy f wrote:
> What bug# is this under? I'm having what I believe is the same problem. Is
> it possible to just take the mpt driver from a prior build in the time
> being?
> The below is from the load the zpool scrub creates. T
What bug# is this under? I'm having what I believe is the same problem. Is
it possible to just take the mpt driver from a prior build in the time
being?
The below is from the load the zpool scrub creates. This is on a dell t7400
workstation with a 1068E oemed lsi. I updated the firmware to the newe
Our config is:
OpenSolaris snv_118 x64
1 x LSISAS3801E controller
2 x 23-disk JBOD (fully populated, 1TB 7.2k SATA drives)
Each of the two external ports on the LSI connects to a 23-disk JBOD. ZFS-wise
we use 1 zpool with 2 x 22-disk raidz2 vdevs (1 vdev per JBOD). Each zpool has
one ZFS filesyst
Hi Adam,
How many disks and zpoo/zfs's do you have behind that LSI?
I have a system with 22 disks and 4 zpools with around 30 zfs's and so
far it works like a charm, even during heavy load. The opensolaris
release is snv_101b .
Bruno
Adam Cheal wrote:
Cindy: How can I view the bug report you
Hi Cindy,
I have a couple of questions about this issue :
1. i have exactly the same LSI controller in another server running
opensolaris snv_101b, and so far no errors like this ones where
seen in the system
2. up to snv_118 i haven't seen any problems, only now within snv_125
3
On 10/22/09 4:07 PM, James C. McPherson wrote:
Adam Cheal wrote:
It seems to be timing out accessing a disk, retrying, giving up and then
doing a bus reset?
...
ugh. New bug time - bugs.opensolaris.org, please select
Solaris / kernel / driver-mpt. In addition to the error
messages and descript
I've filed the bug, but was unable to include the "prtconf -v" output as the
comments field only accepted 15000 chars total. Let me know if there is
anything else I can provide/do to help figure this problem out as it is
essentially preventing us from doing any kind of heavy IO to these pools,
Adam Cheal wrote:
James: We are running Phase 16 on our LSISAS3801E's, and have also tried
the recently released Phase 17 but it didn't help. All firmware NVRAM
settings are default. Basically, when we put the disks behind this
controller under load (e.g. scrubbing, recursive ls on large ZFS
file
James: We are running Phase 16 on our LSISAS3801E's, and have also tried the
recently released Phase 17 but it didn't help. All firmware NVRAM settings are
default. Basically, when we put the disks behind this controller under load
(e.g. scrubbing, recursive ls on large ZFS filesystem) we get th
Adam Cheal wrote:
Cindy: How can I view the bug report you referenced? Standard methods
show my the bug number is valid (6694909) but no content or notes. We are
having similar messages appear with snv_118 with a busy LSI controller,
especially during scrubbing, and I'd be interested to see what
Cindy: How can I view the bug report you referenced? Standard methods show my
the bug number is valid (6694909) but no content or notes. We are having
similar messages appear with snv_118 with a busy LSI controller, especially
during scrubbing, and I'd be interested to see what they mentioned in
Hi Bruno,
I see some bugs associated with these messages (6694909) that point to
an LSI firmware upgrade that cause these harmless errors to display.
According to the 6694909 comments, this issue is documented in the
release notes.
As they are harmless, I wouldn't worry about them.
Maybe someo
Hi all,
Recently i upgrade from snv_118 to snv_125, and suddently i started to
see this messages at /var/adm/messages :
Oct 22 12:54:37 SAN02 scsi: [ID 243001 kern.warning] WARNING:
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:54:37 SAN02 mpt_handle_event: IOCStatus=0x8000,
47 matches
Mail list logo