The system I'm using is not that "beefy". It's a 4-core Phenom II using
a server grade hard drive as system drive and 8 consumer grade drives
for the storage pool that are behind an LSI SAS 1068e controller. I have
4GB RAM in it.
I have experienced freeze-ups due to failing hard drives in the storage
pool in the past. When they happened, they affected the CIFS connection
(of course) but not the SSH connection. Moreover, I could see errors
with "iostat -En". I don't know if you have iostat in Linux but I'm
afraid you don't.
I experienced a series of shorter freeze-ups today (3-5 seconds long)
while monitoring the system using "System Monitor" through the
'vncserver' and 'top' over SSH. Those freeze-ups affected th CIFS
connection, SSH, and VNC connection (but did not sever them). The
freeze-ups were not long enough so that I could get to check the RDP
connection to the VM.
When those freeze-ups occurred, the system monitor gracefully showed
this as a dip in the real-time network history chart so these freeze-ups
don't seem to stagger the operation of the network monitor. The CPU
utilization was around 10-15% and the memory usage was around 13.5%
(540MB) all the time so I don't think capping the ARC would do much good.
I looked into the /var/adm/messages and found the
nwamd[99]: [ID 234669 daemon.error] 3: nwamd_door_switch: need
solaris.network.autoconf.read for request type 1
errors during the time. I'll look more carefully next time and see if
the time-stamps of these entries match the time at which I experience
those freeze-ups. I suspect that they do. No errors are found with
iostat -E. I'll also look into the iowait to see if it will give any
clues, I'm not sure though how to keep a "history" of iowait the way
system monitor keeps a history of cpu utilization, memory usage and
network activity.
I have also been suggested to try out the prestable version of OI and
see if theses freeze-ups occur when using static IP (i.e. not nwam).
Robin.
On 2012-01-24 06:39, Robbie Crash wrote:
I had problems that sound nearly identical to what you're describing when
running ZFS Native under Ubuntu, but without the VM aspect. They seemed to
happen when the server would begin to flush memory after large reads or
writes to the ZFS pool. How much RAM does your machine have? Have you
considered evil tuning your ARC cache for testing? SSH would disconnect
and fileshares would become unavailable.
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache
What is the rest of the system reporting? CPU? Memory in use? IO Wait? Are
you using consumer grade hard drives? These could be doing their lovely 2
minute read recovery thing and causing headaches with the pool access. Does
the host have any CIFS shares that you can attempt to access while the
guest is frozen?
I found that forcing ZFS to stay 2.5GB under max, rather than the
default(?) 1GB improved stability vastly.
I haven't had the same issues after moving to OI, but I've also quadrupled
the amount of RAM in my box. Sorry if any of this is horribly off the mark,
most of my ZFS/CIFS/SMB problems happened while running ZFS on Ubuntu, and
I'm pretty new to OI.
On Mon, Jan 23, 2012 at 16:17, Open Indiana<openindi...@out-side.nl> wrote:
What happens if you disable nwam and use the basic/manual ifconfig setup?
-----Original Message-----
From: Robin Axelsson [mailto:gu99r...@student.chalmers.se]
Sent: maandag 23 januari 2012 15:10
To: openindiana-discuss@openindiana.org
Subject: Re: [OpenIndiana-discuss] CIFS performance issues
No, I'm not doing anything in particular in the virtual machine. The media
file is played on another computer in the (physical) network over CIFS.
Over
the network I also access the server using Remote Desktop/Terminal Services
to communicate to the virtual machine (using the VirtualBox RDP interface,
i.e. not the guest OS RDP), VNC (to access OI using vncserver) and SSH (to
OI).
I wouldn't say that the entire server stops responding, only the connection
to CIFS and SSH. I wasn't running VNC when it happened yesterday so I don't
know about it, but the RDP connection and the Virtual Machine inside this
server was unaffected while CIFS and SSH was frozen.
I tried today to start the virtual machine but it failed because it could
not find the connection (e1000g2):
"Error: failed to start machine. Error message: Failed to open/create the
internal network 'HostInterfaceNetworking-e1000
g2 - Intel PRO/1000 Gigabit Ethernet' (VERR_SUPDRV_COMPONENT_NOT_FOUND).
Failed to attach the network LUN (VERR_SUPDRV_COMPONENT_NOT_FOUND).
Unknown error creating VM (VERR_SUPDRV_COMPONENT_NOT_FOUND)"
ifconfig -a returns:
...
e1000g1: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu
1500 index 2
inet 10.40.137.185 netmask ffffff00 broadcast 10.40.137.255
e1000g2: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu
1500 index 3
inet 10.40.137.196 netmask ffffff00 broadcast 10.40.137.255
rge0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500
index
4
inet 0.0.0.0 netmask ff000000
...
i.e. e1000g1 and e1000g2 appears to be running just fine, wtf !?! I found
the following entries in the /var/adm/messages:
Jan 23 13:50:49<computername> nwamd[95]: [ID 234669 daemon.error] 3:
nwamd_door_switch: need solaris.network.autoconf.read for request type 1
Jan
23 13:56:59<computername> last message repeated 75 times Jan 23 13:57:04
<computername> nwamd[95]: [ID 234669 daemon.error] 3:
nwamd_door_switch: need solaris.network.autoconf.read for request type 1
Jan
23 13:58:19<computername> last message repeated 15 times Jan 23 13:58:22
<computername> gnome-session[916]: [ID 702911 daemon.warning] WARNING:
Unable to determine session: Unable to lookup session information for
process '916'
Jan 23 13:58:24<computername> nwamd[95]: [ID 234669 daemon.error] 3:
nwamd_door_switch: need solaris.network.autoconf.read for request type 1
Jan
23 14:03:24<computername> last message repeated 60 times Jan 23 14:03:26
<computername> gnome-session[916]: [ID 702911 daemon.warning] WARNING:
Unable to determine session: Unable to lookup session information for
process '916'
Jan 23 14:03:29<computername> nwamd[95]: [ID 234669 daemon.error] 3:
nwamd_door_switch: need solaris.network.autoconf.read for request type 1
Jan
23 14:03:34<computername> last message repeated 1 time Jan 23 14:03:39
<computername> nwamd[95]: [ID 234669 daemon.error] 3:
nwamd_door_switch: need solaris.network.autoconf.read for request type 1
Some errors here... I looked into the log of the nwam service
(/var/svc/log/network-physical\:nwam.log):
[ Jan 23 13:03:15 Enabled. ]
[ Jan 23 13:03:16 Executing start method ("/lib/svc/method/net-nwam
start").
]
/lib/svc/method/net-nwam[548]: /sbin/ibd_upgrade: not found [No such file
or
directory] [ Jan 23 13:03:17 Method "start" exited with status 0. ] [ Jan
23
13:03:17 Rereading configuration. ] [ Jan 23 13:03:17 Executing refresh
method ("/lib/svc/method/net-nwam refresh"). ] [ Jan 23 13:03:17 Method
"refresh" exited with status 0. ]
nothing remarkable here... I investigated the issue on VBox forums and this
issue was resolved by the rem_drv/add_drv vboxflt commands. It's not the
first time I've had this issue and one of the people at the forums claims
that this issue occurs after every third powercycle/reboot. It was hinted
that VBox doesn't like dynamic IP addresses so I have also given e1000g2 a
fixed address in the router (I configured the DHCP server in the router to
always give the same IP to the MAC address of the e1000g2 connection). I've
done it on the e1000g1 already, otherwise it would be impossible to ssh to
the server from the "outside world".
Robin.
On 2012-01-23 11:40, Open Indiana wrote:
Ok,
So if I read it correct your virtual machine is playing an audio file
and then the server stops responding. That could mean the hardware
that virtualbox uses to play the soundfile if flooded or that the
drivers of the soundcard in your server/PC are not working very well?
What soundcard are you using?
-----Original Message-----
From: Robin Axelsson [mailto:gu99r...@student.chalmers.se]
Sent: zondag 22 januari 2012 23:38
To: openindiana-discuss@openindiana.org
Subject: Re: [OpenIndiana-discuss] CIFS performance issues
I don't understand what you mean with PCI-x settings and where to
check them out. The hardware is not PCI-X, it is PCIe. The affected
LSI HBA is a discrete PCIe card that operates in IT-mode. As in system
logs I assume you mean /var/adm/messages and I could not find anything
there.
If this was only a hard disk controller issue (I made sure that there
are enough lanes for it) then I wouldn't expect applications such as
SSH to be affected by it.
The settings of the Intel NIC card is not in the BIOS, at least not
what I can see (i.e. there is no visible BIOS of the discrete NIC like
it is for the LSI SAS controller during POST). So, I'm not entirely
sure what settings for the NIC you are referring to.
Robin.
On 2012-01-22 20:28, Open Indiana wrote:
A very stupid answer, but have you looked at the bios and inspected
the settings of the network devices and /or PCIx ? How is your bios
setup (AHCI or raid or ??) ?
Do you see any error in the system logs?
To my opinion your system swallows in the datatransfers. Either on the
NIC<->montherboard side or at the montherboard<-> harddiskcontroller
side.
Do your extra NIC's and the LSI share the same PCI-x settings? Do
they both support all settings?
B,
Roelof
-----Original Message-----
From: Robin Axelsson [mailto:gu99r...@student.chalmers.se]
Sent: zondag 22 januari 2012 19:38
To: OpenIndiana-discuss@openindiana.org
Subject: [OpenIndiana-discuss] CIFS performance issues
In the past, I used OpenSolaris b134 which I then updated to
OpenIndiana
b148 and never did I experience performance issues related to the
network connection (and that was when using two of the "infamous"
RTL8111DL OnBoard ports). Now that I have swapped the motherboard and
the hard drive and later added a 2-port Intel EXPI9402PT NIC (because
of driver issues with the Realtek NIC that wasn't there before), I
performed a fresh install of OpenIndiana.
Since then I experience intermittent network freeze-ups that I cannot
link to faults of the storage pool (iostat -E returns 0 errors). I
have had this issue both with the dual port Intel controller as well
as with a single port Intel controller (EXPI9400PT) and the Realtek
8111E OnBoard NIC. The storage pool is behind an LSI MegaRAID 1068e
based controller using no port extenders.
In detail (9400PT+8111E):
-------------------------
I was running a Virtual Machine with VirtualBox 3.2.14 with (1) a
bridged network connection and was accessed over the network using
(2) VBox RDP connection and (3) a ZFS based CIFS share to be accessed
from a Windows computer over the network. These applications were
administrated both over
(4) SSH (port 2244) and (5) VNC (using vncserver). A typical start of
the VM was done with 'screen VBoxHeadless --startvm ...'
I assigned the network ports the following way:
e1000g: VBox RDP, VNC, SSH
rge0: Virtual Machine Network Connection (Bridged)
I tried various combinations but the connection froze intermittently
for all applications. The bridged network connection was worst. When
I SSHed over rge0, the connection was frequently severed which is was
not
over e1000.
So I pulled the plug on the rg0 and let everything go through the
e1000 connection. freeze-ups became more frequent and it seemed like
the Bridged connection was causing this issue because the connection
didn't freeze like that when the VM wasn't running.
Note that I didn't assign the CIFS share to any particular port but
calls to<computername> were assigned to the e1000 port in the
/etc/inet/hosts file.
-------------------------
In detail (9402PT):
-------------------
In this setup I run essentially the same applications but all through
the 9402PT which has two ports (e1000g1 and e1000g2). So I assign the
applications the following way:
e1000g1: VBox RDP, SSH,<computername> (in /etc/inet/hosts)
e1000g2: Bridged connection to the virtual machine
So while running the virtual machine on the server, having an open
SSH connection to it and a command prompt pointing (cd x:\) at the
CIFS share (which is mapped as a network drive, say "X:") I started a
media player and played an audio file over the CIFS share which made
the
connection freeze.
The freezing affected the media player and the command prompt but the
RDP connection worked and access to internet inside the VM was flawless.
The SSH connection was frozen as well. After a few minutes it became
responsive and iostat -E reported no errors. The command prompt and
the media player were still frozen but "ls<path to CIFS shared
contents>"
worked fine over the SSH connection. Shortly after that the CIFS
connection came back and things seem to run ok.
So in conclusion the freeze-ups are still there but less frequent. I
have tried VirtualBox 4.1.8 but the ethernet connection is worse with
that version which is why I downgraded to 3.2.14 (which was published
_after_ 4.1.8).
-------------------
These issues occur on server grade hardware using drivers that
are/were certified by Sun (as I understand it). Moreover, CIFS and
ZFS are the core functionality of OpenIndiana so it is quite
essential that the network works properly and is stable.
I'm sorely tempted to issue a bug report but I would want some advice
on how to troubleshoot and provide relevant bug reports. There are no
entries in the /var/adm/messages that are related to the latest
freeze-up mentioned above and I couldn't find any when running the
prior setups. These freeze-ups don't happen all the time so it isn't
easy to consistently reproduce them.
Robin.
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
.
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
.
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss