This is just an update, I've still got to try everything that was
suggested before.
This issue is finally occurring again, and I have been able to collect
more information about it:
# uptime
11:46AM up 3 days, 22:50, 1 user, load averages: 1.33, 1.12, 1.10
# ps aux
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 1 0.0 0.2 364 376 ?? Is Wed12PM 0:00.09 /sbin/init
root 17473 0.0 0.3 412 812 ?? Is Wed12PM 0:00.09
syslogd: [priv] (syslogd)
_syslogd 4944 0.0 0.3 420 860 ?? S Wed12PM 1:59.70
syslogd -a /var/www/dev/log -a /var/empty/dev/log
root 17203 0.0 0.2 572 464 ?? Is Wed12PM 0:00.01
pflogd: [priv] (pflogd)
_pflogd 25836 0.0 0.2 636 384 ?? S Wed12PM 1:18.70
pflogd: [running] -s 160 -i pflog0 -f /var/log/pflog (pflogd)
root 20453 0.0 0.4 496 1020 ?? Is Wed12PM 0:02.17
ntpd: [priv] (ntpd)
_ntp 27033 0.0 0.4 548 1092 ?? S Wed12PM 0:36.73
ntpd: ntp engine (ntpd)
_ntp 30318 0.0 0.4 676 1008 ?? I Wed12PM 0:00.02
ntpd: dns engine (ntpd)
root 12410 0.0 0.5 616 1384 ?? Is Wed12PM 0:00.02 /usr/sbin/sshd
root 18650 0.0 0.3 412 832 ?? Is Wed12PM 0:00.06 inetd
root 13652 0.0 0.4 668 912 ?? Is Wed12PM 0:04.15 cron
root 12191 0.0 0.8 1216 2116 ?? Ss Wed12PM 1:36.36
sendmail: accepting connections (sendmail)
root 18822 0.0 1.2 3452 3084 ?? Is 11:22AM 0:00.13
sshd: gene [priv] (sshd)
gene 27682 0.3 0.9 3420 2312 ?? S 11:22AM 0:00.55
sshd: gene@ttyp0 (sshd)
gene 18431 0.0 0.2 616 492 p0 Ss 11:22AM 0:00.14 -ksh (ksh)
root 23079 0.1 0.2 692 536 p0 S 11:46AM 0:00.07 -ksh (ksh)
root 19366 0.0 0.1 516 328 p0 R+ 11:47AM 0:00.00 ps -aux
root 17451 0.0 0.3 280 864 C0 Is+ Wed12PM 0:00.02
/usr/libexec/getty std.9600 ttyC0
root 23962 0.0 0.3 324 864 C1 Is+ Wed12PM 0:00.01
/usr/libexec/getty std.9600 ttyC1
root 2571 0.0 0.3 272 860 C2 Is+ Wed12PM 0:00.01
/usr/libexec/getty std.9600 ttyC2
root 9191 0.0 0.3 296 864 C3 Is+ Wed12PM 0:00.02
/usr/libexec/getty std.9600 ttyC3
root 2812 0.0 0.3 416 868 C5 Is+ Wed12PM 0:00.01
/usr/libexec/getty std.9600 ttyC5
# vmstat -i
interrupt total rate
irq0/clock 34043772 99
irq97/mpi0 772066 2
irq112/em0 96237 0
Total 34912075 102
# systat
1 users Load 1.10 1.07 1.08 PAUSED Sun Oct 23 11:46:02 2011
memory totals (in KB) PAGING SWAPPING Interrupts
real virtual free in out in out 105 total
Active 12420 12420 185072 ops 100 clock
All 55712 55712 447212 pages 4 mpi0
1 em0
Proc:r d s w Csw Trp Sys Int Sof Flt forks
6 21 17 88 4 102 21 fkppw
fksvm
0.0%Int 0.2%Sys 0.4%Usr 0.0%Nic 99.4%Idle pwait
| | | | | | | | | | | 2 relck
2 rlkok
noram
Namei Sys-cache Proc-cache No-cache ndcpy
Calls hits % hits % miss % fltcp
14 14 100 2 zfod
cow
Disks cd0 sd0 fd0 2006 fmin
seeks 2674 ftarg
xfers 4 itarg
speed 67K 1 wired
sec 0.0 pdfre
pdscn
pzidle
10 kmapent
# dmesg | tail
vmware: sending length failed, eax=00000000, ecx=00000000
vmt0: failed to send TCLO outgoing ping
vmware: sending length failed, eax=00000000, ecx=00000000
vmt0: failed to send TCLO outgoing ping
vmware: sending length failed, eax=00000000, ecx=00000000
vmt0: failed to send TCLO outgoing ping
vmware: sending length failed, eax=00000000, ecx=00000000
vmt0: failed to send TCLO outgoing ping
vmware: sending length failed, eax=00000000, ecx=00000000
vmt0: failed to send TCLO outgoing ping
My /var/log/messages* files have that pair of error messages in them
over 16,000 times.
I will go through and try what has been suggested, starting with
changing the guest OS type. Unfortunately it appears it can be days
apart when this problem occurs. I'll send an update when I have
something more concrete.
If anyone would like to try recreating this problem on their ESXi host
I'll make a .tar.gz of this vm guest for you to download.
Thanks again.
-Gene
On Wed, Oct 19, 2011 at 8:23 PM, Gene <[email protected]> wrote:
> I haven't been able to reproduce the problem since this morning.
> Nothing has been changed on the vmhosts so I'm at a bit of a loss at
> the moment.
>
> When the issue reoccurs I'll try everything that has been suggested today.
>
> Thank you very much for your help everyone.
>
> -Gene