A continuation, I hoped it wouldn't follow, but the server hanged again.

The error I saw on the console was

Sep 10 20:15:39/256 ERROR: svc:/system/hal:default: Method "/lib/svc/method/svc-hal start" failed with exit status 95. Sep 10 20:15:39/256: system/hal:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details)

I couldn't do anything on the console, had to do restart server.

The mounts were lost at 20:08 on the client
Sep 10 20:08:07 station KernelEventAgent[72]: tid 00000000 received event(s) VQ_NOTRESP (1)

The last fmdump was 5 days ago
Sep 05 2011 14:37:37.325349500 ereport.fs.zfs.vdev.open_failed
nvlist version: 0

So does it confirming either version for failing psu or bad ssd?

--Roman N

Lucas Van Tol said the following, on 02-09-11 10:12 AM:
You might not want to have any swap enabled on that.   SSD's tend to perform 
worse when they are full (I'm not sure if allocating 8G to swap actually uses 
up space on the physical device or not) and I have seen other Kingston SSD's 
hang for a bit at times, which would probably not be good for swap.

If possible, you might try and redirect some logs off of rpool; it might not be 
able to log anything if the rpool is the problem.
Date: Fri, 2 Sep 2011 00:07:23 -0400
From: ro...@naumenko.ca
To: openindiana-discuss@openindiana.org
Subject: Re: [OpenIndiana-discuss] server hangs

It's Kingston 16GB ssd drive.

--Roman N

Lucas Van Tol said the following, on 01-09-11 5:34 PM:
What is your rpool like?  I saw some bizzare behavior with a compact-flash 
based rpool; as the CF card got overused and got slower and slower, it 
eventually would hang without throwing any actual errors (just service times 
approaching infinity).
Services that had enough information stored in memory continued to work, but 
anytime something read from the rpool it would hang, and services slowly died 
off.   The system never seemed to fault/offline the rpool either...

Date: Thu, 1 Sep 2011 14:42:54 -0400
From: ro...@naumenko.ca
To: openindiana-discuss@openindiana.org
Subject: Re: [OpenIndiana-discuss] server hangs

I need to dig into MB manual, but its basically all commodity hw based 
(although mb is some server-type Asus).

--Roman N

----- Original Message -----

what about hw event logs? if you have power flucuations it might show
ip there.
you can probably pull those out from your service processor or boot
to bios and read them there.
Sent from Jasons' hand held
On Sep 1, 2011, at 8:37 AM, Roman Naumenko<ro...@naumenko.ca>   wrote:
Costly troubleshooting you had.
All right then, I will wait for the next failure to look through it
once again and maybe swap psu if nothing again found.

--Roman N

----- Original Message -----

I burned through about 3 disks before I figured it out. Nothing in
the
logs made me think this but the eventual failure of the disks
alerted
me
that something hardwarish was happening.
On 08/31/11 11:01 PM, Roman Naumenko wrote:
Well, might be the reason. 8 drivers is certainly limit too much
for a
stock psu. But there should be some traces, no?
How did you figure out the reason for errors on your system?

--Roman

Daniel Kjar said the following, on 31-08-11 9:43 PM:
Careful... are you overtaxing your power supply? My 148 system
was
behaving like that when I put too many drives in an ultra 20.

On 8/31/2011 7:48 PM, Roman Naumenko wrote:
Hi,

I have SunOS 5.11 oi_148 installed on my storage server with 8
disks
in raidz2 pool.
It hangs about once in a week and I had to restart it.
Can you help me troubleshoot it?

It has some zfs volumes shared over nfs and afpd. (afpd is
unfortunately a development version to satisfy OSX Lion).

roks@data:~$ afpd -V
afpd 2.2.0 - Apple Filing Protocol (AFP) daemon of Netatalk

afpd has been compiled with support for these features:

AFP3.x support: Yes
TCP/IP Support: Yes
DDP(AppleTalk) Support: No
CNID backends: dbd last tdb
SLP support: No
Zeroconf support: Yes
TCP wrappers support: Yes
Quota support: Yes
Admin group support: Yes
Valid shell checks: Yes
cracklib support: No
Dropbox kludge: No
Force volume uid/gid: No
ACL support: Yes
EA support: ad | sys
LDAP support: Yes

It also has time-slider enabled, which is pretty buggy peace of
hmmm
software, but it shouldn't cause server to crash or hang.

So the problems start with nfs and/or afpd timeouts on clients,
but
I still can ssh to the server. Can't read any files or logs
though.
Then network service disappears in a minute or few minutes,
console
becomes frozen and I have to do hard restart at that point.

Where should I look to understand what causing this?
Since I can't reproduce the problem, I'd like to get prepared
when
it happens next time.
I couldn't find anything unusual in the logs after restart.

time-slider complains for some reason about space on rpool
Aug 31 19:41:36 data time-sliderd: [ID 702911 daemon.notice] No
more
hourly snapshots left
Aug 31 19:41:36 data time-sliderd: [ID 702911 daemon.warning]
rpool
exceeded 80% capacity. Hourly and daily automatic snapshots
were
destroyed

Where does it see 80%?

$ df -h

Filesystem Size Used Avail Use% Mounted on
rpool/ROOT/solaris 5.5G 3.0G 2.6G 54% /
swap 1.4G 396K 1.4G 1% /etc/svc/volatile
/usr/lib/libc/libc_hwcap1.so.1 5.5G 3.0G 2.6G 54%
/lib/libc.so.1
swap 1.4G 8.0K 1.4G 1% /tmp
swap 1.4G 52K 1.4G 1% /var/run
rpool/export 2.6G 32K 2.6G 1% /export
rpool/export/home 2.6G 33K 2.6G 1% /export/home
rpool/export/home/usr1 2.6G 38K 2.6G 1% /export/home/usr1
rpool/export/home/usr2 3.0G 385M 2.6G 13% /export/home/usr2
rpool 2.6G 48K 2.6G 1% /rpool


--Roman

_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
--
Dr. Daniel Kjar
Assistant Professor of Biology
Division of Mathematics and Natural Sciences
Elmira College
1 Park Place
Elmira, NY 14901
607-735-1826
http://faculty.elmira.edu/dkjar
"...humans send their young men to war; ants send their old
ladies"
-E. O. Wilson
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
                                        
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
                                        
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss

_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss

Reply via email to