Re: Spontaneous reboots on Intel i5 and FreeBSD 9.0

2013-01-19 Thread Marin Atanasov Nikolov
Hi,

Re-sending this one, as I've attached an image which was too large to pass
the mailing lists, sorry about that :)

After starting the system last night I kept monitoring the memory usage
just in case I see something strange and I've noticed a significant memory
drop of the free memory between 03:00am and 03:05am time. I've taken a
screenshot of the graph, which you can also see at the link below:

* http://users.unix-heaven.org/~dnaeon/memory-usage.jpg

At 03:00am I can see that periodic(8) runs, but I don't see what could have
taken so much of the free memory. I'm also running this system on ZFS and
have daily rotating ZFS snapshots created - currently the number of ZFS
snapshots are > 1000, and not sure if that could be causing this. Here's a
list of the periodic(8) daily scripts that run at 03:00am time.

% ls -1 /etc/periodic/daily
100.clean-disks
110.clean-tmps
120.clean-preserve
130.clean-msgs
140.clean-rwho
150.clean-hoststat
200.backup-passwd
210.backup-aliases
220.backup-pkgdb
300.calendar
310.accounting
330.news
400.status-disks
404.status-zfs
405.status-ata-raid
406.status-gmirror
407.status-graid3
408.status-gstripe
409.status-gconcat
420.status-network
430.status-rwho
440.status-mailq
450.status-security
460.status-mail-rejects
470.status-named
480.status-ntpd
490.status-pkg-changes
500.queuerun
800.scrub-zfs
999.local

% ls -1 /usr/local/etc/periodic/daily
402.zfSnap
403.zfSnap_delete
411.pkg-backup
smart

I'll keep monitoring the memory usage and will see if the free memory drops
again by more than 50% on the next periodic(8) daily run. If the memory
drop keeps the current trend that would mean that the system should crash
in the next 1-2 days, so if that happens and the memory was low at that
time I'll start debugging the periodic(8) scripts and see which one might
be causing this.

Thanks and regards,
Marin


On Fri, Jan 18, 2013 at 10:23 PM, Warren Block  wrote:

> On Fri, 18 Jan 2013, kpn...@pobox.com wrote:
>
>  On Fri, Jan 18, 2013 at 09:48:05AM -0700, Ian Lepore wrote:
>>
>>> I tend to agree, a machine that starts rebooting spontaneously when
>>> nothing significant changed and it used to be stable is usually a sign
>>> of a failing power supply or memory.
>>>
>>
>> Agreed.
>>
>>  But I disagree about memtest86.  It's probably not completely without
>>> value, but to me its value is only negative:  if it tells you memory is
>>> bad, it is.  If it tells you it's good, you know nothing.  Over the
>>> years I've had 5 dimms fail.  memtest86 found the error in one of them,
>>> but said all the others were fine in continuous 48-hour tests.  I even
>>> tried running the tests on multiple systems.
>>>
>>> The thing that always reliably finds bad memory for me
>>> is /usr/ports/math/mprime run in test/benchmark mode.  It often takes 24
>>> or more hours of runtime, but it will find your bad memory.
>>>
>>
>> I've had "good" luck with gcc showing bad memory. If compiling a new
>> kernel
>> produces seg faults then I know I have a hardware problem. I've seen
>> compilers at work failing due to bad memory as well.
>>
>> Some problems only happen with particular access patterns.  So if a
>> compiler
>> works fine then, like memtest86, it doesn't say anything about the health
>> of the hardware.
>>
>
> Most test tools are like that.  They might diagnose something as bad, but
> they often can't prove it is good.  SMART has a reputation for not finding
> any problems on disks that are failing, and capacitors that aren't swollen
> or leaking still may not be working.
>
> But diagnostic tools can at least give a hint.  In my case, memtest
> indicated a problem--a big problem.  I removed one DIMM at random (there
> were only two) and the problems and memtest errors both went away. Replace
> the DIMM, and both came back.
>
> __**_
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/**mailman/listinfo/freebsd-**stable
> To unsubscribe, send any mail to 
> "freebsd-stable-unsubscribe@**freebsd.org
> "
>



-- 
Marin Atanasov Nikolov

dnaeon AT gmail DOT com
http://www.unix-heaven.org/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Failed to attach P_CNT - FreeBSD 9.1 RC3

2013-01-19 Thread John Baldwin
On Sunday, November 04, 2012 05:56:33 AM Shiv. Nath wrote:
> Dear FreeBSD Community Friends,
> 
> It is FreeBSD 9.1 RC3, i get the following warning in the message log
> file. i need assistance to understand the meaning of this error, how
> serious is it?
> 
> acpi_throttle23: failed to attach P_CNT

On newer CPUs that use est you don't want to use acpi_throttle anyway so you 
can ignore the errors.  (est gives you power savings when it lowers your CPU 
speed, acpi_throttle generally does not, it only helps with lowering the 
temperature)

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Startup lapic messages

2013-01-19 Thread John Baldwin
On Tuesday, December 18, 2012 06:28:25 AM S.N.Grigoriev wrote:
> Hi list,
> 
> I've installed FreeBSD 9.1R amd64 on a new Intel server.
> The following lapic messages appear during system startup:
> 
> lapic18: Forcing LINT1 to edge trigger
> SMP: AP CPU #2 Launched!
> lapic50: Forcing LINT1 to edge trigger
> SMP: AP CPU #6 Launched!
> lapic20: Forcing LINT1 to edge trigger
> SMP: AP CPU #3 Launched!
> lapic32: Forcing LINT1 to edge trigger
> SMP: AP CPU #4 Launched!
> lapic2: Forcing LINT1 to edge trigger
> SMP: AP CPU #1 Launched!
> lapic34: Forcing LINT1 to edge trigger
> SMP: AP CPU #5 Launched!
> lapic52: Forcing LINT1 to edge trigger
> SMP: AP CPU #7 Launched!
> 
> I've never seen such messages in past.
> Does it mean I have some hardware problem/misconfiguration?

Your BIOS is slightly buggy, but in a harmless way.  You can ignore these.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 9-STABLE -> NFS -> NetAPP:

2013-01-19 Thread John Baldwin
On Tuesday, December 18, 2012 11:58:36 PM Hub- Marketing wrote:
> I'm running a few servers sitting on top of a NetAPP file server …
> everything runs great, but periodically I'm getting:
> 
> nfs_getpages: error 13
> vm_fault: pager read error, pid 11355 (https)

Are you using interruptible mounts ("intr" mount option)?

Also, can you get ps output that includes the 'l' flag to show what
the processes are stuck on?

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Spontaneous reboots on Intel i5 and FreeBSD 9.0

2013-01-19 Thread Ian Lepore
On Sat, 2013-01-19 at 12:30 +0200, Marin Atanasov Nikolov wrote:
> Hi,
> 
> Re-sending this one, as I've attached an image which was too large to pass
> the mailing lists, sorry about that :)
> 
> After starting the system last night I kept monitoring the memory usage
> just in case I see something strange and I've noticed a significant memory
> drop of the free memory between 03:00am and 03:05am time. I've taken a
> screenshot of the graph, which you can also see at the link below:
> 
> * http://users.unix-heaven.org/~dnaeon/memory-usage.jpg
> 
> At 03:00am I can see that periodic(8) runs, but I don't see what could have
> taken so much of the free memory. I'm also running this system on ZFS and
> have daily rotating ZFS snapshots created - currently the number of ZFS
> snapshots are > 1000, and not sure if that could be causing this. Here's a
> list of the periodic(8) daily scripts that run at 03:00am time.
> 
[...]
> 

What exactly is that graph displaying for "available memory?"  That is,
what is the source of info on the graph?

If it's just showing memory that appears in top as "Free" that's not the
whole picture; the memory in the Inactive category is also available. 

-- Ian


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Spontaneous reboots on Intel i5 and FreeBSD 9.0

2013-01-19 Thread Marin Atanasov Nikolov
>
> What exactly is that graph displaying for "available memory?"  That is,
> what is the source of info on the graph?
>
>
Hi Ian,



> If it's just showing memory that appears in top as "Free" that's not the
> whole picture; the memory in the Inactive category is also available.
>
>
The graph takes into account the sum of the memory as displayed by top(1)
in the "Free", "Cache" and "Inactive" categories.

Regards,
Marin




> -- Ian
>
>
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>



-- 
Marin Atanasov Nikolov

dnaeon AT gmail DOT com
http://www.unix-heaven.org/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Spontaneous reboots on Intel i5 and FreeBSD 9.0

2013-01-19 Thread John
>At 03:00am I can see that periodic(8) runs, but I don't see what could have
>taken so much of the free memory. I'm also running this system on ZFS and
>have daily rotating ZFS snapshots created - currently the number of ZFS
>snapshots are > 1000, and not sure if that could be causing this. Here's a
>list of the periodic(8) daily scripts that run at 03:00am time.
>
>% ls -1 /etc/periodic/daily
>800.scrub-zfs
>
>% ls -1 /usr/local/etc/periodic/daily
>402.zfSnap
>403.zfSnap_delete

On a couple of my zfs machines, I've found running a scrub along with other
high file system users to be a problem.  I therefore run scrub from cron and
schedule it so it doesn't overlap with periodic.

I also found on a machine with an i3 and 4G ram that overlapping scrubs and
snapshot destroy would cause the machine to grind to the point of being
non-responsive. This was not a problem when the machine was new, but became one
as the pool got larger (dedup is off and the pool is at 45% capacity).

I use my own zfs management script and it prevents snapshot destroys from
overlapping scrubs, and with a lockfile it prevents a new destroy from being
initiated when an old one is still running.

zfSnap has its -S switch to prevent actions during a scrub which you should
use if you haven't already.

Since making these changes, a machine that would have to be rebooted several
times a week has now been up 61 days.

John Theus
TheUs Group
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 9-STABLE -> NFS -> NetAPP:

2013-01-19 Thread Hub- Marketing

On 2013-01-19, at 4:57 AM, John Baldwin  wrote:

> On Tuesday, December 18, 2012 11:58:36 PM Hub- Marketing wrote:
>> I'm running a few servers sitting on top of a NetAPP file server …
>> everything runs great, but periodically I'm getting:
>> 
>> nfs_getpages: error 13
>> vm_fault: pager read error, pid 11355 (https)
> 
> Are you using interruptible mounts ("intr" mount option)?

192.168.1.253:/vol/vol1 /vm nfs rw,intr,soft,nolockd  0   0

I just added the 'soft' option to the mix … nolockd is enabled since I know for 
a fact that its not possible for two processes to access the same file on both 
mounts at the same time … 

> Also, can you get ps output that includes the 'l' flag to show what
> the processes are stuck on?

I will send an follow up the next time this happens, so it may be a few days … 

> 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"