Re: [cfarm-users] gcc102 (sparc64) down again for maintenance/repairs

2022-12-11 Thread Zach van Rijn via cfarm-users
On Sun, 2022-12-11 at 09:09 -0600, Segher Boessenkool wrote: > Hi Zach, > > On Fri, Dec 09, 2022 at 09:12:06AM -0600, Zach van Rijn via > cfarm-users wrote: > > On Fri, 2022-12-09 at 15:42 +0100, Pierre Muller via cfarm- > > users > > wrote: > > > ... > > > > > > It still seems that there are C

Re: [cfarm-users] gcc102 (sparc64) down again for maintenance/repairs

2022-12-11 Thread Segher Boessenkool via cfarm-users
Hi Zach, On Fri, Dec 09, 2022 at 09:12:06AM -0600, Zach van Rijn via cfarm-users wrote: > On Fri, 2022-12-09 at 15:42 +0100, Pierre Muller via cfarm-users > wrote: > > ... > > > > It still seems that there are CPU lockup :-( *Soft* lockups. Tasks that were unresponsive for more than 20s. Thi

Re: [cfarm-users] gcc102 (sparc64) down again for maintenance/repairs

2022-12-09 Thread Zach van Rijn via cfarm-users
On Fri, 2022-12-09 at 15:42 +0100, Pierre Muller via cfarm-users wrote: > ... > > It still seems that there are CPU lockup :-( > > Pierre > > make[1]: Leaving directory '/home/muller/pas/trunk-svn- > github/svn-fpcsrc/compiler' > muller@gcc102:~/pas/trunk/svn-fpcsrc/compiler$ > Message from sy

Re: [cfarm-users] gcc102 (sparc64) down again for maintenance/repairs

2022-12-09 Thread Pierre Muller via cfarm-users
Le 09/12/2022 à 02:38, Zach van Rijn via cfarm-users a écrit : On Fri, 2022-12-02 at 12:38 -0600, Zach van Rijn via cfarm-users wrote: On Thu, 2022-12-01 at 23:22 -0600, Jacob Bachmeyer wrote: ... 16 cores and 64GB memory is probably OK for the farm for now, and I will replace the defective

Re: [cfarm-users] gcc102 (sparc64) down again for maintenance/repairs

2022-12-08 Thread Zach van Rijn via cfarm-users
On Fri, 2022-12-02 at 12:38 -0600, Zach van Rijn via cfarm-users wrote: > On Thu, 2022-12-01 at 23:22 -0600, Jacob Bachmeyer wrote: > > ... > > 16 cores and 64GB memory is probably OK for the farm for now, > and I will replace the defective module(s) early next year. Next year came early. New mem

Re: [cfarm-users] gcc102 (sparc64) down again for maintenance/repairs

2022-12-02 Thread Zach van Rijn via cfarm-users
On Thu, 2022-12-01 at 23:22 -0600, Jacob Bachmeyer wrote: > Zach van Rijn wrote: > > On Wed, 2022-11-30 at 21:21 -0600, Jacob Bachmeyer wrote: > > > > > ... > ...those panics during early boot, strongly suggest bad RAM as > Bruno Haible suggested. I agree it is likely a hardware issue. The sy

Re: [cfarm-users] gcc102 (sparc64) down again for maintenance/repairs

2022-12-01 Thread Jacob Bachmeyer via cfarm-users
Zach van Rijn wrote: On Wed, 2022-11-30 at 21:21 -0600, Jacob Bachmeyer wrote: ... Do you have logs farther back? Yes; I've attached some going back about ten days. Thank you for the analysis, by the way. It is an interesting theory. I would tend to agree with Bruno that memory should

Re: [cfarm-users] gcc102 (sparc64) down again for maintenance/repairs

2022-11-30 Thread Bruno Haible via cfarm-users
Jacob Bachmeyer wrote: > Speculation: (based on a bug I encountered long ago in the ext3 driver) There are also other causes of "BUG: soft lockup - CPU#n stuck". One of them is bad RAM. You can diagnose bad RAM more reliably through 'memtester' [1]. That did it in my case (on an x86_64 machine)

Re: [cfarm-users] gcc102 (sparc64) down again for maintenance/repairs

2022-11-30 Thread Jacob Bachmeyer via cfarm-users
Zach van Rijn via cfarm-users wrote: On Wed, 2022-11-30 at 11:35 +0100, Pierre Muller via cfarm-users wrote: Just got this: Message from syslogd@gcc102 at Nov 30 04:31:20 ... kernel:[47393.509723] watchdog: BUG: soft lockup - CPU#2 stuck for 48s! [ppc2:203070] Can I do anything to help fig

Re: [cfarm-users] gcc102 (sparc64) down again for maintenance/repairs

2022-11-30 Thread Zach van Rijn via cfarm-users
On Wed, 30 Nov 2022 08:45:21 -0600 Pierre Muller wrote --- > > > At least 'ppc2' and 'fpmake' are most probably executable on > my user account that are generated by my cron jobs. > > > Maybe it would be wise to check if the machine is stable if my cron jobs are > disabled. I

Re: [cfarm-users] gcc102 (sparc64) down again for maintenance/repairs

2022-11-30 Thread Pierre Muller via cfarm-users
At least 'ppc2' and 'fpmake' are most probably executable on my user account that are generated by my cron jobs. Maybe it would be wise to check if the machine is stable if my cron jobs are disabled. I am currently unable to login into gcc102. I you restart the machine, please also disable

Re: [cfarm-users] gcc102 (sparc64) down again for maintenance/repairs

2022-11-30 Thread Zach van Rijn via cfarm-users
On Wed, 2022-11-30 at 11:35 +0100, Pierre Muller via cfarm-users wrote: > Just got this: > Message from syslogd@gcc102 at Nov 30 04:31:20 ... > kernel:[47393.509723] watchdog: BUG: soft lockup - CPU#2 > stuck for 48s! [ppc2:203070] > > Can I do anything to help figuring out the problem? Not sur

Re: [cfarm-users] gcc102 (sparc64) down again for maintenance/repairs

2022-11-30 Thread Pierre Muller via cfarm-users
Just got this: Message from syslogd@gcc102 at Nov 30 04:31:20 ... kernel:[47393.509723] watchdog: BUG: soft lockup - CPU#2 stuck for 48s! [ppc2:203070] Can I do anything to help figuring out the problem? Pierre Le 29/11/2022 à 22:30, Zach van Rijn via cfarm-users a écrit : On Sun, 2022-11-27

Re: [cfarm-users] gcc102 (sparc64) down again for maintenance/repairs

2022-11-29 Thread Zach van Rijn via cfarm-users
On Sun, 2022-11-27 at 17:09 -0600, Zach van Rijn via cfarm-users wrote: > Please be advised that gcc102 (sparc64) is, once again, offline > and there is no estimate for when it will be available. It's online with a new kernel, but I cannot make promises about stability. It seemed OK today compilin

[cfarm-users] gcc102 (sparc64) down again for maintenance/repairs

2022-11-27 Thread Zach van Rijn via cfarm-users
Please be advised that gcc102 (sparc64) is, once again, offline and there is no estimate for when it will be available. The current situation is: * "watchdog: BUG: soft lockup" at idle, same kernel as had been somewhat stable before. Once enough CPUs are locked, there is no possibility