IPMI hardware watchdogs Re: dell r420/r320 stable/9
On Jul 26, 2012, at 8:50 PM, Sean Bruno wrote: > For the time being I had to revert the following from my stable/9 tree. > Otherwise I would get a kernel panic on shutdown from ipmi(4). > > http://svnweb.freebsd.org/base?view=revision&revision=237839 > http://svnweb.freebsd.org/base?view=revision&revision=221121 > On a somewhat related note: We noticed recently that you can't pet or disable the IPMI hardware watchdog once SCHEDULER_STOPPED() is true. This means it can fire unexpectedly while you're dumping core or rebooting, depending on how long the timeout was on the pet before the panic. The ipmi driver will need to process the command differently if the scheduler is stopped. I haven't had time to look at a fix yet. -Andrew -- Andrew Boyerabo...@averesystems.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: IPMI hardware watchdogs Re: dell r420/r320 stable/9
On Fri, Jul 27, 2012 at 3:33 PM, Andrew Boyer wrote: > > On Jul 26, 2012, at 8:50 PM, Sean Bruno wrote: > >> For the time being I had to revert the following from my stable/9 tree. >> Otherwise I would get a kernel panic on shutdown from ipmi(4). >> >> http://svnweb.freebsd.org/base?view=revision&revision=237839 >> http://svnweb.freebsd.org/base?view=revision&revision=221121 >> > > > On a somewhat related note: We noticed recently that you can't pet or disable > the IPMI hardware watchdog once SCHEDULER_STOPPED() is true. This means it > can fire unexpectedly while you're dumping core or rebooting, depending on > how long the timeout was on the pet before the panic. The ipmi driver will > need to process the command differently if the scheduler is stopped. I > haven't had time to look at a fix yet. I recall I fixed that internally for SV, but the key here is that we need to find an unified (or a default policy). More specifically, do we want the watchdog also covers the kernel dump part (because of possible deadlocks when dumping). If the answer is yes, we likely need pat the watchdog from within the dumping cycle itself. If the answer is no, then we can just disable it when entering the panic path. But anyway, we need to identify a default policy that makes sense first. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: IPMI hardware watchdogs Re: dell r420/r320 stable/9
On Jul 27, 2012, at 10:42 AM, Attilio Rao wrote: > On Fri, Jul 27, 2012 at 3:33 PM, Andrew Boyer wrote: >> >> On Jul 26, 2012, at 8:50 PM, Sean Bruno wrote: >> >>> For the time being I had to revert the following from my stable/9 tree. >>> Otherwise I would get a kernel panic on shutdown from ipmi(4). >>> >>> http://svnweb.freebsd.org/base?view=revision&revision=237839 >>> http://svnweb.freebsd.org/base?view=revision&revision=221121 >>> >> >> On a somewhat related note: We noticed recently that you can't pet or >> disable the IPMI hardware watchdog once SCHEDULER_STOPPED() is true. This >> means it can fire unexpectedly while you're dumping core or rebooting, >> depending on how long the timeout was on the pet before the panic. The ipmi >> driver will need to process the command differently if the scheduler is >> stopped. I haven't had time to look at a fix yet. > > I recall I fixed that internally for SV, but the key here is that we > need to find an unified (or a default policy). > More specifically, do we want the watchdog also covers the kernel dump > part (because of possible deadlocks when dumping). If the answer is > yes, we likely need pat the watchdog from within the dumping cycle > itself. If the answer is no, then we can just disable it when entering > the panic path. But anyway, we need to identify a default policy that > makes sense first. > > Attilio > For our use case, we need the system to reset if the dump hangs. As the code stands now, you can't disable the HW watchdog from the panic path. Prior to stopping the scheduler early in panic(), you don't know the lock state, so you can't safely initiate the IPMI command. (It hung the first time I tried it.) After stopping the scheduler, you can't pet it to turn it off. -Andrew -- Andrew Boyerabo...@averesystems.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: IPMI hardware watchdogs Re: dell r420/r320 stable/9
On Fri, Jul 27, 2012 at 3:55 PM, Andrew Boyer wrote: > > On Jul 27, 2012, at 10:42 AM, Attilio Rao wrote: > >> On Fri, Jul 27, 2012 at 3:33 PM, Andrew Boyer >> wrote: >>> >>> On Jul 26, 2012, at 8:50 PM, Sean Bruno wrote: >>> For the time being I had to revert the following from my stable/9 tree. Otherwise I would get a kernel panic on shutdown from ipmi(4). http://svnweb.freebsd.org/base?view=revision&revision=237839 http://svnweb.freebsd.org/base?view=revision&revision=221121 >>> >>> On a somewhat related note: We noticed recently that you can't pet or >>> disable the IPMI hardware watchdog once SCHEDULER_STOPPED() is true. This >>> means it can fire unexpectedly while you're dumping core or rebooting, >>> depending on how long the timeout was on the pet before the panic. The >>> ipmi driver will need to process the command differently if the scheduler >>> is stopped. I haven't had time to look at a fix yet. >> >> I recall I fixed that internally for SV, but the key here is that we >> need to find an unified (or a default policy). >> More specifically, do we want the watchdog also covers the kernel dump >> part (because of possible deadlocks when dumping). If the answer is >> yes, we likely need pat the watchdog from within the dumping cycle >> itself. If the answer is no, then we can just disable it when entering >> the panic path. But anyway, we need to identify a default policy that >> makes sense first. >> >> Attilio >> > > For our use case, we need the system to reset if the dump hangs. This means we might likely go to control by hand the watchdog patting in the panic path and more specifically I guess this reduces to patting the watching from within the dumping cycle (there could be other expensive points we can consider but nothing that pop off my head right now). Maybe Ryan can share with us if SV can contribute the code back about that specific part. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
AHCI Timeout errors on Intel Patsburg
We're seeing some strange timeout errors on some new Supermicro X9DRT-HF MB's we here when combined with KINGSTON HyperX 3K SSD's It seems that when connnected to the second channel reads often timeout stalling all IO under 8.3-RELEASE-p3 When this happens we see:- Jul 27 14:35:59 lon059 kernel: ahcich1: Timeout on slot 0 port 0 Jul 27 14:35:59 lon059 kernel: ahcich1: is cs ss 0001 rs 0001 tfd 40 serr 0088 cmd 0004c017 Jul 27 14:37:41 lon059 kernel: ahcich1: Timeout on slot 0 port 0 Jul 27 14:37:41 lon059 kernel: ahcich1: is cs ss 0001 rs 0001 tfd 40 serr 0088 cmd 0004c017 Jul 27 14:38:35 lon059 kernel: ahcich1: Timeout on slot 0 port 0 Jul 27 14:38:35 lon059 kernel: ahcich1: is cs ss 0001 rs 0001 tfd 40 serr 0088 cmd 0004c017 Jul 27 14:39:05 lon059 kernel: ahcich1: Timeout on slot 0 port 0 Jul 27 14:39:05 lon059 kernel: ahcich1: is cs ss 0001 rs 0001 tfd 40 serr 0088 cmd 0004c017 Jul 27 14:39:39 lon059 kernel: ahcich1: Timeout on slot 0 port 0 Jul 27 14:39:39 lon059 kernel: ahcich1: is cs ss 0001 rs 0001 tfd 40 serr 0088 cmd 0004c017 Jul 27 13:58:06 lon059 kernel: ahcich1: Timeout on slot 14 port 0 Jul 27 13:58:06 lon059 kernel: ahcich1: is cs ss 4000 rs 4000 tfd 40 serr 0088 cmd 0004ce17 Jul 27 14:21:17 lon059 kernel: ahcich1: Timeout on slot 14 port 0 Jul 27 14:21:17 lon059 kernel: ahcich1: is cs ss 4000 rs 4000 tfd 40 serr 0088 cmd 0004ce17 Jul 27 14:29:16 lon059 kernel: ahcich1: Timeout on slot 7 port 0 Jul 27 14:29:16 lon059 kernel: ahcich1: is cs ss 0080 rs 0080 tfd 40 serr 0088 cmd 0004c717 Jul 27 14:31:43 lon059 kernel: ahcich1: Timeout on slot 12 port 0 Jul 27 14:31:43 lon059 kernel: ahcich1: is cs ss 1000 rs 1000 tfd 40 serr 0088 cmd 0004cc17 The disk in ahcich0 is identical but doesn't seem to exhibit the same problem. Thought it may be a disk issue even though they are brand new but 2 out of the 3 machines tested have the same problem. In addition I've not managed to reproduce the issue if I force sata to rev 2 with: hint.ahcich.1.sata_rev=2 Machine is running with the latest SSD and machine firmware / bios. Could this be a ahci bug? dmesg and camcontrol output:- ahci0: port 0x9050-0x9057,0x9040-0x9043,0x9030-0x9037,0x9020-0x9023,0x9000-0x901f mem 0xdfa22000-0xdfa227ff irq 18 at device 31.2 on pci0 ahci0: [ITHREAD] ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported ahcich0: at channel 0 on ahci0 ahcich0: [ITHREAD] ahcich1: at channel 1 on ahci0 ahcich1: [ITHREAD] ahcich2: at channel 2 on ahci0 ahcich2: [ITHREAD] ahcich3: at channel 3 on ahci0 ahcich3: [ITHREAD] ahcich4: at channel 4 on ahci0 ahcich4: [ITHREAD] ahcich5: at channel 5 on ahci0 ahcich5: [ITHREAD] camcontrol identify ada1 pass1: ATA-8 SATA 3.x device pass1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) protocol ATA/ATAPI-8 SATA 3.x device model KINGSTON SH103S3120G firmware revision 501ABBF0 serial number 50026B7223027059 WWN 50026b7223027059 cylinders 16383 heads 16 sectors/track 63 sector size logical 512, physical 512, offset 0 LBA supported 234441648 sectors LBA48 supported 234441648 sectors PIO supported PIO4 DMA supported WDMA2 UDMA6 media RPM non-rotating Feature Support Enabled Value Vendor read ahead yes yes write cacheyes yes flush cacheyes yes overlapno Tagged Command Queuing (TCQ) no no Native Command Queuing (NCQ) yes 32 tags SMART yes yes microcode download yes yes security yes no power management yes yes advanced power management yes yes 254/0xFE automatic acoustic management no no media status notification no no power-up in Standbyyes no write-read-verify yes no 0/0x0 unload yes yes free-fall no no data set management (DSM/TRIM) yes DSM - max 512byte blocks yes 8 DSM - deterministic read yes any value Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or retu
9.1 beta powerpc64 installer bugs
I have 2 Mac Pro G5's and was testing out FreeBSD 9.1 beta. I noticed some small problems in the BSD installer: 1. When using the guided partitioning with entire disk selected, if you delete the pre-made "/" and create a new one (to make it a different size for example). After hitting OK, the installer asks you to create an Apple boot partition even though one already exists. If I select yes, I now have two Apple boot partitions. 2. I also noticed that the numbering gets a little out of order if you don't delete ALL existing partitions first. At some point I ended up with the numbering starting at 5 or skipping a 3 for example. 3. If the installer fails at some point and I restart it, the DHCP option comes back with a an error saying it was not able to assign an IP via DHCP. I can continue without DCHP and the IP address is pre-configured, however DNS is not. 4. I wasn't able to create a ZFS partition (for example /data). I was able to add it from the guided partitioning screen, but when it came time to format and install, I received a mount error and the installer failed. Other than that, the powerpc64 install on a Apple Mac Pro G5 was successful. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: IPMI hardware watchdogs Re: dell r420/r320 stable/9
on 27/07/2012 17:33 Andrew Boyer said the following: > > On Jul 26, 2012, at 8:50 PM, Sean Bruno wrote: > >> For the time being I had to revert the following from my stable/9 tree. >> Otherwise I would get a kernel panic on shutdown from ipmi(4). >> >> http://svnweb.freebsd.org/base?view=revision&revision=237839 >> http://svnweb.freebsd.org/base?view=revision&revision=221121 >> > > > On a somewhat related note: We noticed recently that you can't pet or disable > the IPMI hardware watchdog once SCHEDULER_STOPPED() is true. This means it > can fire unexpectedly while you're dumping core or rebooting, depending on > how long the timeout was on the pet before the panic. The ipmi driver will > need to process the command differently if the scheduler is stopped. I > haven't had time to look at a fix yet. Yeah, I noticed that unlike most (all?) other watchdog drivers where watchdog re-arming is a very basic operation like doing one I/O the IPMI watchdog does some more complex stuff which involves waiting on another thread. I think that this may be a little bit too much for a reliable watchdog driver. At least, as you note, this definitely won't work for the panic case where only one thread is left running. I guess that the driver should check for that case and do a direct operation instead of enqueueing a request and waiting for another thread to execute it. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Reg:Not proberly dismount
Dear All, I have FreeNAS 0.7.2 (Sabanda) and i already mounted the 1Tb Hard disk .Past 2 years nas server was working fine.Now i got the following errors, *GEOM:da0: The primary gpt table is correpted or invalid* And my data displaying some codes(*1Tb hard disk data*) * * *freenas:~# ls -l /mnt/nas/lost+fount/#00987* *freenas:~# ls -l /mnt/nas/lost+fount/#00100* *freenas:~# ls -l /mnt/nas/lost+fount/#00200* Kindly give me advise how to solve the issue. Regards, Rajamani ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Regression in stable for ThinkPad T520 with Intel GPU (Sandybridge) between June 22 and July 18
On Tue, 2012-07-24 at 18:35 -0700, Kevin Oberman wrote: > I will shortly spend a bit of time tracking down the breakage more > closely, but my 9-Stable system of June 22 runs fine. After an update > on about July 10, I noted that it would hang after Xorg was started, > but usually worked. After an upgrade to July 18, my system could no > longer start Gnome. It would start Xorg and Gnome would start > normally, getting many apps started, but about 10 seconds after the > wallpaper loaded, the system would freeze solid. No network access and > no response to mouse or keyboard. > > I have looked into commits to 9-STABLE during the time at issue and > very little seems to have changed due to the pre-9.1 freeze. > Similarly, nothing much has changed in any of the X11 ports. > > This really smells a lot like a race condition. I can trigger the same > behavior by enabling VT-x (not VT-d) in BIOS. In all cases where it > was intermittent, if my desktop completed startup, the system runs > fine until re-booted. This is probably the primary reason I might not > have realized that there was a problem as I don't boot the system > often except when traveling, which I was between July 1 and July 6 and > again July 18 when the system died. > > Any idea what I might try looking at? Oh good, its not just me. I note that this is happens when I'm not hardwired in at my docking station as the system doesn't get a routeable IP addr, until much later if on wireless. When watching the system boot, I think I might ctrl-c the sendmail startup or something when it starts to keep this from happening. Sean ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"