IPMI hardware watchdogs Re: dell r420/r320 stable/9

2012-07-27 Thread Andrew Boyer

On Jul 26, 2012, at 8:50 PM, Sean Bruno wrote:

> For the time being I had to revert the following from my stable/9 tree.
> Otherwise I would get a kernel panic on shutdown from ipmi(4).
> 
> http://svnweb.freebsd.org/base?view=revision&revision=237839
> http://svnweb.freebsd.org/base?view=revision&revision=221121
> 


On a somewhat related note: We noticed recently that you can't pet or disable 
the IPMI hardware watchdog once SCHEDULER_STOPPED() is true.  This means it can 
fire unexpectedly while you're dumping core or rebooting, depending on how long 
the timeout was on the pet before the panic.  The ipmi driver will need to 
process the command differently if the scheduler is stopped.  I haven't had 
time to look at a fix yet.

-Andrew

--
Andrew Boyerabo...@averesystems.com




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: IPMI hardware watchdogs Re: dell r420/r320 stable/9

2012-07-27 Thread Attilio Rao
On Fri, Jul 27, 2012 at 3:33 PM, Andrew Boyer  wrote:
>
> On Jul 26, 2012, at 8:50 PM, Sean Bruno wrote:
>
>> For the time being I had to revert the following from my stable/9 tree.
>> Otherwise I would get a kernel panic on shutdown from ipmi(4).
>>
>> http://svnweb.freebsd.org/base?view=revision&revision=237839
>> http://svnweb.freebsd.org/base?view=revision&revision=221121
>>
>
>
> On a somewhat related note: We noticed recently that you can't pet or disable 
> the IPMI hardware watchdog once SCHEDULER_STOPPED() is true.  This means it 
> can fire unexpectedly while you're dumping core or rebooting, depending on 
> how long the timeout was on the pet before the panic.  The ipmi driver will 
> need to process the command differently if the scheduler is stopped.  I 
> haven't had time to look at a fix yet.

I recall I fixed that internally for SV, but the key here is that we
need to find an unified (or a default policy).
More specifically, do we want the watchdog also covers the kernel dump
part (because of possible deadlocks when dumping). If the answer is
yes, we likely need pat the watchdog from within the dumping cycle
itself. If the answer is no, then we can just disable it when entering
the panic path. But anyway, we need to identify a default policy that
makes sense first.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: IPMI hardware watchdogs Re: dell r420/r320 stable/9

2012-07-27 Thread Andrew Boyer

On Jul 27, 2012, at 10:42 AM, Attilio Rao wrote:

> On Fri, Jul 27, 2012 at 3:33 PM, Andrew Boyer  wrote:
>> 
>> On Jul 26, 2012, at 8:50 PM, Sean Bruno wrote:
>> 
>>> For the time being I had to revert the following from my stable/9 tree.
>>> Otherwise I would get a kernel panic on shutdown from ipmi(4).
>>> 
>>> http://svnweb.freebsd.org/base?view=revision&revision=237839
>>> http://svnweb.freebsd.org/base?view=revision&revision=221121
>>> 
>> 
>> On a somewhat related note: We noticed recently that you can't pet or 
>> disable the IPMI hardware watchdog once SCHEDULER_STOPPED() is true.  This 
>> means it can fire unexpectedly while you're dumping core or rebooting, 
>> depending on how long the timeout was on the pet before the panic.  The ipmi 
>> driver will need to process the command differently if the scheduler is 
>> stopped.  I haven't had time to look at a fix yet.
> 
> I recall I fixed that internally for SV, but the key here is that we
> need to find an unified (or a default policy).
> More specifically, do we want the watchdog also covers the kernel dump
> part (because of possible deadlocks when dumping). If the answer is
> yes, we likely need pat the watchdog from within the dumping cycle
> itself. If the answer is no, then we can just disable it when entering
> the panic path. But anyway, we need to identify a default policy that
> makes sense first.
> 
> Attilio
> 

For our use case, we need the system to reset if the dump hangs.

As the code stands now, you can't disable the HW watchdog from the panic path.  
Prior to stopping the scheduler early in panic(), you don't know the lock 
state, so you can't safely initiate the IPMI command.  (It hung the first time 
I tried it.)  After stopping the scheduler, you can't pet it to turn it off.

-Andrew

--
Andrew Boyerabo...@averesystems.com




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: IPMI hardware watchdogs Re: dell r420/r320 stable/9

2012-07-27 Thread Attilio Rao
On Fri, Jul 27, 2012 at 3:55 PM, Andrew Boyer  wrote:
>
> On Jul 27, 2012, at 10:42 AM, Attilio Rao wrote:
>
>> On Fri, Jul 27, 2012 at 3:33 PM, Andrew Boyer  
>> wrote:
>>>
>>> On Jul 26, 2012, at 8:50 PM, Sean Bruno wrote:
>>>
 For the time being I had to revert the following from my stable/9 tree.
 Otherwise I would get a kernel panic on shutdown from ipmi(4).

 http://svnweb.freebsd.org/base?view=revision&revision=237839
 http://svnweb.freebsd.org/base?view=revision&revision=221121

>>>
>>> On a somewhat related note: We noticed recently that you can't pet or 
>>> disable the IPMI hardware watchdog once SCHEDULER_STOPPED() is true.  This 
>>> means it can fire unexpectedly while you're dumping core or rebooting, 
>>> depending on how long the timeout was on the pet before the panic.  The 
>>> ipmi driver will need to process the command differently if the scheduler 
>>> is stopped.  I haven't had time to look at a fix yet.
>>
>> I recall I fixed that internally for SV, but the key here is that we
>> need to find an unified (or a default policy).
>> More specifically, do we want the watchdog also covers the kernel dump
>> part (because of possible deadlocks when dumping). If the answer is
>> yes, we likely need pat the watchdog from within the dumping cycle
>> itself. If the answer is no, then we can just disable it when entering
>> the panic path. But anyway, we need to identify a default policy that
>> makes sense first.
>>
>> Attilio
>>
>
> For our use case, we need the system to reset if the dump hangs.

This means we might likely go to control by hand the watchdog patting
in the panic path and more specifically I guess this reduces to
patting the watching from within the dumping cycle (there could be
other expensive points we can consider but nothing that pop off my
head right now). Maybe Ryan can share with us if SV can contribute the
code back about that specific part.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


AHCI Timeout errors on Intel Patsburg

2012-07-27 Thread Steven Hartland

We're seeing some strange timeout errors on some new Supermicro
X9DRT-HF MB's we here when combined with KINGSTON HyperX 3K SSD's

It seems that when connnected to the second channel reads often
timeout stalling all IO under 8.3-RELEASE-p3

When this happens we see:-
Jul 27 14:35:59 lon059 kernel: ahcich1: Timeout on slot 0 port 0
Jul 27 14:35:59 lon059 kernel: ahcich1: is  cs  ss 0001 rs 
0001 tfd 40 serr 0088 cmd 0004c017
Jul 27 14:37:41 lon059 kernel: ahcich1: Timeout on slot 0 port 0
Jul 27 14:37:41 lon059 kernel: ahcich1: is  cs  ss 0001 rs 
0001 tfd 40 serr 0088 cmd 0004c017
Jul 27 14:38:35 lon059 kernel: ahcich1: Timeout on slot 0 port 0
Jul 27 14:38:35 lon059 kernel: ahcich1: is  cs  ss 0001 rs 
0001 tfd 40 serr 0088 cmd 0004c017
Jul 27 14:39:05 lon059 kernel: ahcich1: Timeout on slot 0 port 0
Jul 27 14:39:05 lon059 kernel: ahcich1: is  cs  ss 0001 rs 
0001 tfd 40 serr 0088 cmd 0004c017
Jul 27 14:39:39 lon059 kernel: ahcich1: Timeout on slot 0 port 0
Jul 27 14:39:39 lon059 kernel: ahcich1: is  cs  ss 0001 rs 
0001 tfd 40 serr 0088 cmd 0004c017
Jul 27 13:58:06 lon059 kernel: ahcich1: Timeout on slot 14 port 0
Jul 27 13:58:06 lon059 kernel: ahcich1: is  cs  ss 4000 rs 
4000 tfd 40 serr 0088 cmd 0004ce17
Jul 27 14:21:17 lon059 kernel: ahcich1: Timeout on slot 14 port 0
Jul 27 14:21:17 lon059 kernel: ahcich1: is  cs  ss 4000 rs 
4000 tfd 40 serr 0088 cmd 0004ce17
Jul 27 14:29:16 lon059 kernel: ahcich1: Timeout on slot 7 port 0
Jul 27 14:29:16 lon059 kernel: ahcich1: is  cs  ss 0080 rs 
0080 tfd 40 serr 0088 cmd 0004c717
Jul 27 14:31:43 lon059 kernel: ahcich1: Timeout on slot 12 port 0
Jul 27 14:31:43 lon059 kernel: ahcich1: is  cs  ss 1000 rs 
1000 tfd 40 serr 0088 cmd 0004cc17

The disk in ahcich0 is identical but doesn't seem to exhibit the
same problem. Thought it may be a disk issue even though they
are brand new but 2 out of the 3 machines tested have the same
problem.

In addition I've not managed to reproduce the issue if I force
sata to rev 2 with: hint.ahcich.1.sata_rev=2

Machine is running with the latest SSD and machine firmware / bios.

Could this be a ahci bug?

dmesg and camcontrol output:-

ahci0:  port 0x9050-0x9057,0x9040-0x9043,0x9030-0x9037,0x9020-0x9023,0x9000-0x901f mem 
0xdfa22000-0xdfa227ff irq 18 at device 31.2 on pci0

ahci0: [ITHREAD]
ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
ahcich0:  at channel 0 on ahci0
ahcich0: [ITHREAD]
ahcich1:  at channel 1 on ahci0
ahcich1: [ITHREAD]
ahcich2:  at channel 2 on ahci0
ahcich2: [ITHREAD]
ahcich3:  at channel 3 on ahci0
ahcich3: [ITHREAD]
ahcich4:  at channel 4 on ahci0
ahcich4: [ITHREAD]
ahcich5:  at channel 5 on ahci0
ahcich5: [ITHREAD]

camcontrol identify ada1
pass1:  ATA-8 SATA 3.x device
pass1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)

protocol  ATA/ATAPI-8 SATA 3.x
device model  KINGSTON SH103S3120G
firmware revision 501ABBF0
serial number 50026B7223027059
WWN   50026b7223027059
cylinders 16383
heads 16
sectors/track 63
sector size   logical 512, physical 512, offset 0
LBA supported 234441648 sectors
LBA48 supported   234441648 sectors
PIO supported PIO4
DMA supported WDMA2 UDMA6
media RPM non-rotating

Feature  Support  Enabled   Value   Vendor
read ahead yes  yes
write cacheyes  yes
flush cacheyes  yes
overlapno
Tagged Command Queuing (TCQ)   no   no
Native Command Queuing (NCQ)   yes  32 tags
SMART  yes  yes
microcode download yes  yes
security   yes  no
power management   yes  yes
advanced power management  yes  yes 254/0xFE
automatic acoustic management  no   no
media status notification  no   no
power-up in Standbyyes  no
write-read-verify  yes  no  0/0x0
unload yes  yes
free-fall  no   no
data set management (DSM/TRIM) yes
DSM - max 512byte blocks   yes  8
DSM - deterministic read   yes  any value

   Regards
   Steve 




This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or retu

9.1 beta powerpc64 installer bugs

2012-07-27 Thread Daniel Morante
I have 2 Mac Pro G5's and was testing out FreeBSD 9.1 beta.  I noticed 
some small problems in the BSD installer:


1. When using the guided partitioning with entire disk selected, if you
   delete the pre-made "/" and create a new one (to make it a different
   size for example).  After hitting OK, the installer asks you to
   create an Apple boot partition even though one already exists.  If I
   select yes, I now have two Apple boot partitions.
2. I also noticed that the numbering gets a little out of order if you
   don't delete ALL existing partitions first.  At some point I ended
   up with the numbering starting at 5 or skipping a 3 for example.
3. If the installer fails at some point and I restart it, the DHCP
   option comes back with a an error saying it was not able to assign
   an IP via DHCP.  I can continue without DCHP and the IP address is
   pre-configured, however DNS is not.
4. I wasn't able to create a ZFS partition (for example /data). I was
   able to add it from the guided partitioning screen, but when it came
   time to format and install, I received a mount error and the
   installer failed.


Other than that, the powerpc64 install on a Apple Mac Pro  G5 was 
successful.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: IPMI hardware watchdogs Re: dell r420/r320 stable/9

2012-07-27 Thread Andriy Gapon
on 27/07/2012 17:33 Andrew Boyer said the following:
> 
> On Jul 26, 2012, at 8:50 PM, Sean Bruno wrote:
> 
>> For the time being I had to revert the following from my stable/9 tree. 
>> Otherwise I would get a kernel panic on shutdown from ipmi(4).
>> 
>> http://svnweb.freebsd.org/base?view=revision&revision=237839 
>> http://svnweb.freebsd.org/base?view=revision&revision=221121
>> 
> 
> 
> On a somewhat related note: We noticed recently that you can't pet or disable
> the IPMI hardware watchdog once SCHEDULER_STOPPED() is true.  This means it
> can fire unexpectedly while you're dumping core or rebooting, depending on
> how long the timeout was on the pet before the panic.  The ipmi driver will
> need to process the command differently if the scheduler is stopped.  I
> haven't had time to look at a fix yet.

Yeah, I noticed that unlike most (all?) other watchdog drivers where watchdog
re-arming is a very basic operation like doing one I/O the IPMI watchdog does
some more complex stuff which involves waiting on another thread.  I think that
this may be a little bit too much for a reliable watchdog driver.  At least, as
you note, this definitely won't work for the panic case where only one thread is
left running.  I guess that the driver should check for that case and do a
direct operation instead of enqueueing a request and waiting for another thread
to execute it.

-- 
Andriy Gapon


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Reg:Not proberly dismount

2012-07-27 Thread Raja Mani
Dear All,


  I  have  FreeNAS 0.7.2 (Sabanda) and i already mounted the 1Tb
Hard disk .Past 2 years nas server was working fine.Now i got the following
errors,

*GEOM:da0: The primary gpt table is correpted or invalid*



And my data displaying some codes(*1Tb hard disk data*)
*
*
*freenas:~# ls -l /mnt/nas/lost+fount/#00987*
*freenas:~# ls -l /mnt/nas/lost+fount/#00100*
*freenas:~# ls -l /mnt/nas/lost+fount/#00200*


Kindly give me advise how to solve the issue.


Regards,
Rajamani
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Regression in stable for ThinkPad T520 with Intel GPU (Sandybridge) between June 22 and July 18

2012-07-27 Thread Sean Bruno
On Tue, 2012-07-24 at 18:35 -0700, Kevin Oberman wrote:
> I will shortly spend a bit of time tracking down the breakage more
> closely, but my 9-Stable system of June 22 runs fine. After an update
> on about July 10, I noted that it would hang after Xorg was started,
> but usually worked. After an upgrade to July 18, my system could no
> longer start Gnome. It would start Xorg and Gnome would start
> normally, getting many apps started, but about 10 seconds after the
> wallpaper loaded, the system would freeze solid. No network access and
> no response to mouse or keyboard.
> 
> I have looked into commits to 9-STABLE during the time at issue and
> very little seems to have changed due to the pre-9.1 freeze.
> Similarly, nothing much has changed in any of the X11 ports.
> 
> This really smells a lot like a race condition. I can trigger the same
> behavior by enabling VT-x (not VT-d) in BIOS. In all cases where it
> was intermittent, if my desktop completed startup, the system runs
> fine until re-booted. This is probably the primary reason I might not
> have realized that there was a problem as I don't boot the system
> often except when traveling, which I was between July 1 and July 6 and
> again July 18 when the system died.
> 
> Any idea what I might try looking at?


Oh good, its not just me.  I note that this is happens when I'm not
hardwired in at my docking station as the system doesn't get a routeable
IP addr, until much later if on wireless.

When watching the system boot, I think I might ctrl-c the sendmail
startup or something when it starts to keep this from happening.

Sean

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"