Re: [ATA] and re(4) stability issues

2008-12-10 Thread Andrey V. Elsukov

Victor Balada Diaz wrote:

Digging at linux source code i've found that they do some special things
for this chipset that i've been unable to find on our code. This is
linux code for my chipset:

371 AHCI_HFLAGS (AHCI_HFLAG_IGN_SERR_INTERNAL |
372  AHCI_HFLAG_32BIT_ONLY | AHCI_HFLAG_NO_MSI |
373  AHCI_HFLAG_SECT255),

File and the rest of the code in here[3].

As i saw AHCI_HFLAG_NO_MSI i tried doing the easiest thing i could
think of, switching MSI and MSI-x off for the whole system, so
i added to /boot/loader.conf this tunables:


FreeBSD's ata(4) driver doesn't support MSI. This flag in linux's libata used in

if ((hpriv->flags & AHCI_HFLAG_NO_MSI) || pci_enable_msi(pdev))
pci_intx(pdev, 1);

In FreeBSD's code we have the same:

/* enable PCI interrupt */
pci_write_config(dev, PCIR_COMMAND,
 pci_read_config(dev, PCIR_COMMAND, 2) & ~0x0400, 2);

AHCI_HFLAG_IGN_SERR_INTERNAL flag targeted to ignore SERR_INTERNAL errors.
FreeBSD's ata(4) driver ignores they too.

AHCI_HFLAG_32BIT_ONLY flag limits to use 32-bit DMA only.
If AHCI CAP register reports that controller supports 64-bit DMA driver will 
use 64-bit.
So i think there can be added one quirk for you, but i'm not sure that problem 
is here..

AHCI_HFLAG_SECT255 flag limits I/O operation to 255 sectors, FreeBSD uses 
128-limit
by default.

--
WBR, Andrey V. Elsukov

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [ATA] and re(4) stability issues

2008-12-10 Thread Victor Balada Diaz
On Wed, Dec 10, 2008 at 03:12:26PM +0900, Pyun YongHyeon wrote:
> On Tue, Dec 09, 2008 at 07:52:37PM +0100, Victor Balada Diaz wrote:
>  > Hello,
>  > 
>  > I got various machines[1] at hetzner.de and I've been having problems
>  > with interrupts on FreeBSD 7.0 and now FreeBSD 7.1 -BETA2 in amd64. I've
>  > been trying to narrow the problem so someone more knowledgeable than me
>  > is able to fix it. This mail is an other attempt to ask a question
>  > with regards ATA code to see if this time i got something.
>  > 
>  > For the ones that don't actually know what happened:
>  > 
>  > With FreeBSD 7.0 -RELEASE for amd64 and default kernel
>  > the system shared re0 interrupt with OHCI and this caused
>  > re(4) to corrupt packets and create interrupt storms. Tried
> 
> re(4) in 7.0-RELEASE had bus_dma(9) bug which could be easily
> triggered on systems with > 4GB memory. But I dont' know whether
> this is related with interrupt storms.
> 
>  > updating to 7.1 -BETA2 and still had some problems with it.
>  > 
>  > I've opened the PR kern/128287[2] and Remko quickly answered
>  > with a workaround: that workaround was removing USB support from
>  > my kernel. I did it and re(4) wasn't sharing interrupts anylonger,
>  > and the interrupt storms were gone. Now sometime later the interface
>  > goes up and down from time to time, but less often. Also sometimes
>  > the machine losts the network interface but continues to work.
>  > 
> 
> It seems that your controller supports MSI so you can set a tunable
> hw.re.msi_disable to 0 to enable MSI. With MSI you can remove
> interrupt sharing(e.g. add hw.re.msi_disable="0" to
> /boot/loader.conf file.) However there were several issues on re(4)
> w.r.t MSI so it was off by default.

This is undocumented and with sysctl -a i can't find the tunable. Is this
a HEAD feature or it's also in 7.1 -BETA2? Should i add
hw.re_msi_disable="0" to /boot/loader.conf?

This was sharing interrupt with USB, does USB need any special MSI handling
or with re using MSI is enough to not share the interrupt?


> 
>  > I know it continues to work because some days later i can see that
>  > it tried to deliver the status reports but was unable to resolve the
>  > aliases hostnames. I can't ping the machine and i know the network
>  > is OK. If i reboot the machine everything is working again.
>  > 
> 
> Recently I've made small changes to re(4) which may help to detect
> link state change event. Would you try re(4) in HEAD?

Can i just drop HEAD's /stable/7/sys/dev/re/ in -STABLE and test that
or do i need to test the whole HEAD kernel?

Regards.

-- 
La prueba más fehaciente de que existe vida inteligente en otros
planetas, es que no han intentado contactar con nosotros. 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [ATA] and re(4) stability issues

2008-12-10 Thread Victor Balada Diaz
On Wed, Dec 10, 2008 at 11:58:12AM +0300, Andrey V. Elsukov wrote:
> Victor Balada Diaz wrote:
> >Digging at linux source code i've found that they do some special things
> >for this chipset that i've been unable to find on our code. This is
> >linux code for my chipset:
> >
> >371 AHCI_HFLAGS (AHCI_HFLAG_IGN_SERR_INTERNAL |
> >372  AHCI_HFLAG_32BIT_ONLY | 
> >AHCI_HFLAG_NO_MSI |
> >373  AHCI_HFLAG_SECT255),
> >
> >File and the rest of the code in here[3].
> >
> >As i saw AHCI_HFLAG_NO_MSI i tried doing the easiest thing i could
> >think of, switching MSI and MSI-x off for the whole system, so
> >i added to /boot/loader.conf this tunables:
> 
> FreeBSD's ata(4) driver doesn't support MSI. This flag in linux's libata 
> used in
> 
> if ((hpriv->flags & AHCI_HFLAG_NO_MSI) || pci_enable_msi(pdev))
> pci_intx(pdev, 1);
> 
> In FreeBSD's code we have the same:
> 
> /* enable PCI interrupt */
> pci_write_config(dev, PCIR_COMMAND,
>  pci_read_config(dev, PCIR_COMMAND, 2) & ~0x0400, 2);
> 
> AHCI_HFLAG_IGN_SERR_INTERNAL flag targeted to ignore SERR_INTERNAL errors.
> FreeBSD's ata(4) driver ignores they too.
> 
> AHCI_HFLAG_32BIT_ONLY flag limits to use 32-bit DMA only.
> If AHCI CAP register reports that controller supports 64-bit DMA driver 
> will use 64-bit.
> So i think there can be added one quirk for you, but i'm not sure that 
> problem is here..
> 
> AHCI_HFLAG_SECT255 flag limits I/O operation to 255 sectors, FreeBSD uses 
> 128-limit
> by default.

Thanks for explaining me what the flags do. I'm not skilled enough to create
the DMA quirks but if you could give me some patches i'll test them. Also
if you have any other idea on what could i test or how can i debug this
it would be more than welcome.

Thanks.
Regards.

-- 
La prueba más fehaciente de que existe vida inteligente en otros
planetas, es que no han intentado contactar con nosotros. 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [ATA] and re(4) stability issues

2008-12-10 Thread Søren Schmidt

On 10Dec, 2008, at 10:11 , Victor Balada Diaz wrote:


Thanks for explaining me what the flags do. I'm not skilled enough  
to create
the DMA quirks but if you could give me some patches i'll test them.  
Also
if you have any other idea on what could i test or how can i debug  
this

it would be more than welcome.



Comment out the following two lines in ata_ahci_dmainit():

if (ATA_INL(ctlr->r_res2, ATA_AHCI_CAP) & ATA_AHCI_CAP_64BIT)
ch->dma->max_address = BUS_SPACE_MAXADDR;

And you will not use 64bit DMA even if the chipset supports it.  
However I have not seen any chipsets supporting this fail, YMMV as  
usual :)


-Søren






___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [ATA] and re(4) stability issues

2008-12-10 Thread Victor Balada Diaz
On Wed, Dec 10, 2008 at 07:28:00PM +0900, Pyun YongHyeon wrote:
> On Wed, Dec 10, 2008 at 09:59:35AM +0100, Victor Balada Diaz wrote:
>  > On Wed, Dec 10, 2008 at 03:12:26PM +0900, Pyun YongHyeon wrote:
>  > > On Tue, Dec 09, 2008 at 07:52:37PM +0100, Victor Balada Diaz wrote:
>  > >  > Hello,
>  > >  > 
>  > >  > I got various machines[1] at hetzner.de and I've been having problems
>  > >  > with interrupts on FreeBSD 7.0 and now FreeBSD 7.1 -BETA2 in amd64. 
> I've
>  > >  > been trying to narrow the problem so someone more knowledgeable than 
> me
>  > >  > is able to fix it. This mail is an other attempt to ask a question
>  > >  > with regards ATA code to see if this time i got something.
>  > >  > 
>  > >  > For the ones that don't actually know what happened:
>  > >  > 
>  > >  > With FreeBSD 7.0 -RELEASE for amd64 and default kernel
>  > >  > the system shared re0 interrupt with OHCI and this caused
>  > >  > re(4) to corrupt packets and create interrupt storms. Tried
>  > > 
>  > > re(4) in 7.0-RELEASE had bus_dma(9) bug which could be easily
>  > > triggered on systems with > 4GB memory. But I dont' know whether
>  > > this is related with interrupt storms.
>  > > 
>  > >  > updating to 7.1 -BETA2 and still had some problems with it.
>  > >  > 
>  > >  > I've opened the PR kern/128287[2] and Remko quickly answered
>  > >  > with a workaround: that workaround was removing USB support from
>  > >  > my kernel. I did it and re(4) wasn't sharing interrupts anylonger,
>  > >  > and the interrupt storms were gone. Now sometime later the interface
>  > >  > goes up and down from time to time, but less often. Also sometimes
>  > >  > the machine losts the network interface but continues to work.
>  > >  > 
>  > > 
>  > > It seems that your controller supports MSI so you can set a tunable
>  > > hw.re.msi_disable to 0 to enable MSI. With MSI you can remove
>  > > interrupt sharing(e.g. add hw.re.msi_disable="0" to
>  > > /boot/loader.conf file.) However there were several issues on re(4)
>  > > w.r.t MSI so it was off by default.
>  > 
>  > This is undocumented and with sysctl -a i can't find the tunable. Is this
>  > a HEAD feature or it's also in 7.1 -BETA2? Should i add
> 
> Yeah it's an undocmented feature. But most drivers written by me
> have similar kobs. Both HEAD and stable/7 including 7.1 BETA2 have
> the tunable.

I think it could be great if you could document it or at least
show it by default when you do sysctl -ad with a small description.

> 
>  > hw.re_msi_disable="0" to /boot/loader.conf?
>^
>Shoule be hw.re.msi_disable="0"
>  > 
> 
> Yes, just add it to /boot/loader.conf. Note, you should not disable
> system-wide MSI control(e.g. hw.pci.enable_msi == 1).
> 
>  > This was sharing interrupt with USB, does USB need any special MSI handling
>  > or with re using MSI is enough to not share the interrupt?
> 
> If re(4) can use MSI, you don't need to worry about interrupt
> sharing with USB. Check the output of "vmstat -i". You normally get
> an irq256 or higher for MSI enabled driver.
> 
>  > 
>  > 
>  > > 
>  > >  > I know it continues to work because some days later i can see that
>  > >  > it tried to deliver the status reports but was unable to resolve the
>  > >  > aliases hostnames. I can't ping the machine and i know the network
>  > >  > is OK. If i reboot the machine everything is working again.
>  > >  > 
>  > > 
>  > > Recently I've made small changes to re(4) which may help to detect
>  > > link state change event. Would you try re(4) in HEAD?
>  > 
>  > Can i just drop HEAD's /stable/7/sys/dev/re/ in -STABLE and test that
> 
> Yes, you can. It should build without problems. Just replace re(4) on
> stable/7 with HEAD version.
> 
>  > or do i need to test the whole HEAD kernel?
>  > 
> 
> No you don't have to that.

Backporting the changes i've found that it didn't compile so in
the end i got from HEAD the following files:

base/head/sys/dev/re/if_re.c
base/head/sys/pci/if_rl.c
base/head/sys/pci/if_rlreg.h

After that i've recompiled 7.1 -BETA2 GENERIC kernel and enabled
the knob you suggested in /boot/loader.conf.

With the new kernel and MSI the interrupts are like this:

# vmstat -i
interrupt  total   rate
irq9: acpi01  0
irq16: ohci0   1  0
irq17: ohci1 ohci3 1  0
irq18: ohci2 ohci4 1  0
irq22: atapci0 19215 15
cpu0: timer  2502718   1998
irq256: re0  4967726   3967
cpu1: timer  2502525   1998
Total9992188   7980

The high interrupt numbers are because i've been running iperf to
check everything it's fine, not because of interrupt storms. So far
i didn't find any interrupt storms related to USB or re(4) driver
but while doing the tests i've found this error:

re0: watchdo

Re: [ATA] and re(4) stability issues

2008-12-10 Thread Pyun YongHyeon
On Wed, Dec 10, 2008 at 12:32:25PM +0100, Victor Balada Diaz wrote:
 > On Wed, Dec 10, 2008 at 07:28:00PM +0900, Pyun YongHyeon wrote:
 > > On Wed, Dec 10, 2008 at 09:59:35AM +0100, Victor Balada Diaz wrote:
 > >  > On Wed, Dec 10, 2008 at 03:12:26PM +0900, Pyun YongHyeon wrote:
 > >  > > On Tue, Dec 09, 2008 at 07:52:37PM +0100, Victor Balada Diaz wrote:
 > >  > >  > Hello,
 > >  > >  > 
 > >  > >  > I got various machines[1] at hetzner.de and I've been having 
 > > problems
 > >  > >  > with interrupts on FreeBSD 7.0 and now FreeBSD 7.1 -BETA2 in 
 > > amd64. I've
 > >  > >  > been trying to narrow the problem so someone more knowledgeable 
 > > than me
 > >  > >  > is able to fix it. This mail is an other attempt to ask a question
 > >  > >  > with regards ATA code to see if this time i got something.
 > >  > >  > 
 > >  > >  > For the ones that don't actually know what happened:
 > >  > >  > 
 > >  > >  > With FreeBSD 7.0 -RELEASE for amd64 and default kernel
 > >  > >  > the system shared re0 interrupt with OHCI and this caused
 > >  > >  > re(4) to corrupt packets and create interrupt storms. Tried
 > >  > > 
 > >  > > re(4) in 7.0-RELEASE had bus_dma(9) bug which could be easily
 > >  > > triggered on systems with > 4GB memory. But I dont' know whether
 > >  > > this is related with interrupt storms.
 > >  > > 
 > >  > >  > updating to 7.1 -BETA2 and still had some problems with it.
 > >  > >  > 
 > >  > >  > I've opened the PR kern/128287[2] and Remko quickly answered
 > >  > >  > with a workaround: that workaround was removing USB support from
 > >  > >  > my kernel. I did it and re(4) wasn't sharing interrupts anylonger,
 > >  > >  > and the interrupt storms were gone. Now sometime later the 
 > > interface
 > >  > >  > goes up and down from time to time, but less often. Also sometimes
 > >  > >  > the machine losts the network interface but continues to work.
 > >  > >  > 
 > >  > > 
 > >  > > It seems that your controller supports MSI so you can set a tunable
 > >  > > hw.re.msi_disable to 0 to enable MSI. With MSI you can remove
 > >  > > interrupt sharing(e.g. add hw.re.msi_disable="0" to
 > >  > > /boot/loader.conf file.) However there were several issues on re(4)
 > >  > > w.r.t MSI so it was off by default.
 > >  > 
 > >  > This is undocumented and with sysctl -a i can't find the tunable. Is 
 > > this
 > >  > a HEAD feature or it's also in 7.1 -BETA2? Should i add
 > > 
 > > Yeah it's an undocmented feature. But most drivers written by me
 > > have similar kobs. Both HEAD and stable/7 including 7.1 BETA2 have
 > > the tunable.
 > 
 > I think it could be great if you could document it or at least
 > show it by default when you do sysctl -ad with a small description.
 > 

If MSI worked as expected I would have documented it as I did
in msk(4)/nfe(4)/ale(4)/age(4)/jme(4) etc.
Using MSI on RealTek does not seem to stable. I tried hard to fix
that but some users still reported watchdog timeouts. Working
without documentation and hardware also made it hard to complete
the work. This was the main reason why MSI was disabled on re(4).

 > > 
 > >  > hw.re_msi_disable="0" to /boot/loader.conf?
 > >^
 > >Shoule be hw.re.msi_disable="0"
 > >  > 
 > > 
 > > Yes, just add it to /boot/loader.conf. Note, you should not disable
 > > system-wide MSI control(e.g. hw.pci.enable_msi == 1).
 > > 
 > >  > This was sharing interrupt with USB, does USB need any special MSI 
 > > handling
 > >  > or with re using MSI is enough to not share the interrupt?
 > > 
 > > If re(4) can use MSI, you don't need to worry about interrupt
 > > sharing with USB. Check the output of "vmstat -i". You normally get
 > > an irq256 or higher for MSI enabled driver.
 > > 
 > >  > 
 > >  > 
 > >  > > 
 > >  > >  > I know it continues to work because some days later i can see that
 > >  > >  > it tried to deliver the status reports but was unable to resolve 
 > > the
 > >  > >  > aliases hostnames. I can't ping the machine and i know the network
 > >  > >  > is OK. If i reboot the machine everything is working again.
 > >  > >  > 
 > >  > > 
 > >  > > Recently I've made small changes to re(4) which may help to detect
 > >  > > link state change event. Would you try re(4) in HEAD?
 > >  > 
 > >  > Can i just drop HEAD's /stable/7/sys/dev/re/ in -STABLE and test that
 > > 
 > > Yes, you can. It should build without problems. Just replace re(4) on
 > > stable/7 with HEAD version.
 > > 
 > >  > or do i need to test the whole HEAD kernel?
 > >  > 
 > > 
 > > No you don't have to that.
 > 
 > Backporting the changes i've found that it didn't compile so in
 > the end i got from HEAD the following files:
 > 
 > base/head/sys/dev/re/if_re.c
 > base/head/sys/pci/if_rl.c
 > base/head/sys/pci/if_rlreg.h
 > 

Ah,, sorry about that. Recently there was some changes. I forgot
that.

 > After that i've recompiled 7.1 -BETA2 GENERIC kernel and enabled
 > the knob you suggested in /boot/loader.conf.
 > 
 > With the new kernel 

Re: visibility of release process

2008-12-10 Thread Ken Smith
On Tue, 2008-12-09 at 15:19 -0600, Kevin Day wrote:

[ Other points were not ignored, just nothing to really say about them
other than "Yes" and/or "Will try", etc. ]

> * More notice to hubs@ before the release notes are generated. The  
> releases always come with a "At the time of this writing, these  
> mirrors have the full distribution" list. If it was announced to us  
> mirror operators before that list is made, we could make sure we were  
> synced in time to be included. Maybe even a semi-shaming of "These  
> mirrors do not appear to have the required bits:". The difference in  
> bandwidth we see on our public mirror (ftp3.us) is pretty extreme if  
> we're listed there or not, which seems to be a 50/50 coin-toss on the  
> last few releases. I'm honestly not sure why, since we can easily pull  
>  >50mbps from ftp-master.

I have absolutely no clue how to fairly handle what needs to be done
with this so, as you noted, its something of a coin toss at the moment.

The issue is that the Release Announcement needs to include a list of
FTP sites, but the list can't be "too long" (as in can't be every mirror
site we've got).  The Release Announcement should be relatively short
and to the point.  An exhaustive list of every mirror is a bit too much
in that regard.  Ideally we'd just say "Its available on a mirror site,
get it from there.".  But people want easy so we need to include
something to click on.  With the 7.0 release I tried giving just the URL
of the primary site (ftp.freebsd.org) but that proved people don't just
want easy - they're lazy.  For the most part they just clicked on that
and didn't look around for a mirror.  Hence your observation about the
difference in bandwidth when you're listed versus when you're not
listed.

Since we don't have any sort of "click here and automagically land on a
nice fast mirror real close to you" I basically make a quick survey of
some FTP sites shooting for having several of the primary mirror sites
(ftpX.freebsd.org) and a sampling of geographically diverse country
mirrors (ftpX.au.freebsd.org, ftpX.ru.freebsd.org, etc.).  If you're one
of the ones I check and if you've got the right sparc64 checksum file
(I'm looking for sites that carry everything, and since sparc64 is
usually the last to get loaded on ftp-master ...) you make the list.

Sorry, I know it sucks.  Until we've got something automagic I'm not
quite sure how to fairly handle having a list that's not "too long" for
a release announcement but still providing a reasonable starting point
for people who want something to click on in the release announcement.

-- 
Ken Smith
- From there to here, from here to  |   [EMAIL PROTECTED]
  there, funny things are everywhere.   |
  - Theodore Geisel |



signature.asc
Description: This is a digitally signed message part


Re: [ATA] and re(4) stability issues

2008-12-10 Thread Arnaud Houdelette

Victor Balada Diaz a écrit :

Hello,

I got various machines[1] at hetzner.de and I've been having problems
with interrupts on FreeBSD 7.0 and now FreeBSD 7.1 -BETA2 in amd64. I've
been trying to narrow the problem so someone more knowledgeable than me
is able to fix it. This mail is an other attempt to ask a question
with regards ATA code to see if this time i got something.

For the ones that don't actually know what happened:

With FreeBSD 7.0 -RELEASE for amd64 and default kernel
the system shared re0 interrupt with OHCI and this caused
re(4) to corrupt packets and create interrupt storms. Tried
updating to 7.1 -BETA2 and still had some problems with it.

I've opened the PR kern/128287[2] and Remko quickly answered
with a workaround: that workaround was removing USB support from
my kernel. I did it and re(4) wasn't sharing interrupts anylonger,
and the interrupt storms were gone. Now sometime later the interface
goes up and down from time to time, but less often. Also sometimes
the machine losts the network interface but continues to work.

I know it continues to work because some days later i can see that
it tried to deliver the status reports but was unable to resolve the
aliases hostnames. I can't ping the machine and i know the network
is OK. If i reboot the machine everything is working again.

When switched from 7.0 to 7.1 BETA2 i also found that under load
after some hours the machine created interrupt storms on ATA disks.

Digging at linux source code i've found that they do some special things
for this chipset that i've been unable to find on our code. This is
linux code for my chipset:

371 AHCI_HFLAGS (AHCI_HFLAG_IGN_SERR_INTERNAL |
372  AHCI_HFLAG_32BIT_ONLY | AHCI_HFLAG_NO_MSI |
373  AHCI_HFLAG_SECT255),

File and the rest of the code in here[3].

As i saw AHCI_HFLAG_NO_MSI i tried doing the easiest thing i could
think of, switching MSI and MSI-x off for the whole system, so
i added to /boot/loader.conf this tunables:

hw.pci.enable_msix="0"
hw.pci.enable_msi="0"

And then rebooted the machine. After various hours of doing almost nothing
i've found that the machine answered ping but was unable to answer any
request (eg, ssh, nagios nrpe, etc). The machine recovered itself after
some minutes and when i was able to ssh into i saw the following in dmesg:

ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing 
request directly
ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing 
request directly
ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request 
directly
ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request 
directly
ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly
ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=1463123158

and a lot more errors like that. I didn't get this errors with MSI enabled.
I see WRITE_DMA48 and in linux code i saw AHCI_HFLAG_32BIT_ONLY which is later
used for DMA related things. Could someone who is more knowledgeable check
if we're doing the right thing?

I've attached verbose dmesg of a machine that's like this one with
7.1 -BETA2, MSI enabled and GENERIC kernel minus USB and firewrire.

Also, please, could someone give me a hand on how could i continue debugging
this interrupt issues? I'm a bit lost and digging code and posting each
time i think i've found something is not going to go anywhere.

I would also like to say that i've seen reports of this kind of problems
on amd64 machines in the lists since various years ago, so i don't think
this is just a problem with this BIOS/motherboard (MSI K9AG Neo2 Digital)
on the lists


Thanks in advance for any help.
Regards.


[1]: http://www.hetzner.de/hosting/produkte_rootserver/ds7000/
[2]: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/128287
[3]: http://fxr.watson.org/fxr/source/drivers/ata/ahci.c?v=linux-2.6#L369
  


Sorry I didn't take the time to read all the thread, but I got similar 
problem with the same IXP600 chipset.
Only it was'nt with a Realtek NIC (re) but with a Ralink wireless one. 
The simptoms where similar : interrupt 22 was shared between the sata 
controler and the wireless card. And I got Interrupt Storms at random 
times when using the wireless network.


No problem since I removed the ral(4) NIC (got a real access point now).
You might not want to point the finger at the re(4) driver too fast.

Arnaud Houdelette


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [ATA] and re(4) stability issues

2008-12-10 Thread Pyun YongHyeon
On Wed, Dec 10, 2008 at 09:59:35AM +0100, Victor Balada Diaz wrote:
 > On Wed, Dec 10, 2008 at 03:12:26PM +0900, Pyun YongHyeon wrote:
 > > On Tue, Dec 09, 2008 at 07:52:37PM +0100, Victor Balada Diaz wrote:
 > >  > Hello,
 > >  > 
 > >  > I got various machines[1] at hetzner.de and I've been having problems
 > >  > with interrupts on FreeBSD 7.0 and now FreeBSD 7.1 -BETA2 in amd64. I've
 > >  > been trying to narrow the problem so someone more knowledgeable than me
 > >  > is able to fix it. This mail is an other attempt to ask a question
 > >  > with regards ATA code to see if this time i got something.
 > >  > 
 > >  > For the ones that don't actually know what happened:
 > >  > 
 > >  > With FreeBSD 7.0 -RELEASE for amd64 and default kernel
 > >  > the system shared re0 interrupt with OHCI and this caused
 > >  > re(4) to corrupt packets and create interrupt storms. Tried
 > > 
 > > re(4) in 7.0-RELEASE had bus_dma(9) bug which could be easily
 > > triggered on systems with > 4GB memory. But I dont' know whether
 > > this is related with interrupt storms.
 > > 
 > >  > updating to 7.1 -BETA2 and still had some problems with it.
 > >  > 
 > >  > I've opened the PR kern/128287[2] and Remko quickly answered
 > >  > with a workaround: that workaround was removing USB support from
 > >  > my kernel. I did it and re(4) wasn't sharing interrupts anylonger,
 > >  > and the interrupt storms were gone. Now sometime later the interface
 > >  > goes up and down from time to time, but less often. Also sometimes
 > >  > the machine losts the network interface but continues to work.
 > >  > 
 > > 
 > > It seems that your controller supports MSI so you can set a tunable
 > > hw.re.msi_disable to 0 to enable MSI. With MSI you can remove
 > > interrupt sharing(e.g. add hw.re.msi_disable="0" to
 > > /boot/loader.conf file.) However there were several issues on re(4)
 > > w.r.t MSI so it was off by default.
 > 
 > This is undocumented and with sysctl -a i can't find the tunable. Is this
 > a HEAD feature or it's also in 7.1 -BETA2? Should i add

Yeah it's an undocmented feature. But most drivers written by me
have similar kobs. Both HEAD and stable/7 including 7.1 BETA2 have
the tunable.

 > hw.re_msi_disable="0" to /boot/loader.conf?
   ^
   Shoule be hw.re.msi_disable="0"
 > 

Yes, just add it to /boot/loader.conf. Note, you should not disable
system-wide MSI control(e.g. hw.pci.enable_msi == 1).

 > This was sharing interrupt with USB, does USB need any special MSI handling
 > or with re using MSI is enough to not share the interrupt?

If re(4) can use MSI, you don't need to worry about interrupt
sharing with USB. Check the output of "vmstat -i". You normally get
an irq256 or higher for MSI enabled driver.

 > 
 > 
 > > 
 > >  > I know it continues to work because some days later i can see that
 > >  > it tried to deliver the status reports but was unable to resolve the
 > >  > aliases hostnames. I can't ping the machine and i know the network
 > >  > is OK. If i reboot the machine everything is working again.
 > >  > 
 > > 
 > > Recently I've made small changes to re(4) which may help to detect
 > > link state change event. Would you try re(4) in HEAD?
 > 
 > Can i just drop HEAD's /stable/7/sys/dev/re/ in -STABLE and test that

Yes, you can. It should build without problems. Just replace re(4) on
stable/7 with HEAD version.

 > or do i need to test the whole HEAD kernel?
 > 

No you don't have to that.

-- 
Regards,
Pyun YongHyeon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [ATA] and re(4) stability issues

2008-12-10 Thread Oliver Peter
On Tue, Dec 09, 2008 at 07:52:37PM +0100, Victor Balada Diaz wrote:
> Hello,
> 
> I got various machines[1] at hetzner.de and I've been having problems
> with interrupts on FreeBSD 7.0 and now FreeBSD 7.1 -BETA2 in amd64. I've
> been trying to narrow the problem so someone more knowledgeable than me
> is able to fix it. This mail is an other attempt to ask a question
> with regards ATA code to see if this time i got something.

Just want to add a quick note and say that I'm having the same problem
with my 7.0-RELEASE-p6/amd64 hetzner machine:

 http://lists.freebsd.org/pipermail/freebsd-acpi/2008-September/005095.html

I would be happy to test patches as well.  Thanks.

-- 
Oliver PETER, email: [EMAIL PROTECTED], ICQ# 113969174
"If it feels good, you're doing something wrong."
  -- Coach McTavish
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: visibility of release process

2008-12-10 Thread Skip Ford
Ken Smith wrote:
> With the 7.0 release I tried giving just the URL
> of the primary site (ftp.freebsd.org) but that proved people don't just
> want easy - they're lazy.  For the most part they just clicked on that
> and didn't look around for a mirror.  Hence your observation about the
> difference in bandwidth when you're listed versus when you're not
> listed.

Any idea if most of those ISO downloaders are really installing a fresh
system or are just updating from a previous release by reinstalling?

It seems to me many more people could be using freebsd-update(8) so
the announcement really could focus on upgrades rather than fresh
installs.  I obviously like FreeBSD myself, but how many new users
who need to download ISOs really come on board with each new release?
The freebsd-update(8) portion of "Updating existing systems"
could be the main focus of the announcement, and the "Availability"
section and "updating existing systems from source" sections could
just contain a link pointing to the web site since (I believe) the number
of users needing those should be limited.  No FTP listing in the
announcement at all.

I guess freebsd-update(8) currently has some limitations that make it
not so cut-and-dry.  But I'm a little confused anyway at this point as
to what the long-term plans are.  There's a CVS repo, SVN repo which
appears to be the way things will be, a "projects" svn repo, a
"projects" p4 repo, cvs(1) in base, csup(1) in base which is still
being worked on even though there appears to be a slow migration to
svn, svn(1) is in ports, there's no SVN repo for the ports tree but
there is for src, freebsd-update(8) exists for binary upgrades which
seems to be the way of the future for a huge majority of end-users, and
yet the official mirrors are missing both the SVN src repo and binary
update files.  It seems to me the mirrors and release announcement are
behind the times by pointing to source upgrades and ISO downloads,
or maybe I'm just a little too early.  I hope core has a plan for
all of this. :)

-- 
Skip
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [ATA] and re(4) stability issues

2008-12-10 Thread Victor Balada Diaz
On Wed, Dec 10, 2008 at 12:08:40PM +, Oliver Peter wrote:
> On Tue, Dec 09, 2008 at 07:52:37PM +0100, Victor Balada Diaz wrote:
> > Hello,
> > 
> > I got various machines[1] at hetzner.de and I've been having problems
> > with interrupts on FreeBSD 7.0 and now FreeBSD 7.1 -BETA2 in amd64. I've
> > been trying to narrow the problem so someone more knowledgeable than me
> > is able to fix it. This mail is an other attempt to ask a question
> > with regards ATA code to see if this time i got something.
> 
> Just want to add a quick note and say that I'm having the same problem
> with my 7.0-RELEASE-p6/amd64 hetzner machine:
> 
>  
> http://lists.freebsd.org/pipermail/freebsd-acpi/2008-September/005095.html
> 
> I would be happy to test patches as well.  Thanks.

Hello Oliver,

What i did so far and improved a lot the experience was:

1) Upgrade at least the if_re code to RELENG_7. This fixes issues
   of packet corruption on ssh sessions.

2) Delete from your kernel config USB and firewire. This prevents
   the realtek interrupt to be shared.

After this, with 7.1 -BETA2 the systems are more or less stable, but
after a while the ATA controller starts to create interrupt storms.
I wasn't able to find why.

With the help that i've received in this thread from Pyun
YongHyeon (Thanks!!) i'm also trying this suggestions:

3) Backport this 3 files from current to 7.1 -BETA2:

base/head/sys/dev/re/if_re.c
base/head/sys/pci/if_rl.c
base/head/sys/pci/if_rlreg.h

You can fetch them from http://svn.freebsd.org/. With them and
adding to /boot/loader.conf this tunable:

hw.re.msi_disable="0"

I can use GENERIC kernel again (ie, USB enabled) and so far
i didn't find any problem yet. No more interface up/down problems
and no more interrupt storms. I must say that i haven't tested
this enough, because the interrupt storms in ATA code start to
happen after a few days of uptime load, but at least the problems
with the realtek seem to be gone. 

If you upgrade to 7.1 -BETA2 you'll also get SATA support for
the IXP card. With 7.0 it will work as ATA 33 in compatibility mode.

Maybe someone with write access to the wiki could add it somewhere
so that other hetzner users that are having the same problems
could use the same workarounds :)

I hope this helps you.

Regards.

-- 
La prueba más fehaciente de que existe vida inteligente en otros
planetas, es que no han intentado contactar con nosotros. 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [ATA] and re(4) stability issues

2008-12-10 Thread Gary Jennejohn
On Wed, 10 Dec 2008 21:07:19 +0900
Pyun YongHyeon <[EMAIL PROTECTED]> wrote:
> On Wed, Dec 10, 2008 at 12:32:25PM +0100, Victor Balada Diaz wrote:

>  > As these seems to improve the current situation, is there any
>  > chance of merging -current driver in 7.1 before release?
>  >
>
> I think re(4) in HEAD needs more testing. As you might know RealTek
> produced too many chipsets. :-(
>

FYI I've now turned MSI on in HEAD and will see what happens.  Before
my re0 was sharing interrupts with 3 USB controllers.  Now it's all
by itself on irq256.

I'm running amd64 with

re0:  port 0xde00-0xdeff mem 0xfdaff000-0xfdaf,
0xfdae-0xfdae irq 18 at device 0.0 on pci2

---
Gary Jennejohn
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Yahoo! Groups: Welcome to SKATINGEXTREME. Visit today!

2008-12-10 Thread SKATINGEXTREME Moderator

 Hello,

 Thank you for being a subscriber to this list. I hope that you get a 
chance to check out my web sites. I have one at http://www.howard.net/hilary
and one also at
http://www.webpost.net/sk/SkatersChoice   .  I will try to be in the chat 
room on Thursday evenings. Go to Hilarys Skating Center then if you want to try 
to chat. I would love to get a lot of skaters out there in a chat. Pass the 
word and lets find some skaters eh. Thanks,
   Hilary
 

Complete your Yahoo! Groups account:
--
Your email address has been added to the email list of a Yahoo! Group.
To gain access to all of your group's web features (previous messages,
photos, files, calendar, etc.) and easier control of your message
delivery options, we highly recommend that you complete your account
by connecting your email address to Yahoo account. It is easy and free.
Please visit:
http://groups.yahoo.com/convacct?email=stable%40FreeBSD.org&list=SKATINGEXTREME

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 

 



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [ATA] and re(4) stability issues

2008-12-10 Thread Victor Balada Diaz
On Wed, Dec 10, 2008 at 09:07:19PM +0900, Pyun YongHyeon wrote:
> On Wed, Dec 10, 2008 at 12:32:25PM +0100, Victor Balada Diaz wrote:
>  > On Wed, Dec 10, 2008 at 07:28:00PM +0900, Pyun YongHyeon wrote:
>  > > On Wed, Dec 10, 2008 at 09:59:35AM +0100, Victor Balada Diaz wrote:
>  > >  > On Wed, Dec 10, 2008 at 03:12:26PM +0900, Pyun YongHyeon wrote:
>  > >  > > On Tue, Dec 09, 2008 at 07:52:37PM +0100, Victor Balada Diaz wrote:
>  > >  > >  > Hello,
>  > >  > >  > 
>  > >  > >  > I got various machines[1] at hetzner.de and I've been having 
> problems
>  > >  > >  > with interrupts on FreeBSD 7.0 and now FreeBSD 7.1 -BETA2 in 
> amd64. I've
>  > >  > >  > been trying to narrow the problem so someone more knowledgeable 
> than me
>  > >  > >  > is able to fix it. This mail is an other attempt to ask a 
> question
>  > >  > >  > with regards ATA code to see if this time i got something.
>  > >  > >  > 
>  > >  > >  > For the ones that don't actually know what happened:
>  > >  > >  > 
>  > >  > >  > With FreeBSD 7.0 -RELEASE for amd64 and default kernel
>  > >  > >  > the system shared re0 interrupt with OHCI and this caused
>  > >  > >  > re(4) to corrupt packets and create interrupt storms. Tried
>  > >  > > 
>  > >  > > re(4) in 7.0-RELEASE had bus_dma(9) bug which could be easily
>  > >  > > triggered on systems with > 4GB memory. But I dont' know whether
>  > >  > > this is related with interrupt storms.
>  > >  > > 
>  > >  > >  > updating to 7.1 -BETA2 and still had some problems with it.
>  > >  > >  > 
>  > >  > >  > I've opened the PR kern/128287[2] and Remko quickly answered
>  > >  > >  > with a workaround: that workaround was removing USB support from
>  > >  > >  > my kernel. I did it and re(4) wasn't sharing interrupts 
> anylonger,
>  > >  > >  > and the interrupt storms were gone. Now sometime later the 
> interface
>  > >  > >  > goes up and down from time to time, but less often. Also 
> sometimes
>  > >  > >  > the machine losts the network interface but continues to work.
>  > >  > >  > 
>  > >  > > 
>  > >  > > It seems that your controller supports MSI so you can set a tunable
>  > >  > > hw.re.msi_disable to 0 to enable MSI. With MSI you can remove
>  > >  > > interrupt sharing(e.g. add hw.re.msi_disable="0" to
>  > >  > > /boot/loader.conf file.) However there were several issues on re(4)
>  > >  > > w.r.t MSI so it was off by default.
>  > >  > 
>  > >  > This is undocumented and with sysctl -a i can't find the tunable. Is 
> this
>  > >  > a HEAD feature or it's also in 7.1 -BETA2? Should i add
>  > > 
>  > > Yeah it's an undocmented feature. But most drivers written by me
>  > > have similar kobs. Both HEAD and stable/7 including 7.1 BETA2 have
>  > > the tunable.
>  > 
>  > I think it could be great if you could document it or at least
>  > show it by default when you do sysctl -ad with a small description.
>  > 
> 
> If MSI worked as expected I would have documented it as I did
> in msk(4)/nfe(4)/ale(4)/age(4)/jme(4) etc.
> Using MSI on RealTek does not seem to stable. I tried hard to fix
> that but some users still reported watchdog timeouts. Working
> without documentation and hardware also made it hard to complete
> the work. This was the main reason why MSI was disabled on re(4).

What do you think about adding a note in the man page telling that
it's experimental and in some cases it could improve the situation
but in others it will give errors? 

> 
>  > > 
>  > >  > hw.re_msi_disable="0" to /boot/loader.conf?
>  > >^
>  > >Shoule be hw.re.msi_disable="0"
>  > >  > 
>  > > 
>  > > Yes, just add it to /boot/loader.conf. Note, you should not disable
>  > > system-wide MSI control(e.g. hw.pci.enable_msi == 1).
>  > > 
>  > >  > This was sharing interrupt with USB, does USB need any special MSI 
> handling
>  > >  > or with re using MSI is enough to not share the interrupt?
>  > > 
>  > > If re(4) can use MSI, you don't need to worry about interrupt
>  > > sharing with USB. Check the output of "vmstat -i". You normally get
>  > > an irq256 or higher for MSI enabled driver.
>  > > 
>  > >  > 
>  > >  > 
>  > >  > > 
>  > >  > >  > I know it continues to work because some days later i can see 
> that
>  > >  > >  > it tried to deliver the status reports but was unable to resolve 
> the
>  > >  > >  > aliases hostnames. I can't ping the machine and i know the 
> network
>  > >  > >  > is OK. If i reboot the machine everything is working again.
>  > >  > >  > 
>  > >  > > 
>  > >  > > Recently I've made small changes to re(4) which may help to detect
>  > >  > > link state change event. Would you try re(4) in HEAD?
>  > >  > 
>  > >  > Can i just drop HEAD's /stable/7/sys/dev/re/ in -STABLE and test that
>  > > 
>  > > Yes, you can. It should build without problems. Just replace re(4) on
>  > > stable/7 with HEAD version.
>  > > 
>  > >  > or do i need to test the whole HEAD kernel?
>  > >  > 
>  > > 
>  > > No you don't have to that.
>  > 
>  > Backpor

iir(4) support under 7.1

2008-12-10 Thread Heinrich Rebehn

Hi list,

i am planning to upgrade our main server from 6.1 to 7.1.
The machine has a ICP Vortex GDT8524RZ raid controller which is  
handled buy the iir(4) driver.


Since i have seen various discussions in the past about Adaptec no  
longer supporting these controllers, driver reaching EOE, data  
corruption in 64bit configurtions with > 4G RAM and so on, i just  
wanted to ask what the current state of the driver is.


Thanks for your help,


Heinrich Rebehn

University of Bremen
Physics / Electrical and Electronics Engineering
- Department of Telecommunications -

Phone : +49/421/218-4664
Fax   :-3341

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [ATA] and re(4) stability issues

2008-12-10 Thread Oliver Peter
On Wed, Dec 10, 2008 at 03:01:30PM +0100, Victor Balada Diaz wrote:
> On Wed, Dec 10, 2008 at 12:08:40PM +, Oliver Peter wrote:
> > On Tue, Dec 09, 2008 at 07:52:37PM +0100, Victor Balada Diaz wrote:
...
> I can use GENERIC kernel again (ie, USB enabled) and so far
> i didn't find any problem yet. No more interface up/down problems
> and no more interrupt storms. I must say that i haven't tested
> this enough, because the interrupt storms in ATA code start to
> happen after a few days of uptime load, but at least the problems
> with the realtek seem to be gone. 

I found out that I'm able to 'force' the interrupt storm by provoking
higher disk I/O.  Just let dd write to a file in a loop for some hours
and watch vmstat:

while true; do dd if=/dev/zero of=BLA bs=1M count=1000; done

First you'll see that the throughput will decrease, and a few
hours later you'll have /var/log/messages / dmesg full of
interrupt storm messages.
 
> If you upgrade to 7.1 -BETA2 you'll also get SATA support for
> the IXP card. With 7.0 it will work as ATA 33 in compatibility mode.

Wow!  That's good to hear as well.  I'll definitely switch to
-STABLE or 7.1-PRERELASE sooner or later.  I'll just give it a try
on my other machines at first.
 
> I hope this helps you.

Absolutely, cheers mate.  I owe you one!

~ollie

-- 
Oliver PETER, email: [EMAIL PROTECTED], ICQ# 113969174
"If it feels good, you're doing something wrong."
  -- Coach McTavish
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [ATA] and re(4) stability issues

2008-12-10 Thread Victor Balada Diaz
On Wed, Dec 10, 2008 at 01:18:00PM +0100, Arnaud Houdelette wrote:
> Victor Balada Diaz a écrit :
> >Hello,
> >
> >I got various machines[1] at hetzner.de and I've been having problems
> >with interrupts on FreeBSD 7.0 and now FreeBSD 7.1 -BETA2 in amd64. I've
> >been trying to narrow the problem so someone more knowledgeable than me
> >is able to fix it. This mail is an other attempt to ask a question
> >with regards ATA code to see if this time i got something.
> >
> >[...] 
> 
> Sorry I didn't take the time to read all the thread, but I got similar 
> problem with the same IXP600 chipset.
> Only it was'nt with a Realtek NIC (re) but with a Ralink wireless one. 
> The simptoms where similar : interrupt 22 was shared between the sata 
> controler and the wireless card. And I got Interrupt Storms at random 
> times when using the wireless network.
> 
> No problem since I removed the ral(4) NIC (got a real access point now).
> You might not want to point the finger at the re(4) driver too fast.
> 
> Arnaud Houdelette
Hello Arnaud,

I didn't say the problem was just because of re(4). Actually i think the
there were two problems, one with re(4) and other with ata(4). The reason
why i talked about both of them in the same mail is because i thought
that as two drivers were affected, maybe the problem was in other part
of the operating system and that could help the developers to debug the
problem.

My re(4) card isn't sharing the interrupt with IXP600, it's sharing
the interrupt with USB controller. In this case i think the problem
is fixed with the advices from Pyun YongHyeon (backporting the driver
from HEAD and using MSI for interrupts).

I think the problems with ata(4) code will appear again after a few
days of load, as they always do, so i'll keep trying to debug them.

Regards.

-- 
La prueba más fehaciente de que existe vida inteligente en otros
planetas, es que no han intentado contactar con nosotros. 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


bce(4) and rx errors

2008-12-10 Thread Vlad GALU
  Hello. Sorry for crossposting, but I wasn't sure which mailing list
was the most appropriate for this email.
I have an application pulling about 220Kpps from a bce(4) card
(details below). At what seems to be random times, errors start
showing up on that interface (I'm watching it with netstat -w1 -I), so
about 10% of the initial 220Kpps is reported as errors. Bringing the
interface down and then back up clears the errors, but they do
reappear at a later time. Before they reappear, the systems manages to
pull the full 220Kpps as before.
  This is a temporary setup, we'll very soon use an Intel fiber card,
but I thought this issue was worth mentioning, as I don't think it's a
hardware problem (the switch also reports no errors).

  The system is running a fresh (yesterday's) RELENG_7. The card is
onboard, on a HP DL380 G5. Here's the pciconf output:

-- cut here --
[EMAIL PROTECTED]:2:0:0:  class=0x060400 card=0x chip=0x01031166
rev=0xc3 hdr=0x01
vendor = 'ServerWorks (Was: Reliance Computer Corp)'
device = 'BCM5715 Broadcom dual gigabit, pci bridge'
class  = bridge
subclass   = PCI-PCI
[EMAIL PROTECTED]:3:0:0:class=0x02 card=0x7038103c chip=0x164c14e4
rev=0x12 hdr=0x00
vendor = 'Broadcom Corporation'
device = '5708C Broadcom NetXtreme II Gigabit Ethernet Adapter'
class  = network
subclass   = ethernet
-- and here --

  Regards,
  Vlad

-- 
~/.signature: no such file or directory
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


7.0 unusual performance issue - vmdaemon hang?

2008-12-10 Thread Steven Hartland

Just had one of hour webservers flag as down here and on
investigation the machine seems to be struggling due to
a hung vmdaemon process.

top is reporting vmdaemon as using a constant 55.57% CPU
yet CPU time is not increasing:-

last pid: 36492;  load averages:  0.04,  0.05,  .11   up 89+19:53:21  
14:36:08
223 processes: 9 running, 201 sleeping, 13 waiting
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 644M Active, 2780M Inact, 480M Wired, 249M Cache, 214M Buf, 3759M Free
Swap: 4096M Total, 537M Used, 3559M Free, 13% Inuse

 PID USERNAME THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
  11 root   1 171 ki31 0K16K CPU7   7 2116.4 100.00% idle: cpu7
  12 root   1 171 ki31 0K16K CPU6   6 2059.5 100.00% idle: cpu6
  13 root   1 171 ki31 0K16K CPU5   5 2029.3 100.00% idle: cpu5
  14 root   1 171 ki31 0K16K CPU4   4 1977.8 100.00% idle: cpu4
  15 root   1 171 ki31 0K16K CPU3   3 1912.0 100.00% idle: cpu3
  16 root   1 171 ki31 0K16K CPU2   2 1835.2 100.00% idle: cpu2
  17 root   1 171 ki31 0K16K CPU1   1 1763.1 100.00% idle: cpu1
  18 root   1 171 ki31 0K16K RUN0 1727.6 100.00% idle: cpu0
  37 root   1  20- 0K16K psleep 5   0:56 55.57% vmdaemon
60198 www1   4098M 13516K sbwait 2  35:21  1.46% httpd
60264 www1   40   133M  9248K sbwait 0  21:21  0.39% httpd
  30 root   1 -68- 0K16K -  7  18.3H  0.00% em1 taskq
  29 root   1 -68- 0K16K -  6 330:21  0.00% em0 taskq
  41 root   1  20- 0K16K syncer 1 212:42  0.00% syncer
  21 root   1 -44- 0K16K WAIT   0 201:02  0.00% swi1: net
  19 root   1 -32- 0K16K WAIT   0 120:15  0.00% swi4: clock
  22 root   1  44- 0K16K -  5  73:00  0.00% yarrow

I've tried to ktrace the process and it produced nothing, also tried
gdb and it failed to attach. Is there anything else I can try before
we reboot the machine to help determine what the problem is?

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to [EMAIL PROTECTED]

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


zfs panics

2008-12-10 Thread Danny Braniss
hi,
from a solaris or linux client, doing a ls(1) of a nfs exported zfs 
file,
for example: ls /net/zfs-server/h/.zfs/snapshot,
panics the server. The server is running latest 7.1-prerelease.
when client is freebsd, it mostly works, but in a few cases
the server just goes into comma.
btw, the server is running vanilla zfs, no tunning, and the server is 
64bit with 8gb of memory and quad core (dell-pe2950)

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x168
fault code  = supervisor write data, page not present
instruction pointer = 0x8:0x804a9175
stack pointer   = 0x10:0xb71fc550
frame pointer   = 0x10:0xb71fc560
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 802 (nfsd)
[thread pid 802 tid 100185 ]
Stopped at  _mtx_lock_flags+0x15:   lock cmpxchgq   %rsi,0x50(%rdi)
db> tr
Tracing pid 802 tid 100185 td 0xff0004d576e0
_mtx_lock_flags() at _mtx_lock_flags+0x15
vput() at vput+0x45
nfsrv_readdirplus() at nfsrv_readdirplus+0x83e
nfssvc() at nfssvc+0x400
syscall() at syscall+0x1bb
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (155, FreeBSD ELF64, nfssvc), rip = 0x8006885cc, rsp = 
0x7fffea2


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: zfs panics

2008-12-10 Thread Jaakko Heinonen

Hi,

On 2008-12-10, Danny Braniss wrote:
>   from a solaris or linux client, doing a ls(1) of a nfs exported zfs 
> file,
> for example: ls /net/zfs-server/h/.zfs/snapshot,
> panics the server. The server is running latest 7.1-prerelease.

This has been reported as PR kern/125149. I have described the problem
in this message:

http://lists.freebsd.org/pipermail/freebsd-fs/2008-October/005217.html

See the PR for RELENG_7 patches.
(http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/125149)

-- 
Jaakko
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: bce(4) and rx errors

2008-12-10 Thread Jeff Blank
On Wed, Dec 10, 2008 at 04:59:26PM +0200, Vlad GALU wrote:
> I have an application pulling about 220Kpps from a bce(4) card
> (details below). At what seems to be random times, errors start
> showing up on that interface (I'm watching it with netstat -w1 -I), so
> about 10% of the initial 220Kpps is reported as errors.

I'm also seeing a pretty steady stream of errors on both bce
interfaces in a Dell PowerEdge 2950 III.  In my case, the source
is RELENG_7_1 from ~14:00 UTC yesterday (9 Dec).  Throughput does not
seem to be affected.  "sysctl -a | egrep -i 'bce.*err'" yields all
zeroes, for whatever that's worth.

[EMAIL PROTECTED]:0:0:0:class=0x06 card=0x80868086 chip=0x25c08086 
rev=0x12 hdr=0x00
vendor = 'Intel Corporation'
device = '5000X Chipset Memory Controller Hub'
class  = bridge
subclass   = HOST-PCI
[EMAIL PROTECTED]:0:2:0:class=0x060400 card=0x chip=0x25e28086 
rev=0x12 hdr=0x01
vendor = 'Intel Corporation'
device = '5000 Series Chipset PCIe x4 Port 2'
class  = bridge
subclass   = PCI-PCI
[EMAIL PROTECTED]:0:3:0:class=0x060400 card=0x chip=0x25e38086 
rev=0x12 hdr=0x01
vendor = 'Intel Corporation'
device = '5000 Series Chipset PCIe x4 Port 3'
class  = bridge
subclass   = PCI-PCI
[EMAIL PROTECTED]:0:4:0:class=0x060400 card=0x chip=0x25f88086 
rev=0x12 hdr=0x01
vendor = 'Intel Corporation'
device = '5000 Series Chipset PCIe x8 Port 4-5'
class  = bridge
subclass   = PCI-PCI
[EMAIL PROTECTED]:0:5:0:class=0x060400 card=0x chip=0x25e58086 
rev=0x12 hdr=0x01
vendor = 'Intel Corporation'
device = '5000 Series Chipset PCIe x4 Port 5'
class  = bridge
subclass   = PCI-PCI
[EMAIL PROTECTED]:0:6:0:class=0x060400 card=0x chip=0x25f98086 
rev=0x12 hdr=0x01
vendor = 'Intel Corporation'
device = '5000 Series Chipset PCIe x8 Port 6-7'
class  = bridge
subclass   = PCI-PCI
[EMAIL PROTECTED]:0:7:0:class=0x060400 card=0x chip=0x25e78086 
rev=0x12 hdr=0x01
vendor = 'Intel Corporation'
device = '5000 Series Chipset PCIe x4 Port 7'
class  = bridge
subclass   = PCI-PCI
[EMAIL PROTECTED]:0:16:0:   class=0x06 card=0x01b21028 chip=0x25f08086 
rev=0x12 hdr=0x00
vendor = 'Intel Corporation'
device = '5000 Series Chipset Error Reporting Registers'
class  = bridge
subclass   = HOST-PCI
[EMAIL PROTECTED]:0:16:1:   class=0x06 card=0x01b21028 chip=0x25f08086 
rev=0x12 hdr=0x00
vendor = 'Intel Corporation'
device = '5000 Series Chipset Error Reporting Registers'
class  = bridge
subclass   = HOST-PCI
[EMAIL PROTECTED]:0:16:2:   class=0x06 card=0x01b21028 chip=0x25f08086 
rev=0x12 hdr=0x00
vendor = 'Intel Corporation'
device = '5000 Series Chipset Error Reporting Registers'
class  = bridge
subclass   = HOST-PCI
[EMAIL PROTECTED]:0:17:0:   class=0x06 card=0x80868086 chip=0x25f18086 
rev=0x12 hdr=0x00
vendor = 'Intel Corporation'
device = '5000 Series Chipset Reserved Registers'
class  = bridge
subclass   = HOST-PCI
[EMAIL PROTECTED]:0:19:0:   class=0x06 card=0x80868086 chip=0x25f38086 
rev=0x12 hdr=0x00
vendor = 'Intel Corporation'
device = '5000 Series Chipset Reserved Registers'
class  = bridge
subclass   = HOST-PCI
[EMAIL PROTECTED]:0:21:0:   class=0x06 card=0x80868086 chip=0x25f58086 
rev=0x12 hdr=0x00
vendor = 'Intel Corporation'
device = '5000 Series Chipset FBD Registers'
class  = bridge
subclass   = HOST-PCI
[EMAIL PROTECTED]:0:22:0:   class=0x06 card=0x80868086 chip=0x25f68086 
rev=0x12 hdr=0x00
vendor = 'Intel Corporation'
device = '5000 Series Chipset FBD Registers'
class  = bridge
subclass   = HOST-PCI
[EMAIL PROTECTED]:0:28:0:   class=0x060400 card=0x01b21028 chip=0x26908086 
rev=0x09 hdr=0x01
vendor = 'Intel Corporation'
device = '631xESB/632xESB/3100 PCIe Root Port 1'
class  = bridge
subclass   = PCI-PCI
[EMAIL PROTECTED]:0:29:0:   class=0x0c0300 card=0x01b21028 chip=0x26888086 
rev=0x09 hdr=0x00
vendor = 'Intel Corporation'
device = '631xESB/632xESB/3100 Chipset USB Universal Host Controller'
class  = serial bus
subclass   = USB
[EMAIL PROTECTED]:0:29:1:   class=0x0c0300 card=0x01b21028 chip=0x26898086 
rev=0x09 hdr=0x00
vendor = 'Intel Corporation'
device = '631xESB/632xESB/3100 Chipset USB Universal Host Controller'
class  = serial bus
subclass   = USB
[EMAIL PROTECTED]:0:29:2:   class=0x0c0300 card=0x01b21028 chip=0x268a8086 
rev=0x09 hdr=0x00
vendor = 'Intel Corporation'
device = '631xESB/632xESB/3100 Chipset USB Universal Host Controller'
class  =

Re: RELENG_7_1: bce driver change generating too much interrupts ?

2008-12-10 Thread Mike Jakubik
On Mon, December 8, 2008 5:22 pm, Mike Jakubik wrote:
> On Mon, December 8, 2008 5:12 pm, Xin LI wrote:
>
>> Which version are you currently using?  My previous commit only fixes
>> the excessive interrupt issue, I think this could be a different
>> problem, I'm taking a look at the code to see if I can have something
>> for you.
>
> I was running on the version just prior to the latest interrupt commit. I
> have now updated to the one with the interrupt fix. Will let you know if
> things change.
>
> Thank You.

The interrupt rate has decreased significantly, however i am still having
having problem with applications that hold stateful connections. The rx
errors are also still showing, i suspect this is related to the problem.
How can i roll back this driver to the last known good version?

Thanks.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: bce(4) and rx errors

2008-12-10 Thread Mike Jakubik
On Wed, December 10, 2008 11:03 am, Jeff Blank wrote:
> On Wed, Dec 10, 2008 at 04:59:26PM +0200, Vlad GALU wrote:
>> I have an application pulling about 220Kpps from a bce(4) card
>> (details below). At what seems to be random times, errors start
>> showing up on that interface (I'm watching it with netstat -w1 -I), so
>> about 10% of the initial 220Kpps is reported as errors.
>
> I'm also seeing a pretty steady stream of errors on both bce
> interfaces in a Dell PowerEdge 2950 III.  In my case, the source
> is RELENG_7_1 from ~14:00 UTC yesterday (9 Dec).  Throughput does not
> seem to be affected.  "sysctl -a | egrep -i 'bce.*err'" yields all
> zeroes, for whatever that's worth.

See the "RELENG_7_1: bce driver change generating too much interrupts ?"
thread. This problem as surfaced since the recent bce driver changes.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: RELENG_7_1: bce driver change generating too much interrupts ?

2008-12-10 Thread Xin LI
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Mike Jakubik wrote:
> On Mon, December 8, 2008 5:22 pm, Mike Jakubik wrote:
>> On Mon, December 8, 2008 5:12 pm, Xin LI wrote:
>>
>>> Which version are you currently using?  My previous commit only fixes
>>> the excessive interrupt issue, I think this could be a different
>>> problem, I'm taking a look at the code to see if I can have something
>>> for you.
>> I was running on the version just prior to the latest interrupt commit. I
>> have now updated to the one with the interrupt fix. Will let you know if
>> things change.
>>
>> Thank You.
> 
> The interrupt rate has decreased significantly, however i am still having
> having problem with applications that hold stateful connections. The rx
> errors are also still showing, i suspect this is related to the problem.
> How can i roll back this driver to the last known good version?

If you are using CVS to track the -stable tree:

cd /usr/src/sys/dev/bce
cvs -q up -rRELENG_7 -D2008/11/01

If not, then the process would be a bit complicated.  You need to
checkout from anoncvs, e.g.:

cvs -q -d [EMAIL PROTECTED]:/home/ncvs login
cvs -q -d [EMAIL PROTECTED]:/home/ncvs co -rRELENG_7
- -D2008/11/01 sys/dev/bce
cd sys/dev/bce
cp * /sys/dev/bce

Cheers,
- --
Xin LI <[EMAIL PROTECTED]>  http://www.delphij.net/
FreeBSD - The Power to Serve!
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (FreeBSD)

iEYEARECAAYFAklAAKUACgkQi+vbBBjt66AhrwCfXI5aPX3q/E26KcW7HovtPSct
LnoAn0QNK/l65eYMiUvGBDUfHDyeXJ9Z
=r8So
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [ATA] and re(4) stability issues

2008-12-10 Thread Peter Jeremy
On 2008-Dec-10 10:55:35 +0100, Søren Schmidt <[EMAIL PROTECTED]> wrote:
>And you will not use 64bit DMA even if the chipset supports it.  
>However I have not seen any chipsets supporting this fail, YMMV as  
>usual :)

There's a reference in wikipedia pointing to
http://www.mail-archive.com/[EMAIL PROTECTED]/msg06694.html
that claims the AMD/ATI SB600 lies about supporting 64-bit DMA in AHCI
mode.  I have a SB600 but it doesn't have >4GB to test on.

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.


pgp1ifE19lUGB.pgp
Description: PGP signature


Re: bce(4) and rx errors

2008-12-10 Thread Vlad GALU
On 12/10/08, Mike Jakubik <[EMAIL PROTECTED]> wrote:
> On Wed, December 10, 2008 11:03 am, Jeff Blank wrote:
>  > On Wed, Dec 10, 2008 at 04:59:26PM +0200, Vlad GALU wrote:
>  >> I have an application pulling about 220Kpps from a bce(4) card
>  >> (details below). At what seems to be random times, errors start
>  >> showing up on that interface (I'm watching it with netstat -w1 -I), so
>  >> about 10% of the initial 220Kpps is reported as errors.
>  >
>  > I'm also seeing a pretty steady stream of errors on both bce
>  > interfaces in a Dell PowerEdge 2950 III.  In my case, the source
>  > is RELENG_7_1 from ~14:00 UTC yesterday (9 Dec).  Throughput does not
>  > seem to be affected.  "sysctl -a | egrep -i 'bce.*err'" yields all
>  > zeroes, for whatever that's worth.
>
>
> See the "RELENG_7_1: bce driver change generating too much interrupts ?"
>  thread. This problem as surfaced since the recent bce driver changes.

Thanks Mike, I'll give it a shot.

-- 
~/.signature: no such file or directory
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: bce(4) and rx errors

2008-12-10 Thread Vlad GALU
On 12/10/08, Vlad GALU <[EMAIL PROTECTED]> wrote:
> On 12/10/08, Mike Jakubik <[EMAIL PROTECTED]> wrote:
>  > On Wed, December 10, 2008 11:03 am, Jeff Blank wrote:
>  >  > On Wed, Dec 10, 2008 at 04:59:26PM +0200, Vlad GALU wrote:
>  >  >> I have an application pulling about 220Kpps from a bce(4) card
>  >  >> (details below). At what seems to be random times, errors start
>  >  >> showing up on that interface (I'm watching it with netstat -w1 -I), so
>  >  >> about 10% of the initial 220Kpps is reported as errors.
>  >  >
>  >  > I'm also seeing a pretty steady stream of errors on both bce
>  >  > interfaces in a Dell PowerEdge 2950 III.  In my case, the source
>  >  > is RELENG_7_1 from ~14:00 UTC yesterday (9 Dec).  Throughput does not
>  >  > seem to be affected.  "sysctl -a | egrep -i 'bce.*err'" yields all
>  >  > zeroes, for whatever that's worth.
>  >
>  >
>  > See the "RELENG_7_1: bce driver change generating too much interrupts ?"
>  >  thread. This problem as surfaced since the recent bce driver changes.
>
>
> Thanks Mike, I'll give it a shot.

   Indeed, the errors seem to have gone away after rolling back the
driver, as Xin Li suggested in another thread. Sorry for the noise!

-- 
~/.signature: no such file or directory
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


why can't multiple programs listen to cuaXYZ anymore? (7-stable)

2008-12-10 Thread Steve Franks
Before my last cvsup, I could have cutecom & a custom configuration
app (i.e gpsd) running at the same time on the same serial port.  Any
incoming data, both would echo it, and as long as only one was
outputting data, that worked fine too.  Now it's 'broke'.  I hear
noise about TTY changes, I assume that changed the underlying devices,
as well?

This used to be a major perk over windows for embedded systems guys
like memakes debugging serial devices a snap.

Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [ATA] and re(4) stability issues

2008-12-10 Thread Pyun YongHyeon
On Wed, Dec 10, 2008 at 03:08:24PM +0100, Victor Balada Diaz wrote:
 > On Wed, Dec 10, 2008 at 09:07:19PM +0900, Pyun YongHyeon wrote:

[...]

 > >  > >  > > It seems that your controller supports MSI so you can set a 
 > > tunable
 > >  > >  > > hw.re.msi_disable to 0 to enable MSI. With MSI you can remove
 > >  > >  > > interrupt sharing(e.g. add hw.re.msi_disable="0" to
 > >  > >  > > /boot/loader.conf file.) However there were several issues on 
 > > re(4)
 > >  > >  > > w.r.t MSI so it was off by default.
 > >  > >  > 
 > >  > >  > This is undocumented and with sysctl -a i can't find the tunable. 
 > > Is this
 > >  > >  > a HEAD feature or it's also in 7.1 -BETA2? Should i add
 > >  > > 
 > >  > > Yeah it's an undocmented feature. But most drivers written by me
 > >  > > have similar kobs. Both HEAD and stable/7 including 7.1 BETA2 have
 > >  > > the tunable.
 > >  > 
 > >  > I think it could be great if you could document it or at least
 > >  > show it by default when you do sysctl -ad with a small description.
 > >  > 
 > > 
 > > If MSI worked as expected I would have documented it as I did
 > > in msk(4)/nfe(4)/ale(4)/age(4)/jme(4) etc.
 > > Using MSI on RealTek does not seem to stable. I tried hard to fix
 > > that but some users still reported watchdog timeouts. Working
 > > without documentation and hardware also made it hard to complete
 > > the work. This was the main reason why MSI was disabled on re(4).
 > 
 > What do you think about adding a note in the man page telling that
 > it's experimental and in some cases it could improve the situation
 > but in others it will give errors? 

Based on the your testing I have idea how to mitigate the missing
Tx completion interrupt. If all goes well re(4) could reliably take
advantage of MSI on RealTek controllers. If that miserably fail I
would do as you suggested.

 > > 
 > > I think re(4) in HEAD needs more testing. As you might know RealTek
 > > produced too many chipsets. :-(
 > 
 > Ok, i'll use the backported driver as it works better for me :-)
 > 
 > If i can help you testing any patches i'm more than welcome to do it.
 > 
 > Thanks a lot for your help Pyun YongHyeon.
 > 

You're welcome.
-- 
Regards,
Pyun YongHyeon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: visibility of release process

2008-12-10 Thread Thomas Hurst
* Patrick Lamaizière ([EMAIL PROTECTED]) wrote:

> Ruben van Staveren <[EMAIL PROTECTED]> a écrit :
> > Though experimental, I'm greatly enjoying
> > http://www.secnetix.de/olli/FreeBSD/svnews/?p=/stable/7
> 
> Nice. There is also http://freshbsd.org/ (really cool IMHO). 

Thanks; I write/run that.  I'm currently developing version 2, which
should bring much better performance and more powerful filtering; e.g.
by filename, multiple committers, branches, etc, as well as history all
the way back to r1.

Since it's working off a local SVN mirror rather than commit emails,
it's also feasable to support things like local copies of diffs.

Of course, now I need to generalize it back to the other BSD's

-- 
Thomas 'Freaky' Hurst
http://hur.st/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [ATA] and re(4) stability issues

2008-12-10 Thread Victor Balada Diaz
On Wed, Dec 10, 2008 at 09:07:19PM +0900, Pyun YongHyeon wrote:
> On Wed, Dec 10, 2008 at 12:32:25PM +0100, Victor Balada Diaz wrote:
>  > Also i didn't see any problem with interfaces going up and down,
>  > but that usually happen after some hours of uptime, so i'll let
>  > you know if the error happens again.
>  > 

After writing to the HD with dd for a few hours and using
stress -i 10 -d 10 the machine lost connectivity. I waited until
today to be sure if the machine hung, paniced or just lost network
connectivity. I don't have local access or serial access, so this
is the only way i could do it. I've seen in the logs during the
night various messages of:


Dec 10 00:33:49 yac kernel: re0: watchdog timeout
Dec 10 00:33:49 yac kernel: re0: link state changed to DOWN
Dec 10 00:33:52 yac kernel: re0: link state changed to UP

The interface never recovered and i wasn't able to ping the machine
until i rebooted. Nagios was checking all the time and no recovery
happened.

The netstat -i in daily scripts shows just one Oerrs. I'm used to
have a lot of them, but seems this time the card didn't recover from
the only one. I also want to say that this is not a regression, as
it happened before with 7.1 -BETA 2 code.

Is there anything more i can try?

Regards.
-- 
La prueba más fehaciente de que existe vida inteligente en otros
planetas, es que no han intentado contactar con nosotros. 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"