Re: Default route changes unexpectedly

2013-03-06 Thread Andre Oppermann

On 05.03.2013 18:39, Nick Rogers wrote:

Hello,

I am attempting to create awareness of a serious issue affecting users
of FreeBSD 9.x and PF. There appears to be a bug that allows the
kernel's routing table to be corrupted by traffic routing through the
system. Under heavy traffic load, the default route can seemingly
randomly change to an IP address that is not directly connected to the
network (i.e., is not configured anywhere). Dhclient is not in the
mix, nor is routed, bgpd, etc. Running `route monitor` shows no
evidence of the change in the default route. The one commonality
between all the systems experiencing this problem seems to be the use
of PF.

Obviously this is a serious problem as it causes all Internet-bound
traffic to stop routing until the default route is corrected. Some
users, including myself, are working around this problem by installing
a script that runs multiple times a second to check if the default
route is incorrect and fixing it if necessary, which mitigates the
amount of downtime caused by the bug.


Can you describe your traffic forwarding setup in more detail?
Is it only pf, or do you run netgraph, or other things as well?
Do you use flow routing?

How frequent does this happen?

I'm trying to create a stack graph to see which parts of the network
stack are involved in handling your packet.

--
Andre


Please refer to these past posts for more examples and evidence of
other users experiencing this problem:

http://forums.freebsd.org/showthread.php?p=211610#post211610

http://freebsd.1045724.n5.nabble.com/Default-route-quot-random-quot-gateway-modification-bug-td5750820.html

http://lists.freebsd.org/pipermail/freebsd-net/2012-March/031879.html

http://lists.freebsd.org/pipermail/freebsd-ipfw/2010-September/004361.html

There is also a PR that was incorrectly labeled as an IPFW issue.
Myself and others believe this issue is not restricted to the use of
IPFW and that the PR should be relabeled. I am inclined to think it is
strictly a PF issue since I am not using IPFW, however there is
evidence of the default route changing on people using IPFW for past
versions of FreeBSD (7.x/8.x), so perhaps this is related.

http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/174749

Another PR for the same problem but specific to IPFW and 8.2-RELEASE

http://www.freebsd.org/cgi/query-pr.cgi?pr=157796

I am hoping someone reading this can give the problem the attention it
deserves. Thank you.

-Nick
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Default route changes unexpectedly

2013-03-06 Thread Krzysztof Barcikowski

W dniu 2013-03-06 09:25, Andre Oppermann pisze:

Can you describe your traffic forwarding setup in more detail?
Is it only pf, or do you run netgraph, or other things as well?
Do you use flow routing?

How frequent does this happen?

I'm trying to create a stack graph to see which parts of the network
stack are involved in handling your packet.



Hi,
In my case, I do use PF for filtering and NAT (without routing options 
like 'route-to' or 'reply-to') together with ALTQ (PRIQ).

I also use IPFW+Dummynet combo for shaping.

net.inet.ip.sourceroute: 0
net.inet.ip.accept_sourceroute: 0

Router traffic is about 300Mb/s in peak.

Frequency:
Wed Oct 3 14:19:15 CEST 2012
Thu Dec 13 04:39:43 CET 2012
Thu Dec 13 04:39:46 CET 2012
Thu Dec 13 04:39:47 CET 2012
Thu Dec 13 04:39:50 CET 2012
Thu Dec 13 04:39:53 CET 2012
Thu Dec 13 04:39:59 CET 2012
Thu Dec 13 04:40:11 CET 2012
Fri Jan 4 07:47:00 CET 2013
Mon Jan 28 18:35:43 CET 2013
Sat Feb 2 22:43:01 CET 2013

I do only monitor default route change, but this bug also affects static 
routes (i.e. I have one static route and it changes more frequently that 
default route).


Please let me know if I can provide any more feedback.

Krzysiek




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Default route changes unexpectedly

2013-03-06 Thread Daniel Hartmeier
On Wed, Mar 06, 2013 at 09:25:21AM +0100, Andre Oppermann wrote:

> I'm trying to create a stack graph to see which parts of the network
> stack are involved in handling your packet.

Ask people if they're using multiple pfil hooks (even just having
ipfilter loaded counts, for instance).

If that's a common factor, see
http://marc.info/?l=freebsd-net&m=133888532814565&w=2

Daniel
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Default route changes unexpectedly

2013-03-06 Thread Ermal Luçi
On Wed, Mar 6, 2013 at 9:38 AM, Krzysztof Barcikowski <
krzys...@airnet.opole.pl> wrote:

> W dniu 2013-03-06 09:25, Andre Oppermann pisze:
>
>  Can you describe your traffic forwarding setup in more detail?
>> Is it only pf, or do you run netgraph, or other things as well?
>> Do you use flow routing?
>>
>> How frequent does this happen?
>>
>> I'm trying to create a stack graph to see which parts of the network
>> stack are involved in handling your packet.
>>
>>
> Hi,
> In my case, I do use PF for filtering and NAT (without routing options
> like 'route-to' or 'reply-to') together with ALTQ (PRIQ).
> I also use IPFW+Dummynet combo for shaping.
>
> net.inet.ip.sourceroute: 0
> net.inet.ip.accept_**sourceroute: 0
>
> Router traffic is about 300Mb/s in peak.
>
> Frequency:
> Wed Oct 3 14:19:15 CEST 2012
> Thu Dec 13 04:39:43 CET 2012
> Thu Dec 13 04:39:46 CET 2012
> Thu Dec 13 04:39:47 CET 2012
> Thu Dec 13 04:39:50 CET 2012
> Thu Dec 13 04:39:53 CET 2012
> Thu Dec 13 04:39:59 CET 2012
> Thu Dec 13 04:40:11 CET 2012
> Fri Jan 4 07:47:00 CET 2013
> Mon Jan 28 18:35:43 CET 2013
> Sat Feb 2 22:43:01 CET 2013
>
> I do only monitor default route change, but this bug also affects static
> routes (i.e. I have one static route and it changes more frequently that
> default route).
>
> Please let me know if I can provide any more feedback.
>
> Krzysiek
>
>
>
>
Do you have flowtable support in your kernel?
Can you try without it enabled?


>
>
> __**_
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/**mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to 
> "freebsd-net-unsubscribe@**freebsd.org
> "
>



-- 
Ermal
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Default route changes unexpectedly

2013-03-06 Thread Krzysztof Barcikowski
I believe I don't have flowtable suport in kernel (no FLOWTABLE option), 
and no sysctl's related to flowtable.


How to check if I'm using multiple pfil hooks?

Best regards!
Krzysiek

W dniu 2013-03-06 10:13, Ermal Luçi pisze:

On Wed, Mar 6, 2013 at 9:38 AM, Krzysztof Barcikowski <
krzys...@airnet.opole.pl> wrote:


W dniu 2013-03-06 09:25, Andre Oppermann pisze:

  Can you describe your traffic forwarding setup in more detail?

Is it only pf, or do you run netgraph, or other things as well?
Do you use flow routing?

How frequent does this happen?

I'm trying to create a stack graph to see which parts of the network
stack are involved in handling your packet.



Hi,
In my case, I do use PF for filtering and NAT (without routing options
like 'route-to' or 'reply-to') together with ALTQ (PRIQ).
I also use IPFW+Dummynet combo for shaping.

net.inet.ip.sourceroute: 0
net.inet.ip.accept_**sourceroute: 0

Router traffic is about 300Mb/s in peak.

Frequency:
Wed Oct 3 14:19:15 CEST 2012
Thu Dec 13 04:39:43 CET 2012
Thu Dec 13 04:39:46 CET 2012
Thu Dec 13 04:39:47 CET 2012
Thu Dec 13 04:39:50 CET 2012
Thu Dec 13 04:39:53 CET 2012
Thu Dec 13 04:39:59 CET 2012
Thu Dec 13 04:40:11 CET 2012
Fri Jan 4 07:47:00 CET 2013
Mon Jan 28 18:35:43 CET 2013
Sat Feb 2 22:43:01 CET 2013

I do only monitor default route change, but this bug also affects static
routes (i.e. I have one static route and it changes more frequently that
default route).

Please let me know if I can provide any more feedback.

Krzysiek





Do you have flowtable support in your kernel?
Can you try without it enabled?




__**_
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/**mailman/listinfo/freebsd-net
To unsubscribe, send any mail to 
"freebsd-net-unsubscribe@**freebsd.org
"






___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: FreeBSD 9.1-RELEASE + bge0 == watchdog timeout

2013-03-06 Thread Eugene M. Zheganin
Hi.
Hi.

On 06.03.2013 12:26, YongHyeon PYUN wrote:
> If you were using latest stable/8, the result would be same on
> CURRENT.
> How frequently do you see the watchdog timeouts? Is there way to
> reproduce it?
> Would you show me the output of dmesg (bge(4) and brgphy(4) only)
> and "pciconf -lcbv"?
I upgraded one om my routers 2 days ago to 8.3-STABLE, and got today a
freeze. Uptime was less than a day.
I have like dozens of these IBM system x3250, all of them run various
8.2-STABLE's, that's why I worry that much. I don't know if this is
triggered by some of my actions. These routers run gre/ipsec, dirrerent
routing stuff (quagga, bird), proxies and pf. In 2011/early 2012 I saw
similar watchdog issues on these machines, and I disabled the tso on
them. I don't know whether this is a coincidence or it really helps, but
after that I didn't see these watchdog issues until today.

I've also discovered that this particular server is running some old
bioses/firmwares including the fact that it misses some NetXtreme
updates available from IBM. Would applying such updates resolve the
situation ?

I am ok with that fact that I cannot run ipmi/sol on these machines, but
it would be nice if this watchdog issue could be somehow resolved.
Furthermore, I have some spare machines that I can provide full access
to, including ipkvm stuff. Since the machine is only partially freezing,
I cannot even rely on the ichwd and watchdogd to reboot it.

pciconf (there's two controllers in this server, I use the first, but
anyway):

bge0@pci0:2:0:0:class=0x02 card=0x03781014 chip=0x165a14e4
rev=0x00 hdr=0x00
vendor = 'Broadcom Corporation'
device = 'Broadcom NetXtreme BCM5722 Gigabit (94309)'
class  = network
subclass   = ethernet
bar   [10] = type Memory, range 64, base 0xe820, size 65536, enabled
cap 01[48] = powerspec 3  supports D0 D3  current D0
cap 03[50] = VPD
cap 09[58] = vendor (length 120)
cap 05[e8] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[d0] = PCI-Express 1 endpoint max data 128(128) link x1(x1)
 speed 2.5(2.5)
ecap 0001[100] = AER 1 0 fatal 0 non-fatal 2 corrected
ecap 0002[13c] = VC 1 max VC0
ecap 0003[160] = Serial 1 001a64fffe21962d
ecap 0004[16c] = Power Budgeting 1
bge1@pci0:3:1:0:class=0x02 card=0x026f1014 chip=0x16c714e4
rev=0x10 hdr=0x00
vendor = 'Broadcom Corporation'
device = 'BCM5703A3 NetXtreme Gigabit Ethernet'
class  = network
subclass   = ethernet
bar   [10] = type Memory, range 64, base 0xe840, size 65536, enabled
cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split
transaction
cap 01[48] = powerspec 2  supports D0 D3  current D0
cap 03[50] = VPD
cap 05[58] = MSI supports 8 messages, 64 bit

dmesg:

bge0:  mem 0xe820-0xe820 irq 16 at device 0.0 on pci2
bge0: CHIP ID 0xa200; ASIC REV 0x0a; CHIP REV 0xa2; PCI-E
miibus0:  on bge0
bge0: Ethernet address: 00:1a:64:21:96:2d
bge0: [FILTER]
bge1:  mem 0xe840-0xe840 irq 21 at device 1.0 on pci3
bge1: CHIP ID 0x1100; ASIC REV 0x01; CHIP REV 0x11; PCI on PCI-X 33
MHz; 32bit
miibus1:  on bge1
bge1: Ethernet address: 00:1a:64:21:96:2e
bge1: [ITHREAD]
[emz@omega:~]# cat /var/run/dmesg.boot | egrep 'bge|brg'
bge0:  mem 0xe820-0xe820 irq 16 at device 0.0 on pci2
bge0: CHIP ID 0xa200; ASIC REV 0x0a; CHIP REV 0xa2; PCI-E
miibus0:  on bge0
brgphy0:  PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bge0: Ethernet address: 00:1a:64:21:96:2d
bge0: [FILTER]
bge1:  mem 0xe840-0xe840 irq 21 at device 1.0 on pci3
bge1: CHIP ID 0x1100; ASIC REV 0x01; CHIP REV 0x11; PCI on PCI-X 33
MHz; 32bit
miibus1:  on bge1
brgphy1:  PHY 1 on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bge1: Ethernet address: 00:1a:64:21:96:2e
bge1: [ITHREAD]


Thanks.
Eugene.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: how to get mac address info in kernel code?

2013-03-06 Thread h bagade
On Tue, Mar 5, 2013 at 7:23 PM, George Neville-Neil 
wrote:

>
> On Mar 5, 2013, at 08:54 , h bagade  wrote:
>
> > Hi all,
> >
> > I need to get interface MAC address within the kernel code and I couldn't
> > use "getifaddrs" because it's user-mode. How can I have the MAC address
> > information within kernel code?
> >
> > Any hints or comments are really appreciated.
>
> If you have access to the struct ifnet you can look at the if_addr member,
> which is
> a struct ifaddr, defined in if_var.h .
>
> Best,
> George
>

Thanks for your suggestion. I will make it a try.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: FreeBSD 9.1-RELEASE + bge0 == watchdog timeout

2013-03-06 Thread Eugene M. Zheganin
Hi.

On 06.03.2013 12:26, YongHyeon PYUN wrote:
> If you were using latest stable/8, the result would be same on
> CURRENT.
> How frequently do you see the watchdog timeouts? Is there way to
> reproduce it?
> Would you show me the output of dmesg (bge(4) and brgphy(4) only)
> and "pciconf -lcbv"?
I just thought. I have never saw a watchdog timeout on an i386. Like,
never (on same system x3250 and same controllers - these servers are
from the same bunch). However all of my i386 machines run less recent
versions of FreeBSD.
Does this make sense ? I mean amd64 and related stuff.

Thanks
Eugene.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: kernel: arpresolve: can't allocate llinfo for 65.59.233.102

2013-03-06 Thread Courtland
Has there been any progress on resolving this problem. Does anyone have a
better idea as to where it is breaking down?

I am experiencing the same problem under FreeBSD 9.1-RELEASE. I use PF for
NAT, ALTQ, and RDR/filter rules. I'm not using PPPoE or dhclient. The
default gateway changes to an IP that is not on my network when under heavy
network load.

The last time this happened I had a stream of arpresolve messages in the
kernel for the IP that the default route was changed to.
Mar  5 19:12:53  kernel: arpresolve: can't allocate llinfo for
50.142.201.101
The default route was changed to 50.142.201.101 after these messages.




--
View this message in context: 
http://freebsd.1045724.n5.nabble.com/kernel-arpresolve-can-t-allocate-llinfo-for-65-59-233-102-tp5742320p5793139.html
Sent from the freebsd-net mailing list archive at Nabble.com.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Default route changes unexpectedly #2 (was Re: kernel: arpresolve: can't allocate llinfo for 65.59.233.102)

2013-03-06 Thread Adrian Chadd
Another instance of it..



Adrian


On 6 March 2013 07:21, Courtland  wrote:
> Has there been any progress on resolving this problem. Does anyone have a
> better idea as to where it is breaking down?
>
> I am experiencing the same problem under FreeBSD 9.1-RELEASE. I use PF for
> NAT, ALTQ, and RDR/filter rules. I'm not using PPPoE or dhclient. The
> default gateway changes to an IP that is not on my network when under heavy
> network load.
>
> The last time this happened I had a stream of arpresolve messages in the
> kernel for the IP that the default route was changed to.
> Mar  5 19:12:53  kernel: arpresolve: can't allocate llinfo for
> 50.142.201.101
> The default route was changed to 50.142.201.101 after these messages.
>
>
>
>
> --
> View this message in context: 
> http://freebsd.1045724.n5.nabble.com/kernel-arpresolve-can-t-allocate-llinfo-for-65-59-233-102-tp5742320p5793139.html
> Sent from the freebsd-net mailing list archive at Nabble.com.
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Default route changes unexpectedly #2 (was Re: kernel: arpresolve: can't allocate llinfo for 65.59.233.102)

2013-03-06 Thread Andre Oppermann

Courtland,

the arpresolve observation is very important.  Do you have flowtable
enabled in your kernel?

--
Andre

On 06.03.2013 17:16, Adrian Chadd wrote:

Another instance of it..
Adrian
On 6 March 2013 07:21, Courtland  wrote:

Has there been any progress on resolving this problem. Does anyone have a
better idea as to where it is breaking down?

I am experiencing the same problem under FreeBSD 9.1-RELEASE. I use PF for
NAT, ALTQ, and RDR/filter rules. I'm not using PPPoE or dhclient. The
default gateway changes to an IP that is not on my network when under heavy
network load.

The last time this happened I had a stream of arpresolve messages in the
kernel for the IP that the default route was changed to.
Mar  5 19:12:53  kernel: arpresolve: can't allocate llinfo for
50.142.201.101
The default route was changed to 50.142.201.101 after these messages.




--
View this message in context: 
http://freebsd.1045724.n5.nabble.com/kernel-arpresolve-can-t-allocate-llinfo-for-65-59-233-102-tp5742320p5793139.html
Sent from the freebsd-net mailing list archive at Nabble.com.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Implementing IP6 in 8.3

2013-03-06 Thread freebsd-net
Greetings,
 I'm evaluating an ISP for the sake of building BSD operating systems on 
hardware
that they use (DSL modems, in this case). When I had my old NEC server, I had a
MIPS environment to develop in. I managed a 28k kernel. In any case, I'm back at
it for use in alot of hardware I have laying around. In my current situation, 
I'm
using a ZYXEL Q1000Z modem to connect to their service. While it's a relatively
new modem, it doesn't support IP6. It is my hope to replace the OS with one that
does. :)
I leased a /48 of IP4's from them, which /also/ came with as many IP6's.
So, not having implemented IP6 on any of my boxes (except by way of tunnel 
brokers),
I'm wondering 2 things:
If my underlying OS (FreeBSD-8.3) can support IP6, will it still function, even 
tho
my gateway (modem) doesn't?
Am I /correctly/ attempting to use it?
I'm answering authoritatively for the many domains I own. They have all 
functioned
well for many years via IP4. I have added the requisite  records in all the 
zones,
as well as the associated RR's.
While the gateway (modem) /does/ have an IP6 address, I can't "speak" for it 
out of
DNS, because it would be an "out of zone" record. Even tho I'm the RP for the 
/48.
So it's up to the modem to answer accordingly.
BUT, I'm not sure I'm initiating any of this correctly via rc(8). Or more 
specifically,
via rc.conf(5). While I've read as much as I can find on the topic related to 
BSD,
boot messages indicate at least -- "IP6 gateway unreachable".
I'm currently using:
rc.conf(5):
ipv6_ifconfig_re0="2602:00d1:b4d6:e100::::"
ipv6_defaultrouter="2602:00d1:b4d6:e600::::"
I also have the corresponding host IP in hosts(5).

Any help, pointers, guidance, answers /greatly/ appreciated.

Thank you for all your time, and consideration.

--Chris

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: FreeBSD 9.1-RELEASE + bge0 == watchdog timeout

2013-03-06 Thread YongHyeon PYUN
On Wed, Mar 06, 2013 at 04:00:34PM +0600, Eugene M. Zheganin wrote:
> Hi.
> Hi.
> 
> On 06.03.2013 12:26, YongHyeon PYUN wrote:
> > If you were using latest stable/8, the result would be same on
> > CURRENT.
> > How frequently do you see the watchdog timeouts? Is there way to
> > reproduce it?
> > Would you show me the output of dmesg (bge(4) and brgphy(4) only)
> > and "pciconf -lcbv"?
> I upgraded one om my routers 2 days ago to 8.3-STABLE, and got today a
> freeze. Uptime was less than a day.
> I have like dozens of these IBM system x3250, all of them run various
> 8.2-STABLE's, that's why I worry that much. I don't know if this is

What was previous SVN revision number on that machine?
The support for 5718/5719/5720 was merged to stable/8 about 3
months ago.

> triggered by some of my actions. These routers run gre/ipsec, dirrerent
> routing stuff (quagga, bird), proxies and pf. In 2011/early 2012 I saw
> similar watchdog issues on these machines, and I disabled the tso on
> them. I don't know whether this is a coincidence or it really helps, but
> after that I didn't see these watchdog issues until today.

I'm not aware of TSO issue on your controller. pf(4) had TSO issue
but I guess it was fixed long time ago.

> 
> I've also discovered that this particular server is running some old
> bioses/firmwares including the fact that it misses some NetXtreme
> updates available from IBM. Would applying such updates resolve the
> situation ?
> 

Updating etherent controller firmware is always good idea. But I'm
not sure whether this address the issue.

> I am ok with that fact that I cannot run ipmi/sol on these machines, but
> it would be nice if this watchdog issue could be somehow resolved.

Actually this is the first report after the merge which seems to
break bge(4).

> Furthermore, I have some spare machines that I can provide full access
> to, including ipkvm stuff. Since the machine is only partially freezing,
> I cannot even rely on the ichwd and watchdogd to reboot it.

Sorry no clue yet.

> 
> pciconf (there's two controllers in this server, I use the first, but
> anyway):

Thanks for the info.

[...]
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: FreeBSD 9.1-RELEASE + bge0 == watchdog timeout

2013-03-06 Thread Eugene M. Zheganin

Hi.

On 07.03.2013 8:24, YongHyeon PYUN wrote:

What was previous SVN revision number on that machine?
The support for 5718/5719/5720 was merged to stable/8 about 3
months ago.

It was definitely older than "months". It was running something similar 
to  "FreeBSD 8.2-STABLE #0: Mon Sep 19 08:10:00 YEKST 2011", this is the 
uname from a neighbor machine.


I have, as I said, identical servers running FreeBSD. Here are some of 
the unames that I don't see timeouts on:


8.3-STABLE #2: Wed Aug 29 13:00:02 YEKT 2012 (up 187 days)
8.3-PRERELEASE #1: Thu Mar 29 16:14:11 MSK 2012 (up 15 days, previous 
uptime around 180 days)

8.2-STABLE #0: Wed Dec 14 16:56:11 YEKT 2011 (up 99 days)

One more question:  could it be a zfs-related issue ? Some kernel-level 
locking ? All of those run zfs also (no ufs at all).


Thanks.
Eugene.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: FreeBSD 9.1-RELEASE + bge0 == watchdog timeout

2013-03-06 Thread YongHyeon PYUN
On Thu, Mar 07, 2013 at 11:08:50AM +0600, Eugene M. Zheganin wrote:
> Hi.
> 
> On 07.03.2013 8:24, YongHyeon PYUN wrote:
> >What was previous SVN revision number on that machine?
> >The support for 5718/5719/5720 was merged to stable/8 about 3
> >months ago.
> >
> It was definitely older than "months". It was running something similar 
> to  "FreeBSD 8.2-STABLE #0: Mon Sep 19 08:10:00 YEKST 2011", this is the 
> uname from a neighbor machine.
> 
> I have, as I said, identical servers running FreeBSD. Here are some of 
> the unames that I don't see timeouts on:
> 
> 8.3-STABLE #2: Wed Aug 29 13:00:02 YEKT 2012 (up 187 days)
> 8.3-PRERELEASE #1: Thu Mar 29 16:14:11 MSK 2012 (up 15 days, previous 
> uptime around 180 days)

These servers do not have 5718/5719/5720 changes.

> 8.2-STABLE #0: Wed Dec 14 16:56:11 YEKT 2011 (up 99 days)

This server has the bge(4) change but it didn't trigger watchdog
timeouts.  Does this server use the same controller? If yes, the
issue didn't come from bge(4) change.
> 
> One more question:  could it be a zfs-related issue ? Some kernel-level 
> locking ? All of those run zfs also (no ufs at all).

Sorry I have no idea on ZFS.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: FreeBSD 9.1-RELEASE + bge0 == watchdog timeout

2013-03-06 Thread Zeus Panchenko
Hi,

here is my situation, much like the issue

On 06.03.2013 12:26, YongHyeon PYUN wrote:
> If you were using latest stable/8, the result would be same on
> CURRENT.

I use FreeBSD 9.1-RELEASE #0 r243825: amd65 + ZFS
on HP ProLiant DL360e Gen8 

the box has two 4 headed cards igb(4) I350 and bge(4) NetXtreme BCM5719
according the pciconf data

> How frequently do you see the watchdog timeouts? Is there way to
> reproduce it?

I noticed that after activation, bge(4) stops respond and interface
becomes useless, while igb(4) works fine after some sysctl-ing

for now I'm forced to not to use bge(4) at all :(

> Would you show me the output of dmesg (bge(4) and brgphy(4) only)
> and "pciconf -lcbv"?

> grep "bge\|brgphy" dmesg.boot
bge0:  mem 
0xfa3f-0xfa3f,0xfa3e-0xfa3e,0xfa3d-0xfa3d irq 40 at 
device 0.0 on pci6
bge0: CHIP ID 0x05719001; ASIC REV 0x5719; CHIP REV 0x57190; PCI-E
miibus0:  on bge0
bge0: Ethernet address: ac:16:2d:83:ec:2c
bge1:  mem 
0xfa3c-0xfa3c,0xfa3b-0xfa3b,0xfa3a-0xfa3a irq 44 at 
device 0.1 on pci6
bge1: CHIP ID 0x05719001; ASIC REV 0x5719; CHIP REV 0x57190; PCI-E
miibus1:  on bge1
bge1: Ethernet address: ac:16:2d:83:ec:2d
bge2:  mem 
0xfa39-0xfa39,0xfa38-0xfa38,0xfa37-0xfa37 irq 40 at 
device 0.2 on pci6
bge2: CHIP ID 0x05719001; ASIC REV 0x5719; CHIP REV 0x57190; PCI-E
miibus2:  on bge2
bge2: Ethernet address: ac:16:2d:83:ec:2e
bge3:  mem 
0xfa36-0xfa36,0xfa35-0xfa35,0xfa34-0xfa34 irq 44 at 
device 0.3 on pci6
bge3: CHIP ID 0x05719001; ASIC REV 0x5719; CHIP REV 0x57190; PCI-E
miibus3:  on bge3
bge3: Ethernet address: ac:16:2d:83:ec:2f

brgphy0:  PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
brgphy1:  PHY 2 on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
brgphy2:  PHY 3 on miibus2
brgphy2:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
brgphy3:  PHY 4 on miibus3
brgphy3:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow


> pciconf -lcbv
hostb0@pci0:0:0:0:  class=0x06 card=0x18a8103c chip=0x3c008086 rev=0x07 
hdr=0x00
vendor = 'Intel Corporation'
device = 'Sandy Bridge DMI2'
class  = bridge
subclass   = HOST-PCI
cap 10[90] = PCI-Express 2 root port max data 128(128) link x0(x4)
cap 01[e0] = powerspec 3  supports D0 D3  current D0
ecap 000b[100] = unknown 1
ecap 000b[144] = unknown 1
ecap 000b[1d0] = unknown 1
ecap 000b[280] = unknown 1
pcib1@pci0:0:1:0:   class=0x060400 card=0x18a8103c chip=0x3c028086 rev=0x07 
hdr=0x01
vendor = 'Intel Corporation'
device = 'Sandy Bridge IIO PCI Express Root Port 1a'
class  = bridge
subclass   = PCI-PCI
cap 0d[40] = PCI Bridge card=0x18a8103c
cap 05[60] = MSI supports 2 messages, vector masks 
cap 10[90] = PCI-Express 2 root port max data 256(256) link x0(x4)
cap 01[e0] = powerspec 3  supports D0 D3  current D0
ecap 000b[100] = unknown 1
ecap 000d[110] = unknown 1
ecap 0001[148] = AER 1 0 fatal 0 non-fatal 0 corrected
ecap 000b[1d0] = unknown 1
ecap 0019[250] = unknown 1
ecap 000b[280] = unknown 1
pcib2@pci0:0:1:1:   class=0x060400 card=0x18a8103c chip=0x3c038086 rev=0x07 
hdr=0x01
vendor = 'Intel Corporation'
device = 'Sandy Bridge IIO PCI Express Root Port 1b'
class  = bridge
subclass   = PCI-PCI
cap 0d[40] = PCI Bridge card=0x18a8103c
cap 05[60] = MSI supports 2 messages, vector masks 
cap 10[90] = PCI-Express 2 root port max data 256(256) link x0(x4)
cap 01[e0] = powerspec 3  supports D0 D3  current D0
ecap 000b[100] = unknown 1
ecap 000d[110] = unknown 1
ecap 0001[148] = AER 1 0 fatal 0 non-fatal 0 corrected
ecap 000b[1d0] = unknown 1
ecap 0019[250] = unknown 1
ecap 000b[280] = unknown 1
pcib3@pci0:0:3:0:   class=0x060400 card=0x18a8103c chip=0x3c088086 rev=0x07 
hdr=0x01
vendor = 'Intel Corporation'
device = 'Sandy Bridge IIO PCI Express Root Port 3a in PCI Express Mode'
class  = bridge
subclass   = PCI-PCI
cap 0d[40] = PCI Bridge card=0x18a8103c
cap 05[60] = MSI supports 2 messages, vector masks 
cap 10[90] = PCI-Express 2 root port max data 256(256) link x4(x16)
cap 01[e0] = powerspec 3  supports D0 D3  current D0
ecap 000b[100] = unknown 1
ecap 000d[110] = unknown 1
ecap 0001[148] = AER 1 0 fatal 0 non-fatal 0 corrected
ecap 000b[1d0] = unknown 1
ecap 0019[250] = unknown 1
ecap 000b[280] = unknown 1
pcib4@pci0:0:3:1:   class=0x060400 card=0x18a8103c chip=0x3c098086 rev=0x07 
hdr=0x01
vendor = 'Intel Corporation'
device = 'Sandy Bridge IIO PCI Express Root Port 3b'
cl

[patch] interface routes

2013-03-06 Thread Alexander V. Chernikov

Hello list!

There is a known long-lived issue with interface routes addition/deletion:

ifconfig iface inet 1.2.3.4/24 can fail if given prefix is already in 
kernel route table (for example, advertised by IGP like OSPF).


Interface route can be deleted via route(8) or any route socket user 
(sometimes this happens with popular opensource daemons like bird/quagga).


Problem is reported at least in kern/106722 and kern/155772.

This can be fixed the following way:
Immutable route flag (RTM_PINNED, added in 19995 with 'for future use' 
comment) is utilised to mark route 'immutable'.
rtrequest1_fib refuses to delete routes with given flag unless 
RTM_PINNED is set in rti_flags.


Every interface address manupulation is done via rtinit[1], so
rtinit1() sets this flag (and behavior does not change here).

Adding interface address is handled via atomically deleting old prefix 
and adding interface one.
Index: sys/net/if.c
===
--- sys/net/if.c(revision 247623)
+++ sys/net/if.c(working copy)
@@ -1357,7 +1357,8 @@ if_rtdel(struct radix_node *rn, void *arg)
return (0);
 
err = rtrequest_fib(RTM_DELETE, rt_key(rt), rt->rt_gateway,
-   rt_mask(rt), rt->rt_flags|RTF_RNH_LOCKED,
+   rt_mask(rt),
+   rt->rt_flags|RTF_RNH_LOCKED|RTF_PINNED,
(struct rtentry **) NULL, rt->rt_fibnum);
if (err) {
log(LOG_WARNING, "if_rtdel: error %d\n", err);
Index: sys/net/route.c
===
--- sys/net/route.c (revision 247842)
+++ sys/net/route.c (working copy)
@@ -1112,6 +1112,16 @@ rtrequest1_fib(int req, struct rt_addrinfo *info,
error = 0;
}
 #endif
+   if ((flags & RTF_PINNED) == 0) {
+   /*
+* Check if can delete target route.
+*/
+   rt = (struct rtentry *)rnh->rnh_lookup(dst,
+   netmask, rnh);
+   if ((rt != NULL) && (rt->rt_flags & RTF_PINNED))
+   senderr(EPERM);
+   }
+
/*
 * Remove the item from the tree and return it.
 * Complain if it is not there and do no more processing.
@@ -1430,6 +1440,7 @@ rtinit1(struct ifaddr *ifa, int cmd, int flags, in
int didwork = 0;
int a_failure = 0;
static struct sockaddr_dl null_sdl = {sizeof(null_sdl), AF_LINK};
+   struct radix_node_head *rnh;
 
if (flags & RTF_HOST) {
dst = ifa->ifa_dstaddr;
@@ -1488,7 +1499,6 @@ rtinit1(struct ifaddr *ifa, int cmd, int flags, in
 */
for ( fibnum = startfib; fibnum <= endfib; fibnum++) {
if (cmd == RTM_DELETE) {
-   struct radix_node_head *rnh;
struct radix_node *rn;
/*
 * Look up an rtentry that is in the routing tree and
@@ -1538,7 +1548,8 @@ rtinit1(struct ifaddr *ifa, int cmd, int flags, in
 */
bzero((caddr_t)&info, sizeof(info));
info.rti_ifa = ifa;
-   info.rti_flags = flags | (ifa->ifa_flags & ~IFA_RTSELF);
+   info.rti_flags = flags |
+   (ifa->ifa_flags & ~IFA_RTSELF) | RTF_PINNED;
info.rti_info[RTAX_DST] = dst;
/* 
 * doing this for compatibility reasons
@@ -1550,6 +1561,32 @@ rtinit1(struct ifaddr *ifa, int cmd, int flags, in
info.rti_info[RTAX_GATEWAY] = ifa->ifa_addr;
info.rti_info[RTAX_NETMASK] = netmask;
error = rtrequest1_fib(cmd, &info, &rt, fibnum);
+
+   if ((error == EEXIST) && (cmd == RTM_ADD)) {
+   /*
+* Interface route addition failed.
+* Note we probably already checked
+* other interface addresses if given prefix exists.
+* Atomically delete current prefix generating
+* RTM_DELETE message, and retry adding
+* interface address.
+*/
+   rnh = rt_tables_get_rnh(fibnum, dst->sa_family);
+   RADIX_NODE_HEAD_LOCK(rnh);
+   /* Delete old prefix */
+   info.rti_ifa = NULL;
+   info.rti_flags = RTF_RNH_LOCKED;
+   error = rtrequest1_fib(RTM_DELETE, &info, &rt, fibnum);
+   if (error == 0) {
+   info.rti_ifa = ifa;
+   info.rti_flags = flags | RTF_RNH_LOCKED |
+

Re: FreeBSD 9.1-RELEASE + bge0 == watchdog timeout

2013-03-06 Thread YongHyeon PYUN
On Thu, Mar 07, 2013 at 08:22:51AM +0300, Zeus Panchenko wrote:
> Hi,
> 
> here is my situation, much like the issue
> 

No, your issue is completely different one.

> On 06.03.2013 12:26, YongHyeon PYUN wrote:
> > If you were using latest stable/8, the result would be same on
> > CURRENT.
> 
> I use FreeBSD 9.1-RELEASE #0 r243825: amd65 + ZFS
> on HP ProLiant DL360e Gen8 
> 
> the box has two 4 headed cards igb(4) I350 and bge(4) NetXtreme BCM5719
> according the pciconf data
> 
> > How frequently do you see the watchdog timeouts? Is there way to
> > reproduce it?
> 
> I noticed that after activation, bge(4) stops respond and interface
> becomes useless, while igb(4) works fine after some sysctl-ing
> 
> for now I'm forced to not to use bge(4) at all :(

9.1-RELEASE does not have required code to support your controller.
Use stable/9. 
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: FreeBSD 9.1-RELEASE + bge0 == watchdog timeout

2013-03-06 Thread Eugene M. Zheganin
Hi.

On 07.03.2013 12:23, YongHyeon PYUN wrote:
> On Thu, Mar 07, 2013 at 11:08:50AM +0600, Eugene M. Zheganin wrote:
>> It was definitely older than "months". It was running something similar 
>> to  "FreeBSD 8.2-STABLE #0: Mon Sep 19 08:10:00 YEKST 2011", this is the 
>> uname from a neighbor machine.
>>
>> I have, as I said, identical servers running FreeBSD. Here are some of 
>> the unames that I don't see timeouts on:
>>
>> 8.3-STABLE #2: Wed Aug 29 13:00:02 YEKT 2012 (up 187 days)
>> 8.3-PRERELEASE #1: Thu Mar 29 16:14:11 MSK 2012 (up 15 days, previous 
>> uptime around 180 days)
> These servers do not have 5718/5719/5720 changes.
>
>> 8.2-STABLE #0: Wed Dec 14 16:56:11 YEKT 2011 (up 99 days)
> This server has the bge(4) change but it didn't trigger watchdog
> timeouts.  Does this server use the same controller? If yes, the
> issue didn't come from bge(4) change.
>
How's that ? It's running even older version than previous two. I guess
you misread the year.

Eugene.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [patch] interface routes

2013-03-06 Thread Andre Oppermann

On 07.03.2013 07:34, Alexander V. Chernikov wrote:

Hello list!

There is a known long-lived issue with interface routes addition/deletion:

ifconfig iface inet 1.2.3.4/24 can fail if given prefix is already in kernel 
route table (for
example, advertised by IGP like OSPF).

Interface route can be deleted via route(8) or any route socket user (sometimes 
this happens with
popular opensource daemons like bird/quagga).

Problem is reported at least in kern/106722 and kern/155772.


You patch is a welcome addition.


This can be fixed the following way:
Immutable route flag (RTM_PINNED, added in 19995 with 'for future use' comment) 
is utilised to mark
route 'immutable'.
rtrequest1_fib refuses to delete routes with given flag unless RTM_PINNED is 
set in rti_flags.


How do the routing daemons react to being unable to change/delete
such a route?

EADDRINUSE would likely be a more descriptive error instead of EPERM?


Every interface address manupulation is done via rtinit[1], so
rtinit1() sets this flag (and behavior does not change here).

>

Adding interface address is handled via atomically deleting old prefix and 
adding interface one.


This brings up a long standing sore point of our routing code
which this patch makes more pronounced.  When an interface link
state is down I don't want the route to it to persist but to
become inactive so another path can be chosen.  This the very
point of running a routing daemon.  So on the link-down event
the installed interface routes should be removed from the routing
table.  The configured addresses though should persist and the
interface routes re-installed on a link-up event.  What's your
opinion on it?

Other than these points I think your code is fine and can go
into the tree.

--
Andre

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"