Re: cxgbe's native netmap support broken since r307394

2016-12-19 Thread Vincenzo Maffione
Hi Navdeep,

  Indeed, we have reviewed the code, and we think it is ok to
implement nm_os_ifnet_lock() with IFNET_RLOCK(), instead of using
IFNET_WLOCK().
Since IFNET_RLOCK() results into sx_slock(), this should fix the issue.

On FreeBSD, this locking is needed to protect a flag read by nm_iszombie().
However, on Linux the same lock is also needed to protect the call to
the nm_hw_register() callback, so we prefer to have an "unified"
locking scheme, i.e. always calling nm_hw_register under the lock.

Does this make sense to you? Would it be easy for you to make a quick
test by replacing IFNET_WLOCK with IFNET_RLOCK?

Thanks,
  Vincenzo

2016-12-17 23:28 GMT+01:00 Navdeep Parhar :
> Luigi, Vincenzo,
>
> The last major update to netmap (r307394 and followups) broke cxgbe's
> native netmap support.  The problem is that netmap_hw_reg now holds an
> rw_lock around the driver's netmap_on/off routines.  It has always been
> safe for the driver to sleep during these operations but now it panics
> instead.
>
> Why is IFNET_WLOCK needed here?  It seems like a regression to disallow
> sleep on the control path.
>
> Regards,
> Navdeep
>
> begin_synchronized_op with the following non-sleepable locks held:
> exclusive rw ifnet_rw (ifnet_rw) r = 0 (0x8271d680) locked @
> /root/ws/head/sys/dev/netmap/netmap_freebsd.c:95
> stack backtrace:
> #0 0x810837a5 at witness_debugger+0xe5
> #1 0x81084d88 at witness_warn+0x3b8
> #2 0x83ef2bcc at begin_synchronized_op+0x6c
> #3 0x83f14beb at cxgbe_netmap_reg+0x5b
> #4 0x809846f1 at netmap_hw_reg+0x81
> #5 0x809806de at netmap_do_regif+0x19e
> #6 0x8098121d at netmap_ioctl+0x7ad
> #7 0x8098682f at freebsd_netmap_ioctl+0x5f



-- 
Vincenzo Maffione
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Avoid using RFC3927 outside of the link

2016-12-19 Thread Alarig Le Lay
Hi,

I have a router that is mutli-homed with BGP. One of my peers is using
an RFC3927 address for the connection. If I traceroute to host behind
that route where we use a route via this peer to reply, the ICMP reply
display that link-local IP:

  1. AS12876 195-154-86-1.rev.poneytelecom.eu (195.154.86.1) 0.0%10
0.9   0.8   0.6   1.1   0.0
  2. AS12876 a9k2-49e-s46-3.dc3.poneytelecom.eu (195.154.1.86)   0.0%10
0.8   1.1   0.7   2.2   0.5
  3. AS12876 pni-th2-a9k2.th2.poneytelecom.eu (195.154.1.75) 0.0%10
1.2   1.5   1.1   3.5   0.6
  4. AS???   equinix-th2.quantic-telecom.net (195.42.144.192)0.0%10
1.1   1.0   0.9   1.2   0.0
  5. AS198507185.132.75.33   0.0%10
7.3   7.4   7.2   7.9   0.0
  6. AS???   169.254.1.2 0.0%10
7.6   7.6   7.4   7.9   0.0
  7. AS204092kaiminus.swordarmor.fr (89.234.186.26)  0.0%10
8.1  11.4   7.8  41.5  10.6

Is it possible to avoid this behaviour and reply with the public IP
(89.234.186.1) instead? What I am looking for is an equivalent of `ip
addr change 169.254.1.2/30 scope link` on Linux.

Thanks,
-- 
alarig


signature.asc
Description: PGP signature


Re: Avoid using RFC3927 outside of the link

2016-12-19 Thread Eugene Grosbein

20.12.2016 1:46, Alarig Le Lay пишет:


Is it possible to avoid this behaviour and reply with the public IP
(89.234.186.1) instead?


try: sysctl net.inet.icmp.reply_from_interface=1

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Avoid using RFC3927 outside of the link

2016-12-19 Thread Alarig Le Lay
On Tue Dec 20 01:51:17 2016, Eugene Grosbein wrote:
> 20.12.2016 1:46, Alarig Le Lay пишет:
> 
> > Is it possible to avoid this behaviour and reply with the public IP
> > (89.234.186.1) instead?
> 
> try: sysctl net.inet.icmp.reply_from_interface=1

If an AS choose to go to us thought this peer, packets will come in by
this interface, so our router will continue to reply with the apipa IP
for those ASes.

-- 
alarig


signature.asc
Description: PGP signature


Re: cxgbe's native netmap support broken since r307394

2016-12-19 Thread Navdeep Parhar
IFNET_RLOCK will work, thanks.

Navdeep

On Mon, Dec 19, 2016 at 3:21 AM, Vincenzo Maffione  wrote:
> Hi Navdeep,
>
>   Indeed, we have reviewed the code, and we think it is ok to
> implement nm_os_ifnet_lock() with IFNET_RLOCK(), instead of using
> IFNET_WLOCK().
> Since IFNET_RLOCK() results into sx_slock(), this should fix the issue.
>
> On FreeBSD, this locking is needed to protect a flag read by nm_iszombie().
> However, on Linux the same lock is also needed to protect the call to
> the nm_hw_register() callback, so we prefer to have an "unified"
> locking scheme, i.e. always calling nm_hw_register under the lock.
>
> Does this make sense to you? Would it be easy for you to make a quick
> test by replacing IFNET_WLOCK with IFNET_RLOCK?
>
> Thanks,
>   Vincenzo
>
> 2016-12-17 23:28 GMT+01:00 Navdeep Parhar :
>> Luigi, Vincenzo,
>>
>> The last major update to netmap (r307394 and followups) broke cxgbe's
>> native netmap support.  The problem is that netmap_hw_reg now holds an
>> rw_lock around the driver's netmap_on/off routines.  It has always been
>> safe for the driver to sleep during these operations but now it panics
>> instead.
>>
>> Why is IFNET_WLOCK needed here?  It seems like a regression to disallow
>> sleep on the control path.
>>
>> Regards,
>> Navdeep
>>
>> begin_synchronized_op with the following non-sleepable locks held:
>> exclusive rw ifnet_rw (ifnet_rw) r = 0 (0x8271d680) locked @
>> /root/ws/head/sys/dev/netmap/netmap_freebsd.c:95
>> stack backtrace:
>> #0 0x810837a5 at witness_debugger+0xe5
>> #1 0x81084d88 at witness_warn+0x3b8
>> #2 0x83ef2bcc at begin_synchronized_op+0x6c
>> #3 0x83f14beb at cxgbe_netmap_reg+0x5b
>> #4 0x809846f1 at netmap_hw_reg+0x81
>> #5 0x809806de at netmap_do_regif+0x19e
>> #6 0x8098121d at netmap_ioctl+0x7ad
>> #7 0x8098682f at freebsd_netmap_ioctl+0x5f
>
>
>
> --
> Vincenzo Maffione
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Avoid using RFC3927 outside of the link

2016-12-19 Thread Eugene Grosbein

20.12.2016 2:05, Alarig Le Lay пишет:

On Tue Dec 20 01:51:17 2016, Eugene Grosbein wrote:

20.12.2016 1:46, Alarig Le Lay пишет:


Is it possible to avoid this behaviour and reply with the public IP
(89.234.186.1) instead?


try: sysctl net.inet.icmp.reply_from_interface=1


If an AS choose to go to us thought this peer, packets will come in by
this interface, so our router will continue to reply with the apipa IP
for those ASes.


Well, you can always use brute force instead:

ipfw nat 169 config reset ip 89.234.186.1 && \
ipfw add 60 nat 169 ip from 169.254.0.0/16 to any out xmit igb0

That's ugly but works.

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: sonewconn: pcb [...]: Listen queue overflow to human-readable form

2016-12-19 Thread hiren panchasara
On 12/16/16 at 11:20P, Andrey V. Elsukov wrote:
> On 15.12.2016 20:51, hiren panchasara wrote:
> > On 12/15/16 at 05:23P, Eugene M. Zheganin wrote:
> >> Hi.
> >>
> >> Sometimes on one of my servers I got dmesg full of
> >>
> >> sonewconn: pcb 0xf80373aec000: Listen queue overflow: 49 already in
> >> queue awaiting acceptance (6 occurrences)
> > [skip]
> >>
> >> but at the time of investigation the socket is already closed and lsof
> >> cannot show me the owner. I wonder if the kernel can itself decode this
> >> output and write it in the human-readable form ?
> > 
> > I have this not-quite-correct patch that may help you. (If you follow the
> > discussion there, you'd know why its not complete.) 
> > 
> > https://lists.freebsd.org/pipermail/freebsd-net/2014-March/038074.html
> 
> Hi Hiren,
> 
> I think the check for socket's domain should be enough?
> 
> 
> -- 
> WBR, Andrey V. Elsukov

> Index: sys/kern/uipc_socket.c
> ===
> --- sys/kern/uipc_socket.c(revision 309834)
> +++ sys/kern/uipc_socket.c(working copy)
> @@ -139,6 +139,7 @@ __FBSDID("$FreeBSD$");
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -577,10 +578,15 @@ sonewconn(struct socket *head, int connstatus)
>   overcount++;
>  
>   if (ratecheck(&lastover, &overinterval)) {
> - log(LOG_DEBUG, "%s: pcb %p: Listen queue overflow: "
> - "%i already in queue awaiting acceptance "
> - "(%d occurrences)\n",
> - __func__, head->so_pcb, head->so_qlen, overcount);
> + if (INP_CHECK_SOCKAF(head, AF_INET) ||
> + INP_CHECK_SOCKAF(head, AF_INET6))
> + over = ntohs(sotoinpcb(head)->inp_lport);
> + else
> + over = 0;
> + log(LOG_DEBUG, "%s: pcb %p: Listen queue overflow on "
> + "port %d: %i already in queue awaiting acceptance "
> + "(%d occurrences)\n", __func__, head->so_pcb,
> + over, head->so_qlen, overcount);
>  
>   overcount = 0;
>   }

Andrey,
Thanks, this seems correct to me. :-)

Cheers,
Hiren


pgpXb_qjWpV_Q.pgp
Description: PGP signature


Re: Avoid using RFC3927 outside of the link

2016-12-19 Thread Alarig Le Lay
On Tue Dec 20 02:34:29 2016, Eugene Grosbein wrote:
> Well, you can always use brute force instead:
> 
> ipfw nat 169 config reset ip 89.234.186.1 && \
> ipfw add 60 nat 169 ip from 169.254.0.0/16 to any out xmit igb0
> 
> That's ugly but works.

I will work just by side effect: by doing this, I will send BGP packets
from 89.234.186.1, which is an IP than the peer learned by BGP. This will
create a recursive loop, and the session will be shut. So, no more
traffic will transit through this interface, and this IP will not be
displayed anymore :p

-- 
alarig


signature.asc
Description: PGP signature


Re: Avoid using RFC3927 outside of the link

2016-12-19 Thread Joe Holden

On 19/12/2016 21:01, Alarig Le Lay wrote:

On Tue Dec 20 02:34:29 2016, Eugene Grosbein wrote:

Well, you can always use brute force instead:

ipfw nat 169 config reset ip 89.234.186.1 && \
ipfw add 60 nat 169 ip from 169.254.0.0/16 to any out xmit igb0

That's ugly but works.


I will work just by side effect: by doing this, I will send BGP packets
from 89.234.186.1, which is an IP than the peer learned by BGP. This will
create a recursive loop, and the session will be shut. So, no more
traffic will transit through this interface, and this IP will not be
displayed anymore :p

Use valid addressing and optionally, working config, there is no problem 
here.

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Avoid using RFC3927 outside of the link

2016-12-19 Thread Eugene Grosbein

20.12.2016 4:01, Alarig Le Lay пишет:

On Tue Dec 20 02:34:29 2016, Eugene Grosbein wrote:

Well, you can always use brute force instead:

ipfw nat 169 config reset ip 89.234.186.1 && \
ipfw add 60 nat 169 ip from 169.254.0.0/16 to any out xmit igb0

That's ugly but works.


I will work just by side effect: by doing this, I will send BGP packets
from 89.234.186.1, which is an IP than the peer learned by BGP. This will
create a recursive loop, and the session will be shut. So, no more
traffic will transit through this interface, and this IP will not be
displayed anymore :p


You could also use another public IP as primary address for interface in 
question
and an address from 169.254.0.0/16 as secondary one. BGP will still work and
kernel/ICMP will use public IP.

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"