[Bug 236102] When create or destroy vlan, the physical interface is flapping

2019-03-01 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236102

--- Comment #7 from Alexey  ---
I have retest on:
FreeBSD nas1 12.0-STABLE FreeBSD 12.0-STABLE r343868 NAS  amd64
and this behavior is also present on this revision.

This is a other server, with the same hardware configuration. There is no such
problem on it.

FreeBSD nas2 12.0-STABLE FreeBSD 12.0-STABLE r341831 NAS  amd64

ix1: flags=8843 metric 0 mtu 1500
description: -=INTERFACE-TO-CORE2-SWITCH=-
   
options=e000bb
ether 00:e0:ed:2e:14:e1
media: Ethernet autoselect (10Gbase-Twinax )
status: active
nd6 options=29

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 236102] When create or destroy vlan, the physical interface is flapping

2019-03-01 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236102

Eugene Grosbein  changed:

   What|Removed |Added

 CC||ma...@freebsd.org

--- Comment #9 from Eugene Grosbein  ---
if_vlan(4) driver has changed in stable/12 between revisions 341831 and 343868,
so CC'ing markj@ who changed it to give him a chance to look at this.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 236102] When create or destroy vlan, the physical interface is flapping

2019-03-01 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236102

--- Comment #10 from Alexey  ---
ix0@pci0:2:0:0: class=0x02 card=0x17d3103c chip=0x10fb8086 rev=0x01
hdr=0x00
vendor = 'Intel Corporation'
device = '82599ES 10-Gigabit SFI/SFP+ Network Connection'
class  = network
subclass   = ethernet
ix1@pci0:2:0:1: class=0x02 card=0x17d3103c chip=0x10fb8086 rev=0x01
hdr=0x00
vendor = 'Intel Corporation'
device = '82599ES 10-Gigabit SFI/SFP+ Network Connection'
class  = network
subclass   = ethernet

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 236102] When create or destroy vlan, the physical interface is flapping

2019-03-01 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236102

Eugene Grosbein  changed:

   What|Removed |Added

 CC||mar...@freebsd.org

--- Comment #8 from Eugene Grosbein  ---
It seems, the code of ixgbe(4) driver got not significant changes in stable/12
between revisions 341831 and 343868 but iflib did. CC'ing marius@ who might
have a glue.

Alexey, please show part of "pciconf -lvvv" output for your ix0 device.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: use of #ifdef INET and #ifdef INET6 in the kernel sources

2019-03-01 Thread Rodney W. Grimes
> Bjoern A. Zeeb wrote:
> [stuff snipped]
> I wrote:
> >> So, is this still recommended for blocks of code that only execute for
> >> the version
> >> of IP, but will build for kernels that do not have the particular
> >> "options INET{6}"
> >> in the kernel config?
> >
> >Yes.
> Ok, I'll do it.
Thank you 

> >> If it is still recommended, I will do it, but I'll admit I don't
> >> understand why it should
> >> be done? (All it does is reduce the size of the executable by a small
> >> amount and
> >> that doesn't seem significant to me.)
> >
> >That small amount is still relevant on some devices where people go to
> >great lengths to fit our constantly growing base into a tiny small
> >thingy.
> I doubt NFS gets squeezed into such devices and, yes, it is a small amount.
> Using source line counts via "wc" (ir includes comments, etc):
> - This will reduce the # of lines by about 6 for a module of about 7700  lines
>which is loaded when either the nfscl or nfsserver modules are loaded.
>(These are both about 25000 lines and require the krpc, which is another 
> 1.
>  I haven't included the Kerberos stuff, because I can't remember if that 
> gets loaded
>  unless Kerberos mounts get used.)
> --> A savings of 6 lines in something like 43000.

That means that nfsusrd is an extremly well behaved ipv4/ipv6
agnostic deamon that only takes a small change to make it able
to run as either v4/v6 as a single stack or dual stacked, at a cost
that also sounds minial, even if it took an #ifdef for each of these
lines that is only 6 in 43000 lines of code, which is a small cost.

The same analysis on other code probably comes out no place near
this.

Also didnt this use to use a unix domain?  Could the unix domain
be put back and knobbed so that I could actually run this without
it doing the localnet thing at all?  I know that it had issues
as the socket is in /tmp and if /tmp isnt a right type file system,
etc...  But some of us do know that and do run with a /tmp that
would support AF_UNIX type nfsusrd.

If it takes 6 lines of ifdef to do v4 vs v6, how many lines of
ifdef is it to add AF_UNIX back and make it run time choice?

(Goes looking for more Nomex clothing :-)

> >And it allows you to lose code from your kernel that you don?t
> >need/want, such as if you?d want to rip out all INET sources from a
> >tree.
> Ok, I can buy into this argument. I doubt I'll see IPv4 removed in my 
> lifetime, but
> it does document where the code is.
> (In Canada, network providers only give out IPv4 addresses to end users, from
>  what I've seen.)
> 
> >I know both of these groups still do exist.
> >
> >Also every code not compiled in is not an attack surface, where you
> >think it?s executed or not.
> 
> rick
-- 
Rod Grimes rgri...@freebsd.org
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [Bug 236102] When create or destroy vlan, the physical interface is flapping

2019-03-01 Thread Oleg Bulyzhin


JFYI: https://lists.freebsd.org/pipermail/freebsd-net/2018-November/052184.html

-- 
Oleg.


=== Oleg Bulyzhin -- OBUL-RIPN -- OBUL-RIPE -- o...@rinet.ru ===


___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 236102] When create or destroy vlan, the physical interface is flapping

2019-03-01 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236102

Oleg Bulyzhin  changed:

   What|Removed |Added

 CC||o...@freebsd.org

--- Comment #11 from Oleg Bulyzhin  ---
JFYI: https://lists.freebsd.org/pipermail/freebsd-net/2018-November/052184.html

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 236102] When create or destroy vlan, the physical interface is flapping

2019-03-01 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236102

--- Comment #12 from Alexey  ---
I rolled back on the current server to revision r341831 and it did not help,
then I remembered that on the second server I used the drivers installed from
the ports (/usr/ports/net/intel-ix-kmod/).
Now when using these drivers - there are no problems and the physical interface
does not flap.
FreeBSD nas1 12.0-STABLE FreeBSD 12.0-STABLE r341831 NAS  amd64

Mar  1 13:38:29 nas1 ntpd[1127]: Listen normally on 8 vlan111 192.168.0.8:123
Mar  1 13:38:32 nas1 ntpd[1127]: Deleting interface #8 vlan111,
192.168.0.8#123, interface stats: received=0, sent=0, dropped=0, active_time=3
secs

[root@nas1 /home/pautina]#  cat /etc/rc.conf |egrep "vlan111|ix1"
cloned_interfaces="vlan111 vlan780"
ifconfig_ix1="up -tso4 -tso6 -lro -vlanhwtso description
-=INTERFACE-TO-APHRODITE="
ifconfig_vlan111="inet 192.168.0.8 netmask 255.255.255.224 vlan 111 vlandev ix1
description -=NAS-WORLD=-"
[root@nas1 /home/pautina]#

[root@nas1 /home/pautina]# ifconfig vlan111
vlan111: flags=8843 metric 0 mtu 1500
description: -=NAS-WORLD=-
options=63
ether e4:11:5b:9b:72:b4
inet 192.168.0.8 netmask 0xffe0 broadcast 192.168.0.31
groups: vlan
vlan: 111 vlanpcp: 0 parent interface: ix1
media: Ethernet autoselect (10Gbase-Twinax )
status: active
nd6 options=29
[root@nas1 /home/pautina]#

[root@nas1 /home/pautina]# ifconfig ix1
ix1: flags=8843 metric 0 mtu 1500
description: -=INTERFACE-TO-APHRODITE=
   
options=e000bb
ether e4:11:5b:9b:72:b4
media: Ethernet autoselect (10Gbase-Twinax )
status: active
nd6 options=29
[root@nas1 /home/pautina]#

[root@nas1 /home/pautina]# grep if_ix_updated /boot/loader.conf
if_ix_updated_load="YES"
[root@nas1 /home/pautina]#

[root@nas1 /home/pautina]# kldstat
Id Refs AddressSize Name
 19 0x8020  12f04e0 kernel
 21 0x814f1000562f0 if_ix_updated.ko
 31 0x81621000  acf mac_ntpd.ko
[root@nas1 /home/pautina]#

[root@nas1 /home/pautina]# dmesg |grep ix1
ix1:  port
0xe000-0xe01f mem 0xdf98-0xdf9f,0xdfb0-0xdfb03fff irq 17 at device
0.1 on pci2
ix1: Using MSI-X interrupts with 5 vectors
ix1: Ethernet address: e4:11:5b:9b:72:b4
ix1: PCI Express Bus: Speed 5.0GT/s Width x8
ix1: link state changed to UP
[root@nas1 /home/pautina]#

ix0@pci0:2:0:0: class=0x02 card=0x17d3103c chip=0x10fb8086 rev=0x01
hdr=0x00
vendor = 'Intel Corporation'
device = '82599ES 10-Gigabit SFI/SFP+ Network Connection'
class  = network
subclass   = ethernet
ix1@pci0:2:0:1: class=0x02 card=0x17d3103c chip=0x10fb8086 rev=0x01
hdr=0x00
vendor = 'Intel Corporation'
device = '82599ES 10-Gigabit SFI/SFP+ Network Connection'
class  = network
subclass   = ethernet

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] D19422: if_vxlan(4) Allow set MTU more than 1500 bytes.

2019-03-01 Thread aleksandr.fedorov_itglobal.com (Aleksandr Fedorov)
aleksandr.fedorov_itglobal.com created this revision.
aleksandr.fedorov_itglobal.com added reviewers: bryanv, hrs, network.
Herald added a subscriber: ae.

REVISION SUMMARY
  It seems, there are no reason to prevent setting MTU more than 1500 bytes.
  MTU greater than 1500 gives a significant increase in throughput.

TEST PLAN
  iperf3 tests between two machines using vxlan over 10Gbit network with 
various MTU.
  
  **Test 1. vxlan MTU -1500, physical network MTU - 9000**
  
# iperf3 -c 192.168.248.1
Connecting to host 192.168.248.1, port 5201
[  5] local 192.168.248.2 port 1050 connected to 192.168.248.1 port 5201
[ ID] Interval   Transfer Bitrate Retr  Cwnd
[  5]   0.00-1.00   sec   175 MBytes  1.46 Gbits/sec0   1.27 MBytes 
  
[  5]   1.00-2.01   sec   194 MBytes  1.63 Gbits/sec0   1.27 MBytes 
  
[  5]   2.01-3.00   sec   195 MBytes  1.64 Gbits/sec0   1.27 MBytes 
  
[  5]   3.00-4.01   sec   196 MBytes  1.64 Gbits/sec0   1.27 MBytes 
  
[  5]   4.01-5.00   sec   195 MBytes  1.64 Gbits/sec0   1.27 MBytes 
  
[  5]   5.00-6.00   sec   195 MBytes  1.63 Gbits/sec0   1.27 MBytes 
  
[  5]   6.00-7.00   sec   196 MBytes  1.64 Gbits/sec0   1.27 MBytes 
  
[  5]   7.00-8.00   sec   194 MBytes  1.64 Gbits/sec0   1.27 MBytes 
  
[  5]   8.00-9.00   sec   191 MBytes  1.60 Gbits/sec  493968 KBytes 
  
[  5]   9.00-10.00  sec   193 MBytes  1.62 Gbits/sec0   1.15 MBytes 
  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bitrate Retr
[  5]   0.00-10.00  sec  1.88 GBytes  1.61 Gbits/sec  493 sender
[  5]   0.00-10.00  sec  1.88 GBytes  1.61 Gbits/sec  
receiver

iperf Done.
  
  **Test 2. vxlan MTU -8900, physical network MTU - 9000**
  
# iperf3 -c 192.168.248.1
Connecting to host 192.168.248.1, port 5201
[  5] local 192.168.248.2 port 1052 connected to 192.168.248.1 port 5201
[ ID] Interval   Transfer Bitrate Retr  Cwnd
[  5]   0.00-1.00   sec   585 MBytes  4.90 Gbits/sec0   1.28 MBytes
[  5]   1.00-2.00   sec   655 MBytes  5.50 Gbits/sec0   1.28 MBytes
[  5]   2.00-3.00   sec   656 MBytes  5.50 Gbits/sec0   1.28 MBytes
[  5]   3.00-4.00   sec   655 MBytes  5.50 Gbits/sec0   1.28 MBytes
[  5]   4.00-5.00   sec   656 MBytes  5.50 Gbits/sec0   1.28 MBytes
[  5]   5.00-6.00   sec   655 MBytes  5.50 Gbits/sec0   1.28 MBytes
[  5]   6.00-7.00   sec   656 MBytes  5.50 Gbits/sec0   1.28 MBytes
[  5]   7.00-8.00   sec   658 MBytes  5.51 Gbits/sec0   1.28 MBytes
[  5]   8.00-9.00   sec   655 MBytes  5.50 Gbits/sec0   1.28 MBytes
[  5]   9.00-10.00  sec   655 MBytes  5.50 Gbits/sec0   1.28 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bitrate Retr
[  5]   0.00-10.00  sec  6.33 GBytes  5.44 Gbits/sec0 sender
[  5]   0.00-10.00  sec  6.33 GBytes  5.44 Gbits/sec  
receiver

iperf Done.
  
  **Test 3. vxlan MTU -8900, physical network MTU - 1500**
  
# iperf3 -c 192.168.248.1
Connecting to host 192.168.248.1, port 5201
[  5] local 192.168.248.2 port 1055 connected to 192.168.248.1 port 5201
[ ID] Interval   Transfer Bitrate Retr  Cwnd
[  5]   0.00-1.00   sec   301 MBytes  2.52 Gbits/sec0   1.28 MBytes
[  5]   1.00-2.00   sec   335 MBytes  2.81 Gbits/sec0   1.28 MBytes
[  5]   2.00-3.00   sec   336 MBytes  2.82 Gbits/sec0   1.28 MBytes
[  5]   3.00-4.00   sec   336 MBytes  2.82 Gbits/sec0   1.28 MBytes
[  5]   4.00-5.00   sec   335 MBytes  2.81 Gbits/sec0   1.28 MBytes
[  5]   5.00-6.00   sec   336 MBytes  2.82 Gbits/sec0   1.28 MBytes
[  5]   6.00-7.00   sec   336 MBytes  2.82 Gbits/sec0   1.28 MBytes
[  5]   7.00-8.00   sec   335 MBytes  2.81 Gbits/sec0   1.28 MBytes
[  5]   8.00-9.00   sec   336 MBytes  2.82 Gbits/sec0   1.28 MBytes
[  5]   9.00-10.00  sec   336 MBytes  2.82 Gbits/sec0   1.28 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bitrate Retr
[  5]   0.00-10.00  sec  3.25 GBytes  2.79 Gbits/sec0 sender
[  5]   0.00-10.00  sec  3.25 GBytes  2.79 Gbits/sec  
receiver

iperf Done.

REVISION DETAIL
  https://reviews.freebsd.org/D19422

AFFECTED FILES
  sys/net/if_vxlan.c

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: aleksandr.fedorov_itglobal.com, bryanv, hrs, #network
Cc: ae, freebsd-net-list, krzysztof.galazka_intel.com
diff --git a/sys/net/if_vxlan.c b/sys/net/if_vxlan.c
--- a/sys/net/if_vxlan.c
+++ b/sys/net/if_vxlan.c
@@ -2248,10 +2248,11 @@
 	ifr = (struct ifreq *) data;
 	i

[Differential] D19422: if_vxlan(4) Allow set MTU more than 1500 bytes.

2019-03-01 Thread rgrimes
rgrimes accepted this revision as: rgrimes.
This revision is now accepted and ready to land.

CHANGES SINCE LAST ACTION
  https://reviews.freebsd.org/D19422/new/

REVISION DETAIL
  https://reviews.freebsd.org/D19422

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: aleksandr.fedorov_itglobal.com, bryanv, hrs, #network, rgrimes
Cc: evgueni.gavrilov_itglobal.com, olevole_olevole.ru, ae, freebsd-net-list, 
krzysztof.galazka_intel.com
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: use of #ifdef INET and #ifdef INET6 in the kernel sources

2019-03-01 Thread Rick Macklem
Rodney W. Grimes wrote:
>Rick Macklem wrote:
>> I doubt NFS gets squeezed into such devices and, yes, it is a small amount.
>> Using source line counts via "wc" (ir includes comments, etc):
>> - This will reduce the # of lines by about 6 for a module of about 7700  
>> lines
>>which is loaded when either the nfscl or nfsserver modules are loaded.
>>(These are both about 25000 lines and require the krpc, which is another 
>> 1.
>>  I haven't included the Kerberos stuff, because I can't remember if that 
>> gets loaded
>>  unless Kerberos mounts get used.)
>> --> A savings of 6 lines in something like 43000.
>
>That means that nfsusrd is an extremly well behaved ipv4/ipv6
>agnostic deamon that only takes a small change to make it able
>to run as either v4/v6 as a single stack or dual stacked, at a cost
>that also sounds minial, even if it took an #ifdef for each of these
>lines that is only 6 in 43000 lines of code, which is a small cost.
Just fyi, the above referred to the kernel changes and not nfsuserd.c, but it
doesn't really matter.
I accept the argument that it documents where INET and INET6 code is .

>The same analysis on other code probably comes out no place near
>this.
>
>Also didnt this use to use a unix domain?  Could the unix domain
>be put back and knobbed so that I could actually run this without
>it doing the localnet thing at all?  I know that it had issues
>as the socket is in /tmp and if /tmp isnt a right type file system,
>etc...  But some of us do know that and do run with a /tmp that
>would support AF_UNIX type nfsusrd.
>
>If it takes 6 lines of ifdef to do v4 vs v6, how many lines of
>ifdef is it to add AF_UNIX back and make it run time choice?
>
>(Goes looking for more Nomex clothing :-)
The AF_LOCAL code was in head for a short period of time before a vnode lock 
panic()
issue was reported and I reverted the patch.

Here is the commit log message for that reversion:
PR#230752 shows a panic where an nfsd thread tries to do soconnect() on
the AF_LOCAL socket used by the nfsuserd while already holding an
exclusive lock on it. I am not 100% sure how this happens, but since an
AF_LOCAL socket is in the file system namespace it is conceivable that it
could lock it and then attempt an upcall to the nfsuserd.
However, reverting r320757 stops the nfsuserd from using an AF_LOCAL
socket, so it should avoid any such panic().
r320757 did fix a problem with running the nfsuserd when jails were
enabled, but that can be dealt with less elegantly by allowing the
use of an alternate address instead of 127.0.0.1.
The gssd daemon also uses an AF_LOCAL socket, but it will do upcalls
before the nfsd thread processes the RPC, so I think it should not
be suseptible to this problem.

As you can see from the above, I wasn't 100% sure what caused the vnode lock 
panic().
It might only occur for the AF_LOCAL socket being on an NFS mount, but I am not 
sure.
I also don't like the idea of code that depends on the kind of underlying file 
system to
function correctly. (The VFS/VOP interface is meant to make "type of file 
system"
transparent to userland as far as possible, as I see it.)

So, I don't feel comfortable with enabling AF_LOCAL for certain file system 
types,
since I know using an INET/INET6 socket address avoids the problem.

I could make an argument for some other "namespace" for local sockets other than
the file system directory structure, but that sounds like way too much work for 
me...

rick
ps: I think the above answers hrs@'s comment about AF_LOCAL as well.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: use of #ifdef INET and #ifdef INET6 in the kernel sources

2019-03-01 Thread Rick Macklem
Rick Macklem wrote:
[stuff snipped]
>The AF_LOCAL code was in head for a short period of time before a vnode lock 
>panic()
>issue was reported and I reverted the patch.
>
>Here is the commit log message for that reversion:
>PR#230752 shows a panic where an nfsd thread tries to do soconnect() on
>the AF_LOCAL socket used by the nfsuserd while already holding an
>exclusive lock on it. I am not 100% sure how this happens, but since an
>AF_LOCAL socket is in the file system namespace it is conceivable that it
>could lock it and then attempt an upcall to the nfsuserd.
>However, reverting r320757 stops the nfsuserd from using an AF_LOCAL
>socket, so it should avoid any such panic().
>r320757 did fix a problem with running the nfsuserd when jails were
>enabled, but that can be dealt with less elegantly by allowing the
>use of an alternate address instead of 127.0.0.1.
>The gssd daemon also uses an AF_LOCAL socket, but it will do upcalls
>before the nfsd thread processes the RPC, so I think it should not
>be suseptible to this problem.

Oops. Duh. I should have read my own commit message more carefully...
It was an nfsd thread, so the file system wasn't an NFS mount, it was a file
system exported to NFS client(s).

It is possible to test to see if a file system is exported, but that can change
at any time, so even if it isn't exported when the nfsuserd daemon is started,
it could be exported later.

So, I don't see any way AF_LOCAL can be safely used here.

Oh, and the kernel RPC expects to be able to do a soconnect() using a
sockaddr, so socketpair() won't do the trick.

rick

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: use of #ifdef INET and #ifdef INET6 in the kernel sources

2019-03-01 Thread Hiroki Sato
Rick Macklem  wrote
  in 
:

rm> Rick Macklem wrote:
rm> [stuff snipped]
rm> >The AF_LOCAL code was in head for a short period of time before a vnode 
lock panic()
rm> >issue was reported and I reverted the patch.
rm> >
rm> >Here is the commit log message for that reversion:
rm> >PR#230752 shows a panic where an nfsd thread tries to do soconnect() on
rm> >the AF_LOCAL socket used by the nfsuserd while already holding an
rm> >exclusive lock on it. I am not 100% sure how this happens, but since an
rm> >AF_LOCAL socket is in the file system namespace it is conceivable that it
rm> >could lock it and then attempt an upcall to the nfsuserd.
rm> >However, reverting r320757 stops the nfsuserd from using an AF_LOCAL
rm> >socket, so it should avoid any such panic().
rm> >r320757 did fix a problem with running the nfsuserd when jails were
rm> >enabled, but that can be dealt with less elegantly by allowing the
rm> >use of an alternate address instead of 127.0.0.1.
rm> >The gssd daemon also uses an AF_LOCAL socket, but it will do upcalls
rm> >before the nfsd thread processes the RPC, so I think it should not
rm> >be suseptible to this problem.
rm>
rm> Oops. Duh. I should have read my own commit message more carefully...
rm> It was an nfsd thread, so the file system wasn't an NFS mount, it was a file
rm> system exported to NFS client(s).
rm>
rm> It is possible to test to see if a file system is exported, but that can 
change
rm> at any time, so even if it isn't exported when the nfsuserd daemon is 
started,
rm> it could be exported later.
rm>
rm> So, I don't see any way AF_LOCAL can be safely used here.
rm>
rm> Oh, and the kernel RPC expects to be able to do a soconnect() using a
rm> sockaddr, so socketpair() won't do the trick.

 Thank you for your clarification.  I agree that AF_INET/AF_INET6
 loopback address is much easier to avoid file system dependency.  For
 the original question about whether we should use a hard-coded
 address or not, my suggestion is that we do not need to use the name
 "localhost" or rely on any Internet domain resolver code such as DNS
 as long as the hard-coded addresses cover all of supported address
 families (we have only two in practice).

 However, an option to specify a loopback address to be used might
 mitigate multiple loopback address issues in classical jail w/o VNET
 or multiple supported address families.  Is there any problem with
 using NFSSVC_NFSUSERDPORT to pass a struct sockaddr
 (i.e. udptransp->xp_ltaddr, not only xp_port) directly?  I think
 checking on kernel side if (sa_family == AF_INET or AF_INET6) and the
 address is already bound before the newfs_connect() call in
 nfsrv_nfsuserdport() guarantee that the specified sockaddr is a
 loopback address.

 Anyway, we might want to have AF_LOCAL socket with namespace which
 does not depend on any file system to communicate between kernel and
 userland.  Linux has it for a long time (by putting '\0' at the head
 of an AF_LOCAL address) while I am still not sure if this is the best
 way to implement.  While a new protocol family (PF_KEY is used in
 IPsec for example) also works for this purpose, it is probably a
 sledgehammer to crack a nut.

-- Hiroki


pgpqPpnKIfYSZ.pgp
Description: PGP signature