from:"John"

kernel crash: devctl set driver -f mlx5_core6 ppt

2023-09-28 Thread John

Hi Folks,

   Working against 13.2-STABLE.

   I have a chance to get some bhyve VMs running on new hardware
with Mellanox 100Gb/s cards. After creating VF entries with iovctl
at boottime, a devctl command to detach the mlx5 driver and attach
the ppt driver causes the kernel to crash:

 # devctl set driver -f mlx5_core6 ppt

backtrace here: https://people.freebsd.org/~jwd/mlx5.dump.txt

   If I create the VF in ppt mode I can correctly detach
the ppt driver and attach the mlx5 driver.

   Also of note, if multiple VFs are created and a single
VF is targeted for the detach operation, all VFs are operated
on. It seems the VFs are not seen as individual entities
but a group of children in detach_method().

   Thoughts?

Thanks,
John

Re: kernel crash: devctl set driver -f mlx5_core6 ppt

2023-09-30 Thread John

- Konstantin Belousov's Original Message -
> On Fri, Sep 29, 2023 at 04:32:30AM +0000, John wrote:
> > Hi Folks,
> > 
> >Working against 13.2-STABLE.
> > 
> >I have a chance to get some bhyve VMs running on new hardware
> > with Mellanox 100Gb/s cards. After creating VF entries with iovctl
> > at boottime, a devctl command to detach the mlx5 driver and attach
> > the ppt driver causes the kernel to crash:
> > 
> >  # devctl set driver -f mlx5_core6 ppt
> > 
> > backtrace here: https://people.freebsd.org/~jwd/mlx5.dump.txt
> What is the line number for pci_iov_detach_method+0x5e?
> Better, load vmcore into debugger and get the backtrace from kgdb.

Took a bit to get a netdump off the system. Results:

#12 0x810e0a89 in trap_fatal (frame=0xfe278b8e9860, eva=2016) at 
/usr/src/sys/amd64/amd64/trap.c:940
#13 0x810e0adf in trap_pfault (frame=0xfe278b8e9860, 
usermode=false, signo=, ucode=) at 
/usr/src/sys/amd64/amd64/trap.c:759
#14 

#15 0x80860a0d in PCI_IOV_UNINIT (dev=0xfa0085704400) at 
./pci_iov_if.h:44

#16 pci_iov_delete_iov_children (dinfo=0xfa0086e3b300) at 
/usr/src/sys/dev/pci/pci_iov.c:873
#17 0x808607ce in pci_iov_detach_method (bus=, 
dev=0xfa0085704400) at /usr/src/sys/dev/pci/pci_iov.c:208
#18 0x80ea64a1 in PCI_IOV_DETACH (dev=0xf80110ffb400, 
child=0xfa0085704400) at ./pci_if.h:510
#19 pci_iov_detach (dev=0xfa0085704400) at /usr/src/sys/dev/pci/pci_iov.h:47
#20 remove_one (pdev=0xf8011000c180) at 
/usr/src/sys/dev/mlx5/mlx5_core/mlx5_main.c:1739
#21 0x80e82666 in linux_pci_detach_device 
(pdev=pdev@entry=0xf8011000c180) at 
/usr/src/sys/compat/linuxkpi/common/src/linux_pci.c:524
#22 0x80e84e74 in linux_pci_detach (dev=0xfa0085704400) at 
/usr/src/sys/compat/linuxkpi/common/src/linux_pci.c:514
#23 0x80c4d9f6 in DEVICE_DETACH (dev=0xfa0085704400) at 
./device_if.h:234
#24 device_detach (dev=dev@entry=0xfa0085704400) at 
/usr/src/sys/kern/subr_bus.c:3093
#25 0x80c54f7b in devctl2_ioctl (cdev=, cmd=, data=0xf8014bcc8500 "mlx5_core6", 
fflag=, td=) at 
/usr/src/sys/kern/subr_bus.c:5949
#26 0x80aa376c in devfs_ioctl (ap=0xfe278b8e9ba8) at 
/usr/src/sys/fs/devfs/devfs_vnops.c:942
#27 0x80d0c7b8 in vn_ioctl (fp=0xf80150025730, 
com=18446744071589646072, data=0xf8014bcc8500, 
active_cred=0xf80166137300, td=0x0) at 
/usr/src/sys/kern/vfs_vnops.c:1701
#28 0x80aa3e3e in devfs_ioctl_f (fp=0x6, com=18446744071589646072, 
data=0xfc, cred=0x0, td=0x0)
at /usr/src/sys/fs/devfs/devfs_vnops.c:873

Moving up the stack frames:

41  static __inline void PCI_IOV_UNINIT(device_t dev)
42  {
43  kobjop_t _m;
44  KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_iov_uninit);
45  ((pci_iov_uninit_t *) _m)(dev);
46  }

 p ((kobj_t)dev)->ops
$8 = (kobj_ops_t) 0x0

#define KOBJOPLOOKUP(OPS,OP) do {   \
kobjop_desc_t _desc = &OP##_##desc; \
kobj_method_t **_cep =  \
&OPS->cache[_desc->id & (KOBJ_CACHE_SIZE-1)];   \

Leading to OPS == NULL

> >If I create the VF in ppt mode I can correctly detach
> > the ppt driver and attach the mlx5 driver.
> > 
> >Also of note, if multiple VFs are created and a single
> > VF is targeted for the detach operation, all VFs are operated
> > on. It seems the VFs are not seen as individual entities
> > but a group of children in detach_method().

9-STABLE: Chelsio t4nex0: failed to pre-process config file: 2.

2013-06-02 Thread John

Hi Folks,

   I have a pair of Chelsio T4 cards installed in a new HP DL380
system. The driver does not load at boot time, failing with the
message:

t4nex0: failed to pre-process config file: 2.

   After the system has finished booting, if I then issue a 
'kldload if_cxgbe' command, the driver loads correctly. Note,
the driver loads correctly from the command prompt with or
without the if_cxgbe_load in /boot/loader.conf.

   The message is coming from t4_main.c:partition_resources().
I don't see anything obvious that would cause this:

rc = cfg ? upload_config_file(sc, cfg, &mtype, &maddr) : ENOENT;
if (rc != 0) {
mtype = FW_MEMTYPE_CF_FLASH;
maddr = t4_flash_cfg_addr(sc);
}

bzero(&caps, sizeof(caps));
caps.op_to_write = htobe32(V_FW_CMD_OP(FW_CAPS_CONFIG_CMD) |
F_FW_CMD_REQUEST | F_FW_CMD_READ);
caps.cfvalid_to_len16 = htobe32(F_FW_CAPS_CONFIG_CMD_CFVALID |
V_FW_CAPS_CONFIG_CMD_MEMTYPE_CF(mtype) |
V_FW_CAPS_CONFIG_CMD_MEMADDR64K_CF(maddr >> 16) | FW_LEN16(caps));
rc = -t4_wr_mbox(sc, sc->mbox, &caps, sizeof(caps), &caps);
if (rc != 0) {
device_printf(sc->dev,
"failed to pre-process config file: %d.\n", rc);
return (rc);
}

   Has anyone run into this?

Thanks,
John

ps: And the output from loading the driver module by hand:

t4nex0:  mem 
0xf7cc-0xf7cf,0xf700-0xf77f,0xf6ff-0xf6ff1fff irq 26 at 
device 0.4 on pci7
t4nex0: installing firmware 1.8.4.0 on card.
cxgbe0:  on t4nex0
cxgbe0: Ethernet address: 00:07:43:11:e9:00
cxgbe0: 16 txq, 8 rxq
cxgbe1:  on t4nex0
cxgbe1: Ethernet address: 00:07:43:11:e9:08
cxgbe1: 16 txq, 8 rxq
cxgbe2:  on t4nex0
cxgbe2: Ethernet address: 00:07:43:11:e9:10
cxgbe2: 16 txq, 8 rxq
cxgbe3:  on t4nex0
cxgbe3: Ethernet address: 00:07:43:11:e9:18
cxgbe3: 16 txq, 8 rxq
t4nex0: PCIe x8, 4 ports, 34 MSI-X interrupts, 101 eq, 33 iq
t4nex1:  mem 
0xfbcc-0xfbcf,0xfb00-0xfb7f,0xfaff-0xfaff1fff irq 58 at 
device 0.4 on pci36
t4nex1: installing firmware 1.8.4.0 on card.
cxgbe4:  on t4nex1
cxgbe4: Ethernet address: 00:07:43:11:e6:a0
cxgbe4: 16 txq, 8 rxq
cxgbe5:  on t4nex1
cxgbe5: Ethernet address: 00:07:43:11:e6:a8
cxgbe5: 16 txq, 8 rxq
cxgbe6:  on t4nex1
cxgbe6: Ethernet address: 00:07:43:11:e6:b0
cxgbe6: 16 txq, 8 rxq
cxgbe7:  on t4nex1
cxgbe7: Ethernet address: 00:07:43:11:e6:b8
cxgbe7: 16 txq, 8 rxq
t4nex1: PCIe x8, 4 ports, 34 MSI-X interrupts, 101 eq, 33 iq




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: 9-STABLE: Chelsio t4nex0: failed to pre-process config file: 2.

2013-06-02 Thread John

- Alfred Perlstein's Original Message -
> This looks like the result of forgetting to include the actual
> firmware in the kernel config and/or the firmware device itself.
> 
> Can you check if you've included all the needed extra modules in the
> kernel config such as firmware(4) and the module for the card
> firmware itself?

   Thank you for the hint. I tracked down t4fw_cfg via the Makefile
in the modules area.

   However, I'm not actually sure how it works at this point. When I
kldload if_cxgbe and check for t4fw_cfg it does not appear to be
loaded (kldstat -v | grep t4).

   Moving on, adding t4fw_cfg_load to loader.conf we end up with:

Id Refs AddressSize Name
 1   17 0x8020 15652a8  kernel
 21 0x81766000 4820 coretemp.ko
 31 0x8176b000 797b0t4fw_cfg.ko
 41 0x817e5000 45b38if_cxgbe.ko
 51 0x8182b000 11b78ipmi.ko
 62 0x8183d000 2a30 smbus.ko

   and everything seems to work. I think this is worth a patch
to the man page at least:

--- cxgbe.4.orig2012-09-13 08:57:44.0 -0400
+++ cxgbe.4 2012-09-13 08:59:43.0 -0400
@@ -46,9 +46,10 @@
 .Ed
 .Pp
 To load the driver as a
-module at boot time, place the following line in
+module at boot time, place the following lines in
 .Xr loader.conf 5 :
 .Bd -literal -offset indent
+t4fw_cfg_load="YES"
 if_cxgbe_load="YES"
 .Ed
 .Sh DESCRIPTION

   Thoughts?

Cheers,
John

> A trick you can use is to run "kldstat" after loading the module,
> you'll see which additional modules were needed for the device to
> work.  Unfortunately the kernel can't autoload those modules while
> booting.
> 
> I'm not sure if loader(8) picks up the deps either.
> 
> -Alfred
> 
> 
> On 6/2/13 6:22 PM, John wrote:
> >Hi Folks,
> >
> >I have a pair of Chelsio T4 cards installed in a new HP DL380
> >system. The driver does not load at boot time, failing with the
> >message:
> >
> >t4nex0: failed to pre-process config file: 2.
> >
> >After the system has finished booting, if I then issue a
> >'kldload if_cxgbe' command, the driver loads correctly. Note,
> >the driver loads correctly from the command prompt with or
> >without the if_cxgbe_load in /boot/loader.conf.
> >
> >The message is coming from t4_main.c:partition_resources().
> >I don't see anything obvious that would cause this:
> >
> > rc = cfg ? upload_config_file(sc, cfg, &mtype, &maddr) : ENOENT;
> > if (rc != 0) {
> > mtype = FW_MEMTYPE_CF_FLASH;
> > maddr = t4_flash_cfg_addr(sc);
> > }
> > bzero(&caps, sizeof(caps));
> > caps.op_to_write = htobe32(V_FW_CMD_OP(FW_CAPS_CONFIG_CMD) |
> > F_FW_CMD_REQUEST | F_FW_CMD_READ);
> > caps.cfvalid_to_len16 = htobe32(F_FW_CAPS_CONFIG_CMD_CFVALID |
> > V_FW_CAPS_CONFIG_CMD_MEMTYPE_CF(mtype) |
> > V_FW_CAPS_CONFIG_CMD_MEMADDR64K_CF(maddr >> 16) | 
> > FW_LEN16(caps));
> > rc = -t4_wr_mbox(sc, sc->mbox, &caps, sizeof(caps), &caps);
> > if (rc != 0) {
> > device_printf(sc->dev,
> > "failed to pre-process config file: %d.\n", rc);
> > return (rc);
> > }
> >
> >Has anyone run into this?
> >
> >Thanks,
> >John
> >
> >ps: And the output from loading the driver module by hand:
> >
> >t4nex0:  mem 
> >0xf7cc-0xf7cf,0xf700-0xf77f,0xf6ff-0xf6ff1fff irq 26 at 
> >device 0.4 on pci7
> >t4nex0: installing firmware 1.8.4.0 on card.
> >cxgbe0:  on t4nex0
> >cxgbe0: Ethernet address: 00:07:43:11:e9:00
> >cxgbe0: 16 txq, 8 rxq
> >cxgbe1:  on t4nex0
> >cxgbe1: Ethernet address: 00:07:43:11:e9:08
> >cxgbe1: 16 txq, 8 rxq
> >cxgbe2:  on t4nex0
> >cxgbe2: Ethernet address: 00:07:43:11:e9:10
> >cxgbe2: 16 txq, 8 rxq
> >cxgbe3:  on t4nex0
> >cxgbe3: Ethernet address: 00:07:43:11:e9:18
> >cxgbe3: 16 txq, 8 rxq
> >t4nex0: PCIe x8, 4 ports, 34 MSI-X interrupts, 101 eq, 33 iq
> >t4nex1:  mem 
> >0xfbcc-0xfbcf,0xfb00-0xfb7f,0xfaff-0xfaff1fff irq 58 at 
> >device 0.4 on pci36
> >t4nex1: installing firmware 1.8.4.0 on card.
> >cxgbe4:  on t4nex1
> >cxgbe4: Ethernet address: 00:07:43:11:e6:a0
> >cxgbe4: 16 txq, 8 rxq
> >cxgbe5:  on t4nex1
> >cxgbe5: Ethernet address: 00:07:43:11:e6:a8
> >cxgbe5: 16 txq, 8 rxq
> >cxgbe6:  on t4nex1
> >cxgbe6: Ethernet ad

0 frame length?

2012-08-01 Thread John

Hi Folks,

   On a Dell R610 system, I've been tracking down some nework issues
and ran across this in a tcpdump:

  84 2b 2b fd be 2e f0 4d  a2 08 c4 13 08 00 45 00   .++M ..E.
0010  00 00 f0 2a 40 00 40 06  00 00 0a 18 09 ee 0a 18   ...*@.@. 
0020  1e 08 58 57 14 02 5e 30  ea dc 61 84 62 b3 80 18   ..XW..^0 ..a.b...
0030  08 00 42 10 00 00 01 01  08 0a 1e 67 b2 58 dc 56   ..B. ...g.X.V
0040  81 12 4e 45 54 50 41 43  4b 04 00 00 00 00 00 00   ..NETPAC K...
  additional packet data

offset 0x10 should be the frame length - not 0. This only seems to happen
in packets being sent from this system/interface.

Corresponding interface:

bce0: flags=8943 metric 0 mtu 
1500

options=c01bb
ether f0:4d:a2:08:c4:13
inet 10.24.9.238 netmask 0x broadcast 10.24.255.255
media: Ethernet autoselect (1000baseT 
)
status: active


8.2-RELEASE

Wireshark reports:

Bogus IP len (0, less than header length 20)


   In googling, I've seen comments about lro/tso & the *csum options.
I have those set to off for the next time the systems get rebooted.
The firmware is also slightly out-of-date and I'll probably update
that also after finding out what the updates target.


   Does anyone have any ideas what the problem might be, or place
I might want to look?  I have a 9.1 system being put in to run some
comparison benchmarks against to see if the same problem occurs.

Thanks!
John

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Dell PowerEdge R820 Broadcom BCM57800 support

2012-08-16 Thread John

Hi Folks,

   I have an R820 I'm testing. The system seems to boot up fine, but
no network adapters show up. From pciconf -l :

none4@pci0:1:0:0:   class=0x02 card=0x1f5c1028 chip=0x168a14e4 rev=0x10 
hdr=0x00
none5@pci0:1:0:1:   class=0x02 card=0x1f5c1028 chip=0x168a14e4 rev=0x10 
hdr=0x00
none6@pci0:1:0:2:   class=0x02 card=0x1f671028 chip=0x168a14e4 rev=0x10 
hdr=0x00
none7@pci0:1:0:3:   class=0x02 card=0x1f671028 chip=0x168a14e4 rev=0x10 
hdr=0x00

which appears to be these:

Broadcom BCM57800 NetXtreme II 10 GigE   1f5c
Broadcom BCM57800 NetXtreme II 1 GigE1f67

Does anyone have any experience with these?

Thanks,
John

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Production use of carp?

2011-06-02 Thread John

Hi Folks,

   Posting to -net & -fs to hopefully catch the right folks.
A similar posting to -current didn't seem to catch anyones
interest.  Please respond as approriate.

   I'm in the process of setting up HA/Failover ZFS server
systems using carp. I seem to be running into some issues that
may simply be misundersandings, or actual support issues. I'm
curious to hear what you think.

   First off, when using carp, one must use a unique vhid in
the configuration line for each system. If not, systems using
the same vhid, but different passwords will see a serious amount
of "jitteryness" and/or delay to their carp'd interface. This
means a unique set of vhid values would need to be assigned and
kept track of for every system put in place. Not something I
want to do. I've already run into this problem with another
group that was using carp on external interfaces to control
an HA nagios setup.

   Instead of running carp on the external interfaces as below:

ifconfig_cxgb0="inet 10.24.99.11 netmask 255.255.0.0"  # System 1 physical ip
ifconfig_cxgb0="inet 10.24.99.12 netmask 255.255.0.0"  # System 2 physcial ip
ifconfig_carp1="vhid 1 pass zfscarp1 advbase 1 advskew 100 10.24.99.13 netmask 
255.255.0.0" # HA ip used by clients
  
   ... we instead connect a direct cross-over cable between the two systems
providing HA/Failover and use a private (backside) network:

ifconfig_cxgb1="inet 192.168.0.1 netmask 255.255.255.0"  # System 1 private ip
ifconfig_cxgb1="inet 192.168.0.2 netmask 255.255.255.0"  # System 2 private ip
ifconfig_carp1="vhid 1 pass zfscarp1 advbase 1 advskew 100 192.168.0.3 netmask 
255.255.255.0"

   If system A is the MASTER, and I issue a 'ifconfig carp1 down'
command, system B becomes the MASTER as one would expect (using
scripts connected up through devd). So far, things are great. A
filesystem resource can be shifted to either A or B with no
impact on the clients.  Other scripts hooked up via devd monitor
the outgoing link and issue ifconfig carp1 up/down commands as
needed (for instance if the networking cable is unplugged on
head B).

   However, if system A is the MASTER, and system B is rebooted,
the carp interface on system A will flip/flop going down and
coming back up which is not what I want.

   This leads to my question, am I missing something simple about
using carp?  Should I implement my own control interface on the
private network and not use carp? What are other folks doing?

Thanks,
John

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

ifconfig -alias with duplicate netmasks work?

2011-08-22 Thread John

Fellow Net'ers

   Debugging an nfs locking problem to a linux host, I accidently
issued some ifconfig commands on the bsd server (9-current) and
found that duplicate netmasks seem to work fine. For instance:

bce0: flags=8843 metric 0 mtu 1500

options=c01bb
ether d4:85:64:66:2a:14
inet6 fe80::d685:64ff:fe66:2a14%bce0 prefixlen 64 scopeid 0x1 
inet 10.24.99.127 netmask 0x broadcast 10.24.255.255
inet 10.24.99.128 netmask 0x broadcast 10.24.255.255
inet 10.24.99.126 netmask 0x broadcast 10.24.255.255
nd6 options=29
media: Ethernet autoselect (1000baseT )
status: active

via the commands:

ifconfig bce0 inet 10.24.99.127 netmask 0x broadcast 10.24.255.255
ifconfig bce0 inet 10.24.99.128 netmask 0x broadcast 10.24.255.255 alias
ifconfig bce0 inet 10.24.99.126 netmask 0x broadcast 10.24.255.255 alias

The man page for ifconfig says one 'must' use a different netmask,
typically 0x. However, everything still seems to work ok.

Has something changed, is the manpage wrong, am I totally missing
something?

Thanks,
John

man ifconfig

If the address
is on the same subnet as the first network address for this
interface, a non-conflicting netmask must be given.  Usually
0x is most appropriate.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

problems compiling raw socket program

2001-01-25 Thread John


Hi

Can anyone enlighten me on why I can't compile?

I use gcc -o rawsocket rawsocket.c

and I get:

bash-2.03$ gcc -o rawsocket  rawsocket.c
In file included from rawsocket.c:7:
/usr/include/netinet/ip.h:152: parse error before
`n_long'
/usr/include/netinet/ip.h:152: warning: no semicolon
at end of struct or union
/usr/include/netinet/ip.h:152: warning: no semicolon
at end of struct or union
/usr/include/netinet/ip.h:155: parse error before
`n_long'
/usr/include/netinet/ip.h:155: warning: no semicolon
at end of struct or union
/usr/include/netinet/ip.h:156: warning: data
definition has no type or storage class
/usr/include/netinet/ip.h:157: parse error before `}'
/usr/include/netinet/ip.h:157: warning: data
definition has no type or storage class
/usr/include/netinet/ip.h:158: parse error before `}'
bash-2.03$ 


thanks

john

__
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices. 
http://auctions.yahoo.com/

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 


#define PORTNUMBER 80

int main(void)
{
int n, s;
char buf[4096];
char hostname[64];
//struct hostent *hp;
struct sockaddr_in name;
struct ip *iph = (struct ip *) buf;
struct tcphdr *tcph = (struct tcphdr *) buf + sizeof (struct ip);

memset (buf, 0, 4096);  /* zero out the buffer */

/* we'll now fill in the ip/tcp header values, see above for explanations */
iph->ip_hl = 5;
iph->ip_v = 4;
iph->ip_tos = 0;
iph->ip_len = sizeof (struct ip) + sizeof (struct tcphdr);  /* no payload 
*/
iph->ip_id = htonl (54321); /* the value doesn't matter here */
iph->ip_off = 0;
iph->ip_ttl = 255;
iph->ip_p = 6;
iph->ip_sum = 0;/* set it to 0 before computing the actual 
checksum later */
iph->ip_src.s_addr = inet_addr ("1.2.3.4");/* SYN's can be blindly spoofed */
iph->ip_dst.s_addr = name.sin_addr.s_addr;
tcph->th_sport = htons (1234);  /* arbitrary port */
tcph->th_dport = htons (PORTNUMBER);
tcph->th_seq = random ();/* in a SYN packet, the sequence is a random */
tcph->th_ack = 0;/* number, and the ack sequence is 0 in the 1st packet */
tcph->th_x2 = 0;
tcph->th_off = 0;   /* first and only tcp segment */
tcph->th_flags = TH_SYN;/* initial connection request */
tcph->th_win = htonl (65535);   /* maximum allowed window size */
tcph->th_sum = 0;/* if you set a checksum to zero, your kernel's IP stack
  should fill in the correct checksum during transmission */
tcph->th_urp = 0;

iph->ip_sum = csum ((unsigned short *) buf, iph->ip_len >> 1);

if ((s = socket(AF_INET, SOCK_RAW, IPPROTO_UDP)) < 0) {
perror("socket");
exit(1);
}

/* finally, it is very advisable to do a IP_HDRINCL call, to make sure
   that the kernel knows the header is included in the data, and doesn't
   insert its own header into the packet before our data */

  { /* lets do it the ugly way.. */
int one = 1;
const int *val = &one;
if (setsockopt (s, IPPROTO_IP, IP_HDRINCL, val, sizeof (one)) < 0)
  printf ("Warning: Cannot set HDRINCL!\n");
  }

//create the address of the server, also local
memset(&name, 0, sizeof(struct sockaddr_in));

name.sin_family = AF_INET;
name.sin_port = htons(PORTNUMBER);
name.sin_addr.s_addr = inet_addr ("127.0.0.1");

  if (sendto (s,/* our socket */
  buf,  /* the buffer containing headers and data */
  iph->ip_len,  /* total length of our datagram */
  0,/* routing flags, normally always 0 */
  (struct sockaddr *) &name,/* socket addr, just like in */
  sizeof (name)) < 0)   /* a normal send() */
printf ("error\n");
  else
printf (".");

close(s);
exit(0);
}

Re: problems compiling raw socket program

2001-01-25 Thread John


Hi

I tried that as you had advised, but the error remains
the same.

bash-2.03$ gcc -o rawsocket  rawsocket.c
In file included from rawsocket.c:7:
/usr/include/netinet/ip.h:152: parse error before
`n_long'
/usr/include/netinet/ip.h:152: warning: no semicolon
at end of struct or union
/usr/include/netinet/ip.h:152: warning: no semicolon
at end of struct or union
/usr/include/netinet/ip.h:155: parse error before
`n_long'
/usr/include/netinet/ip.h:155: warning: no semicolon
at end of struct or union
/usr/include/netinet/ip.h:156: warning: data
definition has no type or storage class
/usr/include/netinet/ip.h:157: parse error before `}'
/usr/include/netinet/ip.h:157: warning: data
definition has no type or storage class
/usr/include/netinet/ip.h:158: parse error before `}'
bash-2.03$ 



--- Wilbert de Graaf <[EMAIL PROTECTED]> wrote:
> > Can anyone enlighten me on why I can't compile?
> 
> It tells you some types are missing so you need to
> add one or more headers.
> This will probably do it: #include
> 
> 
> Wilbert
> 
> 


__
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices. 
http://auctions.yahoo.com/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message

Re: Sympatico ADSL connection through a hub

2002-10-07 Thread John


Try these links:
http://free.mine.nu/~squirrel/PPPoE/FreeBSD%20PPPoE%20Howto.htm
http://www.freebsddiary.org/pppoe.php
http://renaud.waldura.com/doc/freebsd/pppoe/
JT.
- Original Message -
From: "alexis georges" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, October 07, 2002 12:55 PM
Subject: Sympatico ADSL connection through a hub


> Hello,
> This is my first message posted to the list..hehehe
> anyways, i used to have a simple cable connection, which would be
> automatically comfigured by FreeBSD upon installation..I just moved and
now
> we have an ADSL connection (from Sympatico).. The connection is not
> configured automatically..so i looked on the web for a solution. I found a
> few pages explaining that i had to recompile my kernel with a few lines
such
> as 'options NETGRAPH'..and that i had to write some info into the ppp.conf
> file..i did those things but apparently it doesn't work..I am not sure how
> to work this out..cuz all the solutions i found were the same..
> So i will explain exactly how i am connected to the internet..
>
> 1. I have an ethernet card connected to a hub.
> 2. This hub is itself connected to the actual modem that we received from
> Sympatico.
>
> On Windowsl, the conection wont work unless the connection type is set to
> 10Mb Half-Duplex..so i am guessing it should be the same on FreeBSD..
> also, the address is supposed to be obtained dynamically..(on windows it
> says the address has been obtrained by DHCP, however, during FreeBSD
> installation, it fails to find the parameter for the connection)does that
> mean that instead of the 10.0.0.1 address teh solution form the web gave
me,
> i should put the address that i am using on windows (even though it
shouild
> be a dynamic address?)
> Anyways, i would really like to be able to fix this..if anyone could help,
> that would be great..i wanna get rid of windows
>
> Thank You
> Alexis
>
> _
> Chat with friends online, try MSN Messenger: http://messenger.msn.com
>
>
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message

Re: Intermittent problems with LAN transfer speeds

2004-01-09 Thread John

Long -
Adam wrote:
> Any suggestions on how to test this effectively?
SInce this is very intermittant and to the point where you have to reboot my
first suggestion:
Run  top  and check for a  memory leak from a process. I had this once where
apache would slowly use up the memory then all the swap space until after a
month the box was thrashing  and a restart of apache (or box) was required.
In  top  the more 'Free' on the Mem and the Swap lines the better. Let top
run and/or check daily.
 I also recently discovered that 'screen' when installed using pkg_add is a
CPU pig but runs fine when installed from ports.

I've had my share of mis-matched switches - usually on the more expensive
brands that can't seem to handle auto-negotiate very well (but for the $$$
they should :)
and usually on a co-lo's gear or some peering point down the ISP so I had to
go out of my way to prove the problem was on their side. The first tests I
run are to ping with a large packet size to emulate file transfers and not
Internet browsing, this is a common problem I had with ISP techs - They
would say "Oh it pings fine, 0% loss" then I have to try and get them to use
large packets because file transfers, ftp's are not the same as browsing.

Network Testing:
I'll assume neither box is under heavy load prior to testing.
192.168.0.10 = W2K box,  so from the BSD:
ping -i 0.01 -D -p ff -s1472 -c 1000 192.168.0.10
man ping for the option's but a quick rundown:
-i 0.01 = interval between in secs.
-D - don't fragment the packets
-p ff = fill the packets with data.
-s 1472 = the size of each packet , if you get a "ping: sendto: Message too
long" then reduce this number by 5's. This (1472) was as high as I could go
on my LAN. Their is some overhead to the 1500 so you won't get that.  If you
have to reduce this by a lot to get proper output that would be strange.
-c 1000 = number of pings (side bar - I dropped in at a remote site and
turned on the console and found I'd left a regular ping running to the next
hop. I hadn't visited the site or rebooted the box in 6 months -doh- so I
use the '-c' flag now in case I get sidetracked)

Now try
ping -f -D -p ff -s1472 -c 25 192.168.0.10
Replace '-i 0.01' with  '-f' to do a flood ping, BTW only do this on your
own network never public, else you may get visit from your ISP.
Bump up the count to -c 25 or more.
For reference my results:
ping -f -D -p ff -s1472 -c 25 192.168.0.10
PATTERN: 0xff
PING 192.168.0.10 (192.168.0.10): 1472 data bytes
...
--- 192.168.0.10 ping statistics ---
25 packets transmitted, 249994 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.610/0.785/54.674/0.264 ms

If you get 100% Packet Loss then your firewall is in the way.
If you get any other packet loss (well I dropped a few on the flood ping but
not enough to bump the %)  then it could be hardware on any of the 3 devices
or some kind of load on 1 of them.
On mismatched duplexes I 've seen between 10-25% packet loss. Another
symptom is (on a synchronous link) would be regular speed ftp transfers in
one direction and very slow in the other direction.

Some  Inexpensive hardware checks:
Put in a crossover cable between the 2 systems thus eliminating the switch
and retest.
Borrow another NIC and re-test in both boxes.
Probably won't help and only if you are comfortable doing it - enter the
BIOS on each box and change the settings for IRQ's and see if you can get
the NIC's on separate IRQ #'s - try to keep the video IRQ off the NIC's On
my BSD servers I also always do this plus disable the USB, printer and
serial ports if not in use.
Could it be hard disk errors/retry's on either box ? If the room is quiet
you might be able to hear it - check the logs.

Software tools for the BSD side.
Install the port (or package) 'trafshow' and then have it running on
consoles for each card.
trafshow -nifxp0
trafshow -nidc0
When you get the slowdowns you can see if something other than your transfer
is going on.
Check to see if someone is bashing on your public connection, but for 2
years probably not :)
Take the public NIC down during these slowdowns to see what happens.

Good luck let us know how you make out.
Regards, JT

Adam's History:
> Since I first installed FreeBSD 2 years ago, I have intermittent
> problems with my LAN transfer speeds. It doesn't happen often, but when
> it does, I've not found any solution other than rebooting the server.
>
> My network configuration looks like this:
> cable modem --> freebsd 5.1-R --> dlink switch --> win2k workstation
>
> I normally get appx 10MB/s between my two machines. However,
> occasionally I'll fire up a transfer and only get 50-200KB/s, which is
> really awful.
>
> I've tried rebooting the Win2k machine, disconnecting the ethernet
> cables, even power cycling the switch; nothing helps.
>
> The only thing that seems to help is rebooting the server, which I
> really hate to do.
> Here's the output of ifconfig:
> -$ ifconfig
> fxp0: flags=8843 mtu 1500
>

ring buffer in freebsd (for bpf sniffing)

2004-01-14 Thread John

I've been talking with Luca Deri about a paper he wrote (
http://luca.ntop.org/Ring.pdf). In it he says he plans to port
this to FreeBSD. I was just wondering if anyone has looked this 
his work. I'd help him but seeing as this is way over my perl skills
head i though i would post over here about it. 

The code is over here.

http://prdownloads.sourceforge.net/ntop/ring-1.0.tar.gz?download

If you haven't read the paper you should. It shows how much polling
helps bpf capturing on FreeBSD, and compares this with linux.
The patches add code to bpf and nic drivers from the looks of it.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Relative merits of different approaches (ipf, ipfw, ipnat, natd, etc)

2004-01-22 Thread John

I have looked at the FAQ, the handbook, The Complete FreeBSD, and haven't
found anything like what I'm looking for.

There seems to be 2-3 implementations of access control lists and
2-3 implementations of network address translation that apply to
FreeBSD.

Is there anywhere that discusses the relative strengths and weaknesses
of these different implementations, and why you might want to use
one rather than another?
-- 

John Lind
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

IPFW and NAT - blocking RFC 1918 ("unregistered") network that matches my own

2004-02-05 Thread John

I am up and running with ipfw 2 and natd, but not all is quite well.

I can't figure out how to block "spoofed" packets from the outside
that use the same RFC 1918 network as the one I'm translating to.
When I try to put that rule on the exterior interface, it ends up
blocking the packets after they are translated.

Specifically, the network I am using falls in the 192.168.0.0/16 range.
(I won't publish exactly which one: you only have 254 to try...)
If, however, I put in
${fwcmd} add deny ip from any to 192.168.0.0/16 via ${oif}
then I cut off my interior network entirely, due, presumably, to
the pass through the rules after translation.

I suspect that I need some combination of "in" or "recv," but I
would like to actually UNDERSTAND what I'm doing instead of just
trying combinations 'til it works.  On the other hand, there are
sysctl kernel parameters that might affect this behavior, or maybe
other natd parameters - so maybe that's not even the ticket.

Another thing I would like to understand better is how to make a
wise choice as to where the divert rule should be.

Can someone point me to a resource?  I spent an hour at Barnes & Nobel
last night looking through various firewalling books that were long
on theory, or even examples, but not examples for an ipfw / natd
situation.

Thanks!
-- 

John Lind
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Adding a new member to m_pkthdr

2022-05-27 Thread John Baldwin


For NIC offload of kernel TLS on the receive side, the kernel needs to know
the "leaf" interface that packets arrive on up in the socket buffer layer
when appending received packet data to a socket using KTLS.  rcvif does not
fully work since connections that transit a virtual interface like if_vlan or
if_lagg rewrite m_rcvif to be the virtual interface.  For KTLS transmit we
are able to follow the transmit path down to configure KTLS on the leaf
interface.  However, while the receive path is usually a mirror of the
transmit path, it is not always.  In particular, when using a lagg(4) with
LACP, the other end of the lagg is free to use whatever hash it chooses to
distribute traffic across the lagg ports such that the receive and transmit
sides of a connection may transit different ports within a lagg.

To provide a leaf interface, I have a patch that adds a new field to m_pkthdr
to track the leaf receive interface.  It is optional and the only use
currently anticipated is KTLS.  In the current KTLS patches it is set
on received packets by the mlx5 driver.  Possibly it could be set more
generically in ether_input instead of in individual drivers.  It is
serialized to an index and generation count while packets are deferred to
a netisr similar to rcvif except that it is non-fatal if the ifp cannot
be re-materialized when the mbuf is dequeued.  Instead, the pointer is
simply left as NULL.

However, using more space in m_pkthdr is a non-trivial thing, so it's worth
raising the conversation more broadly.  The change to add this field is in
https://reviews.freebsd.org/D35339.  Drew has tested this isolated change
under load at Netflix and found no impact on performance.

--
John Baldwin

Re: what to check? no IPV6 pings between nodes on the same switch

2022-08-15 Thread John Hay

Hi Benoit,

It will allow multicast packets to go through, which IPv6 depends on. Maybe
there is a problem setting up the multicast filter for that driver / card.

Regards

John


On Mon, 15 Aug 2022 at 12:08, Benoit Chesneau 
wrote:

> So I noticed that tcpdump was enabling the "promiscuous" mode  to the
> interface. So I tried to do it manually: `ifconfig ql0 promisc` and ping
> worked even after disabling this mode `ifconfig ql0 -promisc`.
>
> What does happen when the promiscuous mode is enabled? I'm not sure to
> understand what is the issue :/
>
> Benoît
> --- Original Message ---
> On Monday, August 15th, 2022 at 11:53, Benoit Chesneau <
> beno...@enki-multimedia.eu> wrote:
>
> Unfortunately I get the same results with rtsold enabled and the interface
> up.  It doesn't seems related to teh switch since link-local ping work :/
>
>
> Benoît
> --- Original Message ---
> On Monday, August 15th, 2022 at 11:41, Ronald Klop 
> wrote:
>
> Set rtsold_enable="YES" in rc.conf and restart.
> Does that help?
>
> "
> DESCRIPTION
>  rtsold is the daemon program to send ICMPv6 Router Solicitation
> messages
>  on the specified interfaces.  If a node (re)attaches to a link, rtsold
>  sends some Router Solicitations on the link destined to the link-local
>  scope all-routers multicast address to discover new routers and to get
>  non link-local addresses.
>
>  rtsold should be used on IPv6 hosts (non-router nodes) only.
> "
>
> Btw: accept_rtadv makes "rtsol" to run once on startup if you set it in
> rc.conf and use it to boot the machine. (BTW: for me this does not work
> well enough, so I run rtsold explicitly.) Setting accept_rtadv by ifconfig
> will not run rtsol.
>
> Regards,
> Ronald.
>
>
>
> *Van:* Benoit Chesneau 
> *Datum:* maandag, 15 augustus 2022 11:25
> *Aan:* Benoit Chesneau 
> *CC:* Ronald Klop , "freebsd-net@FreeBSD.org" <
> freebsd-net@freebsd.org>
> *Onderwerp:* Re: what to check? no IPV6 pings between nodes on the same
> switch
>
> OK here is the weird but interesting thing. When I start  to capture icmp6
> packets using tcpdump `tcpdump -i ql0 icmp6` then ping6 starts to work.
> Even after stopping the capture. Any idea what could it be ?
>
> Benoît
> --- Original Message ---
> On Monday, August 15th, 2022 at 10:50, Benoit Chesneau <
> beno...@enki-multimedia.eu> wrote:
>
>
> Hi,
>
> Thanks for the help :) The nodes can indeed ping each others using the
> link-local address. What does it means? I tested to set `accept_rtadv`
> using the ifconfig command without much success.
>
>
> Here are the ifconfigs, the prefix is the same for all To be sure, I
>  replaced the content by  using sed.
>
> node 1:
>
> ```
>  $ ifconfig ql0
> ql0: flags=8843 metric 0 mtu 1500
>
> options=507bb
> ether b4:7a:f1:7a:9c:10
> inet6 ::11 prefixlen 64
> inet6 fe80::b67a:f1ff:fe7a:9c10%ql0 prefixlen 64 scopeid 0x1
> media: Ethernet autoselect (25GBase-SR )
> status: active
> nd6 options=21
> ```
>
> node 2:
>
> ```
>  $ ifconfig ql0
> ql0: flags=8843 metric 0 mtu 1500
>
> options=507bb
> ether b4:7a:f1:7a:99:52
> inet6 ::12 prefixlen 64
> inet6 fe80::b67a:f1ff:fe7a:9952%ql0 prefixlen 64 scopeid 0x1
> media: Ethernet autoselect (25GBase-SR )
> status: active
> nd6 options=21
> ```
>
> node 3
> ```
> ifconfig ql0
> ql0: flags=8843 metric 0 mtu 1500
>
> options=507bb
> ether b4:7a:f1:18:ff:d8
> inet6 ::13 prefixlen 64
> inet6 fe80::b67a:f1ff:fe18:ffd8%ql0 prefixlen 64 scopeid 0x1
> media: Ethernet autoselect (25GBase-SR )
> status: active
> nd6 options=21
> ```
>
>
> --- Original Message ---
> On Monday, August 15th, 2022 at 10:29, Ronald Klop 
> wrote:
>
>
> Hi,
>
> My rc.conf config has:
> ifconfig_genet0_ipv6="inet6 accept_rtadv"
>
> Can you post the output of "ifconfig" and "ipfw show"?
> Can you ping the link-local address of the other hosts?
>
> Regards.
> Ronald.
>
>
>
> *Van:* Benoit Chesneau 
> *Datum:* maandag, 15 augustus 2022 08:59
> *Aan:* "freebsd-net@FreeBSD.org" 
> *Onderwerp:* what to check? no IPV6 pings between nodes on the same switch
>
>
> I have setup 3 nodes on a fresh Freebsd 13.1-RELEASE-p1. They have the
> same gateway and IPS are in same /64. All 3 nodes are on the same switch
> (mikrotik) and same vlan untagged.
>
> I can ping them from an external machine through the router/gateway but the
> nodes can't ping each others. When I run `ndp-a` it only return the
> gateway and the

crash and panic using pfsync on 13.1-RELEASE (Bug 268246)

2022-12-08 Thread John Jasen

Hi folks -- I opened this on Freebsd 13.1.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268246

I'm stumped, as I have about half a dozen other systems just like this one,
which do not exhibit this condition.

Don't know if it matters, but this is the backup firewall in a carp
configuration.


kgdb /usr/lib/debug/boot/kernel/kernel.debug /var/crash/vmcore.0

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x18
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80cadb90
stack pointer   = 0x28:0xfe0204794bc0
frame pointer   = 0x28:0xfe0204794c20
code segment= base r
x0, limit 0xf, type
0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 12 (swi1: pfsync)
trap number = 12
panic: page fault
cpuid = 0
time = 1670433489
KDB: stack backtrace:
#0 0x80c694a5 at kdb_backtrace+0x65
#1 0x80c1bb5f at vpanic+0x17f
#2 0x80c1b9d3 at panic+0x43
#3 0x810afdf5 at trap_fatal+0x385
#4 0x810afe4f at trap_pfault+0x4f
#5 0x810875b8 at calltrap+0x8
#6 0x80dca82f at ip_fragment+0x24f
#7 0x80dca1e3 at ip_output+0x1163
#8 0x8225a851 at pfsyncintr+0x151
#9 0x80bdbcfa at ithread_loop+0x25a
#10 0x80bd8a9e at fork_exit+0x7e
#11 0x8108862e at fork_trampoline+0xe
Uptime: 43m36s
Dumping 7356 out of 130983 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Re: vxlan with IPv6 underlay ?

2023-12-04 Thread John Nielsen

On Dec 4, 2023, at 3:26 AM, Benoit Chesneau  wrote:Is IPv6 underlay fully supported with FreebBSD ? I have created the a tunnel and associated an Ipv6 address to each side. I'm able to ping between each devicesl. But when I want to curl from the remote side  it timeout. Locally on the remote side it is OK. Is this expected ? Should I rather create a bridge with vxlan as a member and bind nginx to it ?I think you’ve answered your own question and demonstrated that it works as expected. Pinging the inside address would not work at all if the tunnel and outer transport weren’t working.As to why your curl test doesn’t work, we’d need more information. Make sure that nginx is in fact listening on the vxlan IP and is not being blocked by a firewall. You may also want to do a packet capture of the inside interfaces to see what is and isn’t going through.JN```$ ifconfig vxlan0 create vxlanid 108 vxlanlocal ::110b:102::100 vxlanremote ::110b:102::12$ ifconfig vxlan0vxlan0: flags=1008843 metric 0 mtu 1430	options=80020	ether 58:9c:fc:10:ff:eb	groups: vxlan	vxlan vni 108 local [::102::100]:4789 remote [::110b:102::12]:4789	media: Ethernet autoselect (autoselect )	status: active	nd6 options=29$ ifconfig vxlan0 inet6 ::110b:300::1/64Ping from remote  is ok:``` ifconfig vxlan0vxlan0: flags=1008843 metric 0 mtu 1430	options=680323	ether 58:9c:fc:10:df:1f	inet6 fe80::5a9c:fcff:fe10:df1f%vxlan0 prefixlen 64 scopeid 0xf	inet6 ::110b:300::2 prefixlen 64	groups: vxlan	vxlan vni 108 local [:110b:102::12]:4789 remote [::110b:102::100]:4789	media: Ethernet autoselect (autoselect )	status: active	nd6 options=21$ ping6 ::110b:300::1PING6(56=40+8+8 bytes) ::110b:300::2 --> :::110b:300::116 bytes from 2a0e:e701:110b:300::1, icmp_seq=0 hlim=64 time=0.071 ms16 bytes from 2a0e:e701:110b:300::1, icmp_seq=1 hlim=64 time=0.078 ms16 bytes from 2a0e:e701:110b:300::1, icmp_seq=2 hlim=64 time=0.076 ms16 bytes from 2a0e:e701:110b:300::1, icmp_seq=3 hlim=64 time=0.104 ms16 bytes from 2a0e:e701:110b:300::1, icmp_seq=4 hlim=64 time=0.077 ms^C```But when I run `curl -6 -v 'http://[::110b:300::1]'` it timeout.


Benoît Chesneau, Enki Multimedia—t. +33608655490 



Sent with Proton Mail secure email.

Re: removing RIP/RIPng (routed/route6d)

2024-05-15 Thread John Howie

I use RIP all the time. Removing it would be a pain. What is the justification? 
Moving it to ports is an option, but now we have to compile, distribute, and 
install it.

Sent from my iPhone

> On May 15, 2024, at 07:40, Tomek CEDRO  wrote:
> 
> On Wed, May 15, 2024 at 4:20 PM Scott  wrote:
>>> On Mon, Apr 15, 2024 at 09:49:27PM +0100, Lexi Winter wrote:
>>> (..)
>>> i'd like to submit a patch to remove both of these daemons from src.  if
>>> there's some concern that people still want to use the BSD
>>> implementation of routed/route6d, i'm also willing to submit a port such
>>> as net/freebsd-routed containing the old code, in a similar way to how
>>> the removal of things like window(1) and telnetd(8) were handled.
>> 
>> I use RIPv2 for it's simplicity and small memory and CPU requirements.  It
>> has its place and shouldn't be considered "legacy" despite its shortcomings.
>> It's not uncommon for vendors like Cisco to produce "basic" feature sets of
>> IOS that do not include any link-state protocols.
>> 
>> Anyway, I'm a user, albeit a small user, of RIP and wouldn't object to its
>> removal from FreeBSD if there were a small footprint alternative.  I've used
>> FRR and VyOS a bit and they are overkill as replacements.
>> 
>> Your email doesn't justify its removal other than to say you are unconvinced
>> of the value of shipping it.  As a user I definitely see the value.  I
>> understand that there is always a cost to providing code, but that wasn't
>> suggested as a reason.  All APIs, modules, utilities, etc. need to regularly
>> justify their presence in the OS.
>> 
>> If it must be removed, is there any way to fork the FreeBSD routed and
>> route6d to a port?  Or would that defeat the purpose of removing it in the
>> first place?
> 
> Yeah, where did that recent trend came to FreeBSD to remove perfectly
> working code??
> 
> There are more and more ideas in recent times like this.
> 
> Architectures removal, drivers removal, backward compatibility
> removal. While basic functions become unstable and unreliable. Looks
> more like diversion and sabotage than progress.
> 
> If anything is about to be moved out from SRC for a really good reason
> it should be available in ports and not in /dev/null.
>

Re: removing RIP/RIPng (routed/route6d)

2024-05-15 Thread John Howie

FreeBSD (and BSD Unix in general) has a rich history of being a “complete” OS – 
kernel and userland. If there was really a demand for a minimalist version of 
FreeBSD, why have people not forked FreeBSD and created it by now? There is 
also nanobsd, as an option, for those that want minimalist installs (yes, I 
know it is meant for embedded systems, but it works).

I think we need to stop trying to find solutions for non-existent problems.

From: owner-freebsd-...@freebsd.org  on behalf 
of Marek Zarychta 
Date: Wednesday, May 15, 2024 at 11:19 AM
To: freebsd-net@freebsd.org 
Subject: Re: removing RIP/RIPng (routed/route6d)
Today Michael Sierchio wrote:
There is an argument to be made that all such components of the "base" system 
should be packages, and managed that way.  That would facilitate removal or 
addition of things like MTAs, Route daemons for various protocols, etc.  and 
permit them to be updated independent of the base system.  Too much is included 
by default in Base.

FreeBSD is a comprehensive OS, and most users still do appreciate this feature.
I remember that we had also RCS tools in the base system, they got purged 
(moved to the ports tree really), most users are fine with it, but for managing 
single config files RCS is still the best-suited versioning system. We still 
have ftpd(8), but it was almost removed, there was a strong battle on the 
mailing list to preserve it. FTP protocol is as old as BSD, but it's still 
valid and, so far not deprecated. A similar story was with smbfs(5). The same 
probably applies to RIP/RIPng.
What if we would better remove LLVM from the base if the system is bloated ? 
LLVM needs frequent updates and keeping it in base is far more risky in terms 
of system security than keeping RIP daemons. Why do we still have odd tools 
like biff(1) in the base ?

On the other hand, for a significant share of the user base, the more tiny the 
OS is, the better. The transition to PkgBase should fulfill user needs, 
especially those, who want a minimalist OS. So please, go ahead and switch to 
PgkBase if your FreeBSD system contains undesired software.

Cheers

Marek

On Wed, May 15, 2024 at 1:01 PM John Howie 
mailto:j...@thehowies.com>> wrote:
I use RIP all the time. Removing it would be a pain. What is the justification? 
Moving it to ports is an option, but now we have to compile, distribute, and 
install it.

Sent from my iPhone

> On May 15, 2024, at 07:40, Tomek CEDRO 
> mailto:to...@cedro.info>> wrote:
>
> On Wed, May 15, 2024 at 4:20 PM Scott 
> mailto:uatka3z4z...@thismonkey.com>> wrote:
>>> On Mon, Apr 15, 2024 at 09:49:27PM +0100, Lexi Winter wrote:
>>> (..)
>>> i'd like to submit a patch to remove both of these daemons from src.  if
>>> there's some concern that people still want to use the BSD
>>> implementation of routed/route6d, i'm also willing to submit a port such
>>> as net/freebsd-routed containing the old code, in a similar way to how
>>> the removal of things like window(1) and telnetd(8) were handled.
>>
>> I use RIPv2 for it's simplicity and small memory and CPU requirements.  It
>> has its place and shouldn't be considered "legacy" despite its shortcomings.
>> It's not uncommon for vendors like Cisco to produce "basic" feature sets of
>> IOS that do not include any link-state protocols.
>>
>> Anyway, I'm a user, albeit a small user, of RIP and wouldn't object to its
>> removal from FreeBSD if there were a small footprint alternative.  I've used
>> FRR and VyOS a bit and they are overkill as replacements.
>>
>> Your email doesn't justify its removal other than to say you are unconvinced
>> of the value of shipping it.  As a user I definitely see the value.  I
>> understand that there is always a cost to providing code, but that wasn't
>> suggested as a reason.  All APIs, modules, utilities, etc. need to regularly
>> justify their presence in the OS.
>>
>> If it must be removed, is there any way to fork the FreeBSD routed and
>> route6d to a port?  Or would that defeat the purpose of removing it in the
>> first place?
>
> Yeah, where did that recent trend came to FreeBSD to remove perfectly
> working code??
>
> There are more and more ideas in recent times like this.
>
> Architectures removal, drivers removal, backward compatibility
> removal. While basic functions become unstable and unreliable. Looks
> more like diversion and sabotage than progress.
>
> If anything is about to be moved out from SRC for a really good reason
> it should be available in ports and not in /dev/null.
>

Re: networking in 14.1 release notes

2024-05-19 Thread John Hay

Hi Mike,

The ice(4) driver for Intel E800 Ethernet controllers has been in the tree
since May 2020, but it seems it was never added to the release notes. It
also does not have a man page. There is a bug report for the missing man
page:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262892

Regards

John

On Sat, 18 May 2024 at 16:50, Mike Karels  wrote:

> I have no networking changes at all in the 14.1 release notes.  Is there
> anything that should be mentioned?  Feel free to reply to me individually.
>
> Thanks,
> Mike
>
>

Re: Driver patch to look at...

2013-02-04 Thread John Baldwin

On Monday, February 04, 2013 12:22:49 pm Randy Stewart wrote:
> All:
> 
> I have been working with TCP in gigabit networks (igb driver actually) and 
> have
> found a very nasty problem with the way the driver is doing its put back when
> it fills the out-bound transmit queue.
> 
> Basically it has taken a packet from the head of the ring buffer, and then 
> realizes it can't fit it into the transmit queue. So it just re-enqueue's it
> into the ring buffer. Whats wrong with that? Well most of the time there
> are anywhere from 10-50 packets (maybe more) in that ring buffer when you are
> operating at full speed (or trying to). This means you will see 10 duplicate
> ACKs, do a fast retransmit and cut your cwnd in half.. not very nice actually.
> 
> The patch I have attached makes it so that
> 
> 1) There are ways to swap back.
> 2) Use the peek in the ring buffer and only
>dequeue the packet if we put it into the transmit ring
> 3) If something goes wrong and the transmit frees the packet we dequeue it.
> 4) If the transmit changed it (defrag etc) then swap out the new mbuf that
>has taken its place.
> 
> I have fixed the four intel drivers that had this systemic issue, but there
> are still more to fix.
> 
> Comments/review .. rotten egg's etc.. would be most welcome before
> I commit this..

Does this only happen in drivers that use bufring?  I seem to recall that
drivers using IFQ would just stuff the packet at the head of the IFQ via
IFQ_DRV_PREPEND() in this case so it is still the next packet to transmit.
See, for example, this bit in dc_start_locked():

for (queued = 0; !IFQ_DRV_IS_EMPTY(&ifp->if_snd); ) {
/*
 * If there's no way we can send any packets, return now.
 */
if (sc->dc_cdata.dc_tx_cnt > DC_TX_LIST_CNT - DC_TX_LIST_RSVD) {
ifp->if_drv_flags |= IFF_DRV_OACTIVE;
break;
}
IFQ_DRV_DEQUEUE(&ifp->if_snd, m_head);
if (m_head == NULL)
break;

if (dc_encap(sc, &m_head)) {
if (m_head == NULL)
break;
IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
ifp->if_drv_flags |= IFF_DRV_OACTIVE;
break;
}

It sounds like drbr/buf_ring just don't handle this case correctly?  It
seems a shame to have to duplicate so much code in the various drivers to
fix this, but that seems to be par for the course when using buf_ring. :(
(buggy in edge cases and lots of duplicated code that is).

Also, doing the drbr_swap() just so that drbr_dequeue() returns what you
just swapped in seems... odd.  It seems that it would be nicer instead
to have some sort of drbr_peek() / drbr_advance() where the latter just
skips over whatever the current head is?  Then you could have something
like:

while ((next = drbr_peek()) != NULL) {
if (!foo_encap(&next)) {
if (next == NULL)
drbr_advance();
break;
}
drbr_advance();
}

I guess the sticky widget is the case of ATLQ as you need to dequeue the
packet always in the ALTQ case and put it back if the encap fails.  For
your patch it's not clear to me how that works.  It seems that if the
encap routine frees the mbuf you try to dereference a freed pointer when
you call drbr_dequeue().  I really think you will instead need some sort
of 'drbr_putback()' and have 'drbr_peek()' dequeue in the ALTQ case and
use 'drbr_putback()' to put it back (PREPEND) in the ALTQ case.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: [PATCH] Add a new TCP_IGNOREIDLE socket option

2013-02-05 Thread John Baldwin

On Wednesday, January 30, 2013 12:26:17 pm Andre Oppermann wrote:
> You can simply create your own congestion control algorithm with only the
> restart window changed.  See (pseudo) code below.  BTW, I just noticed that
> the other cc algos don't do not reset the idle window.

*sigh*  I am fully competent at maintaining my own local changes.  The point
was to share this so that other people with similar workloads could make use 
of it.  Also, a custom CC algo is not the right approach as we would want this
change regardless of the CC algo used for handling non-idle periods (so that
this is an orthogonal knob).  Linux also makes this an orthogonal knob rather 
than requiring a separate CC algo.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Driver patch to look at...

2013-02-05 Thread John Baldwin

On Tuesday, February 05, 2013 10:24:24 am Randall Stewart wrote:
> Here is an updated patch… sigh.. I foobar'd the ALTQ stuff.. lots of crashes 
;-D

Heh, I like this better, thanks.  I think you can remove buf_ring_swap() as it 
is no longer used?

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Driver patch to look at...

2013-02-05 Thread John Baldwin

On Tuesday, February 05, 2013 12:44:01 pm Randall Stewart wrote:
> Actually, no it is used.
> 
> If you look in if_var.h int he drbr_putback() function, it does
> a buf_ring_swap when the old mbuf pointer does not equal the
> new mbuf pointer. This *does* happen, I crashed at least once
> yesterday when the igb driver did something to free the original
> mbuf and return a new mbuf with the data (prepend or some such).
> 
> I also have found several issues that I have fixed this morning.. its been
> crash city on my test beds..
> 
> Here is the latest patch with all fixes and suggested changes from emaste 
(thanks Ed)

Oh, I see now why that is needed.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Driver patch to look at...

2013-02-05 Thread John Baldwin

On Tuesday, February 05, 2013 12:44:01 pm Randall Stewart wrote:
> Actually, no it is used.
> 
> If you look in if_var.h int he drbr_putback() function, it does
> a buf_ring_swap when the old mbuf pointer does not equal the
> new mbuf pointer. This *does* happen, I crashed at least once
> yesterday when the igb driver did something to free the original
> mbuf and return a new mbuf with the data (prepend or some such).
> 
> I also have found several issues that I have fixed this morning.. its been
> crash city on my test beds..
> 
> Here is the latest patch with all fixes and suggested changes from emaste 
(thanks Ed)

Actually, one more suggestion then (since you have to keep putback).  It
would be nice to not have to require 'snext' in all the callers.  How
about replace buf_ring_swap() with a buf_ring_putback_sc() that accepts the
mbuf and just stores it at the head unconditionally and have drbr_putback()
use that?

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Driver patch to look at...

2013-02-05 Thread John Baldwin

On Tuesday, February 05, 2013 2:04:12 pm Randall Stewart wrote:
> Hmm
> 
> That would trade off a stack pointer + a compare
> vs always doing the move.

Right, the store is probably cheaper than the branch. :)  However, minimizing 
the duplicated code in drivers and having this interface be as clear/readable 
as possible is my main goal.

> Thats fine until I have to add the _mc() version, then the put
> back would be an atomic, and most of the time the return from
> this is probably not changed…
> 
> I really would prefer not to since the compare and maybe store vs
> the always store.. though the same now, would be far more expensive
> in the _mc version.. if we do a _mc version of course ;-)

I would just not bother with an _mc version until we actually need it. :)

I think doing the sort of peek/advance type logic only works well with
single consumers anyway.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Driver patch to look at...

2013-02-05 Thread John Baldwin

On Tuesday, February 05, 2013 2:08:15 pm Randall Stewart wrote:
> Hmm
> 
> wait, I could probably do the compare to whats in the ring buffer ;-D

I wouldn't bother.  The compare and branch is probably more expensive than
the store.

> R
> On Feb 5, 2013, at 2:04 PM, Randall Stewart wrote:
> 
> > Hmm
> > 
> > That would trade off a stack pointer + a compare
> > vs always doing the move.
> > 
> > Thats fine until I have to add the _mc() version, then the put
> > back would be an atomic, and most of the time the return from
> > this is probably not changed…
> > 
> > I really would prefer not to since the compare and maybe store vs
> > the always store.. though the same now, would be far more expensive
> > in the _mc version.. if we do a _mc version of course ;-)
> > 
> > But I am willing to do whatever .. since this really needs to be fixed.
> > 
> > R
> > On Feb 5, 2013, at 1:52 PM, John Baldwin wrote:
> > 
> >> On Tuesday, February 05, 2013 12:44:01 pm Randall Stewart wrote:
> >>> Actually, no it is used.
> >>> 
> >>> If you look in if_var.h int he drbr_putback() function, it does
> >>> a buf_ring_swap when the old mbuf pointer does not equal the
> >>> new mbuf pointer. This *does* happen, I crashed at least once
> >>> yesterday when the igb driver did something to free the original
> >>> mbuf and return a new mbuf with the data (prepend or some such).
> >>> 
> >>> I also have found several issues that I have fixed this morning.. its been
> >>> crash city on my test beds..
> >>> 
> >>> Here is the latest patch with all fixes and suggested changes from emaste 
> >> (thanks Ed)
> >> 
> >> Actually, one more suggestion then (since you have to keep putback).  It
> >> would be nice to not have to require 'snext' in all the callers.  How
> >> about replace buf_ring_swap() with a buf_ring_putback_sc() that accepts the
> >> mbuf and just stores it at the head unconditionally and have drbr_putback()
> >> use that?
> >> 
> >> -- 
> >> John Baldwin
> >> ___
> >> freebsd-net@freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> >> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> >> 
> > 
> > --
> > Randall Stewart
> > 803-317-4952 (cell)
> > 
> > ___
> > freebsd-net@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> > 
> 
> --
> Randall Stewart
> 803-317-4952 (cell)
> 
> 

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Driver patch to look at...

2013-02-05 Thread John Baldwin

On Tuesday, February 05, 2013 2:30:36 pm Randall Stewart wrote:
> Ok
> 
> Here it is one last time (I hope) with the updates ;-)

One more suggestion.  I would make the check in buf_ring_putback_sc() a 
KASSERT() so that in the production case we don't pay for a branch that should 
never occur.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: [PATCH] Add a new TCP_IGNOREIDLE socket option

2013-02-05 Thread John Baldwin

On Tuesday, February 05, 2013 12:44:27 pm Andre Oppermann wrote:
> On 05.02.2013 18:11, John Baldwin wrote:
> > On Wednesday, January 30, 2013 12:26:17 pm Andre Oppermann wrote:
> >> You can simply create your own congestion control algorithm with only the
> >> restart window changed.  See (pseudo) code below.  BTW, I just noticed that
> >> the other cc algos don't do not reset the idle window.
> >
> > *sigh*  I am fully competent at maintaining my own local changes.  The point
> > was to share this so that other people with similar workloads could make use
> > of it.  Also, a custom CC algo is not the right approach as we would want 
> > this
> > change regardless of the CC algo used for handling non-idle periods (so that
> > this is an orthogonal knob).  Linux also makes this an orthogonal knob 
> > rather
> > than requiring a separate CC algo.
> 
> If everything Linux does is good, then go ahead and commit it.  Discussing
> this change further then is pointless.  I don't mind too much and I have
> stated my case why I think it's the wrong thing to do.

Not everything Linux does is good, nor is everything Linux does bad.

> I would prefer to encapsulate it into its own 
> not-so-much-congestion-management
> algorithm so you can eventually do other tweaks as well like more aggressive
> loss recovery which would fit your objective as well.  Since you have to 
> modify
> your app anyways to do the sockopt call this seems a more complete solution to
> me.  At least better than to do a non-portable hack that violates one of the
> most fundamental TCP concepts.

This is real rich from the guy pushing the increased IW that came from Linux. :)

"Tools not policy" yadda yadda, but I digress.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: [PATCH] Add a new TCP_IGNOREIDLE socket option

2013-02-06 Thread John Baldwin

On Wednesday, February 06, 2013 6:27:04 am Randall Stewart wrote:
> John:
> 
> A burst at line rate will *often* cause drops. This is because
> router queues are at a finite size. Also such a burst (especially
> on a long delay bandwidth network) cause your RTT to increase even
> if there is no drop which is going to hurt you as well.
> 
> A SHOULD in an RFC says you really really really really need to do it
> unless there is some thing that makes you willing to override it. It is
> slight wiggle room.
> 
> In this I agree with Andre, we should not be *not* doing it. Otherwise
> folks will be turning this on and it is plain wrong. It may be fine
> for your network but I would not want to see it in FreeBSD.
> 
> In my testing here at home I have put back into our stack max-burst. This
> uses Mark Allman's version (not Kacheong Poon's) where you clamp the cwnd at
> no more than 4 packets larger than your flight. All of my testing
> high-bw-delay or lan has shown this to improve TCP performance. This
> is because it helps you avoid bursting out so many packets that you overflow
> a queue.
> 
> In your long-delay bw link if you do burst out too many (and you never
> know how many that is since you can not predict how full all those
> MPLS queues are or how big they are) you will really hurt yourself even worse.
> Note that generally in Cisco routers the default queue size is somewhere 
> between
> 100-300 packets depending on the router.

Due to the way our application works this never happens, but I am fine with
just keeping this patch private.  If there are other shops that need this they
can always dig the patch up from the archives.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Question: Why ain't I getting gigabit speed?

2013-02-08 Thread John Nielsen

On Feb 7, 2013, at 4:13 PM, Ronald F. Guilmette  wrote:

> I just aquired a brand new chepie gigabit PCI ethernet card off eBay.
> The main chip on it appears to be an RTL8110S-32.
> 
> I stuck this card into a 9.1-RELEASE system that I have been putting
> together, and it seemed to be recognized ok (as re0) upon boot up, so
> I diddled my /etc/rc.conf file to get it to ifconfig as 192.168.1.3
> on reboot.  Then I rebooted.
> 
> I have the card wired via a CAT6 cable to my Linksys E2000 gigabit
> router.  Nonetheless, upon reboot, followed by "ifconfig -a", the
> output from ifconfig says the following for this card:
> 
> re0: flags=8843 metric 0 mtu 1500
>   
> options=8209b
>   ether 00:13:3b:02:03:bd
>   inet 192.168.1.3 netmask 0xff00 broadcast 192.168.1.255
>   inet6 fe80::213:3bff:fe02:3bd%re0 prefixlen 64 scopeid 0x7 
>   nd6 options=29
>   media: Ethernet autoselect (100baseTX )
>   status: active
> 
> I've tried two different CAT6 cables, two different LAN ports on my E2000,
> and I've even tried the card in two different PCI slost on my motherboard,
> but the results are always the same.
> 
> So, um, what gives?  Why does the driver appear to be setting this card to
> 100baseTX rather than the 1000baseTX that I was hoping for?
> 
> Is there some magic spell that I am unaware of that I must cast on this
> in order to get it to work right?

I would suspect the switch ("router"). FYI:
http://forum.qnap.com/viewtopic.php?f=11&t=47421#p213242

I have an re interface on my FreeBSD router and it connects at 1000baseT no 
problem.

> P.S.  dmesg has this to say about the card:
> 
> re0:  port 
> 0xbe00-0xbeff mem 0xdf9ff000-0xdf9ff0ff irq 18 at device 5.0 on pci4
> re0: Chip rev. 0x0400
> re0: MAC rev. 0x
> re0: Ethernet address: 00:13:3b:02:03:bd
> re0: link state changed to UP
> re0: link state changed to DOWN
> re0: link state changed to UP
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> 

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Question: Why ain't I getting gigabit speed?

2013-02-08 Thread John Nielsen

On Feb 8, 2013, at 1:48 PM, Ronald F. Guilmette  wrote:

> In message , 
> John Nielsen  wrote:
> 
>> On Feb 7, 2013, at 4:13 PM, Ronald F. Guilmette  =
>> wrote:
>> 
>>> I just aquired a brand new chepie gigabit PCI ethernet card off eBay.
>>> The main chip on it appears to be an RTL8110S-32.
>>> ...
> 
>> I would suspect the switch ("router"). FYI:
>> http://forum.qnap.com/viewtopic.php?f=3D11&t=3D47421#p213242
>> 
>> I have an re interface on my FreeBSD router and it connects at 1000baseT =
>> no problem.
> 
> Could you please send or post the relevant ifconfig printout for that,
> and also the applicable/relevant dmesg lines?

% ifconfig re0
re0: flags=8843 metric 0 mtu 1500

options=8209b
ether 00:1f:e2:55:1d:bc
inet 67.182.217.170 netmask 0xfc00 broadcast 255.255.255.255 
nd6 options=29
media: Ethernet autoselect (1000baseT )
status: active

% dmesg | egrep '^re0:|^miibus0:|^rgephy0:'
re0:  port 
0xd800-0xd8ff mem 0xfe9ff000-0xfe9f irq 17 at device 0.0 on pci2
re0: Using 1 MSI message
re0: Chip rev. 0x3800
re0: MAC rev. 0x0040
miibus0:  on re0
rgephy0:  PHY 1 on miibus0
rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 
100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 
1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, 
auto-flow
re0: Ethernet address: 00:1f:e2:55:1d:bc

> This problem is very perplexing, but I don't think that the problem
> is with my Linksys E2000.
> 
> I did some more experiments.  Fortunately, I had a CAT6 crossover cable
> lying around.  So I used that and connected my machine with the RTL8110S-32
> in it directly to two other machines with gigabit interfaces.  One was
> my other server.  The other was a laptop I have here.  The results were
> very strange.
> 
> In the case of connecting to the laptop, all seemed to work correctly,
> however ifconfig showed that my re0 device in this case believed itself
> to be "master".  (I suspect that this may make a difference, and that
> the current FreeBSD re driver may perhaps behave better when it is
> acting as master.)

Agree with other followup--"master" shouldn't be applicable here; figure that 
out before you spend more time worrying about hardware. Would you mind posting 
a redacted version of /etc/rc.conf (and the contents of /etc/rc.conf.d, if any)?

> In the case of connecting (via CAT6 crossover) direct to my other server,
> things got even more strange.  In this case, after making the connection,
> autonegotiation apparently worked correctly, and I could see "1000baseT"
> in the output from "ifconfig re0", *however* a moment or two later,
> suddenly the connection was entirely dropped, and now the ifconfig
> output said "no carrier".  I reproduced this sequence multiple times.
> It is readily reproducable.  (The other server is running FreeBSD 8.3-
> RELEASE with an on-motherboard Nvidia gigabit ethernet interface, BTW.)

Any log or kernel messages on either side when this happens?

> I am inclined to wonder if perhaps the re driver has some rough edges
> still.

I wouldn't jump to that conclusion. It's not exactly a new driver and its 
author (Bill Paul) was quite experienced. It is possible you have a dodgy board 
though.

> P.S.  Since this card is really not working out for me, has anybody got
> a suggestion and/or link they could send me for an _inexpensive_ gigabit
> PCI nic that works reliably with FreeBSD?  (I am hoping for something under
> $12 USD.)

Most/all 1G NIC's in that price range will be Realtek. You may be able to find 
a Marvell/SysKonnect card for a bit more, but for not much more than that you 
can get something from Intel. You may get gigabit links from a cheap card but I 
wouldn't count on gigabit performance. (Actually any PCI card will fall short 
of gigabit performance.) If you actually care then spend the $30 on an Intel 
card.

JN

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Question: Why ain't I getting gigabit speed?

2013-02-12 Thread John Nielsen

On Feb 9, 2013, at 5:02 PM, Ronald F. Guilmette  wrote:

> P.S.  While I appreciate all the friendly advice people here have given
> me, i.e. to go with a card based around some non-Realtek chip, I have to
> say that up until now I have always and consistantly had -zero- problems
> with the many other Realtek-based 10/100 cards that I have owned and used.

A bit OT, but I would say that this is _because_ of the FreeBSD driver (rl, 
also by Bill Paul). Some of the hardware deficiencies documented in the manpage 
and in comments in the if_rl.c are almost comical..

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: [PATCH] Add a new TCP_IGNOREIDLE socket option

2013-02-20 Thread John Baldwin

On Tuesday, February 19, 2013 9:37:54 pm Sepherosa Ziehau wrote:
> John,
> 
> I came across this draft several days ago, you may be interested:
> http://tools.ietf.org/html/draft-ietf-tcpm-newcwv-00

Yes, that is extremely relevant.  My application does use its own
rate-limiting.  And now that I've read this in full, this does seem
to very much be what I want and is a better solution than ignoring
idle handling entirely.  Ironic that this was posted a few weeks after my 
patch. :)  Clearly this is not an isolated workflow.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: kern/172113: [panic] [e1000] [patch] 9.1-RC1/amd64 panices in igb(4): m_getjcl: invalid cluster type

2013-02-21 Thread John Baldwin

The following reply was made to PR kern/172113; it has been noted by GNATS.

From: John Baldwin 
To: bug-follo...@freebsd.org,
 egrosb...@rdtc.ru
Cc:  
Subject: Re: kern/172113: [panic] [e1000] [patch] 9.1-RC1/amd64 panices in 
igb(4): m_getjcl: invalid cluster type
Date: Thu, 21 Feb 2013 17:12:55 -0500

 An update on this.  I think we should just use a workaround as this seems to 
 be specific to a certain set of motherboards.  This is the fix I'm using 
 locally:
 
 Index: if_igb.c
 ===
 --- if_igb.c(revision 243732)
 +++ if_igb.c(working copy)
 @@ -1522,6 +1522,15 @@
 u32 newitr = 0;
 boolmore_rx;
  
 +   /*
 +* The onboard adapters on certain SuperMicro X8* boards
 +* trigger a spurious interrupt during boot.  Since it
 +* occurs before the interface is fully configured it
 +* triggers a panic.  Ignore the interrupt instead.
 +*/
 +   if (!(adapter->ifp->if_drv_flags & IFF_DRV_RUNNING))
 +   return;
 +
 E1000_WRITE_REG(&adapter->hw, E1000_EIMC, que->eims);
 ++que->irqs;
 
 -- 
 John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order packet process and spurious RST

2013-02-28 Thread John Baldwin

The following reply was made to PR kern/176446; it has been noted by GNATS.

From: John Baldwin 
To: bug-follo...@freebsd.org,
 jchar...@verisign.com
Cc:  
Subject: Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving 
out-of-order packet process and spurious RST
Date: Thu, 28 Feb 2013 10:57:24 -0500

 Can you try the fixes from 
http://svnweb.freebsd.org/base?view=revision&revision=240968?
 
 -- 
 John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order packet process and spurious RST

2013-03-14 Thread John Baldwin

The following reply was made to PR kern/176446; it has been noted by GNATS.

From: John Baldwin 
To: "Charbon, Julien" 
Cc: bug-follo...@freebsd.org,
 "De La Gueronniere, Marc" ,
 j...@freebsd.org
Subject: Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving 
out-of-order packet process and spurious RST
Date: Thu, 14 Mar 2013 09:34:18 -0400

 On Thursday, March 07, 2013 5:11:25 am Charbon, Julien wrote:
 > On 2/28/13 8:10 PM, Charbon, Julien wrote:
 > > On 2/28/13 4:57 PM, John Baldwin wrote:
 > >> Can you try the fixes from 
 http://svnweb.freebsd.org/base?view=revision&revision=240968?
 > >
 > >Actually, Marc (I CC'ed him) did find the r240968 fix for concurrency
 > > between ixgbe_msix_que() and ixgbe_handle_que(), and made a backport for
 > > release-8.3.0 (see patch [1] below).  However, the issue was still
 > > reproducible, then Marc found another place for concurrency from
 > > ixgbe_local_timer() and fix it (see patch [2]).  But it was still not
 > > enough, and he found a last place for concurrency due to
 > > ixgbe_rearm_queues() call (see patch [3]).  We all these patches
 > > applied, we were not able to reproduce this issue.
 > 
 >   Just for the record:  As expected this issue is reproducible on 
 > 9.1-RELEASE:
 > 
 > # uname -a
 > FreeBSD atlas 9.1-RELEASE FreeBSD 9.1-RELEASE #1 r247851M: Wed Mar  6 
 > 11:17:43 UTC 2013 
 > jcharbon@atlas:/usr/obj/app/jcharbon/9.1.0/sys/GENERIC  amd64
 > 
 >   Enable TCP debug log:
 > 
 > # sysctl net.inet.tcp.log_debug=1
 > 
 >   Load enough a TCP service and due to ixgbe race conditions between 
 > ixgbe_msix_que() and ixgbe_handle_que(), you will get:
 > 
 > Mar  7 10:01:04 atlas kernel: TCP: [192.168.100.21]:12918 to 
 > [192.168.100.152]:8080; syncache_socket: in_pcbconnect failed with error 48
 > Mar  7 10:01:04 atlas kernel: TCP: [192.168.100.21]:12918 to 
 > [192.168.100.152]:8080 tcpflags 0x10; tcp_input: Listen socket: 
 > Socket allocation failed due to limits or memory shortage, sending RST
 > Mar  7 10:01:04 atlas kernel: TCP: [192.168.100.21]:12918 to 
 > [192.168.100.152]:8080 tcpflags 0x4; syncache_chkrst: Spurious RST 
 > without matching syncache entry (possibly syncookie only), segment ignored
 > 
 >   We will provide our current fix patch for 9.1-RELEASE.

 The place you noticed in 2) is broken, though your fix isn't quite correct.  
 I've been hesitant to reply yet as it requires a long reply.  The short 
 version is that the task to handle rx/tx processing should never be queued by 
 anything other than an interrupt handler or itself (when it reschedules 
 itself).  Anything else that schedules it is going to result in lock 
 contention and out-of-order packet delivery.

 Your 3rd case is also correct.  We should not re-enable interrupts on every 
 timer tick since the rx/tx task might already be running.  Similarly, re-
 enabling all queues anytime one queue processes RX interrupts can trigger an 
 interrupt on another queue while it's rx/tx task is already running.  Both of 
 these are pointless as each queue will rearm itself when the rx/tx task finds 
 no more pending RX packets to process.

 Now, some more details on the 2nd one which is due to watchdog handling which 
 is broken in both igb and ixgbe.  First, some background on how watchdog 
 handling works in nearly all other drivers (and specifically in single-queue 
 drivers):

 First, each device maintains a 'timer' field in the softc which is a count of 
 seconds until the transmit watchdog should expire.  Whenever a packet is 
 queued for transmit in the descriptor ring, it is set to the 'N' seconds (e.g. 
 5).  Whenever the transmit completion interrupt fully drains the descriptor 
 ring such that the ring is idle the timer is set to 0.

 Second, each device runs a periodic stats timer that fires once a second while 
 the interface is "up" (so it is started in the foo_init() routine and stopped 
 in foo_stop()).  Part of this timer's job is to check the transmit watchdog.  
 It uses logic like this to do so:

if (timer > 0) {
timer--;
if (timer == 0) {
/* watchdog expired */
}
}

 The typical implementation for the watchdog expiring is to just reset the chip 
 by doing 'foo_stop()' followed by 'foo_init_locked()'.  However, if you have a 
 NIC whose hardware is known to have a quirk where it can lose interrupts, then 
 a driver can decide to scan the TX ring to see if it makes any progress.  It 
 should do this synchronously from the timer, not by scheduling another task.  
 Also, if you do make progress, then you should reset the watchdog timer if 
 there are still any pending transmits.  In this case I would suggest only 
 setting it to '1&

Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order packet process and spurious RST

2013-03-15 Thread John Baldwin

On Thursday, March 14, 2013 5:59:44 pm Ryan Stone wrote:
> What's the benefit in having a both an interrupt thread and task that
> performs the same function?  It seems to me that having two threads that do
> the same job is what is making this so complicated.

Yes, yes it is.  I have a branch that has changes to interrupt threads where 
you can have an interrupt handler reschedule itself.  That prevents this class 
of problems as the handler always runs in the interrupt thread.

I really should get that patch into HEAD someday.  I've posted it to arch@ 
twice now I think. :(  It also fixes interrupt filters to really work properly 
and be on by default.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: close(2) while accept(2) is blocked

2013-04-01 Thread John Baldwin

On Thursday, March 28, 2013 12:54:31 pm Andriy Gapon wrote:
> 
> So, this started as a simple question, but the answer was quite unexpected to 
> me.
> 
> Let's say we have an opened and listen-ed socket and let's assume that we know
> that one thread is blocked in accept(2) and another thread is calling 
> close(2).
> What is going to happen?
> 
> Turns out that practically nothing.  For kernel the close call would be 
> almost a nop.
> My understanding is this:
> - when socket is created, its reference count is 1
> - when accept(2) is called, fget in kernel increments the reference count 
> (kept in
> an associated struct file)
> - when close(2) is called, the reference count is decremented
> 
> The reference count is still greater than zero, so fdrop does not call 
> fo_close.
> That means that in the case of a socket soclose is not called.
> 
> I am sure that the reference counting in this case is absolutely correct with
> respect to managing kernel side structures.  But I am not that it is correct 
> with
> respect to hiding the explicit close(2) call from other threads that may be
> waiting on the socket.
> In other words, I am not sure if fo_close is supposed to signify that there 
> are no
> uses of a file, or that userland close-d the file.  Or perhaps these should 
> be two
> different methods.
> 
> Additional note is that shutdown(2) doesn't wake up the thread in accept(2)
> either.  At least that's true for unix domain sockets.
> Not sure if this is a bug.
> 
> But the summary seems to be is that currently it is not possible to break a 
> thread
> out of accept(2) (at least without resorting to signals).

I think you need to split the 'struct file' reference count into two different
counts similar to the how we have vref/vrele vs vhold/vdrop for vnodes.  The
fget for accept and probably most other system calls should probably be 
equivalent
to vhold, whereas things like open/dup (and storing an fd in a cmsg) should be
more like vref.  close() should then be a vrele().

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: KVM with freeBSD and SR-IOV vlan doesn't working!

2013-04-01 Thread John Baldwin

On Wednesday, March 27, 2013 5:31:27 am kindule wrote:
> Recently, we use KVM and SR-IOV in our project. And we choose freeBSD10 as
> the guest os.As we use the ip address in the igb0 of our freeBSD10 guest, it
> working no problem.However when i use vlan in our igb0 of the freeBSD10
> guest,we can see the vlan assigned right and we can ping the vlan address
> without problem.But we add a gateway of the vlan area.we can't connnected to
> the gateway.
> there are some os messages:
> Host: Debian 7.0 and KVM 1.2
> Guest: FreeBSD10-current
> 
> And thanks for your help!

Hmm, does the same vlan setup work on a standalone machine?

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Small patch in OFED/sdp

2013-04-03 Thread John Baldwin

On Tuesday, April 02, 2013 2:33:18 pm Vijay Singh wrote:
> Hi, this is based on the the understanding that the SS_NBIO is a
> socket state, and not a state of the socket buffer.

Committed, thanks!

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: ipfilter(4) needs maintainer

2013-04-12 Thread John Hixson

On Fri, Apr 12, 2013 at 11:31:09PM -0600, Scott Long wrote:
> 
> On Apr 12, 2013, at 7:43 PM, Rui Paulo  wrote:
> 
> > On 2013/04/11, at 13:18, Gleb Smirnoff  wrote:
> > 
> >> Lack of maintainer in a near future would lead to bitrot due to changes
> >> in other areas of network stack, kernel APIs, etc. This already happens,
> >> many changes during 10.0-CURRENT cycle were only compile tested wrt
> >> ipfilter. If we fail to find maintainer, then a correct decision would be
> >> to remove ipfilter(4) from the base system before 10.0-RELEASE.
> > 
> > This has been discussed in the past. Every time someone came up and said 
> > "I'm still using ipfilter!" and the idea to remove it dies with it. 
> > I've been saying we should remove it for 4 years now. Not only it's 
> > outdated but it also doesn't not fit well in the FreeBSD roadmap. Then 
> > there's the question of maintainability. We gave the author a commit bit so 
> > that he could maintain it. That doesn't happen anymore and it sounds like 
> > he has since moved away from FreeBSD. I cannot find any reason to burden 
> > another FreeBSD developer with maintaining ipfilter.
> > 
> 
> One thing that FreeBSD is bad about (and this really applies to many open 
> source projects) when deprecating something is that the developer and release 
> engineering groups rarely provide adequate, if any, tools to help users 
> transition and cope with the deprecation.  The fear of deprecation can be 
> largely overcome by giving these users a clear and comprehensive path 
> forward.  Just announcing "ipfilter is going away.  EOM" is inadequate and 
> leads to completely justified complaints from users.
> 
> So with that said, would it be possible to write some tutorials on how to 
> migrate an ipfilter installation to pf?  Maybe some mechanical syntax docs 
> accompanied by a few case studies?  Is it possible for a script to automate 
> some of the common mechanical changes?  Also essential is a clear document on 
> what goes away with ipfilter and what is gained with pf.  Once those tools 
> are written, I suggest announcing that ipfilter is available but 
> deprecated/unsupported in FreeBSD 10, and will be removed from FreeBSD 11.  
> Certain people will still pitch a fit about it departing, but if the tools 
> are there to help the common users, you'll be successful in winning mindshare 
> and general support.
> 

++
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: shm_map questions

2013-04-18 Thread John Baldwin

On Thursday, April 11, 2013 10:58:14 am Laurie Jennings wrote:
> Im working on a simple project that shares a memory segment between a user 
processand a kernel module. I'm having some problems with shm_map and there 
doesn't seem to be much info on it.
> Im not sure what happened to the memory when the user process that creates 
it terminates.  I have some questions.
> 1) Does the kernel mapping keep the segment from being garbage collected 
when the use process that creates it terminated? I've experienced shm_unmap() 
panic when tryingto unmap a segment
> scenario:  
> User process Maps SegmentKernel maps it  with shm_map()User Process 
TerminatesKernel tries to shm_unmap() and it panics.

The kernel mapping bumps the refcount on the underlying vm object, so it will
not go away.  OTOH, you should be keeping your own reference count on the
associated fd so that you can call shm_unmap().  That is, the model should be
something like:

struct mydata *foo;

foo->fp = fget(fd);
shm_map(fp, &foo->p);
/* Don't call fdrop */

and then when unmapping:

struct mydata *foo;

shm_unmap(foo->fp, foo->p);
fdrop(foo->fp);

> 2) Is there a way for the kernel process to know when the user process has 
goneaway? A ref count?

You can install a process_exit EVENTHANDLER if you want to destroy this when a
process goes away.  I have used shm_map/unmap for other objects that already
had a reference count so I did my shm_unmap when that object was destroyed.

> 3) Does a SHM_ANON segment persist as long as the kernel has it mapped, or 
doesit get garbage collected when the creating user process terminates?

It goes away when the backing 'struct file' goes away.  If you follow the 
model above of keeping a reference count on the associated struct file then
it won't go away until you fdrop() after the shm_unmap.

> 4) When using a named segment, can the kernel "reuse" a mapping for a new 
userprocess?
> Example:
> User process creates shm segment with path /fooKernel Maps shm segment with 
shm_map()User process terminates.User process runs again, opening segment /foo
> Does the kernel need to re-map, or is the original mapping valid?

The mapping is not per-process, so if you have mapped a shm for /foo and
mapped it, it will stay mapped until you call shm_unmap.  Multiple processes
can shm_open /foo and mmap it and they will all share the same memory.

You could even share a SHM_ANON fd among multiple processes by passing it
across a UNIX domain socket.

Hope this helps.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Network connections are lost from time to time

2013-04-19 Thread John Baldwin

On Friday, April 19, 2013 3:11:41 am C. L. Martinez wrote:
> Hi all,
> 
>  I have a strange problem with my FreeBSD 9.1 (fully patched): I loose ssh
> sessions from time to time frequently.
> 
>  This fbsd box is installed in an ESXi 5.1 server and I have another three
> fbsd 9.1 in the same ESXi host that do not have this problem, but maybe the
> problem is with my sysctl.conf and loader.conf settings:

Which NIC driver are you using?

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order packet process and spurious RST

2013-04-19 Thread John Baldwin

I want to make some progress on this, so let's break this up into smaller 
parts.

First, I think both calls to rearm_queues() should be removed.  In the case of 
the local timer, this can only re-enable interrupts if the interrupt handler 
is already scheduled or running or its associated task is running.  In the 
last case this means the ithread can run concurrently with the interrupt 
handler causing out-of-order processing.  The rxeof case has the same issue.  
Normally the code calling rxeof is going to re-enable the interrupt if rxeof 
runs to completion, and if not it is going to schedule the taskqueue.  The 
effect of the rxeof change was to always re-enable interrupts before 
scheduling the taskqueue which can result in those running concurrently.

Index: /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c
===
--- /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c   (revision 
249553)
+++ /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c   (working copy)
@@ -1386,23 +1386,6 @@
}
 }
 
-static inline void
-ixgbe_rearm_queues(struct adapter *adapter, u64 queues)
-{
-   u32 mask;
-
-   if (adapter->hw.mac.type == ixgbe_mac_82598EB) {
-   mask = (IXGBE_EIMS_RTX_QUEUE & queues);
-   IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS, mask);
-   } else {
-   mask = (queues & 0x);
-   IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS_EX(0), mask);
-   mask = (queues >> 32);
-   IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS_EX(1), mask);
-   }
-}
-
-
 static void
 ixgbe_handle_que(void *context, int pending)
 {
@@ -2069,7 +2055,6 @@
 goto watchdog;
 
 out:
-   ixgbe_rearm_queues(adapter, adapter->que_mask);
callout_reset(&adapter->timer, hz, ixgbe_local_timer, adapter);
return;
 
@@ -4596,14 +4577,8 @@
 
/*
** We still have cleaning to do?
-   ** Schedule another interrupt if so.
*/
-   if ((staterr & IXGBE_RXD_STAT_DD) != 0) {
-   ixgbe_rearm_queues(adapter, (u64)(1 << que->msix));
-   return (TRUE);
-   }
-
-   return (FALSE);
+   return ((staterr & IXGBE_RXD_STAT_DD) != 0);
 }
 
 
-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order packet process and spurious RST

2013-04-19 Thread John Baldwin

uled to handle it.
-   */
+   ixgbe_txeof(txr);
 #ifdef IXGBE_LEGACY_TX
if (!IFQ_DRV_IS_EMPTY(&adapter->ifp->if_snd))
+   ixgbe_start_locked(txr, ifp);
 #else
-   if (!drbr_empty(adapter->ifp, txr->br))
+   if (!drbr_empty(ifp, txr->br))
+   ixgbe_mq_start_locked(ifp, txr, NULL);
 #endif
-   more_tx = 1;
IXGBE_TX_UNLOCK(txr);
 
/* Do AIM now? */
@@ -1575,7 +1564,7 @@
 rxr->packets = 0;
 
 no_calc:
-   if (more_tx || more_rx)
+   if (more)
taskqueue_enqueue(que->tq, &que->que_task);
else /* Reenable this interrupt */
ixgbe_enable_queue(adapter, que->msix);
@@ -3557,7 +3545,7 @@
  *  tx_buffer is put back on the free queue.
  *
  **/
-static bool
+static void
 ixgbe_txeof(struct tx_ring *txr)
 {
struct adapter  *adapter = txr->adapter;
@@ -3605,13 +3593,13 @@
IXGBE_CORE_UNLOCK(adapter);
IXGBE_TX_LOCK(txr);
}
-   return FALSE;
+   return;
}
 #endif /* DEV_NETMAP */
 
if (txr->tx_avail == txr->num_desc) {
txr->queue_status = IXGBE_QUEUE_IDLE;
-   return FALSE;
+   return;
}
 
/* Get work starting point */
@@ -3705,12 +3693,8 @@
if ((!processed) && ((ticks - txr->watchdog_time) > IXGBE_WATCHDOG))
txr->queue_status = IXGBE_QUEUE_HUNG;
 
-   if (txr->tx_avail == txr->num_desc) {
+   if (txr->tx_avail == txr->num_desc)
txr->queue_status = IXGBE_QUEUE_IDLE;
-   return (FALSE);
-   }
-
-   return TRUE;
 }
 
 /*


-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order packet process and spurious RST

2013-04-19 Thread John Baldwin

The following reply was made to PR kern/176446; it has been noted by GNATS.

From: John Baldwin 
To: freebsd-net@freebsd.org
Cc: Jack Vogel ,
 bug-follo...@freebsd.org,
 Mike Karels 
Subject: Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving 
out-of-order packet process and spurious RST
Date: Fri, 19 Apr 2013 12:27:09 -0400

 A second patch.  This is not something I mentioned before, but I had this in 
 my checkout.  In the legacy IRQ case this could also result in out-of-order 
 processing.  It also fixes a potential OACTIVE-stuck type bug that we used to 
 have in igb.  I have no way to test this, so it would be good if some other 
 folks could test this.

 The patch changes ixgbe_txeof() return void and changes the few places that 
 checked its return value to ignore it.  While it is true that ixgbe has a tx 
 processing limit (which I think is dubious.. TX completion processing is very 
 cheap unlike RX processing, so it seems to me like it should always run to 
 completion as in igb), in the common case I think the result will be to do 
 what igb used to do: poll the ring at 100% CPU (either in the interrupt 
 handler or in the task it keeps rescheduling) waiting for pending TX packets 
 to be completed (which is pointless: the host CPU can't make the NIC transmit 
 packets any faster by polling).

 It also changes the interrupt handlers to restart packet transmission 
 synchronously rather than always deferring that to a task (the former is what 
 (nearly) all other drivers do).  It also fixes the interrupt handlers to be 
 consistent (one looped on txeof but not the others).  In the case of the
 legacy interrupt handler it is possible it could fail to restart packet
 transmission if there were no pending RX packets after rxeof returned and
 txeof fully cleaned its ring without this change.

 It also fixes the legacy interrupt handler to not re-enable the interrupt if 
 it schedules the task but to wait until the task completes (this could result
 in concurrent, out-of-order RX processing).

 Index: /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c
 ===
 --- /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c  (revision 
249553)
 +++ /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c  (working copy)
 @@ -149,7 +149,7 @@
  static void ixgbe_enable_intr(struct adapter *);
  static void ixgbe_disable_intr(struct adapter *);
  static void ixgbe_update_stats_counters(struct adapter *);
 -static bool   ixgbe_txeof(struct tx_ring *);
 +static void   ixgbe_txeof(struct tx_ring *);
  static bool   ixgbe_rxeof(struct ix_queue *);
  static void   ixgbe_rx_checksum(u32, struct mbuf *, u32);
  static void ixgbe_set_promisc(struct adapter *);
 @@ -1431,7 +1414,10 @@
}

/* Reenable this interrupt */
 -  ixgbe_enable_queue(adapter, que->msix);
 +  if (que->res != NULL)
 +  ixgbe_enable_queue(adapter, que->msix);
 +  else
 +  ixgbe_enable_intr(adapter);
return;
  }

 @@ -1449,8 +1435,9 @@
struct adapter  *adapter = que->adapter;
struct ixgbe_hw *hw = &adapter->hw;
struct  tx_ring *txr = adapter->tx_rings;
 -  boolmore_tx, more_rx;
 -  u32 reg_eicr, loop = MAX_LOOP;
 +  struct ifnet*ifp = adapter->ifp;
 +  boolmore;
 +  u32 reg_eicr;

reg_eicr = IXGBE_READ_REG(hw, IXGBE_EICR);
 @@ -1461,17 +1448,19 @@
return;
}

 -  more_rx = ixgbe_rxeof(que);
 +  more = ixgbe_rxeof(que);

IXGBE_TX_LOCK(txr);
 -  do {
 -  more_tx = ixgbe_txeof(txr);
 -  } while (loop-- && more_tx);
 +  ixgbe_txeof(txr);
 +#if __FreeBSD_version >= 80
 +  if (!drbr_empty(ifp, txr->br))
 +  ixgbe_mq_start_locked(ifp, txr, NULL);
 +#else
 +  if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
 +  ixgbe_start_locked(txr, ifp);
 +#endif
IXGBE_TX_UNLOCK(txr);

 -  if (more_rx || more_tx)
 -  taskqueue_enqueue(que->tq, &que->que_task);
 -
/* Check for fan failure */
if ((hw->phy.media_type == ixgbe_media_type_copper) &&
(reg_eicr & IXGBE_EICR_GPI_SDP1)) {
 @@ -1484,7 +1473,10 @@
if (reg_eicr & IXGBE_EICR_LSC)
taskqueue_enqueue(adapter->tq, &adapter->link_task);

 -  ixgbe_enable_intr(adapter);
 +  if (more)
 +  taskqueue_enqueue(que->tq, &que->que_task);
 +  else
 +  ixgbe_enable_intr(adapter);
return;
  }

 @@ -1501,27 +1493,24 @@
struct adapter  *adapter = que->adapter;
struct tx_ring  *txr = que->txr;
struct rx_ring  *rxr = que->rxr;
 -  boolmore_tx, more_rx;
 +  struct ifnet*ifp = adapter->if

Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order packet process and spurious RST

2013-04-19 Thread John Baldwin

The following reply was made to PR kern/176446; it has been noted by GNATS.

From: John Baldwin 
To: freebsd-net@freebsd.org
Cc: Jack Vogel ,
 bug-follo...@freebsd.org
Subject: Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving 
out-of-order packet process and spurious RST
Date: Fri, 19 Apr 2013 12:09:11 -0400

 I want to make some progress on this, so let's break this up into smaller 
 parts.

 First, I think both calls to rearm_queues() should be removed.  In the case of 
 the local timer, this can only re-enable interrupts if the interrupt handler 
 is already scheduled or running or its associated task is running.  In the 
 last case this means the ithread can run concurrently with the interrupt 
 handler causing out-of-order processing.  The rxeof case has the same issue.  
 Normally the code calling rxeof is going to re-enable the interrupt if rxeof 
 runs to completion, and if not it is going to schedule the taskqueue.  The 
 effect of the rxeof change was to always re-enable interrupts before 
 scheduling the taskqueue which can result in those running concurrently.

 Index: /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c
 ===
 --- /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c  (revision 
249553)
 +++ /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c  (working copy)
 @@ -1386,23 +1386,6 @@
}
  }

 -static inline void
 -ixgbe_rearm_queues(struct adapter *adapter, u64 queues)
 -{
 -  u32 mask;
 -
 -  if (adapter->hw.mac.type == ixgbe_mac_82598EB) {
 -  mask = (IXGBE_EIMS_RTX_QUEUE & queues);
 -  IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS, mask);
 -  } else {
 -  mask = (queues & 0x);
 -  IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS_EX(0), mask);
 -  mask = (queues >> 32);
 -  IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS_EX(1), mask);
 -  }
 -}
 -
 -
  static void
  ixgbe_handle_que(void *context, int pending)
  {
 @@ -2069,7 +2055,6 @@
  goto watchdog;

  out:
 -  ixgbe_rearm_queues(adapter, adapter->que_mask);
callout_reset(&adapter->timer, hz, ixgbe_local_timer, adapter);
return;

 @@ -4596,14 +4577,8 @@

/*
** We still have cleaning to do?
 -  ** Schedule another interrupt if so.
*/
 -  if ((staterr & IXGBE_RXD_STAT_DD) != 0) {
 -  ixgbe_rearm_queues(adapter, (u64)(1 << que->msix));
 -  return (TRUE);
 -  }
 -
 -  return (FALSE);
 +  return ((staterr & IXGBE_RXD_STAT_DD) != 0);
  }

 -- 
 John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Network connections are lost from time to time

2013-04-19 Thread John Baldwin

On Friday, April 19, 2013 12:32:18 pm C. L. Martinez wrote:
> On Friday, April 19, 2013, John Baldwin  wrote:
> > On Friday, April 19, 2013 3:11:41 am C. L. Martinez wrote:
> >> Hi all,
> >>
> >>  I have a strange problem with my FreeBSD 9.1 (fully patched): I loose
> ssh
> >> sessions from time to time frequently.
> >>
> >>  This fbsd box is installed in an ESXi 5.1 server and I have another
> three
> >> fbsd 9.1 in the same ESXi host that do not have this problem, but maybe
> the
> >> problem is with my sysctl.conf and loader.conf settings:
> >
> > Which NIC driver are you using?
> >
> > --
> > John Baldwin
> 
> 
> e1000.

igb?  There are some fixes to handle out of order packets on transmit that 
could break new connections in some cases.  That is probably worth testing.  I 
think you can just grab the sys/dev/e1000 from 9-stable and drop it into a
9.1 tree to test.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Network connections are lost from time to time

2013-04-19 Thread John Baldwin

On Friday, April 19, 2013 4:14:43 pm C. L. Martinez wrote:
> On Friday, April 19, 2013, John Baldwin  wrote:
> > On Friday, April 19, 2013 12:32:18 pm C. L. Martinez wrote:
> >> On Friday, April 19, 2013, John Baldwin  wrote:
> >> > On Friday, April 19, 2013 3:11:41 am C. L. Martinez wrote:
> >> >> Hi all,
> >> >>
> >> >>  I have a strange problem with my FreeBSD 9.1 (fully patched): I loose
> >> ssh
> >> >> sessions from time to time frequently.
> >> >>
> >> >>  This fbsd box is installed in an ESXi 5.1 server and I have another
> >> three
> >> >> fbsd 9.1 in the same ESXi host that do not have this problem, but
> maybe
> >> the
> >> >> problem is with my sysctl.conf and loader.conf settings:
> >> >
> >> > Which NIC driver are you using?
> >> >
> >> > --
> >> > John Baldwin
> >>
> >>
> >> e1000.
> >
> > igb?  There are some fixes to handle out of order packets on transmit that
> > could break new connections in some cases.  That is probably worth
> testing.  I
> > think you can just grab the sys/dev/e1000 from 9-stable and drop it into a
> > 9.1 tree to test.
> >
> > --
> Nop, I am using em driver ...

Ok, have you thought about running a tcpdump and then examining the dump 
around the time that a drop occurs?  (Specifically, if you know the connection 
that drops you can get wireshark to only display the packets for that dump).  
Also, have you compared netstat -s before and after drops?

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: shm_map questions

2013-04-22 Thread John Baldwin

On Saturday, April 20, 2013 9:18:24 pm Laurie Jennings wrote:
> That does help. Is there a way for the kernel to access the memory map 
directlyby segment name?

There is not, no.  It wouldn't be hard to add, but the issue there is that
the existing shm_map/unmap API assumes you have an open file descriptor and
is tailored to having a userland process provide memory rather than having
the kernel provide a SHM to userland, so even if you added a shm_open() that
gave you a reference on the underlying shm object (struct shmfd *), you would
need a slightly different shm_map/unmap that took that object directly
rather than an fd.

> Laurie
> 
> --- On Thu, 4/18/13, John Baldwin  wrote:
> 
> From: John Baldwin 
> Subject: Re: shm_map questions
> To: freebsd-net@freebsd.org
> Cc: "Laurie Jennings" 
> Date: Thursday, April 18, 2013, 6:50 AM
> 
> On Thursday, April 11, 2013 10:58:14 am Laurie Jennings wrote:
> > Im working on a simple project that shares a memory segment between a user 
> processand a kernel module. I'm having some problems with shm_map and there 
> doesn't seem to be much info on it.
> > Im not sure what happened to the memory when the user process that creates 
> it terminates.  I have some questions.
> > 1) Does the kernel mapping keep the segment from being garbage collected 
> when the use process that creates it terminated? I've experienced 
shm_unmap() 
> panic when tryingto unmap a segment
> > scenario:  
> > User process Maps SegmentKernel maps it  with shm_map()User Process 
> TerminatesKernel tries to shm_unmap() and it panics.
> 
> The kernel mapping bumps the refcount on the underlying vm object, so it 
will
> not go away.  OTOH, you should be keeping your own reference count on the
> associated fd so that you can call shm_unmap().  That is, the model should 
be
> something like:
> 
> struct mydata *foo;
> 
> foo->fp = fget(fd);
> shm_map(fp, &foo->p);
> /* Don't call fdrop */
> 
> and then when unmapping:
> 
> struct mydata *foo;
> 
> shm_unmap(foo->fp, foo->p);
> fdrop(foo->fp);
> 
> > 2) Is there a way for the kernel process to know when the user process has 
> goneaway? A ref count?
> 
> You can install a process_exit EVENTHANDLER if you want to destroy this when 
a
> process goes away.  I have used shm_map/unmap for other objects that already
> had a reference count so I did my shm_unmap when that object was destroyed.
> 
> > 3) Does a SHM_ANON segment persist as long as the kernel has it mapped, or 
> doesit get garbage collected when the creating user process terminates?
> 
> It goes away when the backing 'struct file' goes away.  If you follow the 
> model above of keeping a reference count on the associated struct file then
> it won't go away until you fdrop() after the shm_unmap.
> 
> > 4) When using a named segment, can the kernel "reuse" a mapping for a new 
> userprocess?
> > Example:
> > User process creates shm segment with path /fooKernel Maps shm segment 
with 
> shm_map()User process terminates.User process runs again, opening segment 
/foo
> > Does the kernel need to re-map, or is the original mapping valid?
> 
> The mapping is not per-process, so if you have mapped a shm for /foo and
> mapped it, it will stay mapped until you call shm_unmap.  Multiple processes
> can shm_open /foo and mmap it and they will all share the same memory.
> 
> You could even share a SHM_ANON fd among multiple processes by passing it
> across a UNIX domain socket.
> 
> Hope this helps.
> 
> -- 
> John Baldwin
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> 

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: High CPU interrupt load on intel I350T4 with igb on 8.3

2013-05-20 Thread John Baldwin

On Friday, April 26, 2013 7:31:07 am Clément Hermann (nodens) wrote:
> Hi list,
> 
> We use pf+ALTQ for trafic shaping on some routers.
> 
> We are switching to new servers : Dell PowerEdge R620 with 2 8-cores 
> Intel Processor (E5-2650L), 8GB RAM and Intel I350T4 (quad port) using 
> igb driver. The old hardware is using em driver, the CPU load is high 
> but mostly due to kernel and a large pf ruleset.
> 
> On the new hardware, we see high CPU Interrupt load (up to 95%), even 
> though there is not much trafic currently (peaks about 150Mbps and 
> 40Kpps). All queues are used and binded to a cpu according to top, but a 
> lot of CPU time is spent on igb queues (interrupt or wait). The load is 
> fine when we stay below 20Kpps.
> 
> We see no mbuf shortage, no dropped packet, but there is little margin 
> left on CPU time (about 25% idle at best, most of CPU time is spent on 
> interrupts), which is disturbing.
> 
> We have done some tuning, but to no avail :

If you have the processing_limit set to -1, you should never see CPU time 
spent in the igb task threads (any such time means there is a bug).  One such 
bug was fixed in 8.x here (that is after 8.3):

http://svnweb.freebsd.org/base?view=revision&revision=235553

This may not help with any issues in pf(4), but we had workloads at work (not 
involving pf) where this bug could cause boxes to spend 100% CPU in igb 
threads.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: bpf hold buffer in-use flag

2013-05-23 Thread John Baldwin

On Thursday, May 23, 2013 5:05:39 pm Guy Helmer wrote:
> 
> On Jan 9, 2013, at 2:35 PM, John Baldwin  wrote:
> 
> > On Tuesday, November 13, 2012 4:40:57 pm Guy Helmer wrote:
> >> To try to completely resolve the race in bpfread(), I have put together 
> > these changes to add a flag to indicate when the hold buffer cannot be 
> > modified because it is in use. Since it's my first time using mtx_sleep() 
> > and 
> > wakeup(), I wanted to run these past the list to see if I can get any 
> > feedback 
> > on the approach.
> >> 
> >> 
> >> Index: bpf.c
> >> ===
> >> --- bpf.c  (revision 242997)
> >> +++ bpf.c  (working copy)
> >> @@ -819,6 +819,7 @@ bpfopen(struct cdev *dev, int flags, int fmt, stru
> >> * particular buffer method.
> >> */
> >>bpf_buffer_init(d);
> >> +  d->bd_hbuf_in_use = 0;
> >>d->bd_bufmode = BPF_BUFMODE_BUFFER;
> >>d->bd_sig = SIGIO;
> >>d->bd_direction = BPF_D_INOUT;
> >> @@ -872,6 +873,9 @@ bpfread(struct cdev *dev, struct uio *uio, int iof
> >>callout_stop(&d->bd_callout);
> >>timed_out = (d->bd_state == BPF_TIMED_OUT);
> >>d->bd_state = BPF_IDLE;
> >> +  while (d->bd_hbuf_in_use)
> >> +  mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
> >> +  PRINET|PCATCH, "bd_hbuf", 0);
> > 
> > You need to check the return value here, otherwise the PCATCH is useless 
> > (you 
> > will just go back to sleep instead of failing with an error if this is 
> > interrupted by a signal). 
> 
> Thanks for the feedback (sorry it's taken so long to get to it). Would this
> change correctly handle interruptions?

Yes.

> Index: bpf.c
> ===
> --- bpf.c (revision 250941)
> +++ bpf.c (working copy)
> @@ -856,9 +856,14 @@
>   callout_stop(&d->bd_callout);
>   timed_out = (d->bd_state == BPF_TIMED_OUT);
>   d->bd_state = BPF_IDLE;
> - while (d->bd_hbuf_in_use)
> - mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
> + while (d->bd_hbuf_in_use) {
> + error = mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
>   PRINET|PCATCH, "bd_hbuf", 0);
> + if (error == EINTR || error == ERESTART) {
> + BPFD_UNLOCK(d);
> + return (error);
> + }
> + }

Maybe simplify the check to just 'if (error != 0)'?

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: RFC: removing redundant checks in ether_input_internal()

2013-05-29 Thread John Baldwin

On Wednesday, May 22, 2013 10:53:29 am Andre Oppermann wrote:
> On 22.05.2013 14:58, Luigi Rizzo wrote:
> > if_ethersubr.c :: ether_input_internal() is only called as follows:
> >
> >  static void
> >  ether_nh_input(struct mbuf *m)
> >  {
> >
> >  ether_input_internal(m->m_pkthdr.rcvif, m);
> >  }
> >
> > hence the following checks in the body are unnecessary:
> >
> >  if (m->m_pkthdr.rcvif == NULL) {
> >  if_printf(ifp, "discard frame w/o interface pointer\n");
> >  ifp->if_ierrors++;
> >  m_freem(m);
> >  return;
> >  }
> >  #ifdef DIAGNOSTIC
> >  if (m->m_pkthdr.rcvif != ifp) {
> >  if_printf(ifp, "Warning, frame marked as received on 
%s\n",
> >  m->m_pkthdr.rcvif->if_xname);
> >  }
> >  #endif
> >
> > Any objection if i remove them ?
> 
> No, but they should remain as KASSERTs.  None of these should trigger in
> production and all of them are an indication that something is very wrong
> with the packet or the caller.

Eh, but if the only caller is ether_nh_input() then by definition you know
that m->m_pkthdr.rcvif == ifp.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Create pkey on FreeBSD 9.1

2013-05-29 Thread John Baldwin

On Thursday, May 23, 2013 2:36:25 pm Ryan Stone wrote:
> On Thu, May 23, 2013 at 4:32 AM, Alex Liptsin  wrote:
> 
> > Hello.
> >
> > I have FreeBSD 9.1 installed.
> > There is mellanox adapter inside.
> > OFED support is already installed.
> >
> > I try to add pkeys on ib0 port.
> >
> > Usually in  Linux I did:
> >
> > echo 0x800c >  /sys/class/net/ib0/create_child
> >
> > ifconfig -a
> > To Make sure you see a new interface: ib0.800c
> >
> > How can I do it on FreeBSD? There is no "/sys/class/net/ib0/create_child"
> > directory.
> >
> > Regards,
> > Alex Liptsin
> >
> > ___
> > freebsd-net@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> >
> 
> From reading the source it looks like this is done by attaching a vlan
> interface to the interface.  So try:
> 
> ifconfig vlan create vlandev ib0 vlan 0xc
> 
> This will create a new vlanX interface (ifconfig will its precise name with
> its unit number to stdout).

Simpler though is just 'ifconfig ib0.12 create' (and how most folks
expect subinterfaces to be named).

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: How to switch Datgram/Connected mtu modes?

2013-05-29 Thread John Baldwin

On Sunday, May 26, 2013 7:43:29 am Alex Liptsin wrote:
> Hello.
> 
> I work with FreeBSD 9.1 and Mellanox devices.
> 
> How can I configure MTU in connected mode on FreeBSD 9.1?
> In Linux to enable connected mode for interface ib0, I enter:
> 
>echo connected > /sys/class/net/ib0/mode
> 
> 
> 
> Switching between CM and UD mode can be done in run time:
> 
>echo datagram > /sys/class/net/ib0/mode sets the mode of ib0 to UD
> 
>echo connected > /sys/class/net/ib0/mode sets the mode ib0 to CM
> 
> There is no such directories at FreeBSD. Wat shall I do?

Have you tried looking for dev.ib.0 sysctls?  It looks like the OFED bits in 
FreeBSD map Linux sysfs entries to sysctl nodes, but I don't have a box with 
IB handy to see what it looks like at runtime.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Create pkey on FreeBSD 9.1

2013-05-30 Thread John Baldwin

On Thursday, May 30, 2013 3:29:46 am Alex Liptsin wrote:
> Hi John.
> 
> I did it, but there is no ping between the vlans.  Ping without VLANs on 
that ports pass.

Unfortunately I do not have an IB setup to test this.  I also don't know
how IB treats vlans (e.g. does it use an 802.1(q) type header?).  Can you
tcpdump on the ib0 interface and see if your pings on ib0.100 show up and if 
they have the appropriate headers?

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: misc/179033: [dc] dc ethernet driver seems to have issues with some multiport card and mother board combinations

2013-05-30 Thread John Baldwin

On Thursday, May 30, 2013 1:12:14 am YongHyeon PYUN wrote:
> On Wed, May 29, 2013 at 08:58:10PM -0700, Mr. Clif wrote:
> > Sorry for the confusion Pyun,
> > 
> > I started looking at it in the context of pfsense, but they rejected my 
> > bug report which was understandable because it's an upstream issue. They 
> > suggested I resubmit it to you guys if I could reproduce it. So I booted 
> > FreeBSD and lo and behold the same two ports failed in exactly the same 
> 
> Ok, I'd like to fix that.

Hmmm, the dc(4) driver is using the I/O port BARs rather than the
memory BARs for its registers and this bug seems to be that the dc(4)
device can't properly access its registers on dc0 and dc1 on the
Atom box.  The one thing I see is that the BIOS on the Atom box assigns
addresses in the 0x1100-0x11ff range for dc0 and dc1.  Those addresses
conflict with ISA I/O aliases for the 0x100-0x1ff range.  The Dell BIOS
is more careful to avoid these ranges.

I think the fix is that I need to fix the PCI-PCI bridge to reject these
resource ranges if the ISA enable bit is set in the bridge's control
register.  However, for the time being you can change dc(4) to use the
memory BAR instead of the I/O port BAR as a workaround:

Index: if_dc.c
===
--- if_dc.c (revision 251132)
+++ if_dc.c (working copy)
@@ -128,7 +128,7 @@ __FBSDID("$FreeBSD$");
 #include 
 #include 

-#defineDC_USEIOSPACE
+//#define  DC_USEIOSPACE

 #include 

If this fixes it then I can take this PR as a test case for handling the ISA 
enable bit in the PCI-PCI bridge code.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: How to compile ipoib module manually?

2013-06-05 Thread John Baldwin

On Tuesday, June 04, 2013 5:18:46 am Alex Liptsin wrote:
> I commented on that lines, because I want to compile and load that modules 
manually.
> I had succeed to compile and load mlx4, mlx4ib and mlxen from /sys/modules:
> 
> [root@h-qa-033 mlxen]# kldstat
> Id Refs AddressSize Name
> 1   14 0x8020 13acbd8  kernel
> 21 0x81612000 21e5 if_mos.ko
> 33 0x81615000 124ebmlx4.ko
> 41 0x81628000 e225 mlx4ib.ko
> 51 0x81637000 ec60 mlxen.ko
> 
> The problem is that IPOIB module is missing in /sys/modules.
> 
> 1.  Where can I find it?
> 
> 2.  How can I compile ipoib support?

You will have to create one.  You should be able to use the existing module 
Makefiles as a guide.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: misc/179033: [dc] dc ethernet driver seems to have issues with some multiport card and mother board combinations

2013-06-24 Thread John Baldwin

On Monday, June 10, 2013 3:13:11 pm Mr. Clif wrote:
> Hi John and Pyun,
> 
> Ok got the new kernel installed and tested. Yes it works! :-) Maybe that 
> will also fix a simular problem with the sun cards (cas[03]), except I 
> don't see a define like that in if_cas.c. Suggestions?

So I have a possible "real" fix for this.  However, I do not have any hardware 
I can find that has a PCI-PCI bridge with the ISA-enable bit set.  I know it
compiles and boots fine on other systems.  Can you please try this and capture
the dmesg output?  It would also be good to capture devinfo -u output before 
and after.

http://www.freebsd.org/~jhb/patches/pci_isa_enable.patch

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Failed to allocate receive buffer problem

2013-06-25 Thread John Baldwin

On Wednesday, June 12, 2013 3:06:26 am Alex Liptsin wrote:
> Hi.
> 
> I have a problem that when running a ping (or any other traffic) over IPoIB 
port,
> Traffic fails after some time.
> At destination server DMESG I see that errors:
> 
> Jun 11 14:42:11 h-qa-033 kernel: ib1: failed to allocate receive buffer 253
> Jun 11 14:42:12 h-qa-033 kernel: ib1: failed to allocate receive buffer 254
> Jun 11 14:42:13 h-qa-033 kernel: ib1: failed to allocate receive buffer 255
> Jun 11 14:42:14 h-qa-033 kernel: ib1: failed to allocate receive buffer 0
> Jun 11 14:42:15 h-qa-033 kernel: ib1: failed to allocate receive buffer 1
> Jun 11 14:42:16 h-qa-033 kernel: ib1: failed to allocate receive buffer 2
> Jun 11 14:42:17 h-qa-033 kernel: ib1: failed to allocate receive buffer 3
> Jun 11 14:42:18 h-qa-033 kernel: ib1: failed to allocate receive buffer 4
> Jun 11 14:42:19 h-qa-033 kernel: ib1: failed to allocate receive buffer 5
> Jun 11 14:42:20 h-qa-033 kernel: ib1: failed to allocate receive buffer 6
> Jun 11 14:42:21 h-qa-033 kernel: ib1: failed to allocate receive buffer 7
> 
> I work with FreeBSD 9.1.
> 
> Is it a bug or some configuration issues?

Do you see memory allocation errors in netstat -m?

Specifically this line:

0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)

If so, it may be that the IPoIB layer has an mbuf leak.  The rest of netstat -
m might be useful here as you can see if any of the zones are full.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: kern/179999: [ofed] [patch] Bug assigning HCA from IB to ETH

2013-06-27 Thread John Baldwin

The following reply was made to PR kern/17; it has been noted by GNATS.

From: John Baldwin 
To: bug-follo...@freebsd.org,
 shah...@mellanox.com
Cc:  
Subject: Re: kern/17: [ofed] [patch] Bug assigning HCA from IB to ETH
Date: Thu, 27 Jun 2013 14:10:42 -0400

 Thanks, I think the sysfs fix has a few issues though (it writes to buf[] even 
 if the copyin() fails, and it doesn't enforce a bounds check).  It does seem 
 that this should probably be using sysctl_handle_string() instead of doing it 
 by hand, but for now I've just adjusted your patch.  Can you please test this 
 version?
 
 Index: ofed/drivers/net/mlx4/main.c
 ===
 --- ofed/drivers/net/mlx4/main.c   (revision 252306)
 +++ ofed/drivers/net/mlx4/main.c   (working copy)
 @@ -479,11 +479,11 @@
int i;
int err = 0;
  
 -  if (!strcmp(buf, "ib\n"))
 +  if (!strcmp(buf, "ib"))
info->tmp_type = MLX4_PORT_TYPE_IB;
 -  else if (!strcmp(buf, "eth\n"))
 +  else if (!strcmp(buf, "eth"))
info->tmp_type = MLX4_PORT_TYPE_ETH;
 -  else if (!strcmp(buf, "auto\n"))
 +  else if (!strcmp(buf, "auto"))
info->tmp_type = MLX4_PORT_TYPE_AUTO;
else {
mlx4_err(mdev, "%s is not supported port type\n", buf);
 Index: ofed/include/linux/sysfs.h
 ===
 --- ofed/include/linux/sysfs.h (revision 252306)
 +++ ofed/include/linux/sysfs.h (working copy)
 @@ -104,10 +104,15 @@
error = SYSCTL_OUT(req, buf, len);
if (error || !req->newptr || ops->store == NULL)
goto out;
 -  error = SYSCTL_IN(req, buf, PAGE_SIZE);
 +  len = req->newlen - req->newidx;
 +  if (len >= PAGE_SIZE)
 +  error = EINVAL;
 +  else 
 +  error = SYSCTL_IN(req, buf, len);
if (error)
goto out;
 -  len = ops->store(kobj, attr, buf, req->newlen);
 +  ((char *)buf)[len] = '\0';
 +  len = ops->store(kobj, attr, buf, len);
if (len < 0)
error = -len;
  out:
 
 -- 
 John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: misc/179033: [dc] dc ethernet driver seems to have issues with some multiport card and mother board combinations

2013-06-28 Thread John Baldwin

On Wednesday, June 26, 2013 12:37:13 am Mr. Clif wrote:
> Hi John,
> 
> Thanks for working on this. I'm very interested in getting this fixed 
> for everyone that uses the Affected Atom boards and other small format 
> boards that work well in small custom routers.
> 
> However right now I have a big network upgrade I'm working on and don't 
> have time to get to it until late July, I'm hoping. So please forgive me 
> for the long delay.

That is fine.  I've been able to test this on a little netbook I have that has
bridges with the ISA enable bit set and have fixed a few bugs.  The updated
patch is at the URL below.  I wasn't able to test your specific use case yet
however (of the BIOS using an invalid range).

>  Thanks for your help,
>  Clif
> 
> 
> John Baldwin wrote:
> > On Monday, June 10, 2013 3:13:11 pm Mr. Clif wrote:
> >> Hi John and Pyun,
> >>
> >> Ok got the new kernel installed and tested. Yes it works! :-) Maybe that
> >> will also fix a simular problem with the sun cards (cas[03]), except I
> >> don't see a define like that in if_cas.c. Suggestions?
> > So I have a possible "real" fix for this.  However, I do not have any 
> > hardware
> > I can find that has a PCI-PCI bridge with the ISA-enable bit set.  I know it
> > compiles and boots fine on other systems.  Can you please try this and 
> > capture
> > the dmesg output?  It would also be good to capture devinfo -u output before
> > and after.
> >
> > http://www.freebsd.org/~jhb/patches/pci_isa_enable.patch
> >
> 
> 

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: kern/180430: [ofed] [patch] Bad UDP checksum calc for fragmented packets

2013-07-11 Thread John Baldwin

On Wednesday, July 10, 2013 6:59:42 am lini...@freebsd.org wrote:
> Old Synopsis: Bad UDP checksum calc for fragmented packets
> New Synopsis: [ofed] [patch] Bad UDP checksum calc for fragmented packets
> 
> Responsible-Changed-From-To: freebsd-bugs->freebsd-net
> Responsible-Changed-By: linimon
> Responsible-Changed-When: Wed Jul 10 10:59:03 UTC 2013
> Responsible-Changed-Why: 
> Over to maintainer(s).
> 
> http://www.freebsd.org/cgi/query-pr.cgi?pr=180430

Is the problem that the hardware checksum overwrites arbitrary data in the 
packet (at the location where the UDP header would be)?

Also, I've looked at other drivers, and they all look at the CSUM_*
flags to determine the right settings.  IP fragments will not have
CSUM_UDP or CSUM_TCP set, so I think the more correct fix is this:

Index: en_tx.c
===
--- en_tx.c (revision 253202)
+++ en_tx.c (working copy)
@@ -780,8 +780,12 @@ retry:
tx_desc->ctrl.srcrb_flags = cpu_to_be32(MLX4_WQE_CTRL_CQ_UPDATE |
MLX4_WQE_CTRL_SOLICITED);
if (mb->m_pkthdr.csum_flags & (CSUM_IP|CSUM_TCP|CSUM_UDP)) {
-   tx_desc->ctrl.srcrb_flags |= cpu_to_be32(MLX4_WQE_CTRL_IP_CSUM |
-
MLX4_WQE_CTRL_TCP_UDP_CSUM);
+   if (mb->m_pkthdr.csum_flags & CSUM_IP)
+   tx_desc->ctrl.srcrb_flags |=
+   cpu_to_be32(MLX4_WQE_CTRL_IP_CSUM);
+   if (mb->m_pkthdr.csum_flags & (CSUM_TCP|CSUM_UDP)) {
+   tx_desc->ctrl.srcrb_flags |=
+   cpu_to_be32(MLX4_WQE_CTRL_TCP_UDP_CSUM);
priv->port_stats.tx_chksum_offload++;
}
 

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: misc/179033: [dc] dc ethernet driver seems to have issues with some multiport card and mother board combinations

2013-07-16 Thread John Baldwin

On Wednesday, June 26, 2013 12:37:13 am Mr. Clif wrote:
> Hi John,
> 
> Thanks for working on this. I'm very interested in getting this fixed 
> for everyone that uses the Affected Atom boards and other small format 
> boards that work well in small custom routers.
> 
> However right now I have a big network upgrade I'm working on and don't 
> have time to get to it until late July, I'm hoping. So please forgive me 
> for the long delay.
> 
>  Thanks for your help,
>  Clif

I've tested your specific case more by hacking the PCI bus driver to assign
a bogus range to my NIC on my netbook and verifying it rejected the request
and allocated a new range.  I did have to fix a bug though, so once you get
a chance to test, please test

http://www.freebsd.org/~jhb/patches/pci_isa_enable2.patch instead.

I will go ahead and commit a slightly cleaned up version (with less debugging)
today, but the patch above will output enough debugging to verify it is working
without requiring a verbose boot.

> John Baldwin wrote:
> > On Monday, June 10, 2013 3:13:11 pm Mr. Clif wrote:
> >> Hi John and Pyun,
> >>
> >> Ok got the new kernel installed and tested. Yes it works! :-) Maybe that
> >> will also fix a simular problem with the sun cards (cas[03]), except I
> >> don't see a define like that in if_cas.c. Suggestions?
> > So I have a possible "real" fix for this.  However, I do not have any 
> > hardware
> > I can find that has a PCI-PCI bridge with the ISA-enable bit set.  I know it
> > compiles and boots fine on other systems.  Can you please try this and 
> > capture
> > the dmesg output?  It would also be good to capture devinfo -u output before
> > and after.
> >
> > http://www.freebsd.org/~jhb/patches/pci_isa_enable.patch
> >
> 
> 

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: bind error when using SO_REUSEPORT(implies SO_REUSEADDR)

2013-07-16 Thread John Baldwin

On Thursday, March 15, 2012 8:07:46 pm Sean Bruno wrote:
> On Thu, 2012-03-15 at 16:59 -0700, Sean Bruno wrote:
> > Hey, I just found a bind bug ticket in my queue about bind.  I noted
> > that on stable/6 stable/7 stable/9 & head the referenced code "fails".
> > 
> > It seems that this is a problem, but I have no idea if its a real
> > problem or not.  Our devs think it is.  Anyway, here is a code snippet
> > to show the failure in bind.  On linux/solaris this does not fail.
> > 
> > http://people.freebsd.org/~sbruno/bind_test.c
> > 
> > simple compile with gcc -o test test.c and run as normal user.
> > 
> > Sean
> > 
> 
> this is bind() not bind ... :-)

Did the recent commit to HEAD fix this btw?

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: bind error when using SO_REUSEPORT(implies SO_REUSEADDR)

2013-07-18 Thread John Baldwin

On Wednesday, July 17, 2013 5:23:37 pm Mikolaj Golub wrote:
> On Tue, Jul 16, 2013 at 11:12:46AM -0400, John Baldwin wrote:
> > On Thursday, March 15, 2012 8:07:46 pm Sean Bruno wrote:
> > > On Thu, 2012-03-15 at 16:59 -0700, Sean Bruno wrote:
> > > > Hey, I just found a bind bug ticket in my queue about bind.  I noted
> > > > that on stable/6 stable/7 stable/9 & head the referenced code "fails".
> > > > 
> > > > It seems that this is a problem, but I have no idea if its a real
> > > > problem or not.  Our devs think it is.  Anyway, here is a code snippet
> > > > to show the failure in bind.  On linux/solaris this does not fail.
> > > > 
> > > > http://people.freebsd.org/~sbruno/bind_test.c
> > > > 
> > > > simple compile with gcc -o test test.c and run as normal user.
> > > > 
> > > > Sean
> > > > 
> > > 
> > > this is bind() not bind ... :-)
> > 
> > Did the recent commit to HEAD fix this btw?
> 
> As for me, bind_test.c does not expose any bug in freebsd, it only
> shows different behavior for freebsd and linux.
> 
> On freebsd the test output is:
> 
> serversock addr is 127.0.0.1:27539
> dup bind: Address already in use
> This error was expected, tried to bind to used addr/port
> BUG: binding duplicate socket to server port succeeded
> dup2sock addr is 0.0.0.0:27539
> overlapping explicit bind to same port number succeeded without SO_REUSEPORT
> listen succeeded after explicitly overlapping port bind
> autosock addr is 0.0.0.0:27539
> bug triggered, port number conflict on sockets without SO_REUSEPORT
> listen succeded after implicitly overlapping port bind
> 
> So, the first socket (serversock) is bound to the loopback address,
> then it tries some combinations of binding the second socket to the
> same port but to the wildcard address. When SO_REUSEADDR socket option
> is set, binding to the wildcard address succeeds for freebsd (and
> fails for linux).
> 
> They call this a bug in freebsd, but this is well known and expected
> behavior (see e.g. Stevens' TCP/IP Illustrated Vol1). 
> 
> Or I missed the test's point?

No, that is probably true.  I wasn't sure if it was a bug or not when Sean 
originally posted it.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: kern/180430: [ofed] [patch] Bad UDP checksum calc for fragmented packets

2013-07-19 Thread John Baldwin

The following reply was made to PR kern/180430; it has been noted by GNATS.

From: John Baldwin 
To: bug-follo...@freebsd.org,
 me...@mellanox.com
Cc:  
Subject: Re: kern/180430: [ofed] [patch] Bad UDP checksum calc for fragmented 
packets
Date: Fri, 19 Jul 2013 11:13:44 -0400

 Oops, my previous reply didn't make it to the PR itself.

 Is the problem that the hardware checksum overwrites arbitrary data in the 
 packet (at the location where the UDP header would be)?

 Also, I've looked at other drivers, and they all look at the CSUM_*
 flags to determine the right settings.  IP fragments will not have
 CSUM_UDP or CSUM_TCP set, so I think the more correct fix is this:

 Index: en_tx.c
 ===
 --- en_tx.c(revision 253470)
 +++ en_tx.c(working copy)
 @@ -780,8 +780,12 @@ retry:
tx_desc->ctrl.srcrb_flags = cpu_to_be32(MLX4_WQE_CTRL_CQ_UPDATE |
MLX4_WQE_CTRL_SOLICITED);
if (mb->m_pkthdr.csum_flags & (CSUM_IP|CSUM_TCP|CSUM_UDP)) {
 -  tx_desc->ctrl.srcrb_flags |= cpu_to_be32(MLX4_WQE_CTRL_IP_CSUM |
 -   
MLX4_WQE_CTRL_TCP_UDP_CSUM);
 +  if (mb->m_pkthdr.csum_flags & CSUM_IP)
 +  tx_desc->ctrl.srcrb_flags |=
 +  cpu_to_be32(MLX4_WQE_CTRL_IP_CSUM);
 +  if (mb->m_pkthdr.csum_flags & (CSUM_TCP|CSUM_UDP))
 +  tx_desc->ctrl.srcrb_flags |=
 +  cpu_to_be32(MLX4_WQE_CTRL_TCP_UDP_CSUM);
priv->port_stats.tx_chksum_offload++;
}

 -- 
 John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: kern/180430: [ofed] [patch] Bad UDP checksum calc for fragmented packets

2013-07-22 Thread John Baldwin

The following reply was made to PR kern/180430; it has been noted by GNATS.

From: John Baldwin 
To: Meny Yossefi 
Cc: "bug-follo...@freebsd.org" 
Subject: Re: kern/180430: [ofed] [patch] Bad UDP checksum calc for fragmented 
packets
Date: Mon, 22 Jul 2013 11:40:08 -0400

 On Monday, July 22, 2013 10:11:51 am Meny Yossefi wrote:
 > Hi John,
 > 
 > 
 > 
 > The problem is that the HW will only calculate csum for parts of the 
 > payload, one fragment at a time,
 > 
 > while the receiver side, in our case the tcp/ip stack, will expect to 
 > validate the packet's payload as a whole.
 
 Ok.
 
 > I agree with the change you offered, though one thing bothers me.
 > 
 > This change will add two conditions to the send flow which will probably 
 > have an effect on performance.
 > 
 > Maybe 'likely' can be useful here ?
 
 FreeBSD tends to not use likely/unlikely unless there is a demonstrable gain
 via benchmark measurements.  However, if the OFED code regularly uses it and
 you feel this is a case where you would normally use it, it can be added.
 
 > BTW, I'm not entirely sure, but I think the CSUM_IP flag is always set, so 
 > maybe the first condition is not necessary.
 > 
 > What do you think ?
 
 If the user uses ifconfig to disable checksum offload and force software
 checksums the flag will not be set.
 
 > -Meny
 > 
 > 
 > 
 > 
 > 
 > -Original Message-
 > From: John Baldwin [mailto:j...@freebsd.org]
 > Sent: Friday, July 19, 2013 6:29 PM
 > To: bug-follo...@freebsd.org; Meny Yossefi
 > Subject: Re: kern/180430: [ofed] [patch] Bad UDP checksum calc for 
 > fragmented packets
 > 
 > 
 > 
 > Oops, my previous reply didn't make it to the PR itself.
 > 
 > 
 > 
 > Is the problem that the hardware checksum overwrites arbitrary data in the 
 > packet (at the location where the UDP header would be)?
 > 
 > 
 > 
 > Also, I've looked at other drivers, and they all look at the CSUM_* flags to 
 > determine the right settings.  IP fragments will not have CSUM_UDP or 
 CSUM_TCP set, so I think the more correct fix is this:
 > 
 > 
 > 
 > Index: en_tx.c
 > 
 > ===
 > 
 > --- en_tx.c   (revision 253470)
 > 
 > +++ en_tx.c(working copy)
 > 
 > @@ -780,8 +780,12 @@ retry:
 > 
 >tx_desc->ctrl.srcrb_flags = 
 > cpu_to_be32(MLX4_WQE_CTRL_CQ_UPDATE |
 > 
 >  
 >   MLX4_WQE_CTRL_SOLICITED);
 > 
 >if (mb->m_pkthdr.csum_flags & (CSUM_IP|CSUM_TCP|CSUM_UDP)) {
 > 
 > -  tx_desc->ctrl.srcrb_flags |= 
 > cpu_to_be32(MLX4_WQE_CTRL_IP_CSUM |
 > 
 > -
 >   MLX4_WQE_CTRL_TCP_UDP_CSUM);
 > 
 > + if (mb->m_pkthdr.csum_flags & CSUM_IP)
 > 
 > + tx_desc->ctrl.srcrb_flags |=
 > 
 > + 
 > cpu_to_be32(MLX4_WQE_CTRL_IP_CSUM);
 > 
 > + if (mb->m_pkthdr.csum_flags & 
 > (CSUM_TCP|CSUM_UDP))
 > 
 > +     tx_desc->ctrl.srcrb_flags |=
 > 
 > + 
 > cpu_to_be32(MLX4_WQE_CTRL_TCP_UDP_CSUM);
 > 
 >priv->port_stats.tx_chksum_offload++;
 > 
 >}
 > 
 > 
 > 
 > --
 > 
 > John Baldwin
 > 
 
 -- 
 John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: kern/180791: [ofed] [patch] Kernel crash on ifdown and kldunload mlxen

2013-07-25 Thread John Baldwin

The following reply was made to PR kern/180791; it has been noted by GNATS.

From: John Baldwin 
To: bug-follo...@freebsd.org,
 shah...@mellanox.com
Cc:  
Subject: Re: kern/180791: [ofed] [patch] Kernel crash on ifdown and kldunload 
mlxen
Date: Thu, 25 Jul 2013 15:18:06 -0400

 Thanks.  One note is that it seems like the state_lock should be held when
 stop is called.  Also, the callout used for stats should be drained during
 detach.  (If you use callot_init_mtx() instead of callout_init() then
 callout_stop() under the lock is race-free, though I'd still do the
 callout_drain() during detach to be safe.)
 
 Also, I had some other changes I made to this file to make the locking more
 consistent with other NIC drivers in the tree.  Can you look at this and
 test this patch please?
 
 Index: en_netdev.c
 ===
 --- en_netdev.c(revision 253547)
 +++ en_netdev.c(working copy)
 @@ -495,11 +495,6 @@ static void mlx4_en_do_get_stats(struct work_struc
  
queue_delayed_work(mdev->workqueue, &priv->stats_task, 
STATS_DELAY);
}
 -  if (mdev->mac_removed[MLX4_MAX_PORTS + 1 - priv->port]) {
 -  panic("mlx4_en_do_get_stats: Unexpected mac removed for %d\n",
 -  priv->port);
 -  mdev->mac_removed[MLX4_MAX_PORTS + 1 - priv->port] = 0;
 -  }
mutex_unlock(&mdev->state_lock);
  }
  
 @@ -688,8 +683,8 @@ int mlx4_en_start_port(struct net_device *dev)
mlx4_en_set_multicast(dev);
  
/* Enable the queues. */
 -  atomic_clear_int(&dev->if_drv_flags, IFF_DRV_OACTIVE);
 -  atomic_set_int(&dev->if_drv_flags, IFF_DRV_RUNNING);
 +  dev->if_drv_flags &= ~IFF_DRV_OACTIVE;
 +  dev->if_drv_flags |= IFF_DRV_RUNNING;
  
callout_reset(&priv->watchdog_timer, MLX4_EN_WATCHDOG_TIMEOUT,
mlx4_en_watchdog_timeout, priv);
 @@ -761,7 +756,7 @@ void mlx4_en_stop_port(struct net_device *dev)
  
callout_stop(&priv->watchdog_timer);
  
 -  atomic_clear_int(&dev->if_drv_flags, IFF_DRV_RUNNING);
 +  dev->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
  }
  
  static void mlx4_en_restart(struct work_struct *work)
 @@ -802,19 +797,30 @@ mlx4_en_init(void *arg)
  {
struct mlx4_en_priv *priv;
struct mlx4_en_dev *mdev;
 +
 +  priv = arg;
 +  mdev = priv->mdev;
 +  mutex_lock(&mdev->state_lock);
 +  mlx4_en_init_locked(priv);
 +  mutex_unlock(&mdev->state_lock);
 +}
 +
 +static void
 +mlx4_en_init_locked(struct mlx4_en_priv *priv)
 +{
 +
 +  struct mlx4_en_dev *mdev;
struct ifnet *dev;
int i;
  
 -  priv = arg;
dev = priv->dev;
mdev = priv->mdev;
 -  mutex_lock(&mdev->state_lock);
if (dev->if_drv_flags & IFF_DRV_RUNNING)
mlx4_en_stop_port(dev);
  
if (!mdev->device_up) {
en_err(priv, "Cannot open - device down/disabled\n");
 -  goto out;
 +  return;
}
  
/* Reset HW statistics and performance counters */
 @@ -835,9 +841,6 @@ mlx4_en_init(void *arg)
mlx4_en_set_default_moderation(priv);
if (mlx4_en_start_port(dev))
en_err(priv, "Failed starting port:%d\n", priv->port);
 -
 -out:
 -  mutex_unlock(&mdev->state_lock);
  }
  
  void mlx4_en_free_resources(struct mlx4_en_priv *priv)
 @@ -927,9 +930,14 @@ void mlx4_en_destroy_netdev(struct net_device *dev
if (priv->sysctl)
sysctl_ctx_free(&priv->conf_ctx);
  
 +  mutex_lock(&mdev->state_lock);
 +  mlx4_en_stop_port(dev);
 +  mutex_unlock(&mdev->state_lock);
 +
cancel_delayed_work(&priv->stats_task);
/* flush any pending task for this netdev */
flush_workqueue(mdev->workqueue);
 +  callout_drain(&priv->watchdog_timer);
  
/* Detach the netdev so tasks would not attempt to access it */
mutex_lock(&mdev->state_lock);
 @@ -1091,25 +1099,25 @@ static int mlx4_en_ioctl(struct ifnet *dev, u_long
error = -mlx4_en_change_mtu(dev, ifr->ifr_mtu);
break;
case SIOCSIFFLAGS:
 +  mutex_lock(&mdev->state_lock);
if (dev->if_flags & IFF_UP) {
 -  if ((dev->if_drv_flags & IFF_DRV_RUNNING) == 0) {
 -  mutex_lock(&mdev->state_lock);
 +  if ((dev->if_drv_flags & IFF_DRV_RUNNING) == 0)
mlx4_en_start_port(dev);
 -  mutex_unlock(&mdev->state_lock);
 -  } else
 +  else
mlx4_en_set_multicast(dev);

Re: kern/180430: [ofed] [patch] Bad UDP checksum calc for fragmented packets

2013-08-07 Thread John Baldwin

On Monday, August 05, 2013 6:49:01 am Meny Yossefi wrote:
> John, 
> 
> Will this be committed to 9.2 as well ?

Yes, I committed it yesterday.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Options to monitor/sniff network traffic under a vm

2013-08-27 Thread John Nielsen

On Aug 25, 2013, at 5:38 AM, carlopmart  wrote:

> I need to monitor/sniff network traffic for three subnets (1 GiB nets) and I 
> need to do this using a virtual guest under an ESXi 5 host (yes, it is a 
> "handicap").

Not sure about your questions below, but doesn't ESXi 5 support port mirroring 
in the virtual switch? That seems like a better place to do most of the heavy 
lifting. You could still attach your FreeBSD instance to the monitor port(s) 
for analysis. That would hopefully help at least with a) by reducing the number 
of virtual NICs needed.

> I would like to use FreeBSD 8.4 + netmap, but I see some problems:
> 
> a) How can I avoid sharing interrupts for nics interfaces?? This vm needs to 
> use 6 nic interfaces.
> 
> b) Which is best: em or ixgb emulated drivers??
> 
> c) Is it a good idea to enable polling in these nics??

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: [rfc] migrate lagg to an rmlock

2013-08-29 Thread John Baldwin

On Saturday, August 24, 2013 10:16:33 am Robert Watson wrote:
> There are a number of other places in the kernel where migration to an rmlock 
> makes sense -- however, some care must be taken for four reasons: (1) while 
> read locks don't experience line contention, write locking becomes observably 
> e.g., rmlocks might not be suitable for tcbinfo; (2) rmlocks, unlike rwlocks, 
> more expensive so is not suitable for all rwlock line contention spots -- 
> implement reader priority propagation, so you must reason about; and (3) 
> historically, rmlocks have not fully implemented WITNESS so you may get less 
> good debugging output.  if_lagg is a nice place to use rmlocks, as 
> reconfigurations are very rare, and it's really all about long-term data 
> stability.

3) should no longer be an issue.  rmlocks now have full WITNESS and assertion
support (including an rm_assert).

However, one thing to consider is that rmlocks pin readers to CPUs while the
read lock is held (which rwlocks do not do).

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: [rfc] migrate lagg to an rmlock

2013-08-29 Thread John Baldwin

On Thursday, August 29, 2013 11:37:08 am Scott Long wrote:
> 
> On Aug 29, 2013, at 7:42 AM, John Baldwin  wrote:
> 
> > On Saturday, August 24, 2013 10:16:33 am Robert Watson wrote:
> >> There are a number of other places in the kernel where migration to an 
> >> rmlock 
> >> makes sense -- however, some care must be taken for four reasons: (1) 
> >> while 
> >> read locks don't experience line contention, write locking becomes 
> >> observably 
> >> e.g., rmlocks might not be suitable for tcbinfo; (2) rmlocks, unlike 
> >> rwlocks, 
> >> more expensive so is not suitable for all rwlock line contention spots -- 
> >> implement reader priority propagation, so you must reason about; and (3) 
> >> historically, rmlocks have not fully implemented WITNESS so you may get 
> >> less 
> >> good debugging output.  if_lagg is a nice place to use rmlocks, as 
> >> reconfigurations are very rare, and it's really all about long-term data 
> >> stability.
> > 
> > 3) should no longer be an issue.  rmlocks now have full WITNESS and 
> > assertion
> > support (including an rm_assert).
> > 
> > However, one thing to consider is that rmlocks pin readers to CPUs while the
> > read lock is held (which rwlocks do not do).
> 
> And this is not a problem for the application that we're giving it in the
> lagg driver.

That is likely true.  I was merely tweaking Robert's general guidelines re: 
rmlock.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Does pthread_set_name_np() work?

2013-09-03 Thread John Baldwin

On Wednesday, August 21, 2013 12:36:09 pm Laurie Jennings wrote:
> Im trying to set the names of threads so I can distinguish them in top -H, 
but it doesn't seem to 
> take the thread id as valid.
> 
> err=pthread_set_name_np(pthread_self(),"FOO");

This function returns void, not an error, so you can't trust the return value.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: QLE3142-CU-CK driver (NetXen NX3031 chipset)

2013-09-20 Thread John Baldwin

On Tuesday, September 17, 2013 12:59:44 pm Ryan McIntosh wrote:
> This particular chipset used in this card has been brought up in the past
> under threads about HP's NC375 controller. The outcome was that it needed
> information/assistance from Qlogic to develop a driver. I have 2 of the
> cards mentioned in the title (details below) and I've finally gotten a
> useful response from Qlogic about opening channels to assist FreeBSD with
> Qlogic hardware; however they've requested a FreeBSD developer's contact
> information to get in touch with if anyone is up for the challenge?

QLogic employs one FreeBSD developer already who maintains the qlxgb(4),
qlxge(4), and qlxgbe(4) drivers: David C Somayajulu .

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: LAN network performance issues

2014-03-07 Thread John Baldwin

On Friday, March 07, 2014 12:17:05 am jcv wrote:
> Hi - I am seeing some strange IPERF results.. Everything goes through my 
> WIFI/GIGABIT router.
> 
> For these tests everything is plugged directly into the router via 
> Ethernet cable.
> 
> My issue is the transfer rate from Windows to FreeBSD.
> 
> There are 3 different computers in this lab running 3 different OS.
> 
> Here are the results:
> 
> 
> 
> FreeBSD as server:
> 
> [vic@yeaguy ~] iperf -s
> 
> Server listening on TCP port 5001
> TCP window size: 64.0 KByte (default)
> 
> 
> 
> [  4] local 192.168.1.3 port 5001 connected with 192.168.1.8 port 52505
> [ ID] Interval   Transfer Bandwidth
> [  4]  0.0-10.1 sec   157 MBytes  131 Mbits/sec <- WINDOWS 8.1 as 
> client on same LAN/ROUTER
> 
> 
> 
> 
> [  5] local 192.168.1.3 port 5001 connected with 192.168.1.12 port 60926
> [  5]  0.0-10.0 sec  1.10 GBytes   941 Mbits/sec <-- MACBOOK PRO as 
> client on same LAN/ROUTER
> 
> 
> Windows as the server:
> 
> 
> Server listening on TCP port 5001
> TCP window size: 64.0 KByte (default)
> 
> [  4] local 192.168.1.8 port 5001 connected with 192.168.1.3 port 60529
> [ ID] Interval   Transfer Bandwidth
> [  4]  0.0-10.0 sec  1014 MBytes   850 Mbits/sec <- Freebsd 10 as 
> client on same LAN/ROUTER
> 
> 
> 
> [  4] local 192.168.1.8 port 5001 connected with 192.168.1.12 port 60933
> [  4]  0.0-10.0 sec  1.08 GBytes   931 Mbits/sec <-- MACBOOK PRO as 
> client on same LAN/ROUTER
> 
> 
> 
> Macbook Pro as the server:
> 
> [  3] local 192.168.1.8 port 52509 connected with 192.168.1.12 port 5001
> [ ID] Interval   Transfer Bandwidth
> [  3]  0.0-10.0 sec   823 MBytes   690 Mbits/sec <-- WINDOWS 8.1 as 
> client on same LAN/ROUTER
> 
> [  3] local 192.168.1.3 port 23190 connected with 192.168.1.12 port 5001
> [ ID] Interval   Transfer Bandwidth
> [  3]  0.0-10.0 sec  1016 MBytes   852 Mbits/sec <-- Freebsd 10 as 
> client on same LAN/ROUTER
> 
> 
> With FreeBSD being the server, Windows transfer to FreeBSD is slow, 
> compared to Macbook to FreeBSD transfer..
> With Windows as the server, FreeBSD and Macbook to Windows transfer is 
> great.
> With Macbook as server, Windows and FreeBSD transfer is good.
> 
> The only bad transfer is Windows to FreeBSD. Windows transfer to Mac is 
> good. Cant really blame Windows for the poor transfer to FreeBSD then. 
> Macbook to FreeBSD is outstanding, cant really blame FreeBSD for poor 
> receive performance.

Can you tell us more about the FreeBSD box such as the NIC being used?

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Network troubles after 8.3 -> 8.4 upgrade

2014-04-17 Thread John Nielsen

On Apr 17, 2014, at 2:38 PM, Andrea Venturoli  wrote:

> Three days ago I upgraded an amd64 8.3 box to the latest 8.4.
> Since then the outside network is misbehaving: large mails are not sended 
> (although small ones do), svn operations will work for a while, then come to 
> a sudden stop, etc...
> Perhaps the most evident test is "wget"ting a big file: it will download some 
> chunk, halt; restart after a while and download another chunk; lose the 
> connection once again, then restart and so on.
> 
> I remember a couple of similar experiences in the past, from which I got out 
> by disabling TSO; however those box had fxp cards, while this has an em.
> In any case disabling TSO did not help.

My first thought was TSO as well, since I've seen the symptoms you describe a 
few times on systems running 10.0. Do you use IPFW or any kind of NAT on this 
system? When an application encounters a network problem, does it report or log 
anything at all? Anything in the kernel log/dmesg?

A bit of a shot in the dark, but could you try applying r264517 (fixes a 
problem with VLAN and TSO interaction)?
http://svnweb.freebsd.org/base/head/sys/net/if_vlan.c?r1=257241&r2=264517

Otherwise my only other thought would be the driver. Can you try reverting only 
the em(4) driver back to 8.3? If that helps it would give you both a workaround 
and a clue for where to look for a solution. Build modules and a kernel without 
em(4) from unmodified 8.4 src, load em(4) as a module, confirm that the problem 
persists. Replace the contents of src/sys/dev/e1000, src/sys/modules/em and 
src/sys/conf/files with those from an 8.3 src tree (or otherwise revert 
revision 247430), rebuild em module, unload/reload or reboot, see if problem 
goes away. (Could be somewhat complicated by the fact that you also have igb 
interfaces which also use code from the e1000 directory, but rather than 
speculate I'll leave solving that as an exercise for someone else.)

JN

> This is the relevant part of rc.conf:
>> cloned_interfaces="lagg0 vlan1 vlan2 vlan3 carp0 carp1 carp3 carp4 carp6 
>> carp7 carp9 carp10"
>> ifconfig_igb0="up"
>> ifconfig_igb1="up"
>> ifconfig_lagg0="laggproto lacp laggport igb0 laggport igb1 192.168.101.4 
>> netmask 255.255.255.0"
>> ifconfig_lagg0_alias0="inet 192.168.101.101 netmask 0x"
>> ifconfig_carp0="vhid 1 advskew 100 pass xxx 192.168.101.10"
>> ifconfig_carp1="vhid 2 pass  192.168.101.10"
>> ifconfig_em0="up"
>> ifconfig_vlan1="inet 81.174.30.11 netmask 255.255.255.248 vlan 4 vlandev em0"
>> ifconfig_vlan2="inet 83.211.188.186 netmask 255.255.255.248 vlan 2 vlandev 
>> em0"
>> ifconfig_vlan3="inet 192.168.2.202 netmask 255.255.255.0 vlan 3 vlandev em0"
>> ifconfig_carp3="vhid 4 advskew 100 pass  81.174.30.12"
>> ifconfig_carp4="vhid 5 pass xxx 81.174.30.12"
>> ifconfig_carp6="vhid 7 advskew 100 pass xx 83.211.188.187"
>> ifconfig_carp7="vhid 8 pass xxx 83.211.188.187"
>> ifconfig_carp9="vhid 10 advskew 100 pass  192.168.2.203"
>> ifconfig_carp10="vhid 11 pass  192.168.2.203"
>> ifconfig_lo0_alias0="inet 127.0.0.2 netmask 0x"
>> ifconfig_lo0_alias1="inet 127.0.0.3 netmask 0x"
>> ifconfig_lo0_alias2="inet 127.0.0.4 netmask 0x"
> 
> As you can see the setup is quite complicated, but worked like a charm until 
> the upgrade; actually the internal net (igb+lagg+carp) still does, so this is 
> what points me toward em0, where I cannot seem to get any kind of stability.
> 
> The card is
>> em0@pci0:6:0:0: class=0x02 card=0x10828086 chip=0x107d8086 rev=0x06 
>> hdr=0x00
>>vendor = 'Intel Corporation'
>>device = 'PRO/1000 PT'
>>class  = network
>>subclass   = ethernet
> 
> I tried disabling TSO, RXCSUM, TXCSUM, VLANHWTAG, VLANHWCSUM, VLANHWTSO...
> I tried putting the card into 10baseT/UTP  mode...
> I tried sysctl net.inet.tcp.tso=0...
> 
> None helped.
> 
> Maybe I'm barking up the wrong tree, but nothing is in the logs to help...
> 
> Nor did Google or wading through bug reports.
> 
> 
> 
> Now I could restore the dumps I made before upgrading to 8.4 (but I'd really 
> like to avoid this), try to upgrade even further to 9.2 (although this will 
> be a lot of work and I'm not looking forward to it as a shot in the dark), 
> drop in another NIC...
> What I'd really like, however, is some insight.
> 
> Is this a known problem of some sort? Is this card or this driver known to be 
> broken?
> Is there any way I could get some debugging info?
> 
> Any hint is appreciated (and I need it badly :( !!!).
> 
> bye & Thanks
>   av.
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> 

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "f

Re: kern/183970: [ofed] [vlan] [panic] mellanox drivers and vlan usage causes kernel panic and reboot

2014-04-20 Thread John Jasen

I've not checked 9.2, but the 181931 patch has been applied to
10-0-release, and it also fixes my problem.



On 04/19/2014 10:14 PM, lini...@freebsd.org wrote:
> Old Synopsis: mellenox drivers and vlan usage causes kernel panic and reboot
> New Synopsis: [ofed] [vlan] [panic] mellanox drivers and vlan usage causes 
> kernel panic and reboot
> 
> State-Changed-From-To: open->open
> State-Changed-By: linimon
> State-Changed-When: Sun Apr 20 01:48:45 UTC 2014
> State-Changed-Why: 
> reclassify and assign.
> 
> 
> Responsible-Changed-From-To: freebsd-bugs->freebsd-net
> Responsible-Changed-By: linimon
> Responsible-Changed-When: Sun Apr 20 01:48:45 UTC 2014
> Responsible-Changed-Why: 
> 
> http://www.freebsd.org/cgi/query-pr.cgi?pr=183970
> 

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Patches for BOOTP/DHCP code to support Windows Server DHCP

2014-05-31 Thread John Howie

Hi all,

I apologize for the cross posting of this email, but I believe it will be
of interest to people across all three groups. Please feel free to forward
to additional groups if you feel they would benefit.

I have seen a few posts on and off over the years about Windows Server
DHCP not working with FreeBSD. Specifically, the Windows Server DHCP would
not return the boot host name/IP address and the root path. The typical
response to ³why won¹t it work" has typically been that there is a flaw in
Windows Server DHCP code. I did a little digging and found that the
problem actually lies in code in FreeBSD.

Section 3.5 of RFC 2131 (the DHCP RFC) states that ³...Second, in its
initial DHCPDISCOVER or DHCPREQUEST message, a client may provide the
server with a list of specific parameters the client is interested inŠ²
and ³...The client can inform the server which configuration parameters
the client is interested in by including the 'parameter request list'
option.  The data portion of this option explicitly lists the options
requested by tag numberŠ². A DHCP Server is not required to return any
parameter that a client does not ask for. It appears that the ISC-DHCP
server, which is recommended by most, will return configured options
regardless of whether or not the client asks for them.

There are two places in the FreeBSD codebase that DHCP is used to boot the
system over a network. The first is in the boot loader, which uses code in
lib/libstand. The second is in the NFS code, in sys/nfs. The code is
sys/nfs is not always used if the boot loader sets up the environment for
the NFS code, either by passing parameters to the kernel (as PXEBOOT
appears to do), or information is configured in the boot loader
configuration files, e.g. /boot/loader.rc.

I have attached two unified diff files which I ask people to test, before
I submit them for inclusion into the codebase as fixes. The first,
bootp.c.diff fixes the code in lib/libstand/bootp.c to request the boot
host (option 12, aka TAG_HOSTNAME) and the NFS root path (option 17, aka
TAG_ROOTPATH). This fix has been tested with PXEBOOT on an amd64 box and
ubldr on an ARM/RaspberryPI system. The second, bootp_subr.c.diff, fixes
code in sys/nfs/bootp_subr.c to request the same options and also to fix
bugs that erroneously reported the IP address of the BOOTP/DHCP server.
The code assumed that the BOOTP/DHCP server was also the boot host. Please
send me all feedback directly.

The diff files work with 10.0-RELEASE through 10.0-RELEASE-p3, but will
likely work with 9.0 and also CURRENT and STABLE, including 11.0, as the
code is old code that does not appear to have changed in a  while. If you
want to try it on those systems please, please make sure you have backup
copies just in case.

If you do not have experience configuring Windows Server DHCP just drop me
an email, and I will send you a cheat sheet to get you up and running.

I am going to grab the latest ubldr code to see if I can get it to work
more like PXEBOOT, that appears to pass parameters to the kernel to avoid
the need for the NFS BOOTP/DHCP process. If you test on an ARM system with
ubldr in RELEASE you will see a lot of unnecessary network activity going
on, that we should be able to fix.

Regards,

John

j...@thehowies.com (personal) | jho...@email.arizona.edu (academic) |
j.ho...@napier.ac.uk (academic) | jho...@cloudsecurityalliance.org (work)





bootp_subr.c.diff
Description: bootp_subr.c.diff


bootp.c.diff
Description: bootp.c.diff
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Patches for BOOTP/DHCP code to support Windows Server DHCP

2014-06-01 Thread John Howie

Hi Steinar,

In short, no, I have no packet traces. Given that the DHCP code in the
FreeBSD boot loader and NFS subsystem does not request those options, but
that ISC-DHCP does provide them, I will go out on a limb and say that it
must be serving them without being asked if they are configured.

Regards,

John


On 6/1/14, 1:24 PM, "sth...@nethelp.no"  wrote:

>> Section 3.5 of RFC 2131 (the DHCP RFC) states that "...Second, in its
>> initial DHCPDISCOVER or DHCPREQUEST message, a client may provide the
>> server with a list of specific parameters the client is interested in"
>> and "...The client can inform the server which configuration parameters
>> the client is interested in by including the 'parameter request list'
>> option."  The data portion of this option explicitly lists the options
>> requested by tag number. A DHCP Server is not required to return any
>> parameter that a client does not ask for. It appears that the ISC-DHCP
>> server, which is recommended by most, will return configured options
>> regardless of whether or not the client asks for them.
>
>As far as I know this is wrong. ISC DHCP does *not* behave this way.
>Do you have packet sniffer traces to indicate oterwise?
>
>In any case - yes, the client should absolutely request all the
>parameters it wants.
>
>Steinar Haug, Nethelp consulting, sth...@nethelp.no

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Patches for BOOTP/DHCP code to support Windows Server DHCP

2014-06-01 Thread John Howie

Hi Steinar,

I could ask you to 'prove it', too, but I can easily check when I get back from 
my current travels :-)

It important to note that even if it does (as I think it does) it is NOT in 
violation of the RFC. The RFC simply says that if a client wants something it 
should ask for it, and not that a server cannot send the options unsolicited.

Best regards,

John

Sent from my iPhone

On Jun 1, 2014, at 19:30, "sth...@nethelp.no"  wrote:

>> In short, no, I have no packet traces. Given that the DHCP code in the
>> FreeBSD boot loader and NFS subsystem does not request those options, but
>> that ISC-DHCP does provide them, I will go out on a limb and say that it
>> must be serving them without being asked if they are configured.
> 
> In that case I'm afraid I must stand by my claim that you're wrong
> and ISC DHCP does *not* provide configured options unless the client
> asks for them.
> 
> (And I have copious amounts of packet sniffer traces to prove this.)
> 
> Not that this is particularly relevant to FreeBSD any more...
> 
> Steinar Haug, Nethelp consulting, sth...@nethelp.no
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Patches for BOOTP/DHCP code to support Windows Server DHCP

2014-06-01 Thread John Howie

Hi Rick,

That is an excellent point and a good debate to have.

I have not looked in detail at how PXEBOOT does it, but I think a clean up of 
the code to somehow pass arguments to the kernel is preferable to having a 
diskless client send a slew of needless requests to the DHCP server to request 
information already requested and obtained in previous stages of the boot 
process. The number of DHCP requests made by a client when using U-Boot and 
ubldr is dizzying. The NFS code will look to see if the boot loader 
configuration file has specified the root filesystem through use of 
vfs.root.mountfrom, so something should be possible.

Another area for possible attention is handling the scenario when there are a 
number of DHCP servers able to reply to requests. This can happen when you have 
multiple NICs on segments with DHCP Servers or where you have multiple servers 
on the same segment. The current code just grabs the first server to reply at 
each stage (going through each NIC in turn) and has affinity to it for the 
remainder of that stage but not through the entire boot process.

Regards,

John

Sent from my iPhone

> On Jun 1, 2014, at 19:01, "Rick Macklem"  wrote:
> 
> John Howie wrote:
>> Hi all,
>> 
>> I apologize for the cross posting of this email, but I believe it
>> will be
>> of interest to people across all three groups. Please feel free to
>> forward
>> to additional groups if you feel they would benefit.
>> 
>> I have seen a few posts on and off over the years about Windows
>> Server
>> DHCP not working with FreeBSD. Specifically, the Windows Server DHCP
>> would
>> not return the boot host name/IP address and the root path. The
>> typical
>> response to ³why won¹t it work" has typically been that there is a
>> flaw in
>> Windows Server DHCP code. I did a little digging and found that the
>> problem actually lies in code in FreeBSD.
>> 
>> Section 3.5 of RFC 2131 (the DHCP RFC) states that ³...Second, in its
>> initial DHCPDISCOVER or DHCPREQUEST message, a client may provide the
>> server with a list of specific parameters the client is interested
>> inŠ²
>> and ³...The client can inform the server which configuration
>> parameters
>> the client is interested in by including the 'parameter request list'
>> option.  The data portion of this option explicitly lists the options
>> requested by tag numberŠ². A DHCP Server is not required to return
>> any
>> parameter that a client does not ask for. It appears that the
>> ISC-DHCP
>> server, which is recommended by most, will return configured options
>> regardless of whether or not the client asks for them.
>> 
>> There are two places in the FreeBSD codebase that DHCP is used to
>> boot the
>> system over a network. The first is in the boot loader, which uses
>> code in
>> lib/libstand. The second is in the NFS code, in sys/nfs. The code is
>> sys/nfs is not always used if the boot loader sets up the environment
>> for
>> the NFS code, either by passing parameters to the kernel (as PXEBOOT
>> appears to do), or information is configured in the boot loader
>> configuration files, e.g. /boot/loader.rc.
>> 
>> I have attached two unified diff files which I ask people to test,
>> before
>> I submit them for inclusion into the codebase as fixes. The first,
>> bootp.c.diff fixes the code in lib/libstand/bootp.c to request the
>> boot
>> host (option 12, aka TAG_HOSTNAME) and the NFS root path (option 17,
>> aka
>> TAG_ROOTPATH). This fix has been tested with PXEBOOT on an amd64 box
>> and
>> ubldr on an ARM/RaspberryPI system. The second, bootp_subr.c.diff,
>> fixes
>> code in sys/nfs/bootp_subr.c to request the same options and also to
>> fix
>> bugs that erroneously reported the IP address of the BOOTP/DHCP
>> server.
>> The code assumed that the BOOTP/DHCP server was also the boot host.
>> Please
>> send me all feedback directly.
>> 
>> The diff files work with 10.0-RELEASE through 10.0-RELEASE-p3, but
>> will
>> likely work with 9.0 and also CURRENT and STABLE, including 11.0, as
>> the
>> code is old code that does not appear to have changed in a  while. If
>> you
>> want to try it on those systems please, please make sure you have
>> backup
>> copies just in case.
>> 
>> If you do not have experience configuring Windows Server DHCP just
>> drop me
>> an email, and I will send you a cheat sheet to get you up and
>> running.
>> 
>> I am going to grab the latest ubldr code to see if I can get it to
>> work
>> more like PXE

recommendations on supported 40GbE adapters?

2014-06-10 Thread John Jasen

Hello -- frequent listener, first time caller. If this should be
redirected to another freebsd-* mailing list, please let me know.

I've been experimenting with multiple dual port Mellanox adapters in a
system, to see if the entire solution would be suitable for a router
replacement project I'm doing.

In general, I've been impressed with the Mellanox drivers and their
responsiveness, however, I have encountered a few issues where the cards
seem to 'fall asleep' or, when I'm playing with the sysctl settings for
the mlxen drivers, I panic the kernel.

Of course, before I finish spec'ing out a solution to be purchased, I
should solicit real-world feedback on the available 40GbE adapters and
drivers.

>From what I've gleaned, Mellanox, Emulex, and Chelsio all support the
development of FreeBSD drivers for their 40GbE cards. Are there any
other vendors I should be considering?

If anyone else has tried 40GbE cards, I am most interested in your
experiences -- especially in stability, performance and performance tuning.

Thanks in advance!

-- John Jasen (jja...@gmail.com)



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

em driver: netif hangs the system if interface is cabled and configured but there is no link

2014-06-12 Thread John Jasem

I'm configuring a system that's destined to be a multi-homed server,
using Intel dual port 1GbE cards that rely on the em driver.

em0 has link, and only needed configuration.

In an attempt to be ahead of the game, I pre-configured em2, plugged in
my side of the cable to be ready when the other side plugged theirs in,
and rebooted the box.

In this state, it appears the box will hang as netif works through
interfaces defined in rc.conf. I'm not sure if permanently, but I'm
willing to call 20 minutes 'permanent' for the purposes of this exercise.

I eventually was able to narrow it down to the configurations for em2,
and em2 not having link WHILE a cable was plugged in. I was able to
replicate the expected condition by unplugging the cable, and was able
to replicate the failure condition on em1 and em3 by moving the cable
and configurations.

Any thoughts? Am I missing something?

-- John Jasen (jja...@gmail.com)

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: em driver: netif hangs the system if interface is cabled and configured but there is no link

2014-06-12 Thread John Jasem


On 06/12/2014 01:02 PM, Andreas Nilsson wrote:



> If it is a dual port card, shouldn't it be em0 and em1 ?

Yes. I do have two dual port cards however.

-- John Jasen (jja...@gmail.com)
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

network.subr vlan handling broken

2014-06-19 Thread John Hay

Hi Guys,

freebsd-rc did not react, so I'm just checking on -net too.

I found after upgrading that vlan handling broke. I tried the following:

vlans_bce1="6"
ipv4_addrs_bce1_6="inet 10.239.100.2/24"
ifconfig_bce1_6_aliases="inet 10.239.100.2/24"
ifconfig_bce1_6_alias0="inet 10.239.100.2/24"

I traced it down to ifalias_af_common_handler being called with the
mangled interfcace name _if and it then calls ifconfig with it. Here
is my fix. Any reason not to commit it? My diff is against 10-stable,
but head looks the same.

#
--- /etc/network.subr.orig  2014-06-01 17:30:38.0 +
+++ /etc/network.subr   2014-06-01 18:03:08.030175024 +
@@ -1151,7 +1151,7 @@
inet|inet6|ipx|link|ether)
case $_tmpargs in
${_af}\ *)
-   eval ifalias_af_common_handler $_if $_af 
$_action $_tmpargs && _ret=0
+   eval ifalias_af_common_handler $1 $_af $_action 
$_tmpargs && _ret=0
;;
esac
_tmpargs=$_c
@@ -1163,7 +1163,7 @@
# Process the last component
case $_tmpargs in
${_af}\ *)
-   ifalias_af_common_handler $_if $_af $_action $_tmpargs && _ret=0
+   ifalias_af_common_handler $1 $_af $_action $_tmpargs && _ret=0
;;
esac
 
#

While looking through the code I saw that ltr is called with different
styling. Is there a reason for it? Which is the prefered style?

ltr ${_if} "${_punct}" '_' _if
    ltr "$_if" "$_punct" "_" _if

My own preference would be the first.

Regards

John
-- 
John Hay -- j...@meraka.csir.co.za / j...@meraka.org.za
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: network.subr vlan handling broken

2014-06-20 Thread John Hay

Hi Hiroki,

On Fri, Jun 20, 2014 at 12:48:03PM +0900, Hiroki Sato wrote:
> John Hay  wrote
>   in <20140619103513.ga92...@zibbi.meraka.csir.co.za>:
> 
> jh> Hi Guys,
> jh>
> jh> freebsd-rc did not react, so I'm just checking on -net too.
> jh>
> jh> I found after upgrading that vlan handling broke. I tried the following:
> jh>
> jh> vlans_bce1="6"
> jh> ipv4_addrs_bce1_6="inet 10.239.100.2/24"
> jh> ifconfig_bce1_6_aliases="inet 10.239.100.2/24"
> jh> ifconfig_bce1_6_alias0="inet 10.239.100.2/24"
> jh>
> jh> I traced it down to ifalias_af_common_handler being called with the
> jh> mangled interfcace name _if and it then calls ifconfig with it. Here
> jh> is my fix. Any reason not to commit it? My diff is against 10-stable,
> jh> but head looks the same.
> 
>  Can you try the attached patch?  It seemed broken after list_vars()
>  was introduced.  Replacing $_if with $1 also fixes it, but $_if
>  should be used for ifname as the other parts do.

I have tested ipv4 and ipv6 cases and it seems ok:

##
vlans_re1="6 7"
ipv4_addrs_re1_6="inet 10.254.254.253/24"
ifconfig_re1_6_aliases="inet 10.254.254.254/24"
ifconfig_re1_6_ipv6="inet6 accept_rtadv"
ifconfig_re1_7_ipv6="inet6 fd99:6829:597c:2::1 prefixlen 64"

root@angel:/etc # ifconfig re1.6
re1.6: flags=8843 metric 0 mtu 1500
options=3
ether 90:2b:34:df:ae:c4
inet6 fe80::922b:34ff:fedf:aec4%re1.6 prefixlen 64 scopeid 0x4 
inet 10.254.254.254 netmask 0xff00 broadcast 10.254.254.255 
inet 10.254.254.253 netmask 0xff00 broadcast 10.254.254.255 
nd6 options=23
media: Ethernet autoselect (none)
status: no carrier
vlan: 6 parent interface: re1
root@angel:/etc # ifconfig re1.7
re1.7: flags=8843 metric 0 mtu 1500
options=3
ether 90:2b:34:df:ae:c4
inet6 fd99:6829:597c:2::1 prefixlen 64 
inet6 fe80::922b:34ff:fedf:aec4%re1.7 prefixlen 64 scopeid 0x5 
nd6 options=21
media: Ethernet autoselect (none)
status: no carrier
vlan: 7 parent interface: re1
root@angel:/etc #
##

Thanks

John

> 
> jh> While looking through the code I saw that ltr is called with different
> jh> styling. Is there a reason for it? Which is the prefered style?
> jh>
> jh>   ltr ${_if} "${_punct}" '_' _if
> jh>   ltr "$_if" "$_punct" "_" _if
> 
>  I do not think there is a reason.
> 
>  I think there is no consensus about the style but I am using {} only
>  when boundary between the variable name and the subsequent characters
>  can be ambiguous.
> 
> -- Hiroki

> Index: network.subr
> ===
> --- network.subr  (revision 267636)
> +++ network.subr  (working copy)
> @@ -1077,7 +1077,7 @@
>  ifalias_af_common()
>  {
>   local _ret _if _af _action alias ifconfig_args _aliasn _c _tmpargs _iaf
> - local _punct=".-/+"
> + local _vif _punct=".-/+"
> 
>   _ret=1
>   _aliasn=
> @@ -1086,11 +1086,11 @@
>   _action=$3
> 
>   # Normalize $_if before using it in a pattern to list_vars()
> - ltr "$_if" "$_punct" "_" _if
> + ltr "$_if" "$_punct" "_" _vif
> 
>   # ifconfig_IF_aliasN which starts with $_af
> - for alias in `list_vars ifconfig_${_if}_alias[0-9]\* |
> - sort_lite -nk1.$((9+${#_if}+7))`
> + for alias in `list_vars ifconfig_${_vif}_alias[0-9]\* |
> + sort_lite -nk1.$((9+${#_vif}+7))`
>   do
>   eval ifconfig_args=\"\$$alias\"
>   _iaf=
> @@ -1118,8 +1118,8 @@
>   # backward compatibility: ipv6_ifconfig_IF_aliasN.
>   case $_af in
>   inet6)
> - for alias in `list_vars ipv6_ifconfig_${_if}_alias[0-9]\* |
> - sort_lite -nk1.$((14+${#_if}+7))`
> + for alias in `list_vars ipv6_ifconfig_${_vif}_alias[0-9]\* |
> + sort_lite -nk1.$((14+${#_vif}+7))`
>   do
>   eval ifconfig_args=\"\$$alias\"
>   case ${_action}:"${ifconfig_args}" in
> @@ -1129,7 +1129,7 @@
>   alias:*)
>   _aliasn="${_aliasn} inet6 ${ifconfig_args}"
>   warn "\$${alias} is obsolete. " \
> - "Use ifconfig_$1_aliasN instead."
> + "Use ifconfig_${_vif}_aliasN instead."
>   ;;
>   esac
>   done
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Ordering problem in if_detach_internal regarding if_bridge

2014-06-23 Thread John Baldwin

On Friday, June 20, 2014 11:25:51 am Roger Pau Monné wrote:
> Hello,
> 
> I've stumbled across the following panic when testing Xen netback with 
> if_bridge:
> 
> Kernel page fault with the following non-sleepable locks held:
> exclusive sleep mutex if_bridge (if_bridge) r = 0 (0xf80006306c18) 
locked @ /usr/src/sys/m
> KDB: stack backtrace:
> X_db_symbol_values() at X_db_symbol_values+0x10b/frame 0xfe213490
> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe213540
> witness_warn() at witness_warn+0x4a8/frame 0xfe213600
> trap() at trap+0xc9d/frame 0xfe2136a0
> trap() at trap+0x669/frame 0xfe2138b0
> calltrap() at calltrap+0x8/frame 0xfe2138b0
> --- trap 0xc, rip = 0x8221a0ef, rsp = 0xfe213970, rbp = 
0xfe2139e0 ---
> bridge_input() at bridge_input+0x5ff/frame 0xfe2139e0
> ether_vlanencap() at ether_vlanencap+0x4a3/frame 0xfe213a10
> netisr_dispatch_src() at netisr_dispatch_src+0x90/frame 0xfe213a80
> ether_ifattach() at ether_ifattach+0x19f/frame 0xfe213ab0
> ath_dfs_get_thresholds() at ath_dfs_get_thresholds+0x81ce/frame 
0xfe213b30
> intr_event_execute_handlers() at intr_event_execute_handlers+0x93/frame 
0xfe213b70
> db_dump_intr_event() at db_dump_intr_event+0x796/frame 0xfe213bb0
> fork_exit() at fork_exit+0x84/frame 0xfe213bf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe213bf0
> --- trap 0, rip = 0, rsp = 0xfe213cb0, rbp = 0 ---
> 
> I've tracked this down to if_detach_internal setting ifp->if_addr to 
> NULL before calling EVENTHANDLER_INVOKE(ifnet_departure_event..., which 
> causes a panic in GRAB_OUR_PACKETS in the if_bridge code when it tries 
> to perform IF_LLADDR on an interface that's in the process of being 
> destroyed (ifp->if_addr set to NULL, but the ifnet_departure_event event 
> has not fired yet).
> 
> I have the following naive patch that moves the firing of the event 
> before if_addr is set to NULL, but I'm not familiar with the ordering 
> in if_detach_internal, so I'm not sure if this might cause problems in 
> other parts of the code, could someone familiar with the net stuff 
> comment on the best way to deal with it?

Hmmm, I have no idea if this is ok or not.  I do think the route message 
should go out at the same time as the devctl_notify() call however.  My guess 
is it is actually better to do this earlier so that we allow outside consumers
to detach from an interface before it is destroyed.  I'm not sure if it would
break things, but I would be tempted to move this even earlier right after it
is removed from the global ifnet list but before the taskqueue_drain, etc.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: MTU not regrowing?

2014-06-24 Thread John Hay

On Tue, Jun 24, 2014 at 08:43:28PM +0200, Andrea Venturoli wrote:
> Hello.
> 
> Today I experienced something weird (at least for me) on a 8.4 system:
> 
> _ the system had vlan3 interface, with default MTU (1500 bytes);
> _ "ping -D -s 1400 somehost" would work, but "ping -D -s 1500 somehost" 
> would yield "frag needed and DF set" (forgive me if the message is not 
> exact, I don't have it anymore);
> 
> _ to make some tests I reduced MTU size with "ifconfig vlan3 mtu 500";
> _ now, of course, "ping -D -s 400 somehost" would work, but "ping -D -s 
> 500 somehost" would yield "frag needed and DF set";
> 
> _ then I raised MTU again with "ifconfig vlan3 mtu 1500" (notice 
> ifconfig would actually report this as "mtu 1500" was shown);
> _ however the results were as before, i.e. "ping -D -s 400 somehost" 
> would work, but "ping -D -s 500 somehost" would yield "frag needed and 
> DF set";
> 
> _ no way I could ping with a packet bigger than 500 bytes until I rebooted.
> 
> Is this expected behaviour? Any way to get around this?

Do a "route get somehost" and see what mtu is returned. You might be
able to delete or tweak that route.

John
-- 
John Hay -- j...@meraka.csir.co.za / j...@meraka.org.za
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Ordering problem in if_detach_internal regarding if_bridge

2014-06-24 Thread John Baldwin

On Monday, June 23, 2014 1:12:54 pm Roger Pau Monné wrote:
> On 23/06/14 18:49, Alexander V. Chernikov wrote:
> > On 23.06.2014 20:39, Alexander V. Chernikov wrote:
> >> On 23.06.2014 19:32, John Baldwin wrote:
> >>> On Friday, June 20, 2014 11:25:51 am Roger Pau Monné wrote:
> >>>> Hello,
> >>>>
> >>>> I've stumbled across the following panic when testing Xen netback with 
> >>>> if_bridge:
> >>>>
> >>>> Kernel page fault with the following non-sleepable locks held:
> >>>> exclusive sleep mutex if_bridge (if_bridge) r = 0 (0xf80006306c18) 
> >>> locked @ /usr/src/sys/m
> >>>> KDB: stack backtrace:
> >>>> X_db_symbol_values() at X_db_symbol_values+0x10b/frame 0xfe213490
> >>>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe213540
> >>>> witness_warn() at witness_warn+0x4a8/frame 0xfe213600
> >>>> trap() at trap+0xc9d/frame 0xfe2136a0
> >>>> trap() at trap+0x669/frame 0xfe2138b0
> >>>> calltrap() at calltrap+0x8/frame 0xfe2138b0
> >>>> --- trap 0xc, rip = 0x8221a0ef, rsp = 0xfe213970, rbp = 
> >>> 0xfe2139e0 ---
> >>>> bridge_input() at bridge_input+0x5ff/frame 0xfe2139e0
> >>>> ether_vlanencap() at ether_vlanencap+0x4a3/frame 0xfe213a10
> >>>> netisr_dispatch_src() at netisr_dispatch_src+0x90/frame 
> >>>> 0xfe213a80
> >>>> ether_ifattach() at ether_ifattach+0x19f/frame 0xfe213ab0
> >>>> ath_dfs_get_thresholds() at ath_dfs_get_thresholds+0x81ce/frame 
> >>> 0xfe213b30
> >>>> intr_event_execute_handlers() at intr_event_execute_handlers+0x93/frame 
> >>> 0xfe213b70
> >>>> db_dump_intr_event() at db_dump_intr_event+0x796/frame 0xfe213bb0
> >>>> fork_exit() at fork_exit+0x84/frame 0xfe213bf0
> >>>> fork_trampoline() at fork_trampoline+0xe/frame 0xfe213bf0
> >>>> --- trap 0, rip = 0, rsp = 0xfe213cb0, rbp = 0 ---
> >>>>
> >>>> I've tracked this down to if_detach_internal setting ifp->if_addr to 
> >>>> NULL before calling EVENTHANDLER_INVOKE(ifnet_departure_event..., which 
> >>>> causes a panic in GRAB_OUR_PACKETS in the if_bridge code when it tries 
> >>>> to perform IF_LLADDR on an interface that's in the process of being 
> >>>> destroyed (ifp->if_addr set to NULL, but the ifnet_departure_event event 
> >>>> has not fired yet).
> >>>>
> >>>> I have the following naive patch that moves the firing of the event 
> >>>> before if_addr is set to NULL, but I'm not familiar with the ordering 
> >>>> in if_detach_internal, so I'm not sure if this might cause problems in 
> >>>> other parts of the code, could someone familiar with the net stuff 
> >>>> comment on the best way to deal with it?
> >>
> >> We should notify kernel customers only when we are really taking this
> >> interface down and every other subsystem cannot add any new state to the
> >> interface.
> >>
> >> In this patch you're sending notification before taking ifnet down,
> >> removing its L3 addresses, routes, and so on.
> >>
> >> This can easily lead to panic in, for example, BPF subsystem (since BPF
> >> state is freed in bpf_ifdetach() handler).
> >>
> >> Addintionally, this will introduce ifaddr / iface messages reversal for
> >> rtsock.
> > Whoops. I misread the patch.
> > It should be OK.
> > 
> >>
> >> It looks like we'd better fix if_bridge (and it is still using mutexes,
> >> what a shame!).
> >>
> >> Can you send me trace with line numbers?
> > However, these two still stands.
> > (And I'm wondering how you're getting any traffic on down/dying interface).
> 
> I'm not getting the traffic from the dying interface, I'm getting the
> traffic from another interface on the bridge (a physical bce interface),
> which injects traffic into the bridge, that calls bridge_input, which
> tries to read ifp->if_addr->ifa_addr from the dying interface, and that
> leads to the panic.
> 
> Line numbers:
> 
> /usr/src/sys/modules/if_bridge/../../net/if_bridge.c:2410 (bridge_input)
> /usr/src/sys/net/if_ethersubr.c:543 (ether_input_internal)
> /usr/src/sys/net/netisr.c:972 (netisr_dispatch_src)
> /usr/src/sys/net/if_ethersubr.c:674 (ether_input)
> /usr/src/sys/dev/bce/if_bce.c:6861 (bce_rx_intr)

I think this certainly suggests moving at least the eventhandler up so that
things like vlans and bridges can detach from an interface while it is still
constructed.  I do think it would be ideal to move all three notifications
to the same place though.  (So your original patch plus moving the
routing socket message.)

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: NFS client READ performance on -current

2014-07-10 Thread John Baldwin

On Thursday, July 03, 2014 8:51:01 pm Rick Macklem wrote:
> Russell L. Carter wrote:
> > 
> > 
> > On 07/02/14 19:09, Rick Macklem wrote:
> > 
> > > Could you please post the dmesg stuff for the network interface,
> > > so I can tell what driver is being used? I'll take a look at it,
> > > in case it needs to be changed to use m_defrag().
> > 
> > em0:  port 0xd020-0xd03f
> > mem
> > 0xfe4a-0xfe4b,0xfe48-0xfe49 irq 44 at device 0.0 on
> > pci2
> > em0: Using an MSI interrupt
> > em0: Ethernet address: 00:15:17:bc:29:ba
> > 001.07 [2323] netmap_attach success for em0 tx 1/1024
> > rx
> > 1/1024 queues/slots
> > 
> > This is one of those dual nic cards, so there is em1 as well...
> > 
> Well, I took a quick look at the driver and it does use m_defrag(), but
> I think that the "retry:" label it does a goto after doing so might be in
> the wrong place.
> 
> The attached untested patch might fix this.
> 
> Is it convenient to build a kernel with this patch applied and then try
> it with TSO enabled?
> 
> rick
> ps: It does have the transmit segment limit set to 32. I have no idea if
> this is a hardware limitation.

I think the retry is not in the wrong place, but the overhead of all those
pullups is apparently quite severe.  It would be interesting to test the
following in addition to your change to see if it improves performance
further:

Index: if_em.c
===
--- if_em.c (revision 268495)
+++ if_em.c (working copy)
@@ -1959,7 +1959,9 @@ retry:
if (error == EFBIG && remap) {
struct mbuf *m;
 
-   m = m_defrag(*m_headp, M_NOWAIT);
+   m = m_collapse(*m_headp, M_NOWAIT, EM_MAX_SCATTER);
+   if (m == NULL)
+   m = m_defrag(*m_headp, M_NOWAIT);
if (m == NULL) {
adapter->mbuf_alloc_failed++;
m_freem(*m_headp);


-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: r256920 missing in stable/9 and releng/9.3

2014-07-10 Thread John Baldwin

On Monday, July 07, 2014 12:58:53 pm hiren panchasara wrote:
> + freebsd-net@,
> 
> On Mon, Jul 7, 2014 at 4:37 AM, Harald Schmalzbauer
>  wrote:
> > Bezüglich Jan Mikkelsen's Nachricht vom 24.06.2014 04:49 (localtime):
> >> Hi,
> >>
> >> I’m bringing 9.3-RC1 into our local Perforce depot and moving our local 
patches to 9.2 forward.
> >>
> >> I noticed that r256920 (changing sys/netinet/tcp_input.c) has not been 
MFC’d. It was listed as “MFC after 3 days” back in October 2013.
> >>
> >> Is this patch missing for a reason?
> >
> > I'm wondering too if there's any good reason not to MFC?
> 
> I also don't see any obvious reason.
> 
> If nobody objects on -net@, I can do it.

I think this looks fine to merge.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: System Booting Kernel from Secondary Drive

2014-07-10 Thread John Baldwin

On Thursday, July 03, 2014 10:06:21 am Laurie Jennings via freebsd-net wrote:
> I'm having a problem with a Supermicro system running FreeBSD 9.1. Sometimes 
when I upgrade the kernel in my main drive (ada0), 
> the system boots the kernel from the 2nd drive. It only happens sometimes. 
ada0 is mounted. but the system is running the old kernel. 
> Pulling the 2nd fixed the problem.
> 
> What can cause this to happen? Is it a supermicro problem (it's a 5017R-MTF 
superserver) or is it something with FreeBSD. 

Are you using a software RAID between the two disks?

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Add netbw option to systat

2014-07-10 Thread John Baldwin

On Wednesday, July 02, 2014 8:54:41 pm hiren panchasara wrote:
> On Wed, Jul 2, 2014 at 4:50 PM, Bryan Venteicher
>  wrote:
> > Awhile back, DragonlyFlyBSD added a netbw option to systat that I've 
ported
> > to FreeBSD and found handy at various times:
> >
> >netbw  Display aggregate and per-connection TCP receive and 
transmit
> >   rates.  Only active TCP connections are shown.
> >
> > Leading to output such as:
> >
> > tcp acceptsconnects rcv 1.192G snd 15.77K rexmit
> >
> >   192.168.10.80:22  192.168.10.20:23103 rcvsnd 415.7  [  NTSX 
]
> >   192.168.10.80:22  192.168.10.20:46560 rcv 19.80M snd 14.47K [  NTSX 
]
> >   192.168.10.80:22  192.168.10.20:60699 rcvsnd 886.3  [  NTSX 
]
> >   192.168.10.81:5201192.168.10.51:60844 rcv 293.2M snd[R  TSX 
]
> >   192.168.10.81:5201192.168.10.51:60845 rcv 293.5M snd[R  TSX 
]
> >   192.168.10.81:5201192.168.10.51:60846 rcv 293.2M snd[R  TSX 
]
> >   192.168.10.81:5201192.168.10.51:60847 rcv 292.9M snd[R  TSX 
]
> >
> > It uses the sequences number from the 'struct tcpcb' to derive the rates,
> > which is usually good but certainly not perfect (i.e., don't set the
> > interval too long).
> >
> > I'd like to commit this if anybody else thinks they'd find it useful.
> >
> > http://people.freebsd.org/~bryanv/patches/systat-netbw.patch
> 
> I like the idea.

I also like the idea.

> A few things about the patch:
> 1) You may want to remove the code hidden behind "#if 0" at 2 places.
> 2) I am not entirely clear on why/if we need the last column with
> flags but if we keep it (for compatibility of any other reason), It
> would be nice to have those flags explained in the manpage:
> 
> + mvwprintw(wnd, LINES-2, 0,
> +  "Rate/sec, "
> +  "R=rxpend T=txpend N=nodelay T=tstmp "
> +  "S=sack X=winscale F=fastrec");
> 3) I feel that the header line for o/p (specially 'tcp accepts and
> connects' terminology) can be improved but I do not have a better
> suggestion :-)

4) Should numtok() just be humanize_number?  Or rather, would it simplify
   the code to use humanize_number?  (It might not, but if it does, I
   think that would be preferable.)

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: NFS client READ performance on -current

2014-07-11 Thread John Baldwin

On Thursday, July 10, 2014 6:31:43 pm Rick Macklem wrote:
> John Baldwin wrote:
> > On Thursday, July 03, 2014 8:51:01 pm Rick Macklem wrote:
> > > Russell L. Carter wrote:
> > > > 
> > > > 
> > > > On 07/02/14 19:09, Rick Macklem wrote:
> > > > 
> > > > > Could you please post the dmesg stuff for the network
> > > > > interface,
> > > > > so I can tell what driver is being used? I'll take a look at
> > > > > it,
> > > > > in case it needs to be changed to use m_defrag().
> > > > 
> > > > em0:  port
> > > > 0xd020-0xd03f
> > > > mem
> > > > 0xfe4a-0xfe4b,0xfe48-0xfe49 irq 44 at device 0.0
> > > > on
> > > > pci2
> > > > em0: Using an MSI interrupt
> > > > em0: Ethernet address: 00:15:17:bc:29:ba
> > > > 001.07 [2323] netmap_attach success for em0 tx
> > > > 1/1024
> > > > rx
> > > > 1/1024 queues/slots
> > > > 
> > > > This is one of those dual nic cards, so there is em1 as well...
> > > > 
> > > Well, I took a quick look at the driver and it does use m_defrag(),
> > > but
> > > I think that the "retry:" label it does a goto after doing so might
> > > be in
> > > the wrong place.
> > > 
> > > The attached untested patch might fix this.
> > > 
> > > Is it convenient to build a kernel with this patch applied and then
> > > try
> > > it with TSO enabled?
> > > 
> > > rick
> > > ps: It does have the transmit segment limit set to 32. I have no
> > > idea if
> > > this is a hardware limitation.
> > 
> > I think the retry is not in the wrong place, but the overhead of all
> > those
> > pullups is apparently quite severe.
> The m_defrag() call after the first failure will just barely squeeze
> the just under 64K TSO segment into 32 mbuf clusters. Then I think any
> m_pullup() done during the retry will allocate an mbuf
> (at a glance it seems to always do this when the old mbuf is a cluster)
> and prepend that to the list.
> --> Now the list is > 32 mbufs again and the bus_dmammap_load_mbuf_sg()
> will fail again on the retry, this time fatally, I think?
> 
> I can't see any reason to re-do all the stuff using m_pullup() and Russell
> reported that moving the "retry:" fixed his problem, from what I understood.

Ah, I had assumed (incorrectly) that the m_pullup()s would all be nops in this
case.  It seems the NIC would really like to have all those things in a single
segment, but it is not required, so I agree that your patch is fine.

> >  It would be interesting to test
> > the
> > following in addition to your change to see if it improves
> > performance
> > further:
> > 
> > Index: if_em.c
> > ===
> > --- if_em.c (revision 268495)
> > +++ if_em.c (working copy)
> > @@ -1959,7 +1959,9 @@ retry:
> > if (error == EFBIG && remap) {
> > struct mbuf *m;
> >  
> > -   m = m_defrag(*m_headp, M_NOWAIT);
> > +   m = m_collapse(*m_headp, M_NOWAIT, EM_MAX_SCATTER);
> > +   if (m == NULL)
> > +   m = m_defrag(*m_headp, M_NOWAIT);
> Since a just under 64K TSO segment barely fits in 32 mbuf clusters,
> I'm at least 99% sure the m_collapse() will fail, but it can't hurt to
> try it. (If it supported 33 or 34, I think m_collapse() would have a
> reasonable chance of success.)
> 
> Right now the NFS and krpc code creates 2 small mbufs in front of the
> read/write data clusters and I think the TCP layer adds another one.
> Even if this was modified to put it all in one cluster, I don't think
> m_collapse() would succeed, since it only copies the data up and deletes
> an mbuf from the chain if it will all fit in the preceding one. Since
> the read/write data clusters are full (except the last one), they can't
> fit in the M_TRAILINGSPACE() of the preceding one unless it is empty
> from my reading of m_collapse().

Correct, ok.

-- 
John Baldwin
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

tuning routing using cxgbe and T580-CR cards?

2014-07-11 Thread John Jasem

In testing two Chelsio T580-CR dual port cards with FreeBSD 10-STABLE,
I've been able to use a collection of clients to generate approximately
1.5-1.6 million TCP packets per second sustained, and routinely hit
10GB/s, both measured by netstat -d -b -w1 -W (I usually use -h for the
quick read, accepting the loss of granularity).

While performance has so far been stellar, and I'm honestly speculating
I will need more CPU depth and horsepower to get much faster, I'm
curious if there is any gain to tweaking performance settings. I'm
seeing, under multiple streams, with N targets connecting to N servers,
interrupts on all CPUs peg at 99-100%, and I'm curious if tweaking
configs will help, or its a free clue to get more horsepower.

So, far, except for temporarily turning off pflogd, and setting the
following sysctl variables, I've not done any performance tuning on the
system yet.

/etc/sysctl.conf
net.inet.ip.fastforwarding=1
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.point_to_point=0
kern.random.sys.harvest.interrupt=0

a) One of the first things I did in prior testing was to turn
hyperthreading off. I presume this is still prudent, as HT doesn't help
with interrupt handling?

b) I briefly experimented with using cpuset(1) to stick interrupts to
physical CPUs, but it offered no performance enhancements, and indeed,
appeared to decrease performance by 10-20%. Has anyone else tried this?
What were your results?

c) the defaults for the cxgbe driver appear to be 8 rx queues, and N tx
queues, with N being the number of CPUs detected. For a system running
multiple cards, routing or firewalling, does this make sense, or would
balancing tx and rx be more ideal? And would reducing queues per card
based on NUMBER-CPUS and NUM-CHELSIO-PORTS make sense at all?

d) dev.cxl.$PORT.qsize_rxq: 1024 and dev.cxl.$PORT.qsize_txq: 1024.
These appear to not be writeable when if_cxgbe is loaded, so I speculate
they are not to be messed with, or are loader.conf variables? Is there
any benefit to messing with them?

e) dev.t5nex.$CARD.toe.sndbuf: 262144. These are writeable, but messing
with values did not yield an immediate benefit. Am I barking up the
wrong tree, trying?

f) based on prior experiments with other vendors, I tried tweaks to
net.isr.* settings, but did not see any benefits worth discussing. Am I
correct in this speculation, based on others experience?

g) Are there other settings I should be looking at, that may squeeze out
a few more packets?

Thanks in advance!

-- John Jasen (jja...@gmail.com)













___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

re: Network Intel X520-SR2 stopping

2014-07-11 Thread John Jasem

Marcelo;

I recently had a case where an Intel card was flapping, but using LR
transceivers. Turns out, the cable ends needed to be re-polished, as not
enough light was making it through to register transmit power.

You and the networking people may want to spend a few moments exploring
that path.

-- John Jasen (jja...@gmail.com)
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1016 matches

Mail list logo