Package: nftables
Version: 0.9.8-3.1+deb11u1

Possible further package: linux-image-amd64
Version: 5.10.162-1

OS: Debian bullseye, amd64, vanilla installation, up to date at the time of 
writing.

NICs: Only one NIC, device enp3s0, working correctly and configured with a 
static IP address.

IPv6 is disabled at the kernel command line, IPv4 is fully enabled.


Dear nftables / netfilter maintainer,

this message is closely related to the both messages I have sent you a few 
minutes ago. As explained there, a ruleset like

table arp t_ARP
delete table arp t_ARP
table arp t_ARP {

  chain output-filter {
    type filter hook output priority -800; policy drop;

    oifname enp3s0 arp ptype 0x0800 log prefix "Foo:" accept;
    log prefix "arp-output-filter:" drop;
  }
}

makes the machine not answer ARP broadcasts any more because the accept rule is 
not executed, probably due to bugs in the nft userspace program or the kernel.

But this time, let's examine what appears in /var/log/messages due to the drop 
rule. In my case, as soon as I tried to connect to the PC in question from 
another box that did not have the respective MAC address in the ARP cache, it 
was multiple instances of the following line:

[...] arp-output-filter:IN= OUT=enp0s3 ARP HTYPE=37 PTYPE=0x90bd OPCODE=21

Well, OK. The next 8 hours of that Sunday had to be spent for researching what 
hardware that should be (HFI hardware?), what opcode that should be 
(MARS-grouplist-reply) and what protocol type that should be. Of course, that 
led nowhere and was a waste of time.

Then I noticed something interesting. The ARP response packet that *should have 
been sent* (but had been dropped) would have begun with the following bytes:

00 25 90 bd b0 db 00 15 17 75 b2 04 08 06 00 01

The first 6 bytes are the destination MAC the response should be sent to, the 
next six bytes are the source MAC of the NIC in question. Now I observed that 
0x25 = 37 (the HTYPE from the log entry), 90bd (the PTYPE thereof) are the 
third and the fourth bytes of the destination MAC, and 0x15 = 21 (the second 
byte of the source MAC) is the opcode in the log entry.

This clearly shows that something catastrophic is going on there. I am now 
absolutely sure that nftables or the kernel use totally wrong data to create 
the log entry. The log entry shown above doesn't make any sense, and it is 
totally impossible that it is correct for the system in question. Instead, the 
kernel outputs arbitrary data taken from *single bytes of the packet's MAC 
addresses as HTYPE, PTYPE and OPCODE.*

Wow, that's really an epic fail that hardly can be excused. It took me the 
whole day to find out, and it has totally destroyed any trust in nftables. It 
might even impact security: If we write a firewall rule and the kernel uses 
arbitrary data to check packets against the rule, packets that should be 
dropped will be accepted, and vice versa.


It would be great if you could let me know what you think about the situation 
as soon as your time allows (we are currently under pressure with a project 
that aims to replace all our iptables-based firewall scripts by nftables-based 
ones, but that's not possible now). We really don't know how to proceed. Dump 
debian? Dump linux? Stick with iptables, dump nftables and revisit them in five 
years? Not really good options ... Any advice would be greatly appreciated.


Thank you very much again, and have a nice Sunday (it's already evening here),

Binarus

Reply via email to