Dear Network Core developers!

I've been debugging an issue with Multicast replies from underlying
interface of MACVLAN towards MACVLAN. These SKBs never contain a MAC header
and therefore cannot be properly processed by MACVLAN.

The usecase is following:
eth1 <-- eth1.212 <-- macvlan@eth1.212 (in bridge mode)

As I understand the problem, it actually plays no role, that there is an 
intermediate VLAN interface.
The problem is, if macvlan@eth1.212 sends Router Solicitation these SKBs are 
received on eth1.212,
but the corresponding multicast Router Advertisements are not received on 
macvlan@eth1.212.

I've tracked the problem down to the following incompatibility between MACVLAN 
code and IP code...

One the one hand, MACVLAN always expects ethernet header:

static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb)          
                                                                                
                
{                                                                               
                                                                                
                
        struct macvlan_port *port;                                              
                                                                                
                
        struct sk_buff *skb = *pskb;                                            
                                                                                
                
        const struct ethhdr *eth = eth_hdr(skb);                                
                                                                                
                
        ...
                                                                                
                                                                                
                
        port = macvlan_port_get_rcu(skb->dev);                                  
                                                                                
                
        if (is_multicast_ether_addr(eth->h_dest)) {                             
                                                                                
                

One the other hand, IP doesn't populate ethernet header for multicast loopback 
transmission:

int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)    
                                                                                
                
{                                                                               
                                                                                
                
        skb_reset_mac_header(skb);                                              
                                                                                
                
        __skb_pull(skb, skb_network_offset(skb));                               
                                                                                
                
        skb->pkt_type = PACKET_LOOPBACK;                                        
                                                                                
                
        skb->ip_summed = CHECKSUM_UNNECESSARY;                                  
                                                                                
                
        WARN_ON(!skb_dst(skb));                                                 
                                                                                
                
        skb_dst_force(skb);                                                     
                                                                                
                
        netif_rx_ni(skb);                                                       
                                                                                
                

Unicast however works fine, because of:

int neigh_connected_output(struct neighbour *neigh, struct sk_buff *skb)        
                                                                                
                
{                                                                               
                                                                                
                
        struct net_device *dev = neigh->dev;                                    
                                                                                
                
        unsigned int seq;                                                       
                                                                                
                
        int err;                                                                
                                                                                
                
                                                                                
                                                                                
                
        do {                                                                    
                                                                                
                
                __skb_pull(skb, skb_network_offset(skb));                       
                                                                                
                
                seq = read_seqbegin(&neigh->ha_lock);                           
                                                                                
                
                err = dev_hard_header(skb, dev, ntohs(skb->protocol),           
                                                                                
                
                                      neigh->ha, NULL, skb->len);               
                                                                                
                
        } while (read_seqretry(&neigh->ha_lock, seq));                          
                                                                                
                
                                                                                
                                                                                
                
        if (err >= 0)                                                           
                                                                                
                
                err = dev_queue_xmit(skb);                                      
                                                                                
                

I've also collected some stack traces and SKB dumps to illustrate the problem
(I've instrumented macvlan_handle_frame() and eth_header() to understand when
the ethernet header has been generated):

macvlan_handle_frame() receives Router Advertisement, but cannot forward
without Ethernet header:

skb len=96 headroom=40 headlen=96 tailroom=56
mac=(40,0) net=(40,40) trans=80
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0xae2e9a2f ip_summed=1 complete_sw=0 valid=0 level=0)
hash(0xc97ebd88 sw=1 l4=1) proto=0x86dd pkttype=5 iif=24
dev name=etha01.212 feat=0x0x0000000040005000
skb headroom: 00000000: 00 28 b3 4d 84 88 ff ff b2 72 b9 5e 00 00 00 00
skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb headroom: 00000020: 08 0f 00 00 00 00 00 00
skb linear:   00000000: 60 09 88 bd 00 38 3a ff fe 80 00 00 00 00 00 00
skb linear:   00000010: 00 40 43 ff fe 80 00 00 ff 02 00 00 00 00 00 00
skb linear:   00000020: 00 00 00 00 00 00 00 01 86 00 61 00 40 00 00 2d
skb linear:   00000030: 00 00 00 00 00 00 00 00 03 04 40 e0 00 00 01 2c
skb linear:   00000040: 00 00 00 78 00 00 00 00 fd 5f 42 68 23 87 a8 81
skb linear:   00000050: 00 00 00 00 00 00 00 00 01 01 02 40 43 80 00 00
skb tailroom: 00000000: 00 f0 01 00 00 00 00 00 a4 73 00 00 00 00 00 00
skb tailroom: 00000010: a4 73 00 00 00 00 00 00 00 10 00 00 00 00 00 00
skb tailroom: 00000020: 01 00 00 00 06 00 00 00 40 66 02 00 00 00 00 00
skb tailroom: 00000030: 40 76 02 00 00 00 00 00

Call Trace:
 <IRQ>
 dump_stack+0x69/0x9b
 macvlan_handle_frame+0x321/0x425 [macvlan]
 ? macvlan_forward_source+0x110/0x110 [macvlan]
 __netif_receive_skb_core+0x545/0xda0
 ? ip6_mc_input+0x103/0x250 [ipv6]
 ? ipv6_rcv+0xe1/0xf0 [ipv6]
 ? __netif_receive_skb_one_core+0x36/0x70
 __netif_receive_skb_one_core+0x36/0x70
 process_backlog+0x97/0x140
 net_rx_action+0x1eb/0x350
 __do_softirq+0xe3/0x383
 do_softirq_own_stack+0x2a/0x40
 </IRQ>
 do_softirq.part.4+0x4e/0x50
 netif_rx_ni+0x60/0xd0
 dev_loopback_xmit+0x83/0xf0
 ip6_finish_output2+0x575/0x590 [ipv6]
 ? ip6_cork_release.isra.1+0x64/0x90 [ipv6]
 ? __ip6_make_skb+0x38d/0x680 [ipv6]
 ? ip6_output+0x6c/0x140 [ipv6]
 ip6_output+0x6c/0x140 [ipv6]
 ip6_send_skb+0x1e/0x60 [ipv6]
 rawv6_sendmsg+0xc4b/0xe10 [ipv6]
 ? proc_put_long+0xd0/0xd0
 ? rw_copy_check_uvector+0x4e/0x110
 ? sock_sendmsg+0x36/0x40
 sock_sendmsg+0x36/0x40
 ___sys_sendmsg+0x2b6/0x2d0
 ? proc_dointvec+0x23/0x30
 ? addrconf_sysctl_forward+0x8d/0x250 [ipv6]
 ? dev_forward_change+0x130/0x130 [ipv6]
 ? _raw_spin_unlock+0x12/0x30
 ? proc_sys_call_handler.isra.14+0x9f/0x110
 ? __call_rcu+0x213/0x510
 ? get_max_files+0x10/0x10
 ? trace_hardirqs_on+0x2c/0xe0
 ? __sys_sendmsg+0x63/0xa0
 __sys_sendmsg+0x63/0xa0
 do_syscall_64+0x6c/0x1e0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Later when the same RA is being transmitted neigh_connected_output(), this is 
the first
time Ethernet header is being generated for this packet, but this is towards 
"world", not
the internal MACVLAN bridge:

skb len=110 headroom=26 headlen=110 tailroom=56
mac=(-1,-1) net=(40,40) trans=80
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0xae2e9a2f ip_summed=0 complete_sw=0 valid=0 level=0)
hash(0xc97ebd88 sw=1 l4=1) proto=0x86dd pkttype=0 iif=0
dev name=etha01.212 feat=0x0x0000000040005000
sk family=10 type=3 proto=58
skb headroom: 00000000: 00 28 b3 4d 84 88 ff ff b2 72 b9 5e 00 00 00 00
skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00
skb linear:   00000000: 33 33 00 00 00 01 02 40 43 80 00 00 86 dd 60 09
skb linear:   00000010: 88 bd 00 38 3a ff fe 80 00 00 00 00 00 00 00 40
skb linear:   00000020: 43 ff fe 80 00 00 ff 02 00 00 00 00 00 00 00 00
skb linear:   00000030: 00 00 00 00 00 01 86 00 61 00 40 00 00 2d 00 00
skb linear:   00000040: 00 00 00 00 00 00 03 04 40 e0 00 00 01 2c 00 00
skb linear:   00000050: 00 78 00 00 00 00 fd 5f 42 68 23 87 a8 81 00 00
skb linear:   00000060: 00 00 00 00 00 00 01 01 02 40 43 80 00 00
skb tailroom: 00000000: 00 f0 01 00 00 00 00 00 a4 73 00 00 00 00 00 00
skb tailroom: 00000010: a4 73 00 00 00 00 00 00 00 10 00 00 00 00 00 00
skb tailroom: 00000020: 01 00 00 00 06 00 00 00 40 66 02 00 00 00 00 00
skb tailroom: 00000030: 40 76 02 00 00 00 00 00

Call Trace:
 dump_stack+0x69/0x9b
 debug_hdr+0x4c/0x60
 eth_header+0x71/0xe0
 vlan_dev_hard_header+0x58/0x140 [8021q]
 neigh_connected_output+0xa9/0x100
 ip6_finish_output2+0x24a/0x590 [ipv6]
 ? ip6_cork_release.isra.1+0x64/0x90 [ipv6]
 ? __ip6_make_skb+0x38d/0x680 [ipv6]
 ? ip6_output+0x6c/0x140 [ipv6]
 ip6_output+0x6c/0x140 [ipv6]
 ip6_send_skb+0x1e/0x60 [ipv6]
 rawv6_sendmsg+0xc4b/0xe10 [ipv6]
 ? proc_put_long+0xd0/0xd0
 ? rw_copy_check_uvector+0x4e/0x110
 ? sock_sendmsg+0x36/0x40
 sock_sendmsg+0x36/0x40
 ___sys_sendmsg+0x2b6/0x2d0
 ? proc_dointvec+0x23/0x30
 ? addrconf_sysctl_forward+0x8d/0x250 [ipv6]
 ? dev_forward_change+0x130/0x130 [ipv6]
 ? _raw_spin_unlock+0x12/0x30
 ? proc_sys_call_handler.isra.14+0x9f/0x110
 ? __call_rcu+0x213/0x510
 ? get_max_files+0x10/0x10
 ? trace_hardirqs_on+0x2c/0xe0
 ? __sys_sendmsg+0x63/0xa0
 __sys_sendmsg+0x63/0xa0
 do_syscall_64+0x6c/0x1e0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

I would appreciate any hint, how to approach this problem! I can try to come up 
with a patch,
but as this is so central thing in the IP protocol, I'd like to hear some 
opinions first...

-- 
Best regards,
Alexander Sverdlin.

Reply via email to