Fri, Jan 22, 2016 at 05:21:28AM CET, wen.gang.w...@oracle.com wrote: > > >在 2016年01月21日 16:35, Jiri Pirko 写道: >>Thu, Jan 21, 2016 at 06:32:58AM CET, wen.gang.w...@oracle.com wrote: >>>In a bonding setting, we determines fragment size according to MTU and >>>PMTU associated to the bonding master. If the slave finds the fragment >>>size is too big, it drops the fragment and calls ip_rt_update_pmtu(), >>>passing _skb_ and _pmtu_, trying to update the path MTU. >>>Problem is that the target device that function ip_rt_update_pmtu actually >>>tries to update is the slave (skb->dev), not the master. Thus since no >>>PMTU change happens on master, the fragment size for later packets doesn't >>>change so all later fragments/packets are dropped too. >>> >>>The fix is letting build_skb_flow_key() take care of the transition of >>>device index from bonding slave to the master. That makes the master become >>>the target device that ip_rt_update_pmtu tries to update PMTU to. >>> >>>Signed-off-by: Wengang Wang <wen.gang.w...@oracle.com> >>>--- >>>net/ipv4/route.c | 9 +++++++++ >>>1 file changed, 9 insertions(+) >>> >>>diff --git a/net/ipv4/route.c b/net/ipv4/route.c >>>index 85f184e..7e766b5 100644 >>>--- a/net/ipv4/route.c >>>+++ b/net/ipv4/route.c >>>@@ -524,10 +524,19 @@ static void build_skb_flow_key(struct flowi4 *fl4, >>>const struct sk_buff *skb, >>>{ >>> const struct iphdr *iph = ip_hdr(skb); >>> int oif = skb->dev->ifindex; >>>+ struct net_device *master; >>> u8 tos = RT_TOS(iph->tos); >>> u8 prot = iph->protocol; >>> u32 mark = skb->mark; >>> >>>+ if (netif_is_bond_slave(skb->dev)) { >>>+ rcu_read_lock(); >>>+ master = netdev_master_upper_dev_get_rcu(skb->dev); >>>+ if (master) >>>+ oif = master->ifindex; >>>+ rcu_read_unlock(); >>>+ } >>This is certainly not correct as it should not be bond-specific but >>rather generic. > >Then what you would suggest to fix it? >>Note that you may have bond over bond or bridge over >>bond or other scenarios, which this patch ignores. >I don't think bond over bond is a good configuration. Do you have a real use >case for that configuration?
Stacking of multiple master devices is absolutelly common. You have to go in the upper tree all the way up, for all master device types. > >thanks, >wengang >