On Mon, Apr 30, 2018 at 2:28 PM, Yi-Hung Wei <yihung....@gmail.com> wrote: > Currently, nf_conntrack_max is used to limit the maximum number of > conntrack entries in the conntrack table for every network namespace. > For the VMs and containers that reside in the same namespace, > they share the same conntrack table, and the total # of conntrack entries > for all the VMs and containers are limited by nf_conntrack_max. In this > case, if one of the VM/container abuses the usage the conntrack entries, > it blocks the others from committing valid conntrack entries into the > conntrack table. Even if we can possibly put the VM in different network > namespace, the current nf_conntrack_max configuration is kind of rigid > that we cannot limit different VM/container to have different # conntrack > entries. > > To address the aforementioned issue, this patch proposes to have a > fine-grained mechanism that could further limit the # of conntrack entries > per-zone. For example, we can designate different zone to different VM, > and set conntrack limit to each zone. By providing this isolation, a > mis-behaved VM only consumes the conntrack entries in its own zone, and > it will not influence other well-behaved VMs. Moreover, the users can > set various conntrack limit to different zone based on their preference. > > The proposed implementation utilizes Netfilter's nf_conncount backend > to count the number of connections in a particular zone. If the number of > connection is above a configured limitation, ovs will return ENOMEM to the > userspace. If userspace does not configure the zone limit, the limit > defaults to zero that is no limitation, which is backward compatible to > the behavior without this patch. > > The following high leve APIs are provided to the userspace: > - OVS_CT_LIMIT_CMD_SET: > * set default connection limit for all zones > * set the connection limit for a particular zone > - OVS_CT_LIMIT_CMD_DEL: > * remove the connection limit for a particular zone > - OVS_CT_LIMIT_CMD_GET: > * get the default connection limit for all zones > * get the connection limit for a particular zone > > Signed-off-by: Yi-Hung Wei <yihung....@gmail.com> > --- > net/openvswitch/Kconfig | 3 +- > net/openvswitch/conntrack.c | 508 > +++++++++++++++++++++++++++++++++++++++++++- > net/openvswitch/conntrack.h | 9 +- > net/openvswitch/datapath.c | 7 +- > net/openvswitch/datapath.h | 1 + > 5 files changed, 522 insertions(+), 6 deletions(-) > .. > diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c > index c5904f629091..8234964889d9 100644 > --- a/net/openvswitch/conntrack.c > +++ b/net/openvswitch/conntrack.c ...
> +/* Call with ovs_mutex */ > +static void ct_limit_del(const struct ovs_ct_limit_info *info, u16 zone) > +{ > + struct ovs_ct_limit *ct_limit; > + struct hlist_head *head; > + > + head = ct_limit_hash_bucket(info, zone); > + hlist_for_each_entry_rcu(ct_limit, head, hlist_node) { better to use hlist_for_each_entry_safe() > + if (ct_limit->zone == zone) { > + hlist_del_rcu(&ct_limit->hlist_node); > + kfree_rcu(ct_limit, rcu); > + return; > + } > + } > +} > + .... > +static int ovs_ct_check_limit(struct net *net, > + const struct ovs_conntrack_info *info, > + const struct nf_conntrack_tuple *tuple) > +{ > + struct ovs_net *ovs_net = net_generic(net, ovs_net_id); > + const struct ovs_ct_limit_info *ct_limit_info = > ovs_net->ct_limit_info; > + u32 per_zone_limit, connections; > + u32 conncount_key[5]; > + > + conncount_key[0] = info->zone.id; > + > + rcu_read_lock(); This function is call with rcu_read_lock() in datapath, so no need to take it again. > + per_zone_limit = ct_limit_get(ct_limit_info, info->zone.id); > + if (per_zone_limit == OVS_CT_LIMIT_UNLIMITED) { > + rcu_read_unlock(); > + return 0; > + } > + > + connections = nf_conncount_count(net, ct_limit_info->data, > + conncount_key, tuple, &info->zone); > + if (connections > per_zone_limit) { > + rcu_read_unlock(); > + return -ENOMEM; > + } > + > + rcu_read_unlock(); > + return 0; > +} > +#endif > + .... > > static void __net_exit list_vports_from_net(struct net *net, struct net > *dnet, > @@ -2469,3 +2471,4 @@ MODULE_ALIAS_GENL_FAMILY(OVS_VPORT_FAMILY); > MODULE_ALIAS_GENL_FAMILY(OVS_FLOW_FAMILY); > MODULE_ALIAS_GENL_FAMILY(OVS_PACKET_FAMILY); > MODULE_ALIAS_GENL_FAMILY(OVS_METER_FAMILY); > +MODULE_ALIAS_GENL_FAMILY(OVS_CT_LIMIT_FAMILY); > diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h > index 523d65526766..51bd4dcb6c8b 100644 > --- a/net/openvswitch/datapath.h > +++ b/net/openvswitch/datapath.h > @@ -144,6 +144,7 @@ struct dp_upcall_info { > struct ovs_net { > struct list_head dps; > struct work_struct dp_notify_work; > + struct ovs_ct_limit_info *ct_limit_info; > Lets keep this struct and hash table inside the ovs_net to avoid indirections in accessing the hash table. Also need to check for IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT).