On Fri, Jan 6, 2012 at 6:27 PM, Jesse Gross <je...@nicira.com> wrote: > On Thu, Jan 5, 2012 at 7:59 PM, Pravin B Shelar <pshe...@nicira.com> wrote: >> diff --git a/datapath/datapath.c b/datapath/datapath.c >> index 17871e4..76bf8f5 100644 >> --- a/datapath/datapath.c >> +++ b/datapath/datapath.c >> +static void __dp_destroy(struct datapath *dp) >> +{ > > Can you document the locking expectations for the function (i.e. genl mutex)? ok.
> >> + struct vport *vport, *next_vport; >> >> + list_for_each_entry_safe(vport, next_vport, &dp->port_list, node) >> + if (vport->port_no != OVSP_LOCAL) >> + ovs_dp_detach_port(vport); > > I think that the above block is the only part of destruction that > needs to hold RTNL. Can we just take the lock there and drop it from > callers? Then we can also move call_rcu in here as well, so it really > does the full destruction process. > RTNL is also required while detaching dp local vport. >> @@ -2047,15 +2059,20 @@ error: >> static int __rehash_flow_table(void *dummy) >> { >> struct datapath *dp; >> + struct net *net; >> >> - list_for_each_entry(dp, &dps, list_node) { >> - struct flow_table *old_table = genl_dereference(dp->table); >> - struct flow_table *new_table; >> + for_each_net(net) { >> + struct ovs_net *ovs_net = net_generic(net, ovs_net_id); > > I think you need to take RTNL here, otherwise a namespace could > disappear while you are iterating over them. > ok. >> +static void dp_destroy_all(struct ovs_net *ovs_net) >> +{ >> + struct datapath *dp, *dp_next; >> + LIST_HEAD(dp_kill_list); >> + >> + rtnl_lock(); >> + list_for_each_entry_safe(dp, dp_next, &ovs_net->dps, list_node) { > > This list is really protected by genl mutex, not RTNL. right, I was thinking of using only RTLN lock from net_exit(). as genl commands can not race with it. Only rehash can race. In rehash we can have both genl and rtnl locks. What do you think? > >> + __dp_destroy(dp); >> + list_add_tail(&dp->list_node, &dp_kill_list); > > I'm not sure that there's much benefit to batching the calls to > call_rcu outside of RTNL. However, if you push them both down into > __dp_destroy() then you can get the same effect for free. > >> + list_for_each_entry(dp, &dp_kill_list, list_node) { >> + call_rcu(&dp->rcu, destroy_dp_rcu); >> + module_put(THIS_MODULE); > > I believe that we can now just drop module refcounting because this > actually achieves the goal of deleting datapaths when the module is > unloaded. When you call unregister_pernet_device() on module unload, > it calls ovs_exit_net() on all namespaces that are currently active so > remaining dps automatically get deleted. > I have separate patch for this. > Can you also annotate ovs_init_net() ovs_exit_net() with __init_net > and __exit_net respectively? > ok. >> @@ -2085,21 +2154,21 @@ static int __init dp_init(void) > [...] >> + err = register_pernet_device(&ovs_net_ops); >> + if (err < 0) > > Does this ever return a positive value? Otherwise, we can just check > for if (err) like in other places. > ok. >> @@ -2131,9 +2200,9 @@ static void dp_cleanup(void) >> rcu_barrier(); >> dp_unregister_genl(ARRAY_SIZE(dp_genl_families)); >> unregister_netdevice_notifier(&ovs_dp_device_notifier); >> + unregister_pernet_device(&ovs_net_ops); > > I think we need to move rcu_barrier() down because > unregister_pernet_device() can generate rcu callbacks (assuming that > we drop the refcounting) and those should be complete before the > module exists. In general, I think the safest thing to do is just > make it last, that way we don't have to worry about whether any of the > exit functions might have callbacks. As we still have module ref for each dp, this can not happen. I have separate patch to drop reference counting. I will fold it in this patch. > >> diff --git a/datapath/datapath.h b/datapath/datapath.h >> index 27151b9..d5a8e41 100644 >> --- a/datapath/datapath.h >> +++ b/datapath/datapath.h >> @@ -29,10 +29,11 @@ >> >> #include "checksum.h" >> #include "compat.h" >> -#include "flow.h" >> #include "dp_sysfs.h" >> +#include "flow.h" >> +#include "tunnel.h" >> #include "vlan.h" >> - >> +#include "vport.h" >> struct vport; > > If you include vport.h then you can drop the forward declaration of > struct vport. > ok. >> +struct ovs_net { >> + /* Per Network namespace list of datapaths to enable dumping them >> + * all out. Protected by genl_mutex. >> + */ >> + struct list_head dps; >> + /* Per network namespace data for tnl. */ >> + struct tnl_net tnl_net; >> + /* Per network namespace data for vport. */ >> + struct vport_net vport_net; >> +}; > > You should use the kernel comment style to document the struct > members, as is used other places in this file. > ok. >> +extern int ovs_net_id; > > Can you put this together with the other global static variables? ok > >> diff --git a/datapath/linux/compat/include/net/genetlink.h >> b/datapath/linux/compat/include/net/genetlink.h >> index a1ff7c1..4e7f1b6 100644 >> --- a/datapath/linux/compat/include/net/genetlink.h >> +++ b/datapath/linux/compat/include/net/genetlink.h >> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,32) >> +#define SET_NETNSOK >> +#else >> +#define SET_NETNSOK .netnsok = true, >> +#endif > > Can you put this in compat.h since it's not going upstream? > ok. >> diff --git a/datapath/linux/compat/include/net/net_namespace.h >> b/datapath/linux/compat/include/net/net_namespace.h >> index 9ce9fcd..1c25cc6 100644 >> --- a/datapath/linux/compat/include/net/net_namespace.h >> +++ b/datapath/linux/compat/include/net/net_namespace.h >> #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,32) >> -#define INIT_NET_GENL_SOCK init_net.genl_sock >> +#define GENL_SOCK(net) ((net)->genl_sock) >> #else >> -#define INIT_NET_GENL_SOCK genl_sock >> +#define GENL_SOCK(net) (genl_sock) >> +#endif > > While you're at it, can you move this to compat.h as well? > ok. >> diff --git a/datapath/linux/compat/net_namespace.c >> b/datapath/linux/compat/net_namespace.c >> new file mode 100644 >> index 0000000..31c4a6e >> --- /dev/null >> +++ b/datapath/linux/compat/net_namespace.c >> +#if LINUX_VERSION_CODE == KERNEL_VERSION(2,6,32) >> + >> +#undef pernet_operations >> +struct pernet_operations pnet_compat; >> + >> +struct rpl_pernet_operations *pnet; > > These should be static. > ok. >> +static int __net_init ovs_init_net(struct net *net) > > Can you name these something different from the ones in datapath.c? > ok >> +{ >> + int err; >> + void *ovs_net = kzalloc(pnet->size, GFP_KERNEL); >> + > > Extra blank line here. > ok >> + if (!ovs_net) >> + return -ENOMEM; >> + >> + err = net_assign_generic(net, *pnet->id, ovs_net); >> + if (err) { >> + kfree(ovs_net); >> + return err; >> + } >> + >> + err = pnet->init(net); >> + if (err) { >> + kfree(ovs_net); >> + return err; > > Can you make a common exit path? > ok >> +static void __net_init ovs_exit_net(struct net *net) > > This should be __net_exit, not __net_init. > ok >> +int rpl_register_pernet_device(struct rpl_pernet_operations *rpl_pnet) >> +{ >> + pnet = rpl_pnet; >> + pnet_compat.init = ovs_init_net; >> + pnet_compat.exit = ovs_exit_net; > > Can you just initialize these statically, like you did in the main > pernet code in datapath.c? > ok. >> + return register_pernet_gen_subsys(pnet->id, &pnet_compat); > > Shouldn't this be register_pernet_gen_device(), since that's what > you're emulating. > right. >> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,32) > > These could just be combined into an #elsif. > ok >> diff --git a/datapath/tunnel.c b/datapath/tunnel.c >> index 33d2fe9..1a66db3 100644 >> --- a/datapath/tunnel.c >> +++ b/datapath/tunnel.c >> @@ -82,23 +84,12 @@ >> #define CACHE_DATA_ALIGN 16 >> #define PORT_TABLE_SIZE 1024 >> >> -static struct hlist_head *port_table __read_mostly; >> -static int port_table_count; >> > > Extra blank line. > ok >> static void cache_cleaner(struct work_struct *work) > [...] >> + for_each_net(net) { > > I think this needs to be for_each_net_rcu(). > right. >> + struct tnl_net *tnl_net = get_tnl_net(net); >> + >> + for (i = 0; i < PORT_TABLE_SIZE; i++) { >> + > > Extra blank line. > ok. >> @@ -1417,7 +1438,8 @@ static int tnl_set_config(struct nlattr *options, >> const struct tnl_ops *tnl_ops, >> >> mutable->tunnel_hlen += sizeof(struct iphdr); >> >> - old_vport = port_table_lookup(&mutable->key, &old_mutable); >> + tnl_net = get_tnl_net(net); > > Can you initialize tnl_net at the beginning of the function so we > don't have wonder whether it is set? > ok. >> +int ovs_tnl_init_net(struct net *net) > [...] >> + tnl_net->port_table = kmalloc(PORT_TABLE_SIZE * sizeof(struct >> hlist_head *), >> + GFP_KERNEL); > > I think that namespaces might affect the tradeoff of whether it makes > sense to preallocate a tunnel table at startup. There could be a lot > of namespaces, none of which use tunnels, so the wasted memory could > actually be pretty significant. > we can have global shared tnl port table like vport table. >> diff --git a/datapath/tunnel.h b/datapath/tunnel.h >> index 6865ae6..f470702 100644 >> --- a/datapath/tunnel.h >> +++ b/datapath/tunnel.h >> @@ -20,6 +20,9 @@ >> #define TUNNEL_H 1 >> >> #include <linux/version.h> >> +#include <net/ip_vs.h> > > We shouldn't reference anything in ip_vs.h > ok. >> +struct tnl_net { >> + struct hlist_head *port_table; >> + >> + /* >> + * These are just used as an optimization: they don't require any >> kind >> + * of synchronization because we could have just as easily read the >> + * value before the port change happened. >> + */ >> + >> + unsigned int key_local_remote_ports; >> + unsigned int key_remote_ports; >> + unsigned int key_multicast_ports; >> + unsigned int local_remote_ports; >> + unsigned int remote_ports; >> + unsigned int multicast_ports; >> +}; > > Does anything outside of tunnel.c actually use struct tnl_net? > it is required in datapath.h for ovs_net definition. >> diff --git a/datapath/vport-capwap.c b/datapath/vport-capwap.c >> index 6c1b0da..493405b 100644 >> --- a/datapath/vport-capwap.c >> +++ b/datapath/vport-capwap.c >> -static int capwap_init(void) >> +static int capwap_init_net(struct net *net) > > Can you annotate this with __net_init as well (actually we should > annotate all the net functions). > ok. > However, given that we already wanted to start opening capwap sockets > on demand I wonder whether it makes sense to have these vport init_net > and exit_net functions at all and just drive everything off port > creation (and ports will get automatically cleaned up with the > namespace goes away). > ok. I will create socket on first port addition. >> diff --git a/datapath/vport-patch.c b/datapath/vport-patch.c >> index 53b24b0..a7be489 100644 >> --- a/datapath/vport-patch.c >> +++ b/datapath/vport-patch.c >> -static void update_peers(const char *name, struct vport *vport) >> +static void update_peers(const char *name, struct net *net, struct vport >> *vport) >> { >> struct hlist_head *bucket = hash_bucket(name); > > It would be nice if we also included the net in the hash. > ok. >> diff --git a/datapath/vport.c b/datapath/vport.c >> index e9ccdbd..83043ec 100644 >> --- a/datapath/vport.c >> +++ b/datapath/vport.c >> +static void vport_exit_net(struct net *net, int max) >> +{ >> + int i; >> + >> + for (i = 0; i < max; i++) { >> + const struct vport_ops *ops = vport_ops_list[i]; >> + >> + if (ops->exit_net) >> + ops->exit_net(net); >> + } >> +} >> + >> +int ovs_vport_init_net(struct net *net) >> +{ >> + int i; >> + >> + for (i = 0; i < n_vport_types; i++) { >> + const struct vport_ops *ops = vport_ops_list[i]; >> + int err; >> + >> + err = 0; >> + >> + if (ops->init_net) >> + err = ops->init_net(net); >> + >> + if (err) { >> + vport_exit_net(net, i); >> + return err; >> + } >> + } >> + >> + return 0; >> +} >> + >> +void ovs_vport_exit_net(struct net *net) >> +{ >> + vport_exit_net(net, n_vport_types); >> +} >> + > > We now have a funny model where things that need to be initialized at > module load and fail will continue without the vport support but > things initialized at namespace init fail the net. Since there's no > clear distinction (from the outside) why something is initialized in > one or the other, it's somewhat confusing. This is probably an even > stronger reason to do things on demand and just get rid of the > initialization functions altogether. > ok. >> -struct vport *ovs_vport_locate(const char *name) >> +struct vport *ovs_vport_locate(struct net *net, const char *name) >> { >> struct hlist_head *bucket = hash_bucket(name); >> struct vport *vport; >> struct hlist_node *node; >> >> hlist_for_each_entry_rcu(vport, node, bucket, hash_node) >> - if (!strcmp(name, vport->ops->get_name(vport))) >> + if (!strcmp(name, vport->ops->get_name(vport)) && >> + vport->net == net) > > It would be nice if we could use the net in the hash here too. > ok. >> @@ -187,6 +229,7 @@ struct vport *ovs_vport_alloc(int priv_size, const >> struct vport_ops *ops, >> >> vport->dp = parms->dp; >> vport->port_no = parms->port_no; >> + vport->net = parms->dp->net; > > vport->net is always exactly the same as vport->dp->net, right? > > Assuming that's the case, I don't think it is a very good idea to make > a copy of it. It's confusing if an internal device is moved to a > different namespace and generally redundant. You could create a > function get this if you want but that probably won't save any typing > anyways. > ok. >> diff --git a/datapath/vport.h b/datapath/vport.h >> index 44cf603..44f3bf0 100644 >> --- a/datapath/vport.h >> +++ b/datapath/vport.h >> @@ -20,25 +20,32 @@ >> #define VPORT_H 1 >> >> #include <linux/list.h> >> +#include <linux/netlink.h> > > Is netlink.h actually used? ok. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev