On Tue, Nov 14, 2017 at 10:00:59AM -0800, Eric Dumazet wrote: > On Tue, 2017-11-14 at 09:44 -0800, Andrei Vagin wrote: > > On Tue, Nov 14, 2017 at 04:53:33PM +0300, Kirill Tkhai wrote: > > > Curently mutex is used to protect pernet operations list. It makes > > > cleanup_net() to execute ->exit methods of the same operations set, > > > which was used on the time of ->init, even after net namespace is > > > unlinked from net_namespace_list. > > > > > > But the problem is it's need to synchronize_rcu() after net is removed > > > from net_namespace_list(): > > > > > > Destroy net_ns: > > > cleanup_net() > > > mutex_lock(&net_mutex) > > > list_del_rcu(&net->list) > > > synchronize_rcu() <--- Sleep there for > > > ages > > > list_for_each_entry_reverse(ops, &pernet_list, list) > > > ops_exit_list(ops, &net_exit_list) > > > list_for_each_entry_reverse(ops, &pernet_list, list) > > > ops_free_list(ops, &net_exit_list) > > > mutex_unlock(&net_mutex) > > > > > > This primitive is not fast, especially on the systems with many processors > > > and/or when preemptible RCU is enabled in config. So, all the time, while > > > cleanup_net() is waiting for RCU grace period, creation of new net > > > namespaces > > > is not possible, the tasks, who makes it, are sleeping on the same mutex: > > > > > > Create net_ns: > > > copy_net_ns() > > > mutex_lock_killable(&net_mutex) <--- Sleep there for > > > ages > > > > > > The solution is to convert net_mutex to the rw_semaphore. Then, > > > pernet_operations::init/::exit methods, modifying the net-related data, > > > will require down_read() locking only, while down_write() will be used > > > for changing pernet_list. > > > > > > This gives signify performance increase, like you may see below. There > > > is measured sequential net namespace creation in a cycle, in single > > > thread, without other tasks (single user mode): > > > > > > 1)int main(int argc, char *argv[]) > > > { > > > unsigned nr; > > > if (argc < 2) { > > > fprintf(stderr, "Provide nr iterations arg\n"); > > > return 1; > > > } > > > nr = atoi(argv[1]); > > > while (nr-- > 0) { > > > if (unshare(CLONE_NEWNET)) { > > > perror("Can't unshare"); > > > return 1; > > > } > > > } > > > return 0; > > > } > > > > > > Origin, 100000 unshare(): > > > 0.03user 23.14system 1:39.85elapsed 23%CPU > > > > > > Patched, 100000 unshare(): > > > 0.03user 67.49system 1:08.34elapsed 98%CPU > > > > > > 2)for i in {1..10000}; do unshare -n bash -c exit; done > > > > Hi Kirill, > > > > This mutex has another role. You know that net namespaces are destroyed > > asynchronously, and the net mutex gurantees that a backlog will be not > > big. If we have something in backlog, we know that it will be handled > > before creating a new net ns. > > > > As far as I remember net namespaces are created much faster than > > they are destroyed, so with this changes we can create a really big > > backlog, can't we? > > Please take a look at the recent patches I did : > > 8ca712c373a462cfa1b62272870b6c2c74aa83f9 Merge branch > 'net-speedup-netns-create-delete-time' > 64bc17811b72758753e2b64cd8f2a63812c61fe1 ipv4: speedup ipv6 tunnels dismantle > bb401caefe9d2c65e0c0fa23b21deecfbfa473fe ipv6: speedup ipv6 tunnels dismantle > 789e6ddb0b2fb5d5024b760b178a47876e4de7a6 tcp: batch tcp_net_metrics_exit > a90c9347e90ed1e9323d71402ed18023bc910cd8 ipv6: addrlabel: per netns list > d464e84eed02993d40ad55fdc19f4523e4deee5b kobject: factorize skb setup in > kobject_uevent_net_broadcast() > 4a336a23d619e96aef37d4d054cfadcdd1b581ba kobject: copy env blob in one go > 16dff336b33d87c15d9cbe933cfd275aae2a8251 kobject: add > kobject_uevent_net_broadcast() >
Good job! Now it really works much faster. I tested these patches with Kirill's one and everithing works good. I could not reproduce a situation, when a backlog starts growing. Thanks Kirill and Eric.