On 14.11.2017 20:44, Andrei Vagin wrote: > On Tue, Nov 14, 2017 at 04:53:33PM +0300, Kirill Tkhai wrote: >> Curently mutex is used to protect pernet operations list. It makes >> cleanup_net() to execute ->exit methods of the same operations set, >> which was used on the time of ->init, even after net namespace is >> unlinked from net_namespace_list. >> >> But the problem is it's need to synchronize_rcu() after net is removed >> from net_namespace_list(): >> >> Destroy net_ns: >> cleanup_net() >> mutex_lock(&net_mutex) >> list_del_rcu(&net->list) >> synchronize_rcu() <--- Sleep there for >> ages >> list_for_each_entry_reverse(ops, &pernet_list, list) >> ops_exit_list(ops, &net_exit_list) >> list_for_each_entry_reverse(ops, &pernet_list, list) >> ops_free_list(ops, &net_exit_list) >> mutex_unlock(&net_mutex) >> >> This primitive is not fast, especially on the systems with many processors >> and/or when preemptible RCU is enabled in config. So, all the time, while >> cleanup_net() is waiting for RCU grace period, creation of new net namespaces >> is not possible, the tasks, who makes it, are sleeping on the same mutex: >> >> Create net_ns: >> copy_net_ns() >> mutex_lock_killable(&net_mutex) <--- Sleep there for >> ages >> >> The solution is to convert net_mutex to the rw_semaphore. Then, >> pernet_operations::init/::exit methods, modifying the net-related data, >> will require down_read() locking only, while down_write() will be used >> for changing pernet_list. >> >> This gives signify performance increase, like you may see below. There >> is measured sequential net namespace creation in a cycle, in single >> thread, without other tasks (single user mode): >> >> 1)int main(int argc, char *argv[]) >> { >> unsigned nr; >> if (argc < 2) { >> fprintf(stderr, "Provide nr iterations arg\n"); >> return 1; >> } >> nr = atoi(argv[1]); >> while (nr-- > 0) { >> if (unshare(CLONE_NEWNET)) { >> perror("Can't unshare"); >> return 1; >> } >> } >> return 0; >> } >> >> Origin, 100000 unshare(): >> 0.03user 23.14system 1:39.85elapsed 23%CPU >> >> Patched, 100000 unshare(): >> 0.03user 67.49system 1:08.34elapsed 98%CPU >> >> 2)for i in {1..10000}; do unshare -n bash -c exit; done > > Hi Kirill, > > This mutex has another role. You know that net namespaces are destroyed > asynchronously, and the net mutex gurantees that a backlog will be not > big. If we have something in backlog, we know that it will be handled > before creating a new net ns. > > As far as I remember net namespaces are created much faster than > they are destroyed, so with this changes we can create a really big > backlog, can't we?
I don't think limitation is a good goal or a gool for the mutex, because it's very easy to create many net namespaces in case of the mutex exists. You may open /proc/[pid]/ns/net like a file, and net_ns counter will increment. Then, do unshare(), and the mutex has no a way to protect against that. Anyway, mutex can't limit a number of something in general, I've never seen a (good) example in kernel. As I see, the real limitation happen in inc_net_namespaces(), which is decremented after RCU grace period in cleanup_net(), and it has not changed. > There was a discussion a few month ago: > https://lists.onap.org/pipermail/containers/2016-October/037509.html > > >> >> Origin: >> real 1m24,190s >> user 0m6,225s >> sys 0m15,132s > > Here you measure time of creating and destroying net namespaces. > >> >> Patched: >> real 0m18,235s (4.6 times faster) >> user 0m4,544s >> sys 0m13,796s > > But here you measure time of crearing namespaces and you know nothing > when they will be destroyed. You're right, and I predict, the sum time, spent on cpu, will remain the same, but the think is that now creation and destroying may be executed in parallel.