Eric W. Biederman <ebied...@xmission.com> wrote: > Florian Westphal <f...@strlen.de> writes: > > > Quoting Joe Stringer: > > If a user loads nf_conntrack_ftp, sends FTP traffic through a network > > namespace, destroys that namespace then unloads the FTP helper module, > > then the kernel will crash. > > > > Events that lead to the crash: > > 1. conntrack is created with ftp helper in netns x > > 2. This netns is destroyed > > 3. netns destruction is scheduled > > 4. netns destruction wq starts, removes netns from global list > > 5. ftp helper is unloaded, which resets all helpers of the conntracks > > via for_each_net() > > > > but because netns is already gone from list the for_each_net() loop > > doesn't include it, therefore all of these conntracks are unaffected. > > > > 6. helper module unload finishes > > 7. netns wq invokes destructor for rmmod'ed helper > > > > CC: "Eric W. Biederman" <ebied...@xmission.com> > > Reported-by: Joe Stringer <j...@ovn.org> > > Signed-off-by: Florian Westphal <f...@strlen.de> > > --- > > Eric, I'd like an explicit (n)ack from you for this one. > > This doesn't look too scary but I have the impression we have addressed > this elsewhere with a different solution. > > Looking... > > Ok. unregister_pernet_operations takes the net_mutex and thus > gives you this barrier automatically. > > Hmm. Why isn't this working for conntrack, looking... > > nf_conntrack_ftp doesn't use unregister_pernet_operations... > nf_conntract_ftp does use nf_conntrack_helpers_unregister > > I think I almost see the problem. > > What is the per net code that stops dealing with the nf_conntract_ftp? > > I am trying to figure out if your netns_barrier is reasonable or if > it treating the symptom. I am having trouble seeing enough of what > conntrack is doing to judge. > > Am I correct in understanding that the root problem is there is > something pointing to ftp_exp_policy at the time of module unload?
Joe described it nicely, problem is that after unload we may have conntracks that still have a nf_conn_help extension attached that has a pointer to a structure that resided in the (unloaded) module. Normally these references should have been NULL'd out by nf_ct_iterate_destroy(), however, there is a small chance that its for_each_net() misses namespaces already removed-from-list by concurrent netns workqueue cleanup. I guess another solution to fix this would be to add dummy pernet ops to all conntrack helpers so they block on unregister_pernet_subsys(). But thats rather ugly IMO since they don't have any notion of a net namespace in first place.