Hi,

I'm currently working on route nexthop caching feature for tunneling interfaces 
such as
if_gif, if_gre, if_vxlan, and potentially if_wg. I encounter a nasty bug 
related to VNET lifecycle.
More preciously I'd like to call `rib_unsubscribe()` to unsubscribe route event 
when the interface
tunnel is deleted (gif_delete_tunnel).

While on VNET shutting down, VNET SYSUNINIT was called and the routing vnet 
subsystem
is destroyed before the interface going down and hence cause pagefault. I do 
not want to check
`vnet.vnet_shutdown` state as it looks messed up.

I'm recently reviewing the life cycles of prison and get some inspirations.

When the jail / prison is submitted to destroy ( by jail_remove syscall ) then 
SIGKILL is sent to
the prison's processes. I think it is correct order to destroy jail / prison. 
To summarize, the life cycle 
of jail / prison is:

on jail create: PRISON_STATE_INVALID -> create VNET -> PRISON_STATE_ALIVE -> 
setup network resources, ifnet, if addresses, routing, etc. -> create / attach 
(network) processes 
on jail destroy: jexec kill processes (1) by user -> mark it as 
PRISON_STATE_DYING -> send SIGKILL to processes by kernel (2)  -> destroy VNET 
(if prison pr_ref go to the last one) ->  DYED

The (2) is a cleanup by kernel as (1) is possible not done by user.


So it comes the idea about the life cycle of VNET.

While on jail destroy, the network resources are cleaned up by vnet_destroy ( 
SYSUNINIT ). Then the
order of SYSUNINIT of network components is hacking as circular network 
resource dependency is possible.
For example the routing table entries (nhop) have reference of ifnet, and ifnet 
have reference to route nhop (cache), as 
I encountered.

Just like the cleanup processes by kernel, we can introduce a new stage 
`vnet_shutdown` that clean up network resources.
When jail / prison is going to dye, after kernel has cleaned up processes it 
call `vnet_shutdown` to cleanup network resources,
then vnet_destroy will go smoothly as there's no circular network resource 
dependency right now.

The life cycle of prison becomes:

on jail create: PRISON_STATE_INVALID -> create VNET -> PRISON_STATE_ALIVE -> 
setup network resources, ifnet, if addresses, routing, etc. -> create / attach 
(network) processes 
on jail destroy: jexec kill processes (1) by user -> mark it as 
PRISON_STATE_DYING -> send SIGKILL to processes by kernel (2)  -> vnet_shutdown 
cleanup network resources -> destroy VNET (if prison pr_ref go to the last one) 
->  DYED

This idea is still unmature and I hope to hear more voices about it.

Thanks!

Best regards,
Zhenlei


Reply via email to