On Wed, Jun 06, 2018 at 05:25:51PM -0700, Maciej Żenczykowski wrote:
> Yes, it does, we found this internally last night and been debating
> what to do about it.
> 
> Fundamentally what it points out is that prior to this patch CRIU
> could get the host into an inconsistent state.

Yes, I understand the problem. It would be good to
find a way how to fix this without breaking CRIU...

> It inserts all the sockets into the hashtables with SO_REUSEADDR set,
> and then (potentially) clears it on some of them...
> But the tb cache still thinks it's set on all of them.
> So later attempts to bind() a socket with SO_REUSEADDR set can then
> succeed even though they should fail (or something like that).
> 
> I wonder if we instead need a socket option to basically say 'ignore
> all conflicts' that CRIU could set, and then clear post
> bind/listen/connect
> hash table insertion...

> 
> Or maybe the transition from 1->0 is valid, but from 0->1 isn't??

I wanted to say that criu needs only the transition from 1->0, but then
I found that that TCP_REPAIR changes sk->sk_reuse too. When we switch a
socket into the repair mode, sk_reuse is set to SK_FORCE_REUSE. But when
we disable the repair mode for a socket, sk_reuse is always set to
SK_NO_REUSE, then we need to be able to restore the origin value for
it somehow...

> 
> Or we need special per-protocol code in the SO_REUSE{ADDR,PORT}
> setsockopt handler to recalculate the tb cache?
> 
> Anyone have any smart ideas?

Reply via email to