On Thu, Jan 18, 2018 at 7:40 AM, James Chapman <jchap...@katalix.com> wrote: > On 18 January 2018 at 15:18, Guillaume Nault <g.na...@alphalink.fr> wrote: >> On Wed, Jan 17, 2018 at 02:25:38PM -0500, David Miller wrote: >>> From: James Chapman <jchap...@katalix.com> >>> Date: Wed, 17 Jan 2018 11:13:33 +0000 >>> >>> > On 16 January 2018 at 19:00, David Miller <da...@davemloft.net> wrote: >>> >> From: Tom Herbert <t...@herbertland.com> >>> >> Date: Tue, 16 Jan 2018 09:36:41 -0800 >>> >> >>> >>> sk_user_data is set with the sk_callback lock held in code below. >>> >>> Should be able to take the lock earlier can do this check under the >>> >>> lock. >>> >> >>> >> csock, and this csk, is obtained from an arbitrary one of the >>> >> process's FDs. It can be any socket type or family, and that socket's >>> >> family might set sk_user_data without the callback lock. >>> >> >>> >> The only socket type check is making sure it is not another PF_KCM >>> >> socket. So that doesn't help with this problem. >>> > >>> > Is it the intention to update all socket code over time to write >>> > sk_user_data within the sk_callback lock? If so, I'm happy to address >>> > that in the l2tp code (and update the kcm patch to check sk_user_data >>> > within the sk_callback lock). Or is the preferred solution to restrict >>> > KCM to specific socket families, as suggested by Guillaume earlier in >>> > the thread? >>> >>> I think we have a more fundamental issue here. >>> >>> sk->sk_user_data is a place where RPC layer specific data is hung off >>> of. By this definition SunRPC, RXRPC, RDS, TIPC, and KCM are all >>> using it correctly. >>> >>> Phonet has a similar issue to the one seen here, it tests and changes >>> sk_user_data under lock_sock(). The only requirement it makes is >>> that the socket type is not SOCK_STREAM. However, this one might be OK >>> since only pep_sock sockets can be passed down into gprs_attach(). >>> >> But, if I read it correctly, that doesn't prevent it from being passed >> to kcm_attach() later on, which will overwrite sk_user_data (unless we >> update the locking scheme and refuse to overwrite sk_user_data in a >> race-free way). >> >> BTW couldn't the gprs_dev pointer be embedded in struct pep_sock? >> This way pep_sk(sk)->gp could be used instead of sk->sk_user_data. >> That'd probably be a violation of the phonet's layering, as that'd >> tie gprs_dev to pep sockets. OTOH, only pep sockets can currently be >> attached to gprs_dev, so in practice that might be a reasonable >> compromise. >> >>> Most of these cases like SunRPC, RXRPC, etc. are fine because they >>> only graft on top of TCP and UDP sockets. >>> >>> The weird situation here is that L2TP does tunneling and stores it's >>> private state in sk->sk_user_data like an RPC layer would. And KCM >>> allows basically any socket type to be attached. >>> >>> The RPC layers create their sockets internally, so I cannot see a way >>> that those can be sent to a KCM attach operations. And I think that >>> is why this RPC invariant is important for sk_user_data usage. >>> >> SunRPC seems to possibly set sk_user_data on user sockets: svc_addsock() >> gets a socket using sockfd_lookup() then passes it to svc_setup_socket() >> which in turn sets sk_user_data. I don't know anything about SunRPC, so >> I might very well have missed important details, but I believe such a >> socket could be passed to KCM which could lead to the same kind of >> issues as for L2TP. Other RPCs look safe to me. >> >>> If all else was equal, even though it doesn't make much sense to KCM >>> attach L2TP sockets to KCM, I would suggest to change L2TP to store >>> it's private stuff elsewhere. >>> >>> But that is not the case. Anything using the generic UDP >>> encapsulation layer is going to make use of sk->sk_user_data like this >>> (see setup_udp_tunnel_sock). >>> >> Most UDP encapsulations only use kernel sockets though. It seems that >> only L2TP and GTP use setup_udp_tunnel_sock() with userpsace sockets. >> So it might be feasible to restrict usage of sk_user_data to kernel >> sockets only. >> >> For L2TP, we probably can adapt l2tp_sock_to_tunnel() so that it does >> a lookup in a hashtable indexed by the socket pointer, rather than >> dereferencing sk_user_data. That doesn't look very satisfying to me, >> but that's the only way I found so far. > > L2TP needs a way to get at its local data from the socket in the data path. > >> We also have another user of sk_user_data in l2tp_ppp, but since it >> uses its own socket type, I guess we could simply embed the pointer in >> its parent structure. >> >>> It looks like over time we've accumulated this new class of uses >>> of sk->sk_user_data, ho hum... >>> >>> And it's not like we can add a test to KCM to avoid these socket >>> types, because they will look like normal UDP datagram sockets. >>> >>> What a mess... >>> >>> Furthermore, even if you add a test to KCM, you will now need to >>> add the same test to L2TP and anything else which uses sk_user_data >>> for tunneling and for which userspace has access to the socket fd. >>> >>> And it will be racy, indeed, until all such users align to the same >>> precise locking scheme for tests and updates to sk_user_data. >>> >>> Again, what a mess... >>> >> So, if I understand correctly, we can either restrict sk_user_data to >> kernel sockets so that KCM couldn't act on them (but then why would we >> make an exception for KCM and allow it to set sk_user_data on >> non-kernel sockets?). >> Or we could agree on a locking scheme for sk_user_data and update all >> users so that they'd fail instead of overwriting it when it's not NULL. >> >> Assuming my understanding is correct, do you have any preference for >> fixing this issue? Or any other ideas? > > Could we add a new pointer, say, encap_user_data to struct udp_sock > and use it instead of sk_user_data for UDP-encap sockets?
Then that's increasing the udp_sock structure size for a narrow use case which will get push back. I think it's going to be better to stick with one sock pointer. We could maybe redefine sk_user_data as a pointer to an allocated structure or array so it can hold multiple user_data pointers (in lieu of chaining). Tom