On Wed, Nov 05, 2014 at 04:57:19PM -0500, David Miller wrote: > From: Al Viro <v...@zeniv.linux.org.uk> > Date: Wed, 5 Nov 2014 21:07:45 +0000 > > > Ping me when you put it there, OK? I'll rebase the rest of old stuff on > > top of it (similar helpers, mostly). > > I just pushed it into net-next, thanks Al.
OK, I've taken the beginning of the old queue on top of net-next; it's in git://git.kernel.org//pub/scm/linux/kernel/git/viro/vfs.git iov_iter-net. >From the quick look at the remaining ->msg_iov users: * I'll need to add several iov_iter primitives - counterparts of checksum.h stuff (copy_and_csum_{from,to}_iter(), maybe some more). Not a big deal, I'll do that tomorrow. That will give us a clean iov_iter-based counterpart of skb_copy_and_csum_datagram_iovec(). * a new helper: zerocopy_sg_from_iter(). I have it, actually, but I'd rather not step on Herbert's toes - it's too close to the areas his series will touch, so that's probably for when his series goes in. It will be needed for complete macvtap conversion... * why doesn't verify_iovec() use rw_copy_check_uvector()? The only real differences I see is that (a) you do allocation in callers (same as rw_copy_check_uvector() would've done), (b) you return EMSGSIZE in case of too long vector, while rw_copy_check_uvector() returns EINVAL in that case and (c) you don't do access_ok(). The last one is described as optimization, but for iov_iter primitives it's a serious PITA - for iovec-backed instances they are using __copy_from_user()/__copy_to_user(), etc. It certainly would be nice to have the same code doing all copying of iovecs from userland - readv/writev/aio/sendmsg/recvmsg/etc. Am I missing something subtle semantical difference in there? EMSGSIZE vs EINVAL is trivial (we can lift that check into the callers, if nothing else), but I could miss something more interesting... * various getfrag will need to grow iov_iter-based counterparts, but ip_append_output() needs no changes, AFAICS. * crypto stuff will be easy to convert - iov_iter_get_pages() would suffice for a primitive * there's some really weird stuff in there. Just what is this static int raw_probe_proto_opt(struct flowi4 *fl4, struct msghdr *msg) { struct iovec *iov; u8 __user *type = NULL; u8 __user *code = NULL; int probed = 0; unsigned int i; if (!msg->msg_iov) return 0; for (i = 0; i < msg->msg_iovlen; i++) { iov = &msg->msg_iov[i]; if (!iov) continue; trying to do? "If non-NULL pointer + i somehow happened to be NULL, skip it and try to use the same pointer + i + 1"? Huh? Had been that way since the function first went in back in 2004 ("[IPV4] XFRM: probe icmp type/code when sending packets via raw socket.", according to historical tree)... * rds, bluetooth and vsock are doing something odd; need to RTFS some more. * not sure I understand what TIPC is doing - does it prohibit too short first segment of ->msg_iov? net/tipc/socket.c:dest_name_check() looks odd _and_ potentially racy - we read the same data twice and hope our checks still apply. I asked TIPC folks about that race back in April, but it looks like that fell through the cracks... Overall, so far it looks more or less feasible - other than the missing csum primitives, current mm/iov_iter.c should suffice. I have _not_ seriously looked into sendpage yet; that might very well require some more. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/