From: Evgeniy Polyakov <[EMAIL PROTECTED]> Date: Sat, 18 Feb 2006 19:11:40 +0300
> Hello developers. > > I'm pleased to announce combined patch of kevent and network AIO subsytems, > which are implemented using three system calls: > * kevent_ctl(), which fully controls kevent subsystem. > It allows to add/modify/remove kevents and waiting for either given > number or at least one kevent is ready. > * aio_send(). > This system call allows to asynchronously transfer userspace buffer > to the remote host using zero-copy mechanism. > * aio_recv(). > Asynchronous receiving support. > > Next step is to implement aio_sendfile() support. > > Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> I did minor review today, the scheme is interesting, but here are some initial code review comments: 1) Please format to 80 columns. I use a very generous 86 character wide emacs window and much of your code wrapped lines :-) 2) I do not see real need to enumeration for all of the KEVENT_* constants, just use CPP defines. You are setting them explicitly anyways. 3) Adding sk_kevent_flags to struct sock looks like real bloat, you just have 1 or 2 flags, just allocate them to sk_flags. I also do not see why you need to obtain the socket lock around set/clear bit. You are using atomic bit operations and you are not testing anything other than local variable state ('valbool'). 4) I have 2 major problems with tcp_async_{send,recv}(). a) Much duplication of existing tcp sendmsg/recvmsg code. Consolidation and code sharing is needed. b) A large chunk of it is protocol agnostic, we can share this logic such that support for protocols such as DCCP can be made very simply. 5) Why is the ordering between sk_stream_set_owner_r() and __skb_queue_tail() important in this hunk? @@ -3079,8 +3080,8 @@ queue_and_out: !sk_stream_rmem_schedule(sk, skb)) goto drop; } - sk_stream_set_owner_r(skb, sk); __skb_queue_tail(&sk->sk_receive_queue, skb); + sk_stream_set_owner_r(skb, sk); } tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq; if(skb->len) 6) I disagree with turning off the prequeue just because the socket is in async mode: @@ -3819,7 +3820,7 @@ int tcp_rcv_established(struct sock *sk, if (tp->ucopy.task == current && tp->copied_seq == tp->rcv_nxt && len - tcp_header_len <= tp->ucopy.len && - sock_owned_by_user(sk)) { + sock_owned_by_user(sk) && !sock_async(sk)) { If the user blocked on a socket operation and we can do prequeue processing, directly copying to userspace, we should. Or are we zero-copy'ing here? 7) More prequeue bypassing when socket is in async mode: + if (sock_async(sk)) { + spin_lock_bh(&sk->sk_lock.slock); + ret = tcp_v4_do_rcv(sk, skb); + spin_unlock_bh(&sk->sk_lock.slock); + } else { + bh_lock_sock(sk); + if (!sock_owned_by_user(sk)) { + if (!tcp_prequeue(sk, skb)) + ret = tcp_v4_do_rcv(sk, skb); + } else + sk_add_backlog(sk, skb); + bh_unlock_sock(sk); + } I really think #6 and #7 need justification, because if we do end up with net channels, they will go via the TCP prequeue mechanism. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html