On 04/08/2016 03:47 PM, Wei Xu wrote: > > > On 2016年04月05日 10:47, Jason Wang wrote: >> >> On 04/04/2016 03:25 AM, w...@redhat.com wrote: >>> From: Wei Xu <w...@redhat.com> >>> >>> All the data packets in a tcp connection will be cached to a big buffer >>> in every receive interval, and will be sent out via a timer, the >>> 'virtio_net_rsc_timeout' controls the interval, the value will >>> influent the >>> performance and response of tcp connection extremely, 50000(50us) is a >>> experience value to gain a performance improvement, since the whql test >>> sends packets every 100us, so '300000(300us)' can pass the test case, >>> this is also the default value, it's gonna to be tunable. >>> >>> The timer will only be triggered if the packets pool is not empty, >>> and it'll drain off all the cached packets >>> >>> 'NetRscChain' is used to save the segments of different protocols in a >>> VirtIONet device. >>> >>> The main handler of TCP includes TCP window update, duplicated ACK >>> check >>> and the real data coalescing if the new segment passed sanity check >>> and is identified as an 'wanted' one. >>> >>> An 'wanted' segment means: >>> 1. Segment is within current window and the sequence is the expected >>> one. >>> 2. ACK of the segment is in the valid window. >>> 3. If the ACK in the segment is a duplicated one, then it must less >>> than 2, >>> this is to notify upper layer TCP starting retransmission due to >>> the spec. >>> >>> Sanity check includes: >>> 1. Incorrect version in IP header >>> 2. IP options & IP fragment >>> 3. Not a TCP packets >>> 4. Sanity size check to prevent buffer overflow attack. >>> >>> There maybe more cases should be considered such as ip >>> identification other >>> flags, while it broke the test because windows set it to the same >>> even it's >>> not a fragment. >>> >>> Normally it includes 2 typical ways to handle a TCP control flag, >>> 'bypass' >>> and 'finalize', 'bypass' means should be sent out directly, and >>> 'finalize' >>> means the packets should also be bypassed, and this should be done >>> after searching for the same connection packets in the pool and sending >>> all of them out, this is to avoid out of data. >>> >>> All the 'SYN' packets will be bypassed since this always begin a new' >>> connection, other flags such 'FIN/RST' will trigger a finalization, >>> because >>> this normally happens upon a connection is going to be closed, an >>> 'URG' packet >>> also finalize current coalescing unit. >>> >>> Statistics can be used to monitor the basic coalescing status, the >>> 'out of order' >>> and 'out of window' means how many retransmitting packets, thus >>> describe the >>> performance intuitively. >>> >>> Signed-off-by: Wei Xu <w...@redhat.com> >>> --- >>> hw/net/virtio-net.c | 480 >>> ++++++++++++++++++++++++++++++++++++++++- >>> include/hw/virtio/virtio-net.h | 1 + >>> include/hw/virtio/virtio.h | 72 +++++++ >>> 3 files changed, 552 insertions(+), 1 deletion(-) >>> >>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>> index bd91a4b..81e8e71 100644 >>> --- a/hw/net/virtio-net.c >>> +++ b/hw/net/virtio-net.c >>> @@ -15,10 +15,12 @@ >>> #include "qemu/iov.h" >>> #include "hw/virtio/virtio.h" >>> #include "net/net.h" >>> +#include "net/eth.h" >>> #include "net/checksum.h" >>> #include "net/tap.h" >>> #include "qemu/error-report.h" >>> #include "qemu/timer.h" >>> +#include "qemu/sockets.h" >>> #include "hw/virtio/virtio-net.h" >>> #include "net/vhost_net.h" >>> #include "hw/virtio/virtio-bus.h" >>> @@ -38,6 +40,24 @@ >>> #define endof(container, field) \ >>> (offsetof(container, field) + sizeof(((container *)0)->field)) >>> +#define VIRTIO_NET_IP4_ADDR_SIZE 8 /* ipv4 saddr + daddr */ >>> +#define VIRTIO_NET_TCP_PORT_SIZE 4 /* sport + dport */ >>> + >>> +/* IPv4 max payload, 16 bits in the header */ >>> +#define VIRTIO_NET_MAX_IP4_PAYLOAD (65535 - sizeof(struct ip_header)) >>> +#define VIRTIO_NET_MAX_TCP_PAYLOAD 65535 >>> + >>> +/* header lenght value in ip header without option */ >> typo here. > Thanks. >> >>> +#define VIRTIO_NET_IP4_HEADER_LENGTH 5 >>> + >>> +/* Purge coalesced packets timer interval */ >>> +#define VIRTIO_NET_RSC_INTERVAL 300000 >>> + >>> +/* This value affects the performance a lot, and should be tuned >>> carefully, >>> + '300000'(300us) is the recommended value to pass the WHQL test, >>> '50000' can >>> + gain 2x netperf throughput with tso/gso/gro 'off'. */ >>> +static uint32_t virtio_net_rsc_timeout = VIRTIO_NET_RSC_INTERVAL; >> Like we've discussed in previous versions, need we add another property >> for this? > Do you know how to make this a tunable parameter to guest? can this > parameter be set via control queue?
It's possible I think. [...] >>> + >>> +static void virtio_net_rsc_purge(void *opq) >>> +{ >>> + NetRscChain *chain = (NetRscChain *)opq; >>> + NetRscSeg *seg, *rn; >>> + >>> + QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, rn) { >>> + if (virtio_net_rsc_drain_seg(chain, seg) == 0) { >>> + chain->stat.purge_failed++; >>> + continue; >> Is it better to break here, consider we fail to do the receive? > Actually this fails only when receive fails according to the test, but > should drain all the segments here, > if break here, then will have to free all the buffers, is it possible > a successful receiving followed by a failed one? I'd say it's possible, e.g guest just finish rx buffer refills. > if that does happens, i preferred give it a shot other than free it > directly. Ok. [...] >>> + >>> +static void virtio_net_rsc_cache_buf(NetRscChain *chain, >>> NetClientState *nc, >>> + const uint8_t *buf, size_t size) >>> +{ >>> + uint16_t hdr_len; >>> + NetRscSeg *seg; >>> + >>> + hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len; >>> + seg = g_malloc(sizeof(NetRscSeg)); >>> + seg->buf = g_malloc(hdr_len + sizeof(struct eth_header)\ >>> + + sizeof(struct ip6_header) + >>> VIRTIO_NET_MAX_TCP_PAYLOAD); >> Why ip6_header here? > ipv6 packets can carry 65535(2^16) bytes pure data payload, that means > the header is not included. > but ipv4 is included, i choose the maximum here. Let's add a comment here, since the patch's title is to implement ipv4 coalescing. [...] >>> + >>> +static int32_t virtio_net_rsc_coalesce4(NetRscChain *chain, >>> NetRscSeg *seg, >>> + const uint8_t *buf, size_t size, NetRscUnit >>> *unit) >> indentation is wrong here. > Should it be aligned from parameter begin? Looks like we'd better do this. > i was going to save an extra line parameter. [...]