This set of patches implements VMWare VMXNET3 paravirtual NIC device. The device supports of all the device features including offload capabilties, VLANs and etc. The device is tested on different OSes: Fedora 15 Ubuntu 10.4 Centos 6.2 Windows 2008R2 Windows 2008 64bit Windows 2008 32bit Windows 2003 64bit Windows 2003 32bit
Changes in V9: Reported-by: Stefan Hajnoczi <stefa...@gmail.com> Issues reported by Stefan Hajnoczi fixed. Changes in V8: Reported-by: Stefan Hajnoczi <stefa...@gmail.com> Issues reported by Stefan Hajnoczi reviewed and mostly fixed: > + } > + curr_src_off += src[i].iov_len; > + } > + return j; > +} The existing iov_copy() function provides equivalent functionality. I don't think iov_rebuild() is needed. [DF] Done. Thanks, missed it. > + size -= len; > + } > + iovec_off += iov[i].iov_len; > + } > + return res; > +} Rename this net_checksum_add_iov() and place it in net/checksum.c, then the new dependency on net from block can be dropped. [DF] Done. > +vmw_shmem_read(hwaddr addr, void *buf, int len) > { > VMW_SHPRN("SHMEM r: %" PRIx64 ", len: %d to %p", addr, len, buf); > cpu_physical_memory_read(addr, buf, len); > } All changes to this file should be squashed with the previous patch. [DF] Done > +#ifdef VMXNET_DEBUG_SHMEM_ACCESS > +#define VMW_SHPRN(fmt, ...) > \ > + do { > \ > + printf("[%s][SH][%s]: " fmt "\n", VMXNET_DEVICE_NAME, __func__, > \ > + ## __VA_ARGS__); > \ > + } while (0) > +#else > +#define VMW_SHPRN(fmt, ...) do {} while (0) > +#endif Please use QEMU tracing. It eliminates all this boilerplate and conditional compilation. Tracing can be enabled/disabled at runtime and works with SystemTap/DTrace. See docs/tracing.txt. [DF] We'd like to stick with compile time logic in this case becase of 2 reasons: [DF] 1. These printouts are intended for reverse engineering/development only and there is [DF] no need to enable them at run time [DF] 2. There is a big number of printouts, all driver-device communication is traced, [DF] they hit performance even on strongest x86 in case of run-time logic > +struct eth_header { > + uint8_t h_dest[ETH_ALEN]; /* destination eth addr */ > + uint8_t h_source[ETH_ALEN]; /* source ether addr */ > + uint16_t h_proto; /* packet type ID field */ > +}; Looks like it's copy-pasted stuff from /usr/include/linux/if_*.h, /usr/include/netinet/*.h, and friends. If the system-wide headers are included names will collide for some of the macros at least. Did you check if the slirp/ definitions can be reused? [DF] Yes, you are right. This is copy-pasted from different places. [DF] Slips definishing do not fully cover our needs. I'd rather we import network header definitions once in a generic place into the source tree. That way vmxnet and other components don't need to redefine these structs. [DF] Exaclty! Our intention is to create generic header with network definitions and make everyone use it. [DF] We can move our header to some shared place if you want, however I'd do it in parallel with cleanup [DF] of similar definitions in existing code and this is a big change that os out of scope of these patches. > + > *===========================================================================*/ Is this huge comment box a sign that the code should be split into a foo_tx.c and an foo_rx.c file? [DF] As for me this file is not that big to be splitted (<800 lines), however I'll do this if you insist :) > +size_t vmxnet_tx_pkt_send(VmxnetTxPktH pkt, NetClientState *vc) 'vc' is an old name that was used for VLANClientState. The struct has since been renamed to NetClientState and the rest of QEMU uses 'nc' instead of 'vc'. [DF] Fixed. Thanks. > +/* tx module context handle */ > +typedef void *VmxnetTxPktH; Forward-declaring the struct is nicer: typedef struct VmxnetTxPkt VmxnetTxPkt; The definition of VmxnetTxPkt is still hidden from the caller but you avoid the void* and casting. In vmxnet_pkt.c define using: struct VmxnetTxPkt { ... }; [DF] Agree, fixed. Thanks. > diff --git a/hw/vmxnet_utils.h b/hw/vmxnet_utils.h > index 7fd9a01..fac3b7b 100644 > --- a/hw/vmxnet_utils.h > +++ b/hw/vmxnet_utils.h Please squash these fixes into the previous patch. [DF] Oops, my bad. Fixed. > + uint8_t msix_used; > + /* Whether MSI support was installed successfully */ > + uint8_t msi_used; These two fields should be bool. > + /* Whether automatic interrupts masking enabled */ > + uint8_t auto_int_masking; bool [DF] Done. > +static inline void vmxnet3_flush_shmem_changes(void) > +{ > + /* > + * Flush shared memory changes > + * Needed before sending interrupt to guest to ensure > + * it gets consistent memory state > + */ > + smp_wmb(); > +} It's useful to document why a memory barrier is being used in each instance. Therefore hiding smp_wmb() inside a wrapper function isn't great. [DF] Fixed. Also, it's suspicious that smb_wmb() is used but no other barriers are used. What about a read memory barrier when accessing shared memory written by the guest? [DF] VMWARE interface build a in safe way - you always read out data buffer (packet descriptor) with memcopy, [DF] and then check its last byte to see whether it was fully filled by driver. [DF] In this case device doesn't need read barriers. Drivers need write barriers of course to secure [DF] proper order of writes. Changes in V7: Reported-by: Michael S. Tsirkin <m...@redhat.com> Issues reported by Michael S. Tsirkin reviewed and mostly fixed: File vmware_utils.h: ... > +static inline void > +vmw_shmem_st64(target_phys_addr_t addr, uint64_t value) > +{ > + VMW_SHPRN("SHMEM store64: %" PRIx64 " (value %" PRIx64 ")", addr, value); > + stq_le_phys(addr, value); > +} > + Pls remove these wrappers. These are just memory stores. Our codebase is too large as it is without every driver wrapping all standard calls. [DF] Idea behind this macros is to have printout in each of them [DF] Printouts are really needed when reverse-engineering windows guest drivers [DF] Since there is no windows drivers code this is the only way to understand [DF] sequences they use > +/* MACROS for simplification of operations on array-style registers */ UPPERCASE ABUSE [DF] Fixed > +#define IS_MULTIREG_ADDR(addr, base, cnt, regsize) \ > + (((addr + 1) > (base)) && ((addr) < (base) + (cnt) * (regsize))) Same as range_covers_byte(base, cnt * regsize, addr)? [DF] Fixed, thanks > + > +#define MULTIREG_IDX_BY_ADDR(addr, base, regsize) \ > + (((addr) - (base)) / (regsize)) > + Above two macros is all that's left. No objection but it does not say what they do - want to add minimal documentation? And please prefix with VMWARE_ or something. [DF] Prefix added, macros documented File vmxnet_utils.h (and related with the same problems): ... > + * See the COPYING file in the top-level directory. > + * > + */ > + > +#ifndef _VMXNET_UTILS_H_ > +#define _VMXNET_UTILS_H_ Please do not start identifiers with _ followed by an uppercase lattters. [DF] Fixed ... > + > +struct eth_header { > + uint8_t h_dest[ETH_ALEN]; /* destination eth addr */ > + uint8_t h_source[ETH_ALEN]; /* source ether addr */ > + uint16_t h_proto; /* packet type ID field */ > +}; > + And fix struct definitions to 1. follow qemu coding style [DF] It's not clear what's wrong with the coding style. Please, elaborate. 2. start with vmxnet [DF] Since this is generic network definitions with no device specifics [DF] I'm not sure it makes sense to add vmxnet prefix. [DF] I'd say it makes sense to put them into some generic header files under "net" directory instead [DF] and clean up other devices that have their local definitions of the same structures. [DF] Please, advise. I also don't really understand why are these functions split out - vmxnet is the only user, no? [DF] Currently vmxnet3 is the only user, hovewer we have vmxnet2 implementation as well (not submitted yet) [DF] and it uses the same header File vmxnet_pkt.h: ... > + > +/* tx module context handle */ > +typedef void *VmxnetTxPktH; This gets you zero type safety and makes debugging impossible. Just use pointers like everyone does. [DF] Handle type is added to hide VmxnetTxPkt structure into .c header [DF] for better encapsulation. [DF] Should we drop it anyway? Files vmxnet3.*: ... > + > + if (zero_region) { > + vmw_shmem_set(pa, 0, size*cell_size); spaces around * [DF] Fixed ... > +#define vmxnet3_ring_dump(macro, ring_name, ridx, r) > \ > + macro("%s#%d: base %" PRIx64 " size %lu cell_size %lu gen %d next %lu", > \ > + (ring_name), (ridx), > \ > + (r)->pa, (r)->size, (r)->cell_size, (r)->gen, (r)->next) make macros upper case [DF] Fixed ... > +static inline void vmxnet3_flush_shmem_changes(void) > +{ > + /* > + * Flush shared memory changes > + * Needed before transferring control to guest what does 'transferring control to guest' mean? [DF] Changed to [DF] /* [DF] * Flush shared memory changes [DF] * Needed before sending interrupt to guest to ensure [DF] * it gets consistent memory state [DF] */ ... > + */ > + smp_wmb(); > +} Don't use wrappers like this. They just hide bugs. For example it's not helpful before an interrupt in the function below. [DF] I guess you are talking about vmxnet3_complete_packet() [DF] Strictly speaking barrier is a must because we change shared memory in [DF] vmxnet3_complete_packet() [DF] And the wrapper is a good thing because its name explains its effect [DF] in a formal way as opposed to comments ... > + switch (status) { > + case VMXNET3_PKT_STATUS_OK: { don't put {} around cases: they align incorrectly if it's too big move to a function. [DF] Fixed ... > +static bool > +vmxnet3_send_packet(VMXNET3State *s, uint32_t qidx) > +{ > + size_t bytes_sent = 0; > + bool res = true; why = true? don't initialize just because. [DF] Fixed ... > +/* > + * VMWARE headers we got from Linux kernel do not fully comply QEMU coding > + * standards in sense of types and defines used. > + * Since we didn't want to change VMWARE code, following set of typedefs > + * and defines needed to compile these headers with QEMU introduced. > + */ No need for this now. You can export headers and put them under linux-headers. [DF] Not sure it is possible because the header as-is is not stand-alone and won't compile [DF] without changes. We extracted definitions we use from their header and dropped unused [DF] and kernel-specific stuff. [DF} Please, advise. ... > + if (VMXNET3_OM_TSO == s->offload_mode) { Don't do Yoda style like this [DF] "Yoda" style removed everywhere Changes in V6: Fixed most of problems pointed out by Michael S. Tsirkin The only issue still open is creation of shared place with generic network structures and functions. Currently all generic network code introduced by VMXNET3 resides in vmxnet_utils.c/h files. It could be moved to some shared location however we believe it is a matter of separate refactoring as there are a lot of copy-pasted definitions in almost every device and code cleanup efforts requred in order to create truly shared codebase. Reported-by: Michael S. Tsirkin <m...@redhat.com> Implemented suggestions by Anthony Liguori Reported-by: Anthony Liguori <aligu...@us.ibm.com> Fixed incorrect checksum caclulation for some packets in SW offloads mode Reported-by: Gerhard Wiesinger <li...@wiesinger.com> Changes in V5: MSI-X save/load implemented in the device instead of pci bus as suggested by Michael S. Tsirkin Reported-by: Michael S. Tsirkin <m...@redhat.com> Patches regrouped as suggested by Paolo Bonzini Reported-by: Paolo Bonzini <pbonz...@redhat.com> Changes in V4: Fixed a few problems uncovered by NETIO test suit Assertion on failure to initialize MSI/MSI-X replaced with warning message and fallback to Legacy/MSI respectively Reported-by: Gerhard Wiesinger <li...@wiesinger.com> Various coding style adjustments and patch split-up as suggested by Anthony Liguori Reported-by: Anthony Liguori <aligu...@us.ibm.com> Live migration support added Changes in V3: Fixed crash when net device that is used as network fronted has no virtio HDR support. Task offloads emulation for cases when net device that is used as network fronted has no virtio HDR support. Reported-by: Gerhard Wiesinger <li...@wiesinger.com> Changes in V2: License text changed accoring to community suggestions Standard license header from GPLv2+ - licensed QEMU files used Dmitry Fleytman (5): Adding utility function net_checksum_add_cont() that allows checksum calculation of scattered data with odd chunk sizes Adding utility function net_checksum_add_iov() for iovec checksum calculation Adding common definitions for VMWARE devices Adding packet abstraction for VMWARE network devices Adding VMXNET3 device implementation default-configs/pci.mak | 1 + hw/Makefile.objs | 1 + hw/pci.h | 1 + hw/vmware_utils.h | 143 +++ hw/vmxnet3.c | 2437 +++++++++++++++++++++++++++++++++++++++++++++++ hw/vmxnet3.h | 762 +++++++++++++++ hw/vmxnet_debug.h | 121 +++ hw/vmxnet_pkt.c | 758 +++++++++++++++ hw/vmxnet_pkt.h | 309 ++++++ hw/vmxnet_utils.c | 219 +++++ hw/vmxnet_utils.h | 340 +++++++ iov.h | 5 + net/checksum.c | 41 +- net/checksum.h | 22 +- 14 files changed, 5153 insertions(+), 7 deletions(-) create mode 100644 hw/vmware_utils.h create mode 100644 hw/vmxnet3.c create mode 100644 hw/vmxnet3.h create mode 100644 hw/vmxnet_debug.h create mode 100644 hw/vmxnet_pkt.c create mode 100644 hw/vmxnet_pkt.h create mode 100644 hw/vmxnet_utils.c create mode 100644 hw/vmxnet_utils.h -- 1.7.11.7