[PATCH 1/2] Documentation: hyperv: Update spelling and fix typo
From: Michael Kelley Update spelling from "VMbus" to "VMBus" to match Hyper-V product documentation. Also correct typo: "SNP-SEV" should be "SEV-SNP". Signed-off-by: Michael Kelley --- Documentation/virt/hyperv/overview.rst | 22 +++ Documentation/virt/hyperv/vmbus.rst| 82 +- 2 files changed, 52 insertions(+), 52 deletions(-) diff --git a/Documentation/virt/hyperv/overview.rst b/Documentation/virt/hyperv/overview.rst index cd493332c88a..77408a89d1a4 100644 --- a/Documentation/virt/hyperv/overview.rst +++ b/Documentation/virt/hyperv/overview.rst @@ -40,7 +40,7 @@ Linux guests communicate with Hyper-V in four different ways: arm64, these synthetic registers must be accessed using explicit hypercalls. -* VMbus: VMbus is a higher-level software construct that is built on +* VMBus: VMBus is a higher-level software construct that is built on the other 3 mechanisms. It is a message passing interface between the Hyper-V host and the Linux guest. It uses memory that is shared between Hyper-V and the guest, along with various signaling @@ -54,8 +54,8 @@ x86/x64 architecture only. .. _Hyper-V Top Level Functional Spec (TLFS): https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs -VMbus is not documented. This documentation provides a high-level -overview of VMbus and how it works, but the details can be discerned +VMBus is not documented. This documentation provides a high-level +overview of VMBus and how it works, but the details can be discerned only from the code. Sharing Memory @@ -74,7 +74,7 @@ follows: physical address space. How Hyper-V is told about the GPA or list of GPAs varies. In some cases, a single GPA is written to a synthetic register. In other cases, a GPA or list of GPAs is sent - in a VMbus message. + in a VMBus message. * Hyper-V translates the GPAs into "real" physical memory addresses, and creates a virtual mapping that it can use to access the memory. @@ -133,9 +133,9 @@ only the CPUs actually present in the VM, so Linux does not report any hot-add CPUs. A Linux guest CPU may be taken offline using the normal Linux -mechanisms, provided no VMbus channel interrupts are assigned to -the CPU. See the section on VMbus Interrupts for more details -on how VMbus channel interrupts can be re-assigned to permit +mechanisms, provided no VMBus channel interrupts are assigned to +the CPU. See the section on VMBus Interrupts for more details +on how VMBus channel interrupts can be re-assigned to permit taking a CPU offline. 32-bit and 64-bit @@ -169,14 +169,14 @@ and functionality. Hyper-V indicates feature/function availability via flags in synthetic MSRs that Hyper-V provides to the guest, and the guest code tests these flags. -VMbus has its own protocol version that is negotiated during the -initial VMbus connection from the guest to Hyper-V. This version +VMBus has its own protocol version that is negotiated during the +initial VMBus connection from the guest to Hyper-V. This version number is also output to dmesg during boot. This version number is checked in a few places in the code to determine if specific functionality is present. -Furthermore, each synthetic device on VMbus also has a protocol -version that is separate from the VMbus protocol version. Device +Furthermore, each synthetic device on VMBus also has a protocol +version that is separate from the VMBus protocol version. Device drivers for these synthetic devices typically negotiate the device protocol version, and may test that protocol version to determine if specific device functionality is present. diff --git a/Documentation/virt/hyperv/vmbus.rst b/Documentation/virt/hyperv/vmbus.rst index d2012d9022c5..f0d83ebda626 100644 --- a/Documentation/virt/hyperv/vmbus.rst +++ b/Documentation/virt/hyperv/vmbus.rst @@ -1,8 +1,8 @@ .. SPDX-License-Identifier: GPL-2.0 -VMbus +VMBus = -VMbus is a software construct provided by Hyper-V to guest VMs. It +VMBus is a software construct provided by Hyper-V to guest VMs. It consists of a control path and common facilities used by synthetic devices that Hyper-V presents to guest VMs. The control path is used to offer synthetic devices to the guest VM and, in some cases, @@ -12,9 +12,9 @@ and the synthetic device implementation that is part of Hyper-V, and signaling primitives to allow Hyper-V and the guest to interrupt each other. -VMbus is modeled in Linux as a bus, with the expected /sys/bus/vmbus -entry in a running Linux guest. The VMbus driver (drivers/hv/vmbus_drv.c) -establishes the VMbus control path with the Hyper-V host, then +VMBus is modeled in Linux as a bus, with the expected /sys/bus/vmbus +entry in a running Linux guest. The VMBus driver (drivers/hv/vmbus_drv.c) +establishes the VMBus control path with the Hyper-V host, then registers itself as a Linux bus driver. It implements the standard bus functions for adding and removing
[PATCH 2/2] Documentation: hyperv: Improve synic and interrupt handling description
From: Michael Kelley Current documentation does not describe how Linux handles the synthetic interrupt controller (synic) that Hyper-V provides to guest VMs, nor how VMBus or timer interrupts are handled. Add text describing the synic and reorganize existing text to make this more clear. Signed-off-by: Michael Kelley --- Documentation/virt/hyperv/clocks.rst | 21 +--- Documentation/virt/hyperv/vmbus.rst | 79 ++-- 2 files changed, 66 insertions(+), 34 deletions(-) diff --git a/Documentation/virt/hyperv/clocks.rst b/Documentation/virt/hyperv/clocks.rst index a56f4837d443..919bb92d6d9d 100644 --- a/Documentation/virt/hyperv/clocks.rst +++ b/Documentation/virt/hyperv/clocks.rst @@ -62,12 +62,21 @@ shared page with scale and offset values into user space. User space code performs the same algorithm of reading the TSC and applying the scale and offset to get the constant 10 MHz clock. -Linux clockevents are based on Hyper-V synthetic timer 0. While -Hyper-V offers 4 synthetic timers for each CPU, Linux only uses -timer 0. Interrupts from stimer0 are recorded on the "HVS" line in -/proc/interrupts. Clockevents based on the virtualized PIT and -local APIC timer also work, but the Hyper-V synthetic timer is -preferred. +Linux clockevents are based on Hyper-V synthetic timer 0 (stimer0). +While Hyper-V offers 4 synthetic timers for each CPU, Linux only uses +timer 0. In older versions of Hyper-V, an interrupt from stimer0 +results in a VMBus control message that is demultiplexed by +vmbus_isr() as described in the VMBus documentation. In newer versions +of Hyper-V, stimer0 interrupts can be mapped to an architectural +interrupt, which is referred to as "Direct Mode". Linux prefers +to use Direct Mode when available. Since x86/x64 doesn't support +per-CPU interrupts, Direct Mode statically allocates an x86 interrupt +vector (HYPERV_STIMER0_VECTOR) across all CPUs and explicitly codes it +to call the stimer0 interrupt handler. Hence interrupts from stimer0 +are recorded on the "HVS" line in /proc/interrupts rather than being +associated with a Linux IRQ. Clockevents based on the virtualized +PIT and local APIC timer also work, but Hyper-V stimer0 +is preferred. The driver for the Hyper-V synthetic system clock and timers is drivers/clocksource/hyperv_timer.c. diff --git a/Documentation/virt/hyperv/vmbus.rst b/Documentation/virt/hyperv/vmbus.rst index f0d83ebda626..1dcef6a7fda3 100644 --- a/Documentation/virt/hyperv/vmbus.rst +++ b/Documentation/virt/hyperv/vmbus.rst @@ -102,10 +102,10 @@ resources. For Windows Server 2019 and later, this limit is approximately 1280 Mbytes. For versions prior to Windows Server 2019, the limit is approximately 384 Mbytes. -VMBus messages --- -All VMBus messages have a standard header that includes the message -length, the offset of the message payload, some flags, and a +VMBus channel messages +-- +All messages sent in a VMBus channel have a standard header that includes +the message length, the offset of the message payload, some flags, and a transactionID. The portion of the message after the header is unique to each VSP/VSC pair. @@ -137,7 +137,7 @@ control message contains a list of GPAs that describe the data buffer. For example, the storvsc driver uses this approach to specify the data buffers to/from which disk I/O is done. -Three functions exist to send VMBus messages: +Three functions exist to send VMBus channel messages: 1. vmbus_sendpacket(): Control-only messages and messages with embedded data -- no GPAs @@ -165,6 +165,37 @@ performed in this temporary buffer without the risk of Hyper-V maliciously modifying the message after it is validated but before it is used. +Synthetic Interrupt Controller (synic) +-- +Hyper-V provides each guest CPU with a synthetic interrupt controller +that is used by VMBus for host-guest communication. While each synic +defines 16 synthetic interrupts (SINT), Linux uses only one of the 16 +(VMBUS_MESSAGE_SINT). All interrupts related to communication between +the Hyper-V host and a guest CPU use that SINT. + +The SINT is mapped to a single per-CPU architectural interrupt (i.e, +an 8-bit x86/x64 interrupt vector, or an arm64 PPI INTID). Because +each CPU in the guest has a synic and may receive VMBus interrupts, +they are best modeled in Linux as per-CPU interrupts. This model works +well on arm64 where a single per-CPU Linux IRQ is allocated for +VMBUS_MESSAGE_SINT. This IRQ appears in /proc/interrupts as an IRQ labelled +"Hyper-V VMbus". Since x86/x64 lacks support for per-CPU IRQs, an x86 +interrupt vector is statically allocated (HYPERVISOR_CALLBACK_VECTOR) +across all CPUs and explicitly coded to call vmbus_isr(). In this case, +there's no Linux IRQ, and the interrupts are visible in aggregate in +/proc/interrupts on the "HYP" line. + +The synic provides the means to demultiplex the architectural i
Re: [PATCH] Documentation: tracing: Fix spelling mistakes
Saurav Shah writes: > Fix spelling mistakes in the documentation. > > Signed-off-by: Saurav Shah > --- > Documentation/trace/fprobetrace.rst | 4 ++-- > Documentation/trace/ftrace.rst | 2 +- > Documentation/trace/kprobetrace.rst | 2 +- > 3 files changed, 4 insertions(+), 4 deletions(-) Applied, thanks. jon
Re: [PATCH net-next v12 0/4] ethtool: provide the dim profile fine-tuning channel
On Sat, 4 May 2024 14:44:43 +0800, Heng Qi wrote: > The NetDIM library provides excellent acceleration for many modern > network cards. However, the default profiles of DIM limits its maximum > capabilities for different NICs, so providing a way which the NIC can > be custom configured is necessary. > > Currently, the way is based on the commonly used "ethtool -C". > > Please review, thank you very much! Hi, I would like to confirm if there are still comments on the current version, since the current series and the just merged "Remove RTNL lock protection of CVQ" conflict with a line of code with the fourth patch, if I can collect other comments or ack/review tags, then release the new version seems better. Thank you very much! > > Changelog > = > v11->v12: > - Remove the use of IS_ENABLED(DIMLIB). > - Update Simon's htmldoc hint. > > v10->v11: > - Fix and clean up some issues from Kuba, thanks. > - Rebase net-next/main > > v9->v10: > - Collect dim related flags/mode/work into one place. > - Use rx_profile + tx_profile instead of four profiles. > - Add several helps. > - Update commit logs. > > v8->v9: > - Fix the compilation error of conflicting names of rx_profile in > dim.h and ice driver: in dim.h, rx_profile is replaced with > dim_rx_profile. So does tx_profile. > > v7->v8: > - Use kmemdup() instead of kzalloc()/memcpy() in dev_dim_profile_init(). > > v6->v7: > - A new wrapper struct pointer is used in struct net_device. > - Add IS_ENABLED(CONFIG_DIMLIB) to avoid compiler warnings. > - Profile fields changed from u16 to u32. > > v5->v6: > - Place the profile in netdevice to bypass the driver. > The interaction code of ethtool <-> kernel has not changed at all, > only the interaction part of kernel <-> driver has changed. > > v4->v5: > - Update some snippets from Kuba. > > v3->v4: > - Some tiny updates and patch 1 only add a new comment. > > v2->v3: > - Break up the attributes to avoid the use of raw c structs. > - Use per-device profile instead of global profile in the driver. > > v1->v2: > - Use ethtool tool instead of net-sysfs. > > Heng Qi (4): > linux/dim: move useful macros to .h file > ethtool: provide customized dim profile management > dim: add new interfaces for initialization and getting results > virtio-net: support dim profile fine-tuning > > Documentation/netlink/specs/ethtool.yaml | 31 +++ > Documentation/networking/ethtool-netlink.rst | 4 + > drivers/net/virtio_net.c | 47 +++- > include/linux/dim.h | 114 > include/linux/ethtool.h | 4 +- > include/linux/netdevice.h| 3 + > include/uapi/linux/ethtool_netlink.h | 22 ++ > lib/dim/net_dim.c| 145 ++- > net/ethtool/coalesce.c | 259 ++- > 9 files changed, 613 insertions(+), 16 deletions(-) > > -- > 2.32.0.3.g01195cf9f > >
Re: [PATCH net-next v12 0/4] ethtool: provide the dim profile fine-tuning channel
On Wed, 8 May 2024 10:12:35 +0800 Heng Qi wrote: > I would like to confirm if there are still comments on the current version, > since the current series and the just merged "Remove RTNL lock protection of > CVQ" conflict with a line of code with the fourth patch, if I can collect > other comments or ack/review tags, then release the new version seems better. Looking now! Please note that I merged a patch today which makes DIMLIB a tri-state config, meaning it can be a module now. So please double check that didn't break things, especially referring to dim symbols from the core code.
Re: [PATCH net-next v12 2/4] ethtool: provide customized dim profile management
On Sat, 4 May 2024 14:44:45 +0800 Heng Qi wrote: > @@ -1325,6 +1354,8 @@ operations: > - tx-aggr-max-bytes > - tx-aggr-max-frames > - tx-aggr-time-usecs > +- rx-profile > +- tx-profile >dump: *coalesce-get-op > - >name: coalesce-set set probably needs to get the new attributes, too? > Request is rejected if it attributes declared as unsupported by driver (i.e. > diff --git a/include/linux/dim.h b/include/linux/dim.h > index 43398f5eade2..d848b790ca50 100644 > --- a/include/linux/dim.h > +++ b/include/linux/dim.h > @@ -9,6 +9,7 @@ > #include > #include > #include > +#include looks unnecessary, you just need a forward declaration of struct net_device, no? > diff --git a/lib/dim/net_dim.c b/lib/dim/net_dim.c > index 67d5beb34dc3..b3e01619f929 100644 > --- a/lib/dim/net_dim.c > +++ b/lib/dim/net_dim.c > @@ -4,6 +4,7 @@ > */ > > #include > +#include > > /* > * Net DIM profiles: > @@ -95,6 +96,76 @@ net_dim_get_def_tx_moderation(u8 cq_period_mode) > } > EXPORT_SYMBOL(net_dim_get_def_tx_moderation); > > +int net_dim_init_irq_moder(struct net_device *dev, u8 profile_flags, > +u8 coal_flags, u8 rx_mode, u8 tx_mode, > +void (*rx_dim_work)(struct work_struct *work), > +void (*tx_dim_work)(struct work_struct *work)) > +{ > + struct dim_cq_moder *rxp = NULL, *txp; > + struct dim_irq_moder *moder; > + int len; > + > + dev->irq_moder = kzalloc(sizeof(*dev->irq_moder), GFP_KERNEL); > + if (!dev->irq_moder) > + goto err_moder; return the error directly here, no need to goto > + moder = dev->irq_moder; > + len = NET_DIM_PARAMS_NUM_PROFILES * sizeof(*moder->rx_profile); > + > + moder->coal_flags = coal_flags; > + moder->profile_flags = profile_flags; > + > + if (profile_flags & DIM_PROFILE_RX) { > + moder->rx_dim_work = rx_dim_work; > + WRITE_ONCE(moder->dim_rx_mode, rx_mode); why WRITE_ONCE()? The structure can't be used, yet > + rxp = kmemdup(rx_profile[rx_mode], len, GFP_KERNEL); > + if (!rxp) > + goto err_rx_profile; name the labels after the target, please, not the source > + rcu_assign_pointer(moder->rx_profile, rxp); > + } > +static int ethnl_update_profile(struct net_device *dev, > + struct dim_cq_moder __rcu **dst, > + const struct nlattr *nests, > + struct netlink_ext_ack *extack) > + rcu_assign_pointer(*dst, new_profile); > + kfree_rcu(old_profile, rcu); > + > + return 0; Don't we need to inform DIM somehow that profile has switched and it should restart itself?
Re: [PATCH net-next v12 0/4] ethtool: provide the dim profile fine-tuning channel
On Sat, 4 May 2024 14:44:43 +0800 Heng Qi wrote: > The NetDIM library provides excellent acceleration for many modern > network cards. However, the default profiles of DIM limits its maximum > capabilities for different NICs, so providing a way which the NIC can > be custom configured is necessary. > > Currently, the way is based on the commonly used "ethtool -C". > > Please review, thank you very much! Good progress! Please make sure to also update Documentation/networking/net_dim.rst in the next version.