On 2/26/2025 3:08 PM, Nuno Das Neves wrote:
> Provide a set of IOCTLs for creating and managing child partitions when
> running as root partition on Hyper-V. The new driver is enabled via
> CONFIG_MSHV_ROOT.
>
> A brief overview of the interface:
>
> MSHV_CREATE_PARTITION is the entry point, retu
On Wed, Feb 26 2025 at 18:18, Sean Christopherson wrote:
> When resuming timekeeping after suspend, restore clocksources prior to
> reading the persistent clock. Paravirt clocks, e.g. kvmclock, tie the
> validity of a PV persistent clock to a clocksource, i.e. reading the PV
> persistent clock wi
On 2/26/2025 8:03 PM, Greg Kroah-Hartman wrote:
On Wed, Feb 26, 2025 at 05:51:46PM +0530, Naman Jain wrote:
On 2/26/2025 3:33 PM, Greg Kroah-Hartman wrote:
On Wed, Feb 26, 2025 at 10:43:41AM +0530, Naman Jain wrote:
On 2/25/2025 2:09 PM, Greg Kroah-Hartman wrote:
On Tue, Feb 25, 2025 a
On 2/26/2025 3:07 PM, Nuno Das Neves wrote:
> These non-nested msr and fast hypercall functions are present in x86,
> but they must be available in both architetures for the root partition
nit: *architectures*
> driver code.
>
> Signed-off-by: Nuno Das Neves
> ---
> arch/arm64/hyperv/hv_core.
On 2/26/2025 3:07 PM, Nuno Das Neves wrote:
> Introduce hv_result_to_string() for this purpose. This allows
> hypercall failures to be debugged more easily with dmesg.
>
Let the commit message stand on its own, i.e. state that hv_result_to_string()
is introduced to convert hyper-v status codes to
When running as a KVM guest with kvmclock support enabled, stuff the APIC
timer period/frequency with the core crystal frequency from CPUID.0x15 (if
CPUID.0x15 is provided). KVM's ABI adheres to Intel's SDM, which states
that the APIC timer runs at the core crystal frequency when said frequency
is
When kvmclock and CPUID.0x15 are both present, use the TSC frequency from
CPUID.0x15 instead of kvmclock's frequency. Barring a misconfigured
setup, both sources should provide the same frequency, CPUID.0x15 is
arguably a better source when using the TSC over kvmclock, and most
importantly, using
Rework the seemingly generic x86_cpuinit_ops.early_percpu_clock_init hook
into a dedicated PV sched_clock hook, as the only reason the hook exists
is to allow kvmclock to enable its PV clock on secondary CPUs before the
kernel tries to reference sched_clock, e.g. when grabbing a timestamp for
print
Prefer the TSC over kvmclock for sched_clock if the TSC is constant,
nonstop, and not marked unstable via command line. I.e. use the same
criteria as tweaking the clocksource rating so that TSC is preferred over
kvmclock. Per the below comment from native_sched_clock(), sched_clock
is more tolera
Mark the TSC as reliable if the hypervisor (KVM) has enumerated the TSC
as constant and nonstop, and the admin hasn't explicitly marked the TSC
as unstable. Like most (all?) virtualization setups, any secondary
clocksource that's used as a watchdog is guaranteed to be less reliable
than a constant
When registering a TSC frequency calibration routine, sanity check that
the incoming routine is as robust as the outgoing routine, and reject the
incoming routine if the sanity check fails.
Because native calibration routines only mark the TSC frequency as known
and reliable when they actually run
Silently ignore attempts to switch to a paravirt sched_clock when running
as a CoCo guest with trusted TSC. In hand-wavy theory, a misbehaving
hypervisor could attack the guest by manipulating the PV clock to affect
guest scheduling in some weird and/or predictable way. More importantly,
reading
If CPUID.0x16 is present and valid, use the CPU frequency provided by
CPUID instead of assuming that the virtual CPU runs at the same
frequency as TSC and/or kvmclock. Back before constant TSCs were a
thing, treating the TSC and CPU frequencies as one and the same was
somewhat reasonable, but now
Add a return code to __paravirt_set_sched_clock() so that the kernel can
reject attempts to use a PV sched_clock without breaking the caller. E.g.
when running as a CoCo VM with a secure TSC, using a PV clock is generally
undesirable.
Note, kvmclock is the only PV clock that does anything "extra"
Annotate __paravirt_set_sched_clock() as __init, and make its wrapper
__always_inline to ensure sanitizers don't result in a non-inline version
hanging around. All callers run during __init, and changing sched_clock
after boot would be all kinds of crazy.
No functional change intended.
Signed-of
Add a "tsc_properties" set of flags and use it to annotate whether the
TSC operates at a known and/or reliable frequency when registering a
paravirtual TSC calibration routine. Currently, each PV flow manually
sets the associated feature flags, but often in haphazard fashion that
makes it difficul
In anticipation of making x86_cpuinit.early_percpu_clock_init(), i.e.
kvm_setup_secondary_clock(), a dedicated sched_clock hook that will be
invoked if and only if kvmclock is set as sched_clock, ensure APs enable
their kvmclock during CPU online. While a redundant write to the MSR is
technically
Move the code to mark the TSC as reliable from sme_early_init() to
snp_secure_tsc_init(). The only reader of TSC_RELIABLE is the aptly
named check_system_tsc_reliable(), which runs in tsc_init(), i.e.
after snp_secure_tsc_init().
This will allow consolidating the handling of TSC_KNOWN_FREQ and
TS
WARN if kvmclock is still suspended when its wallclock is read, i.e. when
the kernel reads its persistent clock. The wallclock subtly depends on
the BSP's kvmclock being enabled, and returns garbage if kvmclock is
disabled.
Signed-off-by: Sean Christopherson
---
arch/x86/kernel/kvmclock.c | 7 +
Save/restore kvmclock across suspend/resume via clocksource hooks when
kvmclock isn't being used for sched_clock. This will allow using kvmclock
as a clocksource (or for wallclock!) without also using it for sched_clock.
Signed-off-by: Sean Christopherson
---
arch/x86/kernel/kvmclock.c | 23 +++
When resuming timekeeping after suspend, restore clocksources prior to
reading the persistent clock. Paravirt clocks, e.g. kvmclock, tie the
validity of a PV persistent clock to a clocksource, i.e. reading the PV
persistent clock will return garbage if the underlying PV clocksource
hasn't been ena
Annotate xen_setup_vsyscall_time_info() as being used only during kernel
initialization; it's called only by xen_time_init(), which is already
tagged __init.
Signed-off-by: Sean Christopherson
---
arch/x86/xen/time.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/xe
WARN if the common PV clock valid_flags are overwritten; all PV clocks
expect that they are the one and only PV clock, i.e. don't guard against
another PV clock having modified the flags.
Signed-off-by: Sean Christopherson
---
arch/x86/kernel/pvclock.c | 1 +
1 file changed, 1 insertion(+)
diff
Clean up the setting of PVCLOCK_TSC_STABLE_BIT during kvmclock init to
make it somewhat obvious that pvclock_read_flags() must be called *after*
pvclock_set_flags().
Note, in theory, a different PV clock could have set PVCLOCK_TSC_STABLE_BIT
in the supported flags, i.e. reading flags only if
KVM_F
Now that Xen PV clock and kvmclock explicitly do setup only during init,
tag the common PV clock flags/vsyscall variables and their mutators with
__init.
Signed-off-by: Sean Christopherson
---
arch/x86/kernel/pvclock.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/a
Move kvm_sched_clock_init() "down" so that it can reference the global
kvm_clock structure without needing a forward declaration.
Opportunistically mark the helper as "__init" instead of "inline" to make
its usage more obvious; modern compilers don't need a hint to inline a
single-use function, an
Pass in a PV clock's save/restore helpers when configuring sched_clock
instead of relying on each PV clock to manually set the save/restore hooks.
In addition to bringing sanity to the code, this will allow gracefully
"rejecting" a PV sched_clock, e.g. when running as a CoCo guest that has
access t
Now that all PV clocksources override the sched_clock save/restore hooks
when overriding sched_clock, WARN if the "default" TSC hooks are invoked
when using a PV sched_clock, e.g. to guard against regressions.
Signed-off-by: Sean Christopherson
---
arch/x86/kernel/tsc.c | 4 ++--
1 file changed,
Nullify the sched_clock save/restore hooks when using VMware's version of
sched_clock. This will allow extending paravirt_set_sched_clock() to set
the save/restore hooks, without having to simultaneously change the
behavior of VMware guests.
Note, it's not at all obvious that it's safe/correct fo
Nullify the x86_platform sched_clock save/restore hooks when setting up
Xen's PV clock to make it somewhat obvious the hooks aren't used when
running as a Xen guest (Xen uses a paravirtualized suspend/resume flow).
Signed-off-by: Sean Christopherson
---
arch/x86/xen/time.c | 6 ++
1 file cha
Move kvmclock's sched_clock save/restore helper "up" so that they can
(eventually) be referenced by kvm_sched_clock_init().
No functional change intended.
Signed-off-by: Sean Christopherson
---
arch/x86/kernel/kvmclock.c | 108 ++---
1 file changed, 54 insertions
Don't disable kvmclock on the BSP during syscore_suspend(), as the BSP's
clock is NOT restored during syscore_resume(), but is instead restored
earlier via the sched_clock restore callback. If suspend is aborted, e.g.
due to a late wakeup, the BSP will run without its clock enabled, which
"works"
Now that Hyper-V overrides the sched_clock save/restore hooks if and only
sched_clock itself is set to the Hyper-V timer, drop the invocation of the
"old" save/restore callbacks. When the registration of the PV sched_clock
was done separate from overriding the save/restore hooks, it was possible
f
Gate kvmclock's secondary CPU code on CONFIG_SMP, not CONFIG_X86_LOCAL_APIC.
Originally, kvmclock piggybacked PV APIC ops to setup secondary CPUs.
When that wart was fixed by commit df156f90a0f9 ("x86: Introduce
x86_cpuinit.early_percpu_clock_init hook"), the dependency on a local APIC
got carried
Move the handling of unstable PV clocks, of which kvmclock is the only
example, into paravirt_set_sched_clock(). This will allow modifying
paravirt_set_sched_clock() to keep using the TSC for sched_clock in
certain scenarios without unintentionally marking the TSC-based clock as
unstable.
No func
Now that all of the Hyper-V timer sched_clock code is located in a single
file, drop the superfluous wrappers for the save/restore flows.
No functional change intended.
Signed-off-by: Sean Christopherson
---
drivers/clocksource/hyperv_timer.c | 34 +-
include/clockso
Register the Hyper-V timer callbacks or saving/restoring its PV sched_clock
if and only if the timer is actually being used for sched_clock.
Currently, Hyper-V overrides the save/restore hooks if the reference TSC
available, whereas the Hyper-V timer code only overrides sched_clock if
the reference
Mark the TSC frequency as known when using ACRN's PV CPUID information.
Per commit 81a71f51b89e ("x86/acrn: Set up timekeeping") and common sense,
the TSC freq is explicitly provided by the hypervisor.
Signed-off-by: Sean Christopherson
---
arch/x86/kernel/cpu/acrn.c | 1 +
1 file changed, 1 ins
When running as a TDX guest, explicitly override the TSC frequency
calibration routine with CPUID-based calibration instead of potentially
relying on a hypervisor-controlled PV routine. For TDX guests, CPUID.0x15
is always emulated by the TDX-Module, i.e. the information from CPUID is
more trustwo
Move the check on having a Secure TSC to the common tsc_early_init() so
that it's obvious that having a Secure TSC is conditional, and to prepare
for adding TDX to the mix (blindly initializing *both* SNP and TDX TSC
logic looks especially weird).
No functional change intended.
Cc: Tom Lendacky
Add a helper to register non-native, i.e. PV and CoCo, CPU and TSC
frequency calibration routines. This will allow consolidating handling
of common TSC properties that are forced by hypervisor (PV routines),
and will also allow adding sanity checks to guard against overriding a
TSC calibration rou
Extract the guts of cpu_khz_from_cpuid() to a standalone helper that
doesn't restrict the usage to Intel CPUs. This will allow sharing the
core logic with kvmclock, as (a) CPUID.0x16 may be enumerated alongside
kvmclock, and (b) KVM generally doesn't restrict CPUID based on vendor.
No functional
Extract retrieval of TSC frequency information from CPUID into standalone
helpers so that TDX guest support and kvmlock can reuse the logic. Provide
a version that includes the multiplier math as TDX in particular does NOT
want to use native_calibrate_tsc()'s fallback logic that derives the TSC
fr
This... snowballed a bit.
The bulk of the changes are in kvmclock and TSC, but pretty much every
hypervisor's guest-side code gets touched at some point. I am reaonsably
confident in the correctness of the KVM changes. For all other hypervisors,
assume it's completely broken until proven otherwi
From: Nuno Das Neves Sent: Wednesday,
February 26, 2025 4:15 PM
>
> On 2/26/2025 12:06 PM, mhkelle...@gmail.com wrote:
> > From: Michael Kelley
> >
> > Current code allocates the "hyperv_pcpu_input_arg", and in
> > some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
> > page of m
From: Long Li
Hyper-V may offer a non latency sensitive device with subchannels without
monitor bit enabled. The decision is entirely on the Hyper-V host not
configurable within guest.
When a device has subchannels, also signal events for the subchannel
if its monitor bit is disabled.
Signed-of
On 2/26/2025 12:06 PM, mhkelle...@gmail.com wrote:
> From: Michael Kelley
>
> Current code allocates the "hyperv_pcpu_input_arg", and in
> some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
> page of memory allocated per-vCPU. A hypercall call site disables
> interrupts, then uses
On Wed, Feb 26, 2025 at 03:08:03PM -0800, Nuno Das Neves wrote:
> A few additional definitions are required for the mshv driver code
> (to follow). Introduce those here and clean up a little bit while
> at it.
>
> Signed-off-by: Nuno Das Neves
> ---
> include/hyperv/hvgdk_mini.h | 64 ++
On Wed, Feb 26, 2025 at 03:08:02PM -0800, Nuno Das Neves wrote:
> This will handle SYNIC interrupts such as intercepts, doorbells, and
> scheduling messages intended for the mshv driver.
>
> Signed-off-by: Nuno Das Neves
> Reviewed-by: Wei Liu
> Reviewed-by: Tianyu Lan
> ---
> arch/x86/kernel/
On Wed, Feb 26, 2025 at 03:08:01PM -0800, Nuno Das Neves wrote:
> Add a pointer hv_synic_eventring_tail to track the tail pointer for the
> SynIC event ring buffer for each SINT.
>
> This will be used by the mshv driver, but must be tracked independently
> since the driver module could be removed
On Wed, Feb 26, 2025 at 03:08:00PM -0800, Nuno Das Neves wrote:
> get_hypervisor_version, hv_call_deposit_pages, hv_call_create_vp,
> hv_call_deposit_pages, and hv_call_create_vp are all needed in module
> with CONFIG_MSHV_ROOT=m.
>
Reviewed-by: Stanislav Kinsburskii
On Wed, Feb 26, 2025 at 03:07:59PM -0800, Nuno Das Neves wrote:
> node_to_pxm() is used by hv_numa_node_to_pxm_info().
> That helper will be used by Hyper-V root partition module code
> when CONFIG_MSHV_ROOT=m.
>
Reviewed-by: Stanislav Kinsburskii
On Wed, Feb 26, 2025 at 03:07:58PM -0800, Nuno Das Neves wrote:
> Factor out the check for enabling auto eoi, to be reused in root
> partition code.
>
Reviewed-by: Stanislav Kinsburskii
On Wed, Feb 26, 2025 at 03:07:57PM -0800, Nuno Das Neves wrote:
> These non-nested msr and fast hypercall functions are present in x86,
> but they must be available in both architetures for the root partition
> driver code.
>
Reviewed-by: Stanislav Kinsburskii
On Wed, Feb 26, 2025 at 03:07:56PM -0800, Nuno Das Neves wrote:
> From: Stanislav Kinsburskii
>
> Extend the "ms_hyperv_info" structure to include a new field,
> "ext_features", for capturing extended Hyper-V features.
> Update the "ms_hyperv_init_platform" function to retrieve these features
> u
On Wed, Feb 26, 2025 at 03:07:55PM -0800, Nuno Das Neves wrote:
> Introduce hv_result_to_string() for this purpose. This allows
> hypercall failures to be debugged more easily with dmesg.
>
> Signed-off-by: Nuno Das Neves
> ---
> drivers/hv/hv_common.c | 65 ++
A few additional definitions are required for the mshv driver code
(to follow). Introduce those here and clean up a little bit while
at it.
Signed-off-by: Nuno Das Neves
---
include/hyperv/hvgdk_mini.h | 64 -
include/hyperv/hvhdk.h | 132 ++-
get_hypervisor_version, hv_call_deposit_pages, hv_call_create_vp,
hv_call_deposit_pages, and hv_call_create_vp are all needed in module
with CONFIG_MSHV_ROOT=m.
Signed-off-by: Nuno Das Neves
---
arch/arm64/hyperv/mshyperv.c | 1 +
arch/x86/kernel/cpu/mshyperv.c | 1 +
drivers/hv/hv_common.c
Add a pointer hv_synic_eventring_tail to track the tail pointer for the
SynIC event ring buffer for each SINT.
This will be used by the mshv driver, but must be tracked independently
since the driver module could be removed and re-inserted.
Signed-off-by: Nuno Das Neves
Reviewed-by: Wei Liu
---
Introduce hv_result_to_string() for this purpose. This allows
hypercall failures to be debugged more easily with dmesg.
Signed-off-by: Nuno Das Neves
---
drivers/hv/hv_common.c | 65 ++
drivers/hv/hv_proc.c | 13 ---
include/asm-generic/mshyp
node_to_pxm() is used by hv_numa_node_to_pxm_info().
That helper will be used by Hyper-V root partition module code
when CONFIG_MSHV_ROOT=m.
Signed-off-by: Nuno Das Neves
---
drivers/acpi/numa/srat.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa
This will handle SYNIC interrupts such as intercepts, doorbells, and
scheduling messages intended for the mshv driver.
Signed-off-by: Nuno Das Neves
Reviewed-by: Wei Liu
Reviewed-by: Tianyu Lan
---
arch/x86/kernel/cpu/mshyperv.c | 9 +
drivers/hv/hv_common.c | 5 +
include/
This series introduces support for creating and running guest virtual
machines while running on the Microsoft Hypervisor[0] as root partition.
This is done via an IOCTL interface accessed through /dev/mshv, similar to
/dev/kvm. Another series introducing this support was previously posted in
2021[1
Factor out the check for enabling auto eoi, to be reused in root
partition code.
Signed-off-by: Nuno Das Neves
---
drivers/hv/hv.c| 12 +---
include/asm-generic/mshyperv.h | 13 +
2 files changed, 14 insertions(+), 11 deletions(-)
diff --git a/drivers/hv/hv.c
These non-nested msr and fast hypercall functions are present in x86,
but they must be available in both architetures for the root partition
driver code.
Signed-off-by: Nuno Das Neves
---
arch/arm64/hyperv/hv_core.c | 17 +
arch/arm64/include/asm/mshyperv.h | 12 +++
From: Stanislav Kinsburskii
Extend the "ms_hyperv_info" structure to include a new field,
"ext_features", for capturing extended Hyper-V features.
Update the "ms_hyperv_init_platform" function to retrieve these features
using the cpuid instruction and include them in the informational output.
Si
On 2/26/2025 12:06 PM, mhkelle...@gmail.com wrote:
> From: Michael Kelley
>
> The hypercall in hv_mark_gpa_visibility() is invoked with an input
> argument and an output argument. The output argument ostensibly returns
> the number of pages that were processed. But in fact, the hypercall does
> n
From: Michael Kelley
Current code allocates the "hyperv_pcpu_input_arg", and in
some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
page of memory allocated per-vCPU. A hypercall call site disables
interrupts, then uses this memory to set up the input parameters for
the hypercall,
From: Michael Kelley
Update hypercall call sites to use the new hv_hvcall_*() functions
to set up hypercall arguments. Since these functions zero the
fixed portion of input memory, remove now redundant calls to memset().
Signed-off-by: Michael Kelley
---
drivers/pci/controller/pci-hyperv.c | 1
From: Michael Kelley
Update hypercall call sites to use the new hv_hvcall_*() functions
to set up hypercall arguments. Since these functions zero the
fixed portion of input memory, remove now redundant zero'ing of
input fields.
hv_post_message() requires additional updates. The payload area is
t
From: Michael Kelley
Update hypercall call sites to use the new hv_hvcall_*() functions
to set up hypercall arguments. Since these functions zero the
fixed portion of input memory, remove now redundant calls to memset()
and explicit zero'ing of input fields.
For hv_mark_gpa_visibility(), use the
From: Michael Kelley
Update hypercall call sites to use the new hv_hvcall_*() functions
to set up hypercall arguments. Since these functions zero the
fixed portion of input memory, remove now redundant calls to memset()
and explicit zero'ing of input fields.
Signed-off-by: Michael Kelley
---
a
From: Michael Kelley
All open coded uses of hyperv_pcpu_input_arg and hyperv_pcpu_ouput_arg
have been replaced by hv_hvcall_*() functions. So combine
hyperv_pcpu_input_arg and hyperv_pcpu_output_arg in a single
hyperv_pcpu_arg. Remove logic for managing a separate output arg. Fixup
comment refere
From: Michael Kelley
This patch set introduces a new way to manage the use of the per-cpu
memory that is usually the input and output arguments to Hyper-V
hypercalls. Current code allocates the "hyperv_pcpu_input_arg", and in
some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
page
From: Michael Kelley
The hypercall in hv_mark_gpa_visibility() is invoked with an input
argument and an output argument. The output argument ostensibly returns
the number of pages that were processed. But in fact, the hypercall does
not provide any output, so the output argument is spurious.
The
On Sat 2025-02-22 14:44:05, Ryo Takakura wrote:
> On Fri, 21 Feb 2025 16:23:07 -0500, Hamza Mahfooz wrote:
> >On Fri, Feb 21, 2025 at 11:23:28AM +0900, Ryo Takakura wrote:
> >> On Thu, 20 Feb 2025 17:53:00 -0500, Hamza Mahfooz wrote:
> >> >Since, the panic handlers may require certain cpus to be on
On Wed, Feb 26, 2025 at 05:51:46PM +0530, Naman Jain wrote:
>
>
> On 2/26/2025 3:33 PM, Greg Kroah-Hartman wrote:
> > On Wed, Feb 26, 2025 at 10:43:41AM +0530, Naman Jain wrote:
> > >
> > >
> > > On 2/25/2025 2:09 PM, Greg Kroah-Hartman wrote:
> > > > On Tue, Feb 25, 2025 at 02:04:43PM +0530, N
On Tue, Feb 25, 2025, at 23:25, Roman Kisel wrote:
> On 2/24/2025 11:24 PM, Arnd Bergmann wrote:
>> On Tue, Feb 25, 2025, at 00:22, Roman Kisel wrote:
>>> Hi Arnd,
>>
>> If you want to declare a uuid here, I think you should remove the
>> ARM_SMCCC_VENDOR_HYP_UID_HYPERV_REG_{0,1,2,3} macros and jus
On 2/26/2025 3:33 PM, Greg Kroah-Hartman wrote:
On Wed, Feb 26, 2025 at 10:43:41AM +0530, Naman Jain wrote:
On 2/25/2025 2:09 PM, Greg Kroah-Hartman wrote:
On Tue, Feb 25, 2025 at 02:04:43PM +0530, Naman Jain wrote:
On 2/25/2025 11:42 AM, Greg Kroah-Hartman wrote:
On Tue, Feb 25, 2025
On Wed, Feb 26, 2025 at 10:43:41AM +0530, Naman Jain wrote:
>
>
> On 2/25/2025 2:09 PM, Greg Kroah-Hartman wrote:
> > On Tue, Feb 25, 2025 at 02:04:43PM +0530, Naman Jain wrote:
> > >
> > >
> > > On 2/25/2025 11:42 AM, Greg Kroah-Hartman wrote:
> > > > On Tue, Feb 25, 2025 at 10:50:01AM +0530,
80 matches
Mail list logo