Hi, I'm not sure about the other places. In hvm_load_cpu_ctxt
(xen/arch/x86/hvm/hvm.c ), it was easy to catch because
process_pending_softirqs is frequently called there, which in turn
processes softirqs from the timer (where the timestamp is updated).
After I fixed sync_tsc in hvm_load_cpu_ctxt, the problem stopped
reproducing under no load. However, when the number of vCPUs is 4 times
greater than the number of CPUs (under heavy load), the problem rarely
reoccurs (mostly during snapshot restores during
process_pending_softirqs calls), and this is no longer a simple case. If
get_s_time_fixed can indeed be interrupted during execution after
rdtsc_ordered, then the current fix is insufficient. It's necessary to
atomically copy "t->stamp" to the stack using local_irq_disable and
local_irq_enable (as in local_time_calibration), and then work with the
copy, confident in its lifetime and immutability. But until
get_s_time_fixed is proven to be interruptible, this is premature, so
your fix is sufficient. I think I need more information and testing to
say more.
Regarding the other scale_delta calls, if they include values
calculated from externally saved tsc values that could have become
stale during the process_pending_softirqs call, this definitely needs to
be fixed.
Here's the problematic code from dwm.exe, if that helps:
void __fastcall
CPartitionVerticalBlankScheduler::UpdateCurrentTime(CPartitionVerticalBlankScheduler
*this) {
uint64_t *xor_time_ptr = (uint64_t *)((char *)this + 0x3E40);
CPartitionVerticalBlankScheduler *scheduler = this;
LARGE_INTEGER *current_time_ptr = (LARGE_INTEGER *)((char *)this +
0x3E30);
uint64_t previous_time = *((_QWORD *)this + 0x7C6); // 0x3e30
uint64_t address_high = ((_QWORD)this + 0x3E40) << 32;
uint64_t xor_key = ((uint64_t)this + 0x3E40) | address_high;
// validate internal state
if ((previous_time ^ xor_key) != *((_QWORD *)this + 0x7C8)) { // 0x3e40
char exception_record[0x98];
memset(exception_record, 0, sizeof(exception_record));
*(int *)exception_record = 0x88980080; // MILERR_INTERNALERROR
uint64_t computed_xor = *xor_time_ptr ^ ((uint64_t)xor_time_ptr
| address_high);
int param_count = 4;
int previous_time_high = (int)(previous_time >> 32);
uint32_t previous_time_low = (uint32_t)previous_time;
int computed_xor_high = (int)(computed_xor >> 32);
uint32_t computed_xor_low = (uint32_t)computed_xor;
RaiseFailFastException((PEXCEPTION_RECORD)exception_record, nullptr, 0);
previous_time = *((_QWORD *)scheduler + 1990);
}
// get current timestamp
*((_QWORD *)scheduler + 0x7C7) = previous_time; // 0x3e38
QueryPerformanceCounter(current_time_ptr);
LARGE_INTEGER new_time = *current_time_ptr;
uint64_t last_previous_time = *((_QWORD *)scheduler + 0x7C7); // 0x3e38
// check if time go backward
if (new_time.QuadPart < last_previous_time) {
char exception_record[0x98];
memset(exception_record, 0, sizeof(exception_record));
*(int *)exception_record = 0x8898009B; //
MILERR_QPC_TIME_WENT_BACKWARD
int new_time_high = new_time.HighPart;
uint32_t new_time_low = new_time.LowPart;
int last_previous_high = (int)(last_previous_time >> 32);
uint32_t last_previous_low = (uint32_t)last_previous_time;
int frequency_high = g_qpcFrequency.HighPart;
uint32_t frequency_low = g_qpcFrequency.LowPart;
int64_t time_delta = last_previous_time - new_time.QuadPart;
int64_t millis_delta = (1000 * time_delta) /
g_qpcFrequency.QuadPart;
int millis_high = (int)(millis_delta >> 32);
uint32_t millis_low = (uint32_t)millis_delta;
int param_count = 8;
RaiseFailFastException((PEXCEPTION_RECORD)exception_record, nullptr, 0);
new_time = *(LARGE_INTEGER *)((char *)scheduler + 15920);
}
*xor_time_ptr = new_time.QuadPart ^ xor_key;
}
Thanks.
06.01.2026 16:58, Jan Beulich пишет:
Callers may pass in TSC values from before the local TSC stamp was last
updated (this would in particular be the case when the TSC was latched, a
time rendezvous occurs, and the latched value is used only afterwards).
scale_delta(), otoh, deals with unsigned values, and hence would treat
negative incoming deltas as huge positive values, yielding entirely bogus
return values.
Fixes: 88e64cb785c1 ("x86/HVM: use fixed TSC value when saving or restoring
domain")
Reported-by: Антон Марков<[email protected]>
Signed-off-by: Jan Beulich<[email protected]>
---
An alternative might be to have scale_delta() deal with signed deltas, yet
that seemed more risky to me.
There could likely be more Fixes: tags; the one used is the oldest
applicable one, from what I can tell.
A similar issue looks to exist in read_xen_timer() and its read_cycle()
helper, if we're scheduled out (and beck in) between reading of the TSC
and calculation of the delta (involving ->tsc_timestamp). Am I
overlooking anything there?
stime2tsc() guards against negative deltas by using 0 instead; I'm not
quite sure that's correct either.
amd_check_erratum_1474() (next to its call to tsc_ticks2ns()) has a
comment towards the TSC being "sane", but is that correct? Due to
TSC_ADJUST, rdtsc() may well return a huge value (and the TSC would then
wrap through 0 at some point). Shouldn't we subtract boot_tsc_stamp before
calling tsc_ticks2ns()?
A similar issue looks to exist in tsc_get_info(), again when rdtsc()
possibly returns a huge value due to TSC_ADJUST. Once again I wonder
whether we shouldn't subtract boot_tsc_stamp.
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1649,8 +1649,13 @@ s_time_t get_s_time_fixed(u64 at_tsc)
tsc = at_tsc;
else
tsc = rdtsc_ordered();
- delta = tsc - t->stamp.local_tsc;
- return t->stamp.local_stime + scale_delta(delta, &t->tsc_scale);
+
+ if ( tsc >= t->stamp.local_tsc )
+ delta = scale_delta(tsc - t->stamp.local_tsc, &t->tsc_scale);
+ else
+ delta = -scale_delta(t->stamp.local_tsc - tsc, &t->tsc_scale);
+
+ return t->stamp.local_stime + delta;
}
s_time_t get_s_time(void)