> Date: Tue, 20 May 2025 14:34:51 +
> From: Emmanuel Dreyfus
>
> On Tue, May 20, 2025 at 01:53:45PM +, Taylor R Campbell wrote:
> > Can you please try the attached patch and share any output from the
> > same dtrace script?
>
> I was not able to reach single-user, the console is swamped
On Tue, May 20, 2025 at 01:53:45PM +, Taylor R Campbell wrote:
> Can you please try the attached patch and share any output from the
> same dtrace script?
I was not able to reach single-user, the console is swamped with messages
like this:
[ 20.6228413] WARNING: hardclock skipped 2063201598ns
> Date: Mon, 19 May 2025 13:23:18 +
> From: Emmanuel Dreyfus
>
> On Mon, May 19, 2025 at 01:02:19PM +, Taylor R Campbell wrote:
> > Pass `-x nolibs' to dtrace -- this works around the bug where the Xen
> > bootloader doesn't pass the kernel's ctf sections to NetBSD (needs to
> > be fixed
> Date: Mon, 19 May 2025 12:34:01 +
> From: Emmanuel Dreyfus
>
> On Mon, May 19, 2025 at 12:11:37PM +, Taylor R Campbell wrote:
> > Please build with KDTRACE_HOOKS _and_ the patch I already sent you for
> > this, which I haven't committed yet because I was waiting for your
> > feedback on
On Mon, May 19, 2025 at 01:02:19PM +, Taylor R Campbell wrote:
> Pass `-x nolibs' to dtrace -- this works around the bug where the Xen
> bootloader doesn't pass the kernel's ctf sections to NetBSD (needs to
> be fixed but I don't know how, would have to investigate how the Xen
> loader works).
On Mon, May 19, 2025 at 12:11:37PM +, Taylor R Campbell wrote:
> Please build with KDTRACE_HOOKS _and_ the patch I already sent you for
> this, which I haven't committed yet because I was waiting for your
> feedback on verifying it:
>
> https://mail-index.NetBSD.org/tech-kern/2025/04/06/msg030
> Date: Sun, 18 May 2025 18:49:51 +
> From: Emmanuel Dreyfus
>
> On Sun, May 18, 2025 at 12:29:51PM +, Taylor R Campbell wrote:
> > And, if you built a kernel with options KDTRACE_HOOKS, can you share
> > any output from the following dtrace script? (Will only print output
> > when somet
On Sun, May 18, 2025 at 12:29:51PM +, Taylor R Campbell wrote:
> Remind me -- are you running with or without the xenclock.patch I
> posted on 2025-04-05? I recall there was some trouble with dtrace at
> the time I posted it.
With and without xenclock.patch, same issue.
>
> And, if you built
> Date: Sun, 18 May 2025 00:25:32 +
> From: Emmanuel Dreyfus
>
> On Sun, Apr 06, 2025 at 11:50:37PM +, Emmanuel Dreyfus wrote:
> > I restarted with stock 10.0 XEN3_DOMU kernel to get dtrace working
> > (it does not), but now the problem seems to have vanished.
>
> I have a domU carrying
On Sun, Apr 06, 2025 at 11:50:37PM +, Emmanuel Dreyfus wrote:
> I restarted with stock 10.0 XEN3_DOMU kernel to get dtrace working
> (it does not), but now the problem seems to have vanished.
I have a domU carrying a busy web server. I upgraded vcpus from
1 to 4, and the time keeping bug pops
The problem strikes after 22 days (two vcpu, xen_system_time timecounter)
ntpd is unable to sync. If I run it aster a rdate, ntpq -c peers shows
offset quickly climbing to 6000 in a few seconds, then 49000 after a minute.
This is a mess.
I have a test program that calls gettimeofday 1000 times i
On Sun, Apr 13, 2025 at 05:01:46PM +, Emmanuel Dreyfus wrote:
> One o two sane domU was hit by the bug ater 6.5 days of uptime.
I run a simple .c program that calls gettimeaofday() 1000 times
and count how many time it gets the same value. Once a machine
hits the bug, here is what I get:
5 174
On Fri, Apr 11, 2025 at 10:40:23PM +, Emmanuel Dreyfus wrote:
> After one week of testing, one domU was never able to get ntpq -c kerninfo
> reporting sync, but the local clock did not drift from the NTP server.
> Two other domU were able to quickly sync and to keep it as is. The three
> domU
On Fri, Apr 11, 2025 at 12:35:24PM -0700, Greg A. Woods wrote:
> As far as I know this is only shown in the Xen console output, so:
> xl dmesg | fgrep 'Platform timer'
I had to reboot to get that one:
(XEN) Platform timer is 24.000MHz HPET
> Also for fun what does this show:
>
> xl d
I forgot to ask one other question about critical timing and clock
information: Which platform timer is the Xen kernel selecting?
As far as I know this is only shown in the Xen console output, so:
xl dmesg | fgrep 'Platform timer'
Also for fun what does this show:
xl dmesg | fg
> On Wed, 9 Apr 2025 06:00:08 +, Emmanuel Dreyfus
> said:
[...]
> What is suprising is that it happens on a domU but not on another on
> the same dom0.
Is the dom0 MP ?
--
Math/(~cherry)
On Wed, Apr 09, 2025 at 06:07:13AM +, Mathew, Cherry G. wrote:
> > What is suprising is that it happens on a domU but not on another on
> > the same dom0.
>
> Is the dom0 MP ?
Yes it is:
# cpuctl list
Num HwId Unbound LWPs Interrupts Last change #Intr
--
On Tue, Apr 08, 2025 at 02:09:44PM -0700, Greg A. Woods wrote:
> So if that's right then a changing maxerror means ntpd is constantly
> providing a new maxerror value to store.
It does, but on a sane value the value is bound to lower values, and
ntpd does sync:
associd=0 status=0615 leap_none, syn
At Sun, 6 Apr 2025 00:15:18 +, Emmanuel Dreyfus wrote:
Subject: Re: PHP performance on Xen domU with mulitple vcpu
>
> On Sat, Apr 05, 2025 at 04:02:28PM -0700, Greg A. Woods wrote:
> > > Indeed, this is 4.3.1
> > I'm not sure I understand that number.
>
> So
> Date: Sun, 6 Apr 2025 23:50:37 +
> From: Emmanuel Dreyfus
>
> On Sun, Apr 06, 2025 at 03:50:44AM +, Taylor R Campbell wrote:
> > Try the attached patch?
>
> I restarted with stock 10.0 XEN3_DOMU kernel to get dtrace working
> (it does not), but now the problem seems to have vanished. I
On Sun, Apr 06, 2025 at 03:50:44AM +, Taylor R Campbell wrote:
> Try the attached patch?
I restarted with stock 10.0 XEN3_DOMU kernel to get dtrace working
(it does not), but now the problem seems to have vanished. I run
with two vcpu and timecounter xen_system_time, ntpd syncs without
a hithc
> Date: Sun, 6 Apr 2025 00:03:25 +
> From: Emmanuel Dreyfus
>
> I was not able to load dtrace:
> kobj_checksyms, 1013: [dtrace]: linker error: symbol `dtrace_smap_enable' not
> found
Try the attached patch?
(It would be good to have dtrace working, though it might not help in
this case --
On Sun, Apr 06, 2025 at 12:03:25AM +, Emmanuel Dreyfus wrote:
> ntpq -c kerninfo shows maximum_error raising from 16000 by 0.5s each
> second.
The unit is ms, it raises by 0.5ms each second.
--
Emmanuel Dreyfus
m...@netbsd.org
On Sat, Apr 05, 2025 at 04:02:28PM -0700, Greg A. Woods wrote:
> > Indeed, this is 4.3.1
> I'm not sure I understand that number.
Sorry, I am not sure how I managed to write that. I meant 4.18.3
xen_version: 4.18.3_20240909nb1
> That's what I meant -- that with clockinterrupt then ntp
On Sat, Apr 05, 2025 at 01:49:32AM +, Taylor R Campbell wrote:
> And, can you send output of the following before and after you've
> observed trouble with ntpd (with or without the patch, or both -- just
> tell me which)?
running a -current kernel as of yesterday, without out patch
ntpq -c ker
At Sat, 5 Apr 2025 07:09:09 +, Emmanuel Dreyfus wrote:
Subject: Re: PHP performance on Xen domU with mulitple vcpu
>
> On Fri, Apr 04, 2025 at 05:27:49PM -0700, Greg A. Woods wrote:
> > So which Xen kernel version is this with?
> > My bet is that it's newer than 4.
On Fri, Apr 04, 2025 at 05:27:49PM -0700, Greg A. Woods wrote:
> So which Xen kernel version is this with?
> My bet is that it's newer than 4.11.
Indeed, this is 4.3.1
> Also which exact CPU model is your system using?
Quad intel Xeon:
cpu0: highest basic info 000d
cpu0: highest hypervisor
> Date: Thu, 3 Apr 2025 15:41:13 +
> From: Emmanuel Dreyfus
>
> Oh, yes, good pick! I use kern.timecounter.hardware=clockinterrupt
> because with xen_system_time the domU's ntpd is unable to keep in sync.
I have been hearing about weird issues with xen_system_time but I
don't have the eviden
At Thu, 3 Apr 2025 15:41:13 +, Emmanuel Dreyfus wrote:
Subject: Re: PHP performance on Xen domU with mulitple vcpu
>
> On Thu, Apr 03, 2025 at 11:00:50AM -0400, Greg Troxel wrote:
> > Have you written a test program to log and examined the return values?
>
> I ran gettim
Mouse writes:
> Not a bug in the sense that gettimeofday is violating its interface
> contract, just a bug in the sense that "something is wrong". I'd say
> gettimeofday() taking as long as a second, measured by ktrace records,
> indicates a bug; everything involved is entirely in-kernel, so tha
m...@netbsd.org (Emmanuel Dreyfus) writes:
>This php function loops around gettimeofday() until the microsecond
>changes:
> do {
>(void)gettimeofday((struct timeval *) &tv, (struct timezone *) NULL);
> } while (tv.tv_sec == prev_tv.tv_sec && tv.tv_usec == prev_tv.tv_usec);
Yes, that's the
g...@lexort.com (Greg Troxel) writes:
>> count value
>> 239726 1743694322.907498
>> 119400 1743694324.187487
>> 327599 1743694325.467476
>> 174425 1743694326.747465
>> 138850 1743694328.027453
>Wow, that is messed up!
That looks like a common behaviour of virtual machines, when they
cannot kee
Mouse writes:
>> and using a polling loop for uniquid is what I'd expect from php.
>
> Whether or not it's good, bad, or just ugly, it looks to me as though
> it's exposed a bug.
Is it documented that gettimeofday has any particular resolution, or
that successive values are unique? I don't thin
Emmanuel Dreyfus writes:
> On NetBSD-10.0/amd64 Xen domU with two vcpu PHP's uniqid() performance
> is terrible. Theses two tests are on the same dom0, with same NetBSD
> versions in the domU:
>
> With a single vcpu:
> $ time php -r 'for ($i = 0; $i < 100; $i++) uniqid();'
> 2.94s real
Emmanuel Dreyfus writes:
> On Thu, Apr 03, 2025 at 11:00:50AM -0400, Greg Troxel wrote:
>> Have you written a test program to log and examined the return values?
>
> I ran gettimeofday 1M time, it took 5.12, which is much faster than
> what happens in PHP. Here is the distribution of results:
>
>
On Thu, Apr 03, 2025 at 11:00:50AM -0400, Greg Troxel wrote:
> Have you written a test program to log and examined the return values?
I ran gettimeofday 1M time, it took 5.12, which is much faster than
what happens in PHP. Here is the distribution of results:
count value
239726 1743694322.907498
>>> and using a polling loop for uniquid is what I'd expect from php.
>> Whether or not it's good, bad, or just ugly, it looks to me as
>> though it's exposed a bug.
> Is it documented that gettimeofday has any particular resolution, or
> that successive values are unique?
Not a bug in the sense t
>> do {
>>(void)gettimeofday((struct timeval *) &tv, (struct timezone *) NULL);
>> } while (tv.tv_sec == prev_tv.tv_sec && tv.tv_usec == prev_tv.tv_usec);
> Yes, that's the difference between precision and resolution.
Yes, but that doesn't strike me as very relevant.
> While gettimeofday y
38 matches
Mail list logo