I think it would be great to get this patch committed. Beyond the reasons already mentioned, the significant overhead also tends to skew the reported runtimes in ways that makes it difficult to compare them. For example, if two nodes are executed equally often but one needs twice the time to process the rows: in such a case EXPLAIN ANALYZE should report timings that are 2x apart. However, currently, the high overhead of clock_gettime() tends to skew the relative runtimes.

On 10/12/22 10:33, Michael Paquier wrote:
No rebased version has been sent since this update, so this patch has
been marked as RwF.

I've rebased the patch set on latest master and fixed a few compiler warnings. Beyond that some findings and thoughts:

You're only using RDTSC if the clock source is 'tsc'. Great idea to not bother caring about a lot of hairy TSC details. Looking at the kernel code this seems to imply that the TSC is frequency invariant. I don't think though that this implies that Linux is not running under a hypervisor; which is good because I assume PostgreSQL is used a lot in VMs. However, when running under a hypervisor (at least with VMWare) CPUID leaf 0x16 is not available. In my tests __get_cpuid() indicated success but the returned values were garbage. Instead of using leaf 0x16, we should then use the hypervisor interface to obtain the TSC frequency. Checking if a hypervisor is active can be done via:

bool IsHypervisorActive()
{
    uint32 cpuinfo[4] = {0};
    int res = __get_cpuid(0x1, &cpuinfo[0], &cpuinfo[1], &cpuinfo[2], &cpuinfo[3]);
    return res > 0 && (cpuinfo[2] & (1 << 30));
}

Obtaining the TSC frequency via the hypervisor interface can be done with the following code. See https://lwn.net/Articles/301888/ for more details.

// Under hypervisors (tested with VMWare) leaf 0x16 is not available, even though __get_cpuid() succeeds. // Hence, if running under a hypervisor, use the hypervisor interface to obtain TSC frequency.
uint32 cpuinfo[4] = {0};
if (IsHypervisorActive() && __get_cpuid(0x40000001, &cpuinfo[0], &cpuinfo[1], &cpuinfo[2], &cpuinfo[3]) > 0)
    cycles_to_sec = 1.0 / ((double)cpuinfo[0] * 1000 * 1000);

Given that we anyways switch between RDTSC and clock_gettime() with a global variable, what about exposing the clock source as GUC? That way the user can switch back to a working clock source in case we miss a detail around activating or reading the TSC.

I'm happy to update the patches accordingly.

--
David Geier
(ServiceNow)



Reply via email to