Re: [HACKERS] Timing overhead and Linux clock sources

Ants Aasma Fri, 09 Dec 2011 15:49:17 -0800

On Wed, Dec 7, 2011 at 9:40 AM, Greg Smith <g...@2ndquadrant.com> wrote:
>  He estimated 22ns per gettimeofday on the system with fast timing
> calls--presumably using TSC, and possibly faster than I saw because his
> system had less cores than mine to worry about.  He got 990 ns on his slower
> system, and a worst case there of 3% overhead.


Roberts comment about sequential scan performing lots of reads in a tight loop
made me think of worse worst case. A count(*) with wide rows and/or lots of
bloat. I created a test table with one tuple per page like this:
CREATE TABLE io_test WITH (fillfactor=10) AS
    SELECT repeat('x', 1000) FROM generate_series(1,30000);
I then timed SELECT COUNT(*) FROM io_test; with track_iotiming on and
off. Averages of 1000 executions, differences significant according to t-test:
timer | iotiming=off |  iotiming=on | diff
 hpet |     86.10 ms |    147.80 ms | 71.67%
  tsc |     85.86 ms |     87.66 ms |  2.10%

The attached test program (test_gettimeofday_monotonic) shows that one
test loop iteration takes 34ns with tsc and 1270ns with hpet.

I also managed to run the test program a couple of two socket Solaris 10
machines. The one with Xeon X5570 had iteration time of 220ns and Xeon
E5620 had 270ns iterations. I'm not sure yet whether this is due to Solaris
gettimeofday just being slower or some hardware issue.

I also tested a more reasonable case of count(*) on pgbench_accounts with
scale factor 50 (61 tuples per page). With tsc timing was actually 1% faster,
but not statistically significant, with hpet the overhead was 5.6%.

Scaling the overhead for the Solaris machines, it seems that the worst case
for timing all buffer reads would be about 20% slower. Count(*) on medium
length tuples for an out of shared buffers, in OS cache would have overhead
between 1 and 2%.

>> One random thought: I wonder if there's a way for us to just time
>> every N'th event or something like that, to keep the overhead low.

This would work only for cases where there's a reasonably uniform distribution
of times or really long sampling periods, otherwise the variability will make
the result pretty much useless. For example in the I/O case, a pretty typical
load can have 1% of timings be 3 orders of magnitude longer than median.

--
Ants Aasma

#include <stdio.h>
#include <sys/time.h>

typedef long long int64;
typedef unsigned long long uint64;
typedef long int32;
typedef unsigned long uint32;

#define TIMEDIFF(a, b) (((a).tv_sec - (b).tv_sec)*1000000 + ((a).tv_usec - (b).tv_usec))

int main(int argc, char **argv, char **arge) {
    struct timeval start, end, prev, cur;
    int64 i;
    int64 time_elapsed = 0;
    int32 duration;
    uint64 loop_count = 0;
    static int64 timings[32];
    
    if (argc != 2 || !sscanf(argv[1], "%ld", &duration) || duration < 0) {
        printf("Usage: test_gettimeofday_monotonic seconds\n");
        return 1;
    }
    uint64 end_time = 1000000ll*duration;

    gettimeofday(&start, 0);
    cur = start;

    while (time_elapsed < end_time) {
       prev = cur;
       gettimeofday(&cur, 0);

       long diff = TIMEDIFF(cur, prev);
       if (diff < 0) {
           printf("Time warp: %ld < 0 => %ld.%ld < %ld.%ld\n",
              diff, cur.tv_sec, cur.tv_usec, prev.tv_sec, prev.tv_usec);
           return -1;
       }
       int bits = 0;
       while (diff) {
          diff >>= 1;
          bits++;
       }
       timings[bits]++;
       loop_count++;
       time_elapsed = TIMEDIFF(cur, start);
    }
    gettimeofday(&end, 0);
    
    printf("Per loop: %0.2f ns\n",
        ((double) TIMEDIFF(end, start))*1000/loop_count);
    printf("%9s: %10s %9s\n", "usec", "count", "percent");
    
    int found = 0;
    for (i = 31; i >= 0; i--) {
        if (found || timings[i]) {
            found = 1;
            printf("%9ld: %10lld %8.5f%%\n", 1l<<i>>1, timings[i],
                ((double)timings[i])*100/loop_count);
        }
    }
    return 0;
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Timing overhead and Linux clock sources

Reply via email to