Ingo Molnar a écrit :
* Jie Chen <[EMAIL PROTECTED]> wrote:

I just ran the same test on two 2.6.24-rc4 kernels: one with CONFIG_FAIR_GROUP_SCHED on and the other with CONFIG_FAIR_GROUP_SCHED off. The odd behavior I described in my previous e-mails were still there for both kernels. Let me know If I can be any more help. Thank you.

ok, i had a look at your data, and i think this is the result of the scheduler balancing out to idle CPUs more agressively than before. Doing that is almost always a good idea though - but indeed it can result in "bad" numbers if all you do is to measure the ping-pong "performance" between two threads. (with no real work done by any of them).

the moment you saturate the system a bit more, the numbers should improve even with such a ping-pong test.

do you have testcode (or a modification of your testcase sourcecode) that simulates a real-life situation where 2.6.24-rc4 performs not as well as you'd like it to see? (or if qmt.tar.gz already contains that then please point me towards that portion of the test and how i should run it - thanks!)

        Ingo

I cooked a program shorter than Jie one, to try to understand what was going on. Its a pure cpu burner program, with no thread synchronisation (but the pthread_join at the very end)

As each thread is bound to a given cpu, I am not sure the scheduler is allowed to balance to an idle cpu.

Unfortunatly I dont have a 4 way SMP idle machine available to
test it.

$ gcc -O2 -o burner burner.c
$ ./burner
Time to perform the unit of work on one thread is 0.040328 s
Time to perform the unit of work on 2 threads is 0.040221 s

I tried it on a 64 way machine (Thanks David :) ) and noticed some strange
results that may be related to the Niagara hardware (time for 64 threads was nearly the double for one thread)

#include <pthread.h>
#include <sched.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/time.h>
#include <stdio.h>
#include <stdlib.h>

int blockthemall=1;
static void inline cpupause()
{
#if defined(i386)
 asm volatile("rep;nop":::"memory");
#else
 asm volatile("":::"memory");
#endif
}
/*
 * Determines number of cpus
 * Can be overiden by the NR_CPUS environment variable
 */
int number_of_cpus()
{
        char line[1024], *p;
        int cnt = 0;
        FILE *F;

        p = getenv("NR_CPUS");
        if (p)
                return atoi(p);
        F = fopen("/proc/cpuinfo", "r");
        if (F == NULL) {
                perror("/proc/cpuinfo");
                return 1;
        }
        while (fgets(line, sizeof(line), F) != NULL) {
                if (memcmp(line, "processor", 9) == 0)
                        cnt++;
        }
        fclose(F);
        return cnt;
}

void compute_elapsed(struct timeval *delta, const struct timeval *t0)
{
        struct timeval t1;

        gettimeofday(&t1, NULL);
        delta->tv_sec = t1.tv_sec - t0->tv_sec;
        delta->tv_usec = t1.tv_usec - t0->tv_usec;
        if (delta->tv_usec < 0) {
                delta->tv_usec += 1000000;
                delta->tv_sec--;
        }
}

int nr_loops = 20*1000000;
double incr=0.3456;
void perform_work()
{
        int i;
        double t = 0.0;
        for (i = 0; i < nr_loops; i++) {
                t += incr;
                }
        if (t < 0.0) printf("well... should not happen\n");
}

void set_affinity(int cpu)
{
        long cpu_mask;
        int res;

        cpu_mask = 1L << cpu;
        res = sched_setaffinity(0, sizeof(cpu_mask), &cpu_mask);
        if (res)
                perror("sched_setaffinity");
}

void *thread_work(void *arg)
{
        int cpu = (int)arg;
        set_affinity(cpu);
        while (blockthemall)
                cpupause();
        perform_work();
        return (void *)0;
}

main(int argc, char *argv[])
{
        struct timeval t0, delta;
        int nr_cpus, i;
        pthread_t *tids;

        gettimeofday(&t0, NULL);
        perform_work();
        compute_elapsed(&delta, &t0);
        printf("Time to perform the unit of work on one thread is %d.%06d s\n", 
delta.tv_sec, delta.tv_usec);

        nr_cpus = number_of_cpus();
        if (nr_cpus <= 1)
                return 0;
        tids = malloc(nr_cpus * sizeof(pthread_t));
        for (i = 1; i < nr_cpus; i++) {
                pthread_create(tids + i, NULL, thread_work, (void *)i);
        }

        set_affinity(0);
        gettimeofday(&t0, NULL);
        blockthemall=0;
        perform_work();
        for (i = 1; i < nr_cpus; i++)
                pthread_join(tids[i], NULL);
        compute_elapsed(&delta, &t0);
        printf("Time to perform the unit of work on %d threads is %d.%06d s\n", 
nr_cpus, delta.tv_sec, delta.tv_usec);
        
}

Reply via email to