Re: [computer-go] OpenMP / Quad Core experiments

Weimin Xiao Tue, 01 Jan 2008 08:47:00 -0800

In order to use multi-core CPU efficiently, you better give each core a 
bigger task to reduce communication overhead.

Weimin

----- Original Message ----- 
From: "terry mcintyre" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Tuesday, January 01, 2008 10:01 AM
Subject: [computer-go] OpenMP / Quad Core experiments

I have been tinkering with OpenMP and my new HP Quad
Intel 6600. Wrote a small program to compute the
Taylor series of e and pi, just for exploration, and
I've found some interesting data points.

I am using gcc 4.2 and 4.3.1 - the latter being the
head of the SVN repository. Kubuntu 7.10, both 32 and
64 bit versions. One of my test programs is attached.

Oddly, the OpenMP version is no faster than the
single-threaded version - but it does keep the cores
busier. It is possible that I am doing something
wrong, as I am new to OpenMP.

I was so puzzled by the results that I tried the same
program on my AMD Athlon X2. The older AMD Athlon duo,
with a 1 GHz clock, 64-bit Fedora Core 7, is 20%
faster than the 1.6GHz quad 6600. I've also run the
--monte-carlo version of GnuGo 4.7.11 on both
machines, with similar results.

The compilation line is:
gcc -Wall -fopenmp -O3 -march=native -lgomp taylor3.c
-o taylor3

( the code is an adaptation of code from the OpenMP
tutorial at http://kallipolis.com/openmp/ - which
leads to another interesting discovery. The original
code yields incorrect results for pi; the two parallel
branches use the same index variable i,
and one stomps on the other. Is this a feature of the
gcc version of OpenMP? I'll be testing Intel's icc
soon. )

I'll be doing more testing this weekend, but I'd like
to know if anyone has compared the Intel 6600 to other
processors. So far, it sure looks like a tired old nag
on her last ride to the glue factory; I'm wishing that
I had waited for the Penryn version.

One more puzzle: this processor is rated at 2.4GHz,
but cpuinfo tells a different story:

[EMAIL PROTECTED]:/proc$ cat cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 Quad CPU    Q6600
 @ 2.40GHz
stepping        : 11
cpu MHz         : 1596.000
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8
apic sep mtrr pge mca cmov pat pse36 clflush dts acpi
mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc
pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
bogomips        : 4804.08
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

Terry McIntyre &lt;[EMAIL PROTECTED]&gt;

"Wherever is found what is called a paternal government, there is found 
state education. It has been discovered that the best way to insure implicit 
obedience is to commence tyranny in the nursery."

Benjamin Disraeli, Speech in the House of Commons [June 15, 1874]

____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page.
http://www.yahoo.com/r/hs

--------------------------------------------------------------------------------

/*
 * taylor.c
 *
 * calculate e and pi by their taylor expansions and multiply them
 * together.
 *
 * moved local variables inside parallel blocks ( performance tweak? )
 */

#include <omp.h>
#include <stdio.h>
#include <time.h>

#define num_steps 20000000

int main(int argc, char *argv[])
{
  double start, stop; /* times of beginning and end of procedure */
  double efinal, pifinal, product;

  /* start the timer */
  start = clock();

  /* calculate e and pi in parallel */
#pragma omp parallel sections shared(efinal,pifinal)
  {
#pragma omp section
    { /* calculate e using Taylor approximation */
      register double e, factorial;
      register int j;

      e = 1;
      factorial = 1;
      for (j = 1; j<num_steps; j++) {
factorial *= j;
e += 1.0/factorial;
      }
      efinal=e;
    } /* e section */

#pragma omp section
    { /* calculate pi expansion */
      register int i;
      register double pi;

      pi = 0;
      for (i = 0; i < num_steps*10; i++) {
/* we want 1/1 - 1/3 + 1/5 - 1/7 etc.
   therefore we count by fours (0, 4, 8, 12...) and take
             1/(0+1) =  1/1
   - 1/(0+3) = -1/3
             1/(4+1) =  1/5
   - 1/(4+3) = -1/7 and so on */
pi += 1.0/(i*4.0 + 1.0);
pi -= 1.0/(i*4.0 + 3.0);
      }
      pi = pi * 4.0;
      pifinal=pi;
    } /* pi section */

  } /* omp sections */
  /* threads rejoin here */

  product = efinal * pifinal;

  stop = clock();

  printf("e %f pi %f products =  %f reached in %.3f seconds\n", efinal, 
pifinal, product, (double)(stop-start)/CLOCKS_PER_SEC);

  return 0;
}

--------------------------------------------------------------------------------

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/ 

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] OpenMP / Quad Core experiments

Reply via email to