Hello everyone,

I know, I come up with another question involving NativeBoost ... Sorry !

Anyway, I am currently trying to improve the performance of a Pharo
application by trying to rewrite some parts in C and use them with
NativeBoost.

In the application, there is a need to compute a lot of 2D projections of
points so basically I need to compute a lot of cosinuses and sinuses on
different points. Ultimately I'd like to do it with openMP to use all the
processor cores but for now I just wanted to do some tests to find the best
way to combine Pharo and C.

 The first test consisted in writing a very simple C function and use it
from inside Pharo :

void project (Point *p) {
  p->x = cos(p->x);
  p->y = sin(p->y);
}

with
typedef struct {
  double x;
  double y;
} Point;

In Pharo I created a NBExternalStructure to match Point and everything
worked as expected.

Now I ran some performance tests and there are some results I don't really
understand :(

1) First test was :
test
| t1 t2 tC tPharo |
tC := 0. tPharo := 0.

1 to: 1000000 do: [ :i |
  | p1 |
  p1 := Point x: i + (1.0 / i ) y: i - (1.0 / i).
  t1 := Time microsecondsToRun: [
    p1 setX: p1 x cos setY: p1 y sin.
  ]
  tPharo := tPharo + t1.
]

1 to: 1000000 do: [ :i |
  | p2 |
  p2 := MyPoint x: i + (1.0 / i ) y: i - (1.0 / i).
  t2 := Time microsecondsToRun: [
    self primProject: p2.
  ]
  tC := tC + t2.
]

Results here are arround 185 ms for tC and 210 ms for tPharo.
Nothing really bothered me with this test as results seemed coherent (are
they ?).

2) Second test was :
test2
| tC tPharo array1 array2 size |
size := 1000000.
array1 := Array new: size.
array2 := Array new: size.
1 to: size do: [ :i |
  array1 at: i put: (Point x: i + (1.0 / i ) y: i - (1.0 / i)).
  array2 at: i put: (MyPoint x: i + (1.0 / i ) y: i - (1.0 / i)).
].

tPharo := Time millisecondsToRun: [
  array1 do: [ :each |
    each setX: each x cos setY: each y sin.
  ].
].

tC := Time millisecondsToRun: [
  array2 do: [ :each |
    self primProject: each.
  ].
].

Results here are arround 3 500 ms for tPharo and 150ms for tC.
And I don't really understand why such a difference. I've tried doing tC
first but it changed nothing.
I've checked the results and the contents of array2 are all updated
correctly.
Is there a problem with tC or tPharo ? What am I doing wrong ?

On another hand, I also noticed that it is faster to allocate the array
with Pharo Points than the array with MyPoint C-ish struct. I did some more
testing and it seemed that on a general basis it is faster to manipulate
Pharo objects than C-ish structs or pointers. For example, it is faster to
read in a Pharo array than to read in a NativeBoost_to_C array.

Knowing so, what kind of C types should I use to avoid losing too much time
manipulating them with NativeBoost ? Is it preferable, for example, to use
two separate double instead of a struct with two doubles inside ?


Thank you very much,

Matthieu

Reply via email to