Hello Maxime, it runs in the fastest way with your idea, as you said it scm_init_guile() is only needed once by thread.
On Fri, Jan 13, 2023 at 1:23 PM Maxime Devos <maximede...@telenet.be> wrote: > > for (i=start; i<=stop; i++) { /* i is private by default */ > > > > scm_init_guile(); > > scm_call_1( func , scm_from_int(i) ); > > IIUC, you are calling scm_init_guile once per index, whereas calling it > yes openMP slice a 1 to N for loop in N/number_of_cpus segments of normal C for loop but run one loop per CPUs so if you do a 'top' command on a C openMP code you will see a load of number_of_cpus*100% for example with 12 cpus top will then display a load for your program of 1200% furthermore if you hit the 1 key you would see in top the load of each CPU (100% each) the same options does not exist with 'top' of BSD like Mac OS. OpenMP do a partition of N and run exactly each part on one thread ,each thread on a different CPU or core, i think it is the only library that can do that , OpenMP is written very near of the compiler and LLVM. In general there is a Master thread and slave threads or you can run a special code only on the first thread to fork (master one or the first to launch) and friday unfortunately i tried the single pragma: https://www.openmp.org/spec-html/5.0/openmpsu38.html but that can not help becaus it run only on the first thread. a solution of the problem could be this one: Executing Code Once Per Thread in an OpenMP Loop <https://ofekshilon.com/2014/06/10/executing-code-once-per-thread-in-an-openmp-loop/> https://www.openmp.org/spec-html/5.0/openmpsu38.html but it is (Visual C++) and even with g++ this would be not compatible. so i use a basic C solution with static and array that keep in memory if the scm_init_guile() as already been launch for the current thread the code is running now. I also put omp_get_max_threads() in a static var as openmp() is called many times in my codes and the number of available hardware cpus would change never. the code is here: https://github.com/damien-mattei/library-FunctProg/blob/master/guile-openMP.c unfortunately i find no real speed up, i understood that the only reason of speed up was because the C 'for loop is much faster than the Scheme 'for ones. For this concclusion i compared Scheme and C openmp and C without openMP and in C i got exactly the same time results: Scheme: ... [output cut] Chrono START number: 165 minterms-vector-length = 10944. chrono STOP : elapsedTime = 36.219 ms.totalComputationTime =485311.94 Chrono START number: 166 minterms-vector-length = 12008. chrono STOP : elapsedTime = 39.82 ms.totalComputationTime =485351.76 Chrono START number: 167 minterms-vector-length = 342. chrono STOP : elapsedTime = 1.215 ms.totalComputationTime =485352.97500000003 Scheme with OpenMP call: ...[output cut] Chrono START number: 165 minterms-vector-length = 10944. chrono STOP : elapsedTime = 35.039 ms.Open MP totalComputationTime =385444.1410000001 Chrono START number: 166 minterms-vector-length = 12008. chrono STOP : elapsedTime = 37.792 ms.Open MP totalComputationTime =385481.93300000014 Chrono START number: 167 minterms-vector-length = 342. chrono STOP : elapsedTime = 1.163 ms.Open MP totalComputationTime =385483.09600000014 Scheme with C 'for loop call: ...[output cut] Chrono START number: 165 minterms-vector-length = 10944. chrono STOP : elapsedTime = 33.104 ms.For Funct totalComputationTime =385543.4700000001 Chrono START number: 166 minterms-vector-length = 12008. chrono STOP : elapsedTime = 35.938 ms.For Funct totalComputationTime =385579.4080000001 Chrono START number: 167 minterms-vector-length = 342. chrono STOP : elapsedTime = 1.165 ms.For Funct totalComputationTime =385580.5730000001 on the C codes (// openmp and sequenctial for) the result is almost the same : totalComputationTime =385580.5730000001 ms totalComputationTime =385483.09600000014 ms =385 s i suppose openMP works well by slicing on many processors but the scm_call_1( func , scm_from_int(i) ); works all on the same thread that host the Guile interpreter. Solution would be to have many Guile interpreter running but i do not know how doing that from the C code with OpenMP. Damien note : i did time measure both in C and Scheme with gettimeofday code to compare both 100% scheme code and mixed one: https://github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki%2B.scm#L3500