You mean this bug? Unfortunately not, I've been very busy with the upcoming gem5 release and haven't had time to investigate this further.
-- Dr. Bobby R. Bruce Room 3050, Kemper Hall, UC Davis Davis, CA, 95616 web: https://www.bobbybruce.net On Mon, Oct 31, 2022 at 1:45 AM Νικόλαος Ταμπουρατζής via gem5-users < gem5-users@gem5.org> wrote: > Dear Bobby, Jason, all, > > Is there any update about the accuracy of RISC-V FS? > > Best regards, > Nikos > > > Quoting Bobby Bruce <bbr...@ucdavis.edu>: > > > Jason and I had a theory that this may be due to the "Rounding Mode" for > > floating pointing being set incorrectly in FS mode. That's set via a > macro > > here: > > > https://gem5.googlesource.com/public/gem5/+/refs/tags/v22.0.0.2/src/arch/riscv/fp_inst.hh#36 > > > > I manually expanded the macro here: > > > https://gem5.googlesource.com/public/gem5/+/refs/tags/v22.0.0.2/src/arch/riscv/isa/decoder.isa#1495 > , > > inside the "fsqrt_d" definition then compiled "build/ALL/gem5.debug". > Then > > used gdb to add a breakpoint in the "Fsqrt_d::execute" function (in the > > generated "build/ALL/arch/riscv/generated/exec-ns.cc.inc" file). > > > > ``` > > gdb build/ALL/gem5.opt > > break Fsqrt_d::execute > > run bug-recreation/se-mode-run.py # or `run > bug-recreation/fs-mode-run.py` > > ``` > > > > Stepping through with gdb I the rounding mode is `0` for SE mode and `0` > > for FS mode as well. So, no luck with that theory. > > > > My new theory is that this bug has something to do with thread context > > switching being implemented incorrectly in RISC-V somehow. I find it > > strange that the sqrt(1) works fine for a while (i.e. returns `1`) then > > suddenly starts returning zero after a certain point in the execution. In > > addition, it's odd that the loop is not returning the same value each > time > > despite executing the same code. It'd make sense to me that the thread is > > being stored and then resumed with some corruption of the floating point > > data. This would also explain why this bug only occurs in FS mode. > > > > I'll try to find time to figure out a good test for this. If anyone has > any > > other theories or ideas then let me know. > > > > -- > > Dr. Bobby R. Bruce > > Room 3050, > > Kemper Hall, UC Davis > > Davis, > > CA, 95616 > > > > web: https://www.bobbybruce.net > > > > > > On Fri, Oct 7, 2022 at 12:50 PM Νικόλαος Ταμπουρατζής < > > ntampourat...@ece.auth.gr> wrote: > >> > >> Dear Jason & Boddy, > >> > >> Unfortunately, I have tried my simple example without the sqrt > >> function and the problem remains. Specifically, I have the following > >> simple code: > >> > >> > >> #include <cmath> > >> #include <stdio.h> > >> > >> int main(){ > >> > >> int dim = 1024; > >> > >> double result; > >> > >> for (int iter = 0; iter < 2; iter++){ > >> result = 0; > >> for (int i = 0; i < dim; i++){ > >> for (int j = 0; j < dim; j++){ > >> result += i * j; > >> } > >> } > >> printf("Final Result: %lf\n", result); > >> } > >> } > >> > >> > >> In the above code, the correct result is 274341298176.000000 (from > >> RISCV-SE mode and x86), while in FS mode I get sometimes the correct > >> result and other times a different number. > >> > >> Best regards, > >> Nikos > >> > >> > >> Quoting Jason Lowe-Power <ja...@lowepower.com>: > >> > >> > I have an idea... > >> > > >> > Have you put a breakpoint in the implementation of the fsqrt_d > > function? I > >> > would like to know if when running in SE mode and running in FS mode > we > > are > >> > using the same rounding mode. My hypothesis is that in FS mode the > > rounding > >> > mode is set differently. > >> > > >> > Cheers, > >> > Jason > >> > > >> > On Fri, Oct 7, 2022 at 12:15 AM Νικόλαος Ταμπουρατζής < > >> > ntampourat...@ece.auth.gr> wrote: > >> > > >> >> Dear Boddy, > >> >> > >> >> Thanks a lot for the effort! I looked in detail and I observe that > the > >> >> problem is created only using float and double variables (in the case > >> >> of int it is working properly in FS mode). Specifically, in the case > >> >> of float the variables are set to "nan", while in the case of double > >> >> the variables are set to 0.000000 (in random time - probably from > some > >> >> instruction of simulated OS?). You may use a simple c/c++ example in > >> >> order to get some traces before going to HPCG... > >> >> > >> >> Thank you in advance!! > >> >> Best regards, > >> >> Nikos > >> >> > >> >> > >> >> Quoting Bobby Bruce <bbr...@ucdavis.edu>: > >> >> > >> >> > Hey Niko, > >> >> > > >> >> > Thanks for this analysis. I jumped a little into this today but > > didn't > >> >> get > >> >> > as far as you did. I wanted to find a quick way to recreate the > >> >> following: > >> >> > https://gem5-review.googlesource.com/c/public/gem5/+/64211. > Please > > feel > >> >> > free to use this, if it helps any. > >> >> > > >> >> > It's very strange to me that this bug hasn't manifested itself > > before but > >> >> > it's undeniably there. I'll try to spend more time looking at this > >> >> tomorrow > >> >> > with some traces and debug flags and see if I can narrow down the > >> >> problem. > >> >> > > >> >> > -- > >> >> > Dr. Bobby R. Bruce > >> >> > Room 3050, > >> >> > Kemper Hall, UC Davis > >> >> > Davis, > >> >> > CA, 95616 > >> >> > > >> >> > web: https://www.bobbybruce.net > >> >> > > >> >> > > >> >> > On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής < > >> >> > ntampourat...@ece.auth.gr> wrote: > >> >> > > >> >> >> In my previous results, I had used double (not float) for the > >> >> >> following variables: result, sq_i and sq_j. In the case of float > >> >> >> instead of double I get "nan" and not 0.000000. > >> >> >> > >> >> >> Quoting Νικόλαος Ταμπουρατζής <ntampourat...@ece.auth.gr>: > >> >> >> > >> >> >> > Dear Jason, all, > >> >> >> > > >> >> >> > I am trying to find the accuracy problem with RISCV-FS and I > > observe > >> >> >> > that the problem is created (at least in my dummy example) > because > >> >> >> > the variables (double) are set to zero in random simulated time > > (for > >> >> >> > this reason I get different results among executions of the same > >> >> >> > code). Specifically for the following dummy code: > >> >> >> > > >> >> >> > > >> >> >> > #include <cmath> > >> >> >> > #include <stdio.h> > >> >> >> > > >> >> >> > int main(){ > >> >> >> > > >> >> >> > int dim = 10; > >> >> >> > > >> >> >> > float result; > >> >> >> > > >> >> >> > for (int iter = 0; iter < 2; iter++){ > >> >> >> > result = 0; > >> >> >> > for (int i = 0; i < dim; i++){ > >> >> >> > for (int j = 0; j < dim; j++){ > >> >> >> > float sq_i = sqrt(i); > >> >> >> > float sq_j = sqrt(j); > >> >> >> > result += sq_i * sq_j; > >> >> >> > printf("ITER: %d | i: %d | j: %d Result(i: %f | > j: > >> >> >> > %f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, > > result); > >> >> >> > } > >> >> >> > } > >> >> >> > printf("Final Result: %lf\n", result); > >> >> >> > } > >> >> >> > } > >> >> >> > > >> >> >> > > >> >> >> > The correct Final Result in both iterations is 372.721656. > > However, > >> >> >> > I get the following results in FS: > >> >> >> > > >> >> >> > ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: > >> >> >> > 1.000000): 1.000000 > >> >> >> > ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: > >> >> >> > 1.414214): 2.414214 > >> >> >> > ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: > >> >> >> > 1.732051): 4.146264 > >> >> >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: > >> >> >> > 1.414214): 1.414214 > >> >> >> > ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: > >> >> >> > 2.000000): 3.414214 > >> >> >> > ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: > >> >> >> > 2.449490): 5.863703 > >> >> >> > ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: > >> >> >> > 2.828427): 8.692130 > >> >> >> > ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: > >> >> >> > 3.162278): 11.854408 > >> >> >> > ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: > >> >> >> > 3.464102): 15.318510 > >> >> >> > ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: > >> >> >> > 3.741657): 19.060167 > >> >> >> > ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: > >> >> >> > 4.000000): 23.060167 > >> >> >> > ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: > >> >> >> > 4.242641): 27.302808 > >> >> >> > ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 27.302808 > >> >> >> > ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: > >> >> >> > 1.732051): 29.034859 > >> >> >> > ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: > >> >> >> > 2.449490): 31.484348 > >> >> >> > ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: > >> >> >> > 3.000000): 34.484348 > >> >> >> > ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: > >> >> >> > 3.464102): 37.948450 > >> >> >> > ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: > >> >> >> > 3.872983): 41.821433 > >> >> >> > ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: > >> >> >> > 4.242641): 46.064074 > >> >> >> > ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: > >> >> >> > 4.582576): 50.646650 > >> >> >> > ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: > >> >> >> > 4.898979): 55.545629 > >> >> >> > ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: > >> >> >> > 5.196152): 60.741782 > >> >> >> > ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 60.741782 > >> >> >> > ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: > >> >> >> > 2.000000): 62.741782 > >> >> >> > ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: > >> >> >> > 2.828427): 65.570209 > >> >> >> > ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: > >> >> >> > 3.464102): 69.034310 > >> >> >> > ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: > >> >> >> > 4.000000): 73.034310 > >> >> >> > ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: > >> >> >> > 4.472136): 77.506446 > >> >> >> > ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: > >> >> >> > 4.898979): 82.405426 > >> >> >> > ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: > >> >> >> > 5.291503): 87.696928 > >> >> >> > ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: > >> >> >> > 5.656854): 93.353783 > >> >> >> > ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: > >> >> >> > 6.000000): 99.353783 > >> >> >> > ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 99.353783 > >> >> >> > ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: > >> >> >> > 2.236068): 101.589851 > >> >> >> > ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: > >> >> >> > 3.162278): 104.752128 > >> >> >> > ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: > >> >> >> > 3.872983): 108.625112 > >> >> >> > ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: > >> >> >> > 4.472136): 113.097248 > >> >> >> > ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: > >> >> >> > 5.000000): 118.097248 > >> >> >> > ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: > >> >> >> > 5.477226): 123.574473 > >> >> >> > ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: > >> >> >> > 5.916080): 129.490553 > >> >> >> > ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: > >> >> >> > 6.324555): 135.815108 > >> >> >> > ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: > >> >> >> > 6.708204): 142.523312 > >> >> >> > ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 142.523312 > >> >> >> > ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: > >> >> >> > 2.449490): 144.972802 > >> >> >> > ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: > >> >> >> > 3.464102): 148.436904 > >> >> >> > ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: > >> >> >> > 4.242641): 152.679544 > >> >> >> > ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: > >> >> >> > 4.898979): 157.578524 > >> >> >> > ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: > >> >> >> > 5.477226): 163.055749 > >> >> >> > ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: > >> >> >> > 6.000000): 169.055749 > >> >> >> > ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: > >> >> >> > 6.480741): 175.536490 > >> >> >> > ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: > >> >> >> > 6.928203): 182.464693 > >> >> >> > ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: > >> >> >> > 7.348469): 189.813162 > >> >> >> > ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 189.813162 > >> >> >> > ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: > >> >> >> > 2.645751): 192.458914 > >> >> >> > ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: > >> >> >> > 3.741657): 196.200571 > >> >> >> > ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: > >> >> >> > 4.582576): 200.783147 > >> >> >> > ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: > >> >> >> > 5.291503): 206.074649 > >> >> >> > ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: > >> >> >> > 5.916080): 211.990729 > >> >> >> > ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: > >> >> >> > 6.480741): 218.471470 > >> >> >> > ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: > >> >> >> > 7.000000): 225.471470 > >> >> >> > ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: > >> >> >> > 7.483315): 232.954785 > >> >> >> > ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: > >> >> >> > 7.937254): 240.892039 > >> >> >> > ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 240.892039 > >> >> >> > ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: > >> >> >> > 2.828427): 243.720466 > >> >> >> > ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: > >> >> >> > 4.000000): 247.720466 > >> >> >> > ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: > >> >> >> > 4.898979): 252.619445 > >> >> >> > ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: > >> >> >> > 5.656854): 258.276300 > >> >> >> > ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: > >> >> >> > 6.324555): 264.600855 > >> >> >> > ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: > >> >> >> > 6.928203): 271.529058 > >> >> >> > ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: > >> >> >> > 7.483315): 279.012373 > >> >> >> > ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: > >> >> >> > 8.000000): 287.012373 > >> >> >> > ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: > >> >> >> > 8.485281): 295.497654 > >> >> >> > ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 295.497654 > >> >> >> > ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: > >> >> >> > 3.000000): 298.497654 > >> >> >> > ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: > >> >> >> > 4.242641): 302.740295 > >> >> >> > ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: > >> >> >> > 5.196152): 307.936447 > >> >> >> > ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: > >> >> >> > 6.000000): 313.936447 > >> >> >> > ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: > >> >> >> > 6.708204): 320.644651 > >> >> >> > ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: > >> >> >> > 7.348469): 327.993120 > >> >> >> > ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: > >> >> >> > 7.937254): 335.930374 > >> >> >> > ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: > >> >> >> > 8.485281): 344.415656 > >> >> >> > ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: > >> >> >> > 9.000000): 353.415656 > >> >> >> > Final Result: 353.415656 > >> >> >> > ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: > >> >> >> > 1.000000): 1.000000 > >> >> >> > ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: > >> >> >> > 1.414214): 2.414214 > >> >> >> > ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: > >> >> >> > 1.732051): 4.146264 > >> >> >> > ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | i*j: > >> >> >> > 2.000000): 6.146264 > >> >> >> > ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i*j: > >> >> >> > 2.236068): 8.382332 > >> >> >> > ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | i*j: > >> >> >> > 2.449490): 10.831822 > >> >> >> > ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i*j: > >> >> >> > 2.645751): 13.477573 > >> >> >> > ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | i*j: > >> >> >> > 2.828427): 16.306001 > >> >> >> > ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i*j: > >> >> >> > 3.000000): 19.306001 > >> >> >> > ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 19.306001 > >> >> >> > ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: > >> >> >> > 1.414214): 20.720214 > >> >> >> > ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: > >> >> >> > 2.000000): 22.720214 > >> >> >> > ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: > >> >> >> > 2.449490): 25.169704 > >> >> >> > ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: > >> >> >> > 2.828427): 27.998131 > >> >> >> > ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: > >> >> >> > 3.162278): 31.160409 > >> >> >> > ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: > >> >> >> > 3.464102): 34.624510 > >> >> >> > ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: > >> >> >> > 3.741657): 38.366168 > >> >> >> > ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: > >> >> >> > 4.000000): 42.366168 > >> >> >> > ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: > >> >> >> > 4.242641): 46.608808 > >> >> >> > ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 46.608808 > >> >> >> > ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: > >> >> >> > 1.732051): 48.340859 > >> >> >> > ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: > >> >> >> > 2.449490): 50.790349 > >> >> >> > ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: > >> >> >> > 3.000000): 53.790349 > >> >> >> > ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: > >> >> >> > 3.464102): 57.254450 > >> >> >> > ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: > >> >> >> > 3.872983): 61.127434 > >> >> >> > ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: > >> >> >> > 4.242641): 65.370075 > >> >> >> > ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: > >> >> >> > 4.582576): 69.952650 > >> >> >> > ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: > >> >> >> > 4.898979): 74.851630 > >> >> >> > ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: > >> >> >> > 5.196152): 80.047782 > >> >> >> > ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 80.047782 > >> >> >> > ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: > >> >> >> > 2.000000): 82.047782 > >> >> >> > ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: > >> >> >> > 2.828427): 84.876209 > >> >> >> > ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: > >> >> >> > 3.464102): 88.340311 > >> >> >> > ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: > >> >> >> > 4.000000): 92.340311 > >> >> >> > ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: > >> >> >> > 4.472136): 96.812447 > >> >> >> > ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: > >> >> >> > 4.898979): 101.711426 > >> >> >> > ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: > >> >> >> > 5.291503): 107.002929 > >> >> >> > ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: > >> >> >> > 5.656854): 112.659783 > >> >> >> > ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: > >> >> >> > 6.000000): 118.659783 > >> >> >> > ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 118.659783 > >> >> >> > ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: > >> >> >> > 2.236068): 120.895851 > >> >> >> > ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: > >> >> >> > 3.162278): 124.058129 > >> >> >> > ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: > >> >> >> > 3.872983): 127.931112 > >> >> >> > ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: > >> >> >> > 4.472136): 132.403248 > >> >> >> > ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: > >> >> >> > 5.000000): 137.403248 > >> >> >> > ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: > >> >> >> > 5.477226): 142.880474 > >> >> >> > ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: > >> >> >> > 5.916080): 148.796553 > >> >> >> > ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: > >> >> >> > 6.324555): 155.121109 > >> >> >> > ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: > >> >> >> > 6.708204): 161.829313 > >> >> >> > ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 161.829313 > >> >> >> > ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: > >> >> >> > 2.449490): 164.278802 > >> >> >> > ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: > >> >> >> > 3.464102): 167.742904 > >> >> >> > ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: > >> >> >> > 4.242641): 171.985545 > >> >> >> > ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: > >> >> >> > 4.898979): 176.884524 > >> >> >> > ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: > >> >> >> > 5.477226): 182.361750 > >> >> >> > ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: > >> >> >> > 6.000000): 188.361750 > >> >> >> > ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: > >> >> >> > 6.480741): 194.842491 > >> >> >> > ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: > >> >> >> > 6.928203): 201.770694 > >> >> >> > ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: > >> >> >> > 7.348469): 209.119163 > >> >> >> > ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 209.119163 > >> >> >> > ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: > >> >> >> > 2.645751): 211.764914 > >> >> >> > ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: > >> >> >> > 3.741657): 215.506572 > >> >> >> > ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: > >> >> >> > 4.582576): 220.089147 > >> >> >> > ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: > >> >> >> > 5.291503): 225.380650 > >> >> >> > ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: > >> >> >> > 5.916080): 231.296730 > >> >> >> > ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: > >> >> >> > 6.480741): 237.777470 > >> >> >> > ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: > >> >> >> > 7.000000): 244.777470 > >> >> >> > ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: > >> >> >> > 7.483315): 252.260785 > >> >> >> > ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: > >> >> >> > 7.937254): 260.198039 > >> >> >> > ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 260.198039 > >> >> >> > ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: > >> >> >> > 2.828427): 263.026466 > >> >> >> > ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: > >> >> >> > 4.000000): 267.026466 > >> >> >> > ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: > >> >> >> > 4.898979): 271.925446 > >> >> >> > ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: > >> >> >> > 5.656854): 277.582300 > >> >> >> > ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: > >> >> >> > 6.324555): 283.906855 > >> >> >> > ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: > >> >> >> > 6.928203): 290.835059 > >> >> >> > ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: > >> >> >> > 7.483315): 298.318373 > >> >> >> > ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: > >> >> >> > 8.000000): 306.318373 > >> >> >> > ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: > >> >> >> > 8.485281): 314.803655 > >> >> >> > ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 314.803655 > >> >> >> > ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: > >> >> >> > 3.000000): 317.803655 > >> >> >> > ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: > >> >> >> > 4.242641): 322.046295 > >> >> >> > ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: > >> >> >> > 5.196152): 327.242448 > >> >> >> > ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: > >> >> >> > 6.000000): 333.242448 > >> >> >> > ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: > >> >> >> > 6.708204): 339.950652 > >> >> >> > ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: > >> >> >> > 7.348469): 347.299121 > >> >> >> > ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: > >> >> >> > 7.937254): 355.236375 > >> >> >> > ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: > >> >> >> > 8.485281): 363.721656 > >> >> >> > ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: > >> >> >> > 9.000000): 372.721656 > >> >> >> > Final Result: 372.721656 > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > As we can see in the following iterations the sqrt(1) as well as > > the > >> >> >> > result is set to zero for some reason. > >> >> >> > > >> >> >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > > >> >> >> > Please help me to resolve the accuracy issue! I think that it > will > >> >> >> > be very useful for gem5 community. > >> >> >> > > >> >> >> > To be noticed, I find the correct simulated tick in which the > >> >> >> > application started in FS (using m5 dumpstats), and I start the > >> >> >> > --debug-start, but the trace file which is generated is 10x > larger > >> >> >> > than SE mode for the same application. How can I compare them? > >> >> >> > > >> >> >> > Thank you in advance! > >> >> >> > Best regards, > >> >> >> > Nikos > >> >> >> > > >> >> >> > Quoting Νικόλαος Ταμπουρατζής <ntampourat...@ece.auth.gr>: > >> >> >> > > >> >> >> >> Dear Jason, > >> >> >> >> > >> >> >> >> I am trying to use --debug-start but in FS mode it is very > >> >> >> >> difficult to find the tick on which the application is started! > >> >> >> >> > >> >> >> >> However, I am writing the following very simple c++ program: > >> >> >> >> > >> >> >> >> #include <cmath> > >> >> >> >> #include <stdio.h> > >> >> >> >> > >> >> >> >> int main(){ > >> >> >> >> > >> >> >> >> int dim = 4096; > >> >> >> >> > >> >> >> >> double result; > >> >> >> >> > >> >> >> >> for (int iter = 0; iter < 2; iter++){ > >> >> >> >> result = 0; > >> >> >> >> for (int i = 0; i < dim; i++){ > >> >> >> >> for (int j = 0; j < dim; j++){ > >> >> >> >> result += sqrt(i) * sqrt(j); > >> >> >> >> } > >> >> >> >> } > >> >> >> >> printf("Result: %lf\n", result); //Result: > > 30530733453.127449 > >> >> >> >> } > >> >> >> >> } > >> >> >> >> > >> >> >> >> I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o > >> >> >> >> test_riscv test_riscv.cpp > >> >> >> >> > >> >> >> >> > >> >> >> >> While in X86 (without cross-compilation of course), QEMU-RISCV, > >> >> >> >> GEM5-SE the result is the same (30530733453.127449), in GEM5-FS > > the > >> >> >> >> result is different! In addition, the result is also different > >> >> >> >> between the 2 iterations. > >> >> >> >> > >> >> >> >> Please reproduce the error if you want in order to verify my > > result. > >> >> >> >> Ηow can the issue be resolved? > >> >> >> >> > >> >> >> >> Thank you in advance! > >> >> >> >> > >> >> >> >> Best regards, > >> >> >> >> Nikos > >> >> >> >> > >> >> >> >> > >> >> >> >> Quoting Jason Lowe-Power <ja...@lowepower.com>: > >> >> >> >> > >> >> >> >>> Hi Nikos, > >> >> >> >>> > >> >> >> >>> You can use --debug-start to start the debugging after some > > number > >> >> of > >> >> >> >>> ticks. Also, I would expect that the difference should come up > >> >> >> quickly, so > >> >> >> >>> no need to run the program to the end. > >> >> >> >>> > >> >> >> >>> For the FS mode one, you will want to just start the trace as > > the > >> >> >> >>> application starts. This could be a bit of a pain. > >> >> >> >>> > >> >> >> >>> I'm not really sure what fundamentally could be different. FS > > and SE > >> >> >> mode > >> >> >> >>> use the exact same code for executing instructions, so I don't > > think > >> >> >> that's > >> >> >> >>> the problem. Have you tried running for smaller inputs or just > > one > >> >> >> >>> iteration? > >> >> >> >>> > >> >> >> >>> Jason > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής < > >> >> >> >>> ntampourat...@ece.auth.gr> wrote: > >> >> >> >>> > >> >> >> >>>> Dear Bobby, > >> >> >> >>>> > >> >> >> >>>> Iam trying to add --debug-flags=Exec (building the gem5 for > >> >> gem5.opt > >> >> >> >>>> not for gem5.fast which I had) but the debug traces exceed > the > > 20GB > >> >> >> >>>> (and it is not finished yet) for less than 1 simulated > second. > > How > >> >> can > >> >> >> >>>> I reduce the size of the debug-flags (or set something more > >> >> specific)? > >> >> >> >>>> > >> >> >> >>>> In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. > > If > >> >> you > >> >> >> >>>> want, you can compare these two output files > >> >> >> >>>> (hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). > As > > you > >> >> can > >> >> >> >>>> see, something goes wrong with the accuracy of calculations > in > > FS > >> >> mode > >> >> >> >>>> (benchmark uses double precission). You can find the files > > here: > >> >> >> >>>> http://kition.mhl.tuc.gr:8000/d/68d82f3533/ > >> >> >> >>>> > >> >> >> >>>> Best regards, > >> >> >> >>>> Nikos > >> >> >> >>>> > >> >> >> >>>> Quoting Jason Lowe-Power <ja...@lowepower.com>: > >> >> >> >>>> > >> >> >> >>>>> That's quite odd that it works in SE mode but not FS mode! > >> >> >> >>>>> > >> >> >> >>>>> I would suggest running with --debug-flags=Exec for both and > > then > >> >> >> >>>> perform a > >> >> >> >>>>> diff to see how they differ. > >> >> >> >>>>> > >> >> >> >>>>> Cheers, > >> >> >> >>>>> Jason > >> >> >> >>>>> > >> >> >> >>>>> On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής < > >> >> >> >>>>> ntampourat...@ece.auth.gr> wrote: > >> >> >> >>>>> > >> >> >> >>>>>> Dear Bobby, > >> >> >> >>>>>> > >> >> >> >>>>>> In QEMU I get the same (correct) results that I get in SE > > mode > >> >> >> >>>>>> simulation. I get invalid results in FS simulation (in both > >> >> >> >>>>>> riscv-fs.py and riscv-ubuntu-run.py). I cannot access real > > RISCV > >> >> >> >>>>>> hardware at this moment, however, if you want you may > > execute my > >> >> >> xhpcg > >> >> >> >>>>>> binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with > the > >> >> >> >>>>>> following configuration: > >> >> >> >>>>>> > >> >> >> >>>>>> ./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 > > --rt=0.1 > >> >> >> >>>>>> > >> >> >> >>>>>> Please let me know if you have any updates! > >> >> >> >>>>>> > >> >> >> >>>>>> Best regards, > >> >> >> >>>>>> Nikos > >> >> >> >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> Quoting Jason Lowe-Power <ja...@lowepower.com>: > >> >> >> >>>>>> > >> >> >> >>>>>>> Hi Nikos, > >> >> >> >>>>>>> > >> >> >> >>>>>>> I notice you said the following in your original email: > >> >> >> >>>>>>> > >> >> >> >>>>>>> In addition, I used the RISCV Ubuntu image > >> >> >> >>>>>>>> ( > >> >> >> > https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu > >> >> >> >>>> ), > >> >> >> >>>>>>>> I installed the gcc compiler, compile it (through qemu) > > and I > >> >> get > >> >> >> >>>>>>>> wrong results too. > >> >> >> >>>>>>> > >> >> >> >>>>>>> > >> >> >> >>>>>>> Is this saying you get the wrong results is QEMU? If so, > > the bug > >> >> >> is in > >> >> >> >>>>>> GCC > >> >> >> >>>>>>> or the HPCG workload, not in gem5. If not, I would test in > > QEMU > >> >> to > >> >> >> >>>> make > >> >> >> >>>>>>> sure the binary works there. Another way you could test to > > see > >> >> if > >> >> >> the > >> >> >> >>>>>>> problem is your binary or gem5 would be to run it on real > >> >> >> hardware. We > >> >> >> >>>>>> have > >> >> >> >>>>>>> access to some RISC-V hardware here at UC Davis, if you > > don't > >> >> have > >> >> >> >>>> access > >> >> >> >>>>>>> to it. > >> >> >> >>>>>>> > >> >> >> >>>>>>> Cheers, > >> >> >> >>>>>>> Jason > >> >> >> >>>>>>> > >> >> >> >>>>>>> On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής < > >> >> >> >>>>>>> ntampourat...@ece.auth.gr> wrote: > >> >> >> >>>>>>> > >> >> >> >>>>>>>> Dear Bobby, > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> 1) I use the original riscv-fs.py which is provided in > the > >> >> latest > >> >> >> >>>> gem5 > >> >> >> >>>>>>>> release. > >> >> >> >>>>>>>> I run the gem5 once (./build/RISCV/gem5.fast -d > >> >> ./HPCG_FS_results > >> >> >> >>>>>>>> ./configs/example/gem5_library/riscv-fs.py) in order to > >> >> download > >> >> >> the > >> >> >> >>>>>>>> riscv-bootloader-vmlinux-5.10 and riscv-disk-img. > >> >> >> >>>>>>>> After this I mount the riscv-disk-img (sudo mount -o loop > >> >> >> >>>>>>>> riscv-disk-img /mnt), put the xhpcg executable and I do > the > >> >> >> following > >> >> >> >>>>>>>> changes in riscv-fs.py to boot the riscv-disk-img with > >> >> executable: > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> image = CustomDiskImageResource( > >> >> >> >>>>>>>> local_path = > > "/home/cossim/.cache/gem5/riscv-disk-img", > >> >> >> >>>>>>>> ) > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> # Set the Full System workload. > >> >> >> >>>>>>>> board.set_kernel_disk_workload( > >> >> >> >>>>>>>> > >> >> >> kernel=Resource("riscv-bootloader-vmlinux-5.10"), > >> >> >> >>>>>>>> disk_image=image, > >> >> >> >>>>>>>> ) > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> Finally, in the > >> >> >> gem5/src/python/gem5/components/boards/riscv_board.py > >> >> >> >>>>>>>> I change the last line to "return ["console=ttyS0", > >> >> >> >>>>>>>> "root={root_value}", "rw"]" in order to allow the write > >> >> >> permissions > >> >> >> >>>> in > >> >> >> >>>>>>>> the image. > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> 2) The HPCG benchmark after some iterations calculates if > > the > >> >> >> results > >> >> >> >>>>>>>> are valid or not valid. In the case of FS it gives > invalid > >> >> >> results. > >> >> >> >>>> As > >> >> >> >>>>>>>> I see from the results, one (at least) problem is that > > produces > >> >> >> >>>>>>>> different results in each HPCG execution (with the same > >> >> >> >>>> configuration). > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> Here is the HPCG output and riscv-fs.py > >> >> >> >>>>>>>> (http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may > >> >> reproduce > >> >> >> the > >> >> >> >>>>>>>> results in the video if you use the xhpcg executable > >> >> >> >>>>>>>> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> Please help me in order to solve it! > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> Finally, I get invalid results in the HPL benchmark in FS > > mode > >> >> >> too. > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> Best regards, > >> >> >> >>>>>>>> Nikos > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> Quoting Bobby Bruce <bbr...@ucdavis.edu>: > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> > I'm going to need a bit more information to help: > >> >> >> >>>>>>>> > > >> >> >> >>>>>>>> > 1. In what way have you modified > >> >> >> >>>>>>>> > ./configs/example/gem5_library/riscv-fs.py? Can you > > attach > >> >> the > >> >> >> >>>> script > >> >> >> >>>>>>>> here? > >> >> >> >>>>>>>> > 2. What error are you getting or in what way are the > > results > >> >> >> >>>> invalid? > >> >> >> >>>>>>>> > > >> >> >> >>>>>>>> > - > >> >> >> >>>>>>>> > Dr. Bobby R. Bruce > >> >> >> >>>>>>>> > Room 3050, > >> >> >> >>>>>>>> > Kemper Hall, UC Davis > >> >> >> >>>>>>>> > Davis, > >> >> >> >>>>>>>> > CA, 95616 > >> >> >> >>>>>>>> > > >> >> >> >>>>>>>> > web: https://www.bobbybruce.net > >> >> >> >>>>>>>> > > >> >> >> >>>>>>>> > > >> >> >> >>>>>>>> > On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής < > >> >> >> >>>>>>>> > ntampourat...@ece.auth.gr> wrote: > >> >> >> >>>>>>>> > > >> >> >> >>>>>>>> >> > >> >> >> >>>>>>>> >> Dear gem5 community, > >> >> >> >>>>>>>> >> > >> >> >> >>>>>>>> >> I have successfully cross-compile the HPCG benchmark > for > >> >> RISCV > >> >> >> >>>>>> (Serial > >> >> >> >>>>>>>> >> version, without MPI and OpenMP). While it working > > properly > >> >> in > >> >> >> >>>> gem5 > >> >> >> >>>>>> SE > >> >> >> >>>>>>>> >> mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results > >> >> >> >>>>>>>> >> ./configs/example/se.py -c xhpcg --options '--nx=16 > > --ny=16 > >> >> >> >>>> --nz=16 > >> >> >> >>>>>>>> >> --npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid > > results > >> >> in FS > >> >> >> >>>>>>>> >> simulation using "./build/RISCV/gem5.fast -d > >> >> ./HPCG_FS_results > >> >> >> >>>>>>>> >> ./configs/example/gem5_library/riscv-fs.py" (I mount > the > >> >> riscv > >> >> >> >>>> image > >> >> >> >>>>>>>> >> and put it). > >> >> >> >>>>>>>> >> > >> >> >> >>>>>>>> >> Can you help me please? > >> >> >> >>>>>>>> >> > >> >> >> >>>>>>>> >> In addition, I used the RISCV Ubuntu image > >> >> >> >>>>>>>> >> ( > >> >> >> >>>> > >> >> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu > >> >> >> >>>>>> ), > >> >> >> >>>>>>>> >> I installed the gcc compiler, compile it (through > qemu) > > and > >> >> I > >> >> >> get > >> >> >> >>>>>>>> >> wrong results too. > >> >> >> >>>>>>>> >> > >> >> >> >>>>>>>> >> Here is the Makefile which I use, the hpcg executable > > for > >> >> RISCV > >> >> >> >>>>>>>> >> (xhpcg), and a video that shows the results > >> >> >> >>>>>>>> >> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/). > >> >> >> >>>>>>>> >> > >> >> >> >>>>>>>> >> P.S. I use the latest gem5 version. > >> >> >> >>>>>>>> >> > >> >> >> >>>>>>>> >> Thank you in advance! :) > >> >> >> >>>>>>>> >> > >> >> >> >>>>>>>> >> Best regards, > >> >> >> >>>>>>>> >> Nikos > >> >> >> >>>>>>>> >> _______________________________________________ > >> >> >> >>>>>>>> >> gem5-users mailing list -- gem5-users@gem5.org > >> >> >> >>>>>>>> >> To unsubscribe send an email to > > gem5-users-le...@gem5.org > >> >> >> >>>>>>>> >> > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> _______________________________________________ > >> >> >> >>>>>>>> gem5-users mailing list -- gem5-users@gem5.org > >> >> >> >>>>>>>> To unsubscribe send an email to > gem5-users-le...@gem5.org > >> >> >> >>>>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> _______________________________________________ > >> >> >> >>>>>> gem5-users mailing list -- gem5-users@gem5.org > >> >> >> >>>>>> To unsubscribe send an email to gem5-users-le...@gem5.org > >> >> >> >>>>>> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> _______________________________________________ > >> >> >> >>>> gem5-users mailing list -- gem5-users@gem5.org > >> >> >> >>>> To unsubscribe send an email to gem5-users-le...@gem5.org > >> >> >> >>>> > >> >> >> >> > >> >> >> >> > >> >> >> >> _______________________________________________ > >> >> >> >> gem5-users mailing list -- gem5-users@gem5.org > >> >> >> >> To unsubscribe send an email to gem5-users-le...@gem5.org > >> >> >> > > >> >> >> > > >> >> >> > _______________________________________________ > >> >> >> > gem5-users mailing list -- gem5-users@gem5.org > >> >> >> > To unsubscribe send an email to gem5-users-le...@gem5.org > >> >> >> > >> >> >> > >> >> >> _______________________________________________ > >> >> >> gem5-users mailing list -- gem5-users@gem5.org > >> >> >> To unsubscribe send an email to gem5-users-le...@gem5.org > >> >> >> > >> >> > >> >> > >> >> _______________________________________________ > >> >> gem5-users mailing list -- gem5-users@gem5.org > >> >> To unsubscribe send an email to gem5-users-le...@gem5.org > >> >> > >> > >> > >> _______________________________________________ > >> gem5-users mailing list -- gem5-users@gem5.org > >> To unsubscribe send an email to gem5-users-le...@gem5.org > > > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-le...@gem5.org >
_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org