You mean this bug? Unfortunately not, I've been very busy with the upcoming
gem5 release and haven't had time to investigate this further.

--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net


On Mon, Oct 31, 2022 at 1:45 AM Νικόλαος Ταμπουρατζής via gem5-users <
gem5-users@gem5.org> wrote:

> Dear Bobby, Jason, all,
>
> Is there any update about the accuracy of RISC-V FS?
>
> Best regards,
> Nikos
>
>
> Quoting Bobby Bruce <bbr...@ucdavis.edu>:
>
> > Jason and I had a theory that this may be due to the "Rounding Mode" for
> > floating pointing being set incorrectly in FS mode. That's set via a
> macro
> > here:
> >
> https://gem5.googlesource.com/public/gem5/+/refs/tags/v22.0.0.2/src/arch/riscv/fp_inst.hh#36
> >
> > I manually expanded the macro here:
> >
> https://gem5.googlesource.com/public/gem5/+/refs/tags/v22.0.0.2/src/arch/riscv/isa/decoder.isa#1495
> ,
> > inside the "fsqrt_d" definition then compiled "build/ALL/gem5.debug".
> Then
> > used gdb to add a breakpoint in the "Fsqrt_d::execute" function (in the
> > generated "build/ALL/arch/riscv/generated/exec-ns.cc.inc" file).
> >
> > ```
> > gdb build/ALL/gem5.opt
> > break Fsqrt_d::execute
> > run bug-recreation/se-mode-run.py # or `run
> bug-recreation/fs-mode-run.py`
> > ```
> >
> > Stepping through with gdb I the rounding mode is `0` for SE mode and `0`
> > for FS mode as well. So, no luck with that theory.
> >
> > My new theory is that this bug has something to do with thread context
> > switching being implemented incorrectly in RISC-V somehow. I find it
> > strange that the sqrt(1) works fine for a while (i.e. returns `1`) then
> > suddenly starts returning zero after a certain point in the execution. In
> > addition, it's odd that the loop is not returning the same value each
> time
> > despite executing the same code. It'd make sense to me that the thread is
> > being stored and then resumed with some corruption of the floating point
> > data. This would also explain why this bug only occurs in FS mode.
> >
> > I'll try to find time to figure out a good test for this. If anyone has
> any
> > other theories or ideas then let me know.
> >
> > --
> > Dr. Bobby R. Bruce
> > Room 3050,
> > Kemper Hall, UC Davis
> > Davis,
> > CA, 95616
> >
> > web: https://www.bobbybruce.net
> >
> >
> > On Fri, Oct 7, 2022 at 12:50 PM Νικόλαος Ταμπουρατζής <
> > ntampourat...@ece.auth.gr> wrote:
> >>
> >> Dear Jason & Boddy,
> >>
> >> Unfortunately, I have tried my simple example without the sqrt
> >> function and the problem remains. Specifically, I have the following
> >> simple code:
> >>
> >>
> >> #include <cmath>
> >> #include <stdio.h>
> >>
> >> int main(){
> >>
> >>      int dim = 1024;
> >>
> >>      double result;
> >>
> >>      for (int iter = 0; iter < 2; iter++){
> >>          result = 0;
> >>          for (int i = 0; i < dim; i++){
> >>              for (int j = 0; j < dim; j++){
> >>                  result += i * j;
> >>              }
> >>          }
> >>          printf("Final Result: %lf\n", result);
> >>      }
> >> }
> >>
> >>
> >> In the above code, the correct result is 274341298176.000000 (from
> >> RISCV-SE mode and x86), while in FS mode I get sometimes the correct
> >> result and other times a different number.
> >>
> >> Best regards,
> >> Nikos
> >>
> >>
> >> Quoting Jason Lowe-Power <ja...@lowepower.com>:
> >>
> >> > I have an idea...
> >> >
> >> > Have you put a breakpoint in the implementation of the fsqrt_d
> > function? I
> >> > would like to know if when running in SE mode and running in FS mode
> we
> > are
> >> > using the same rounding mode. My hypothesis is that in FS mode the
> > rounding
> >> > mode is set differently.
> >> >
> >> > Cheers,
> >> > Jason
> >> >
> >> > On Fri, Oct 7, 2022 at 12:15 AM Νικόλαος Ταμπουρατζής <
> >> > ntampourat...@ece.auth.gr> wrote:
> >> >
> >> >> Dear Boddy,
> >> >>
> >> >> Thanks a lot for the effort! I looked in detail and I observe that
> the
> >> >> problem is created only using float and double variables (in the case
> >> >> of int it is working properly in FS mode). Specifically, in the case
> >> >> of float the variables are set to "nan", while in the case of double
> >> >> the variables are set to 0.000000 (in random time - probably from
> some
> >> >> instruction of simulated OS?). You may use a simple c/c++ example in
> >> >> order to get some traces before going to HPCG...
> >> >>
> >> >> Thank you in advance!!
> >> >> Best regards,
> >> >> Nikos
> >> >>
> >> >>
> >> >> Quoting Bobby Bruce <bbr...@ucdavis.edu>:
> >> >>
> >> >> > Hey Niko,
> >> >> >
> >> >> > Thanks for this analysis. I jumped a little into this today but
> > didn't
> >> >> get
> >> >> > as far as you did. I wanted to find a quick way to recreate the
> >> >> following:
> >> >> > https://gem5-review.googlesource.com/c/public/gem5/+/64211.
> Please
> > feel
> >> >> > free to use this, if it helps any.
> >> >> >
> >> >> > It's very strange to me that this bug hasn't manifested itself
> > before but
> >> >> > it's undeniably there. I'll try to spend more time looking at this
> >> >> tomorrow
> >> >> > with some traces and debug flags and see if I can narrow down the
> >> >> problem.
> >> >> >
> >> >> > --
> >> >> > Dr. Bobby R. Bruce
> >> >> > Room 3050,
> >> >> > Kemper Hall, UC Davis
> >> >> > Davis,
> >> >> > CA, 95616
> >> >> >
> >> >> > web: https://www.bobbybruce.net
> >> >> >
> >> >> >
> >> >> > On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής <
> >> >> > ntampourat...@ece.auth.gr> wrote:
> >> >> >
> >> >> >> In my previous results, I had used double (not float) for the
> >> >> >> following variables: result, sq_i and sq_j. In the case of float
> >> >> >> instead of double I get "nan" and not 0.000000.
> >> >> >>
> >> >> >> Quoting Νικόλαος Ταμπουρατζής <ntampourat...@ece.auth.gr>:
> >> >> >>
> >> >> >> > Dear Jason, all,
> >> >> >> >
> >> >> >> > I am trying to find the accuracy problem with RISCV-FS and I
> > observe
> >> >> >> > that the problem is created (at least in my dummy example)
> because
> >> >> >> > the variables (double) are set to zero in random simulated time
> > (for
> >> >> >> > this reason I get different results among executions of the same
> >> >> >> > code). Specifically for the following dummy code:
> >> >> >> >
> >> >> >> >
> >> >> >> > #include <cmath>
> >> >> >> > #include <stdio.h>
> >> >> >> >
> >> >> >> > int main(){
> >> >> >> >
> >> >> >> >     int dim = 10;
> >> >> >> >
> >> >> >> >     float result;
> >> >> >> >
> >> >> >> >     for (int iter = 0; iter < 2; iter++){
> >> >> >> >         result = 0;
> >> >> >> >         for (int i = 0; i < dim; i++){
> >> >> >> >             for (int j = 0; j < dim; j++){
> >> >> >> >                 float sq_i = sqrt(i);
> >> >> >> >                 float sq_j = sqrt(j);
> >> >> >> >                 result += sq_i * sq_j;
> >> >> >> >                 printf("ITER: %d | i: %d | j: %d Result(i: %f |
> j:
> >> >> >> > %f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j,
> > result);
> >> >> >> >             }
> >> >> >> >         }
> >> >> >> >         printf("Final Result: %lf\n", result);
> >> >> >> >     }
> >> >> >> > }
> >> >> >> >
> >> >> >> >
> >> >> >> > The correct Final Result in both iterations is 372.721656.
> > However,
> >> >> >> > I get the following results in FS:
> >> >> >> >
> >> >> >> > ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j:
> >> >> >> > 1.000000): 1.000000
> >> >> >> > ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j:
> >> >> >> > 1.414214): 2.414214
> >> >> >> > ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j:
> >> >> >> > 1.732051): 4.146264
> >> >> >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j:
> >> >> >> > 1.414214): 1.414214
> >> >> >> > ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j:
> >> >> >> > 2.000000): 3.414214
> >> >> >> > ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j:
> >> >> >> > 2.449490): 5.863703
> >> >> >> > ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j:
> >> >> >> > 2.828427): 8.692130
> >> >> >> > ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j:
> >> >> >> > 3.162278): 11.854408
> >> >> >> > ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j:
> >> >> >> > 3.464102): 15.318510
> >> >> >> > ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j:
> >> >> >> > 3.741657): 19.060167
> >> >> >> > ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j:
> >> >> >> > 4.000000): 23.060167
> >> >> >> > ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j:
> >> >> >> > 4.242641): 27.302808
> >> >> >> > ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 27.302808
> >> >> >> > ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j:
> >> >> >> > 1.732051): 29.034859
> >> >> >> > ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j:
> >> >> >> > 2.449490): 31.484348
> >> >> >> > ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j:
> >> >> >> > 3.000000): 34.484348
> >> >> >> > ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j:
> >> >> >> > 3.464102): 37.948450
> >> >> >> > ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j:
> >> >> >> > 3.872983): 41.821433
> >> >> >> > ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j:
> >> >> >> > 4.242641): 46.064074
> >> >> >> > ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j:
> >> >> >> > 4.582576): 50.646650
> >> >> >> > ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j:
> >> >> >> > 4.898979): 55.545629
> >> >> >> > ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j:
> >> >> >> > 5.196152): 60.741782
> >> >> >> > ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 60.741782
> >> >> >> > ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j:
> >> >> >> > 2.000000): 62.741782
> >> >> >> > ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j:
> >> >> >> > 2.828427): 65.570209
> >> >> >> > ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j:
> >> >> >> > 3.464102): 69.034310
> >> >> >> > ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j:
> >> >> >> > 4.000000): 73.034310
> >> >> >> > ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j:
> >> >> >> > 4.472136): 77.506446
> >> >> >> > ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j:
> >> >> >> > 4.898979): 82.405426
> >> >> >> > ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j:
> >> >> >> > 5.291503): 87.696928
> >> >> >> > ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j:
> >> >> >> > 5.656854): 93.353783
> >> >> >> > ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j:
> >> >> >> > 6.000000): 99.353783
> >> >> >> > ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 99.353783
> >> >> >> > ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j:
> >> >> >> > 2.236068): 101.589851
> >> >> >> > ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j:
> >> >> >> > 3.162278): 104.752128
> >> >> >> > ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j:
> >> >> >> > 3.872983): 108.625112
> >> >> >> > ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j:
> >> >> >> > 4.472136): 113.097248
> >> >> >> > ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j:
> >> >> >> > 5.000000): 118.097248
> >> >> >> > ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j:
> >> >> >> > 5.477226): 123.574473
> >> >> >> > ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j:
> >> >> >> > 5.916080): 129.490553
> >> >> >> > ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j:
> >> >> >> > 6.324555): 135.815108
> >> >> >> > ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j:
> >> >> >> > 6.708204): 142.523312
> >> >> >> > ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 142.523312
> >> >> >> > ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j:
> >> >> >> > 2.449490): 144.972802
> >> >> >> > ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j:
> >> >> >> > 3.464102): 148.436904
> >> >> >> > ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j:
> >> >> >> > 4.242641): 152.679544
> >> >> >> > ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j:
> >> >> >> > 4.898979): 157.578524
> >> >> >> > ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j:
> >> >> >> > 5.477226): 163.055749
> >> >> >> > ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j:
> >> >> >> > 6.000000): 169.055749
> >> >> >> > ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j:
> >> >> >> > 6.480741): 175.536490
> >> >> >> > ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j:
> >> >> >> > 6.928203): 182.464693
> >> >> >> > ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j:
> >> >> >> > 7.348469): 189.813162
> >> >> >> > ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 189.813162
> >> >> >> > ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j:
> >> >> >> > 2.645751): 192.458914
> >> >> >> > ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j:
> >> >> >> > 3.741657): 196.200571
> >> >> >> > ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j:
> >> >> >> > 4.582576): 200.783147
> >> >> >> > ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j:
> >> >> >> > 5.291503): 206.074649
> >> >> >> > ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j:
> >> >> >> > 5.916080): 211.990729
> >> >> >> > ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j:
> >> >> >> > 6.480741): 218.471470
> >> >> >> > ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j:
> >> >> >> > 7.000000): 225.471470
> >> >> >> > ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j:
> >> >> >> > 7.483315): 232.954785
> >> >> >> > ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j:
> >> >> >> > 7.937254): 240.892039
> >> >> >> > ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 240.892039
> >> >> >> > ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j:
> >> >> >> > 2.828427): 243.720466
> >> >> >> > ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j:
> >> >> >> > 4.000000): 247.720466
> >> >> >> > ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j:
> >> >> >> > 4.898979): 252.619445
> >> >> >> > ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j:
> >> >> >> > 5.656854): 258.276300
> >> >> >> > ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j:
> >> >> >> > 6.324555): 264.600855
> >> >> >> > ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j:
> >> >> >> > 6.928203): 271.529058
> >> >> >> > ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j:
> >> >> >> > 7.483315): 279.012373
> >> >> >> > ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j:
> >> >> >> > 8.000000): 287.012373
> >> >> >> > ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j:
> >> >> >> > 8.485281): 295.497654
> >> >> >> > ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 295.497654
> >> >> >> > ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j:
> >> >> >> > 3.000000): 298.497654
> >> >> >> > ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j:
> >> >> >> > 4.242641): 302.740295
> >> >> >> > ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j:
> >> >> >> > 5.196152): 307.936447
> >> >> >> > ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j:
> >> >> >> > 6.000000): 313.936447
> >> >> >> > ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j:
> >> >> >> > 6.708204): 320.644651
> >> >> >> > ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j:
> >> >> >> > 7.348469): 327.993120
> >> >> >> > ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j:
> >> >> >> > 7.937254): 335.930374
> >> >> >> > ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j:
> >> >> >> > 8.485281): 344.415656
> >> >> >> > ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j:
> >> >> >> > 9.000000): 353.415656
> >> >> >> > Final Result: 353.415656
> >> >> >> > ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j:
> >> >> >> > 1.000000): 1.000000
> >> >> >> > ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j:
> >> >> >> > 1.414214): 2.414214
> >> >> >> > ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j:
> >> >> >> > 1.732051): 4.146264
> >> >> >> > ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | i*j:
> >> >> >> > 2.000000): 6.146264
> >> >> >> > ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i*j:
> >> >> >> > 2.236068): 8.382332
> >> >> >> > ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | i*j:
> >> >> >> > 2.449490): 10.831822
> >> >> >> > ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i*j:
> >> >> >> > 2.645751): 13.477573
> >> >> >> > ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | i*j:
> >> >> >> > 2.828427): 16.306001
> >> >> >> > ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i*j:
> >> >> >> > 3.000000): 19.306001
> >> >> >> > ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 19.306001
> >> >> >> > ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j:
> >> >> >> > 1.414214): 20.720214
> >> >> >> > ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j:
> >> >> >> > 2.000000): 22.720214
> >> >> >> > ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j:
> >> >> >> > 2.449490): 25.169704
> >> >> >> > ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j:
> >> >> >> > 2.828427): 27.998131
> >> >> >> > ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j:
> >> >> >> > 3.162278): 31.160409
> >> >> >> > ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j:
> >> >> >> > 3.464102): 34.624510
> >> >> >> > ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j:
> >> >> >> > 3.741657): 38.366168
> >> >> >> > ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j:
> >> >> >> > 4.000000): 42.366168
> >> >> >> > ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j:
> >> >> >> > 4.242641): 46.608808
> >> >> >> > ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 46.608808
> >> >> >> > ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j:
> >> >> >> > 1.732051): 48.340859
> >> >> >> > ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j:
> >> >> >> > 2.449490): 50.790349
> >> >> >> > ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j:
> >> >> >> > 3.000000): 53.790349
> >> >> >> > ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j:
> >> >> >> > 3.464102): 57.254450
> >> >> >> > ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j:
> >> >> >> > 3.872983): 61.127434
> >> >> >> > ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j:
> >> >> >> > 4.242641): 65.370075
> >> >> >> > ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j:
> >> >> >> > 4.582576): 69.952650
> >> >> >> > ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j:
> >> >> >> > 4.898979): 74.851630
> >> >> >> > ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j:
> >> >> >> > 5.196152): 80.047782
> >> >> >> > ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 80.047782
> >> >> >> > ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j:
> >> >> >> > 2.000000): 82.047782
> >> >> >> > ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j:
> >> >> >> > 2.828427): 84.876209
> >> >> >> > ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j:
> >> >> >> > 3.464102): 88.340311
> >> >> >> > ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j:
> >> >> >> > 4.000000): 92.340311
> >> >> >> > ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j:
> >> >> >> > 4.472136): 96.812447
> >> >> >> > ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j:
> >> >> >> > 4.898979): 101.711426
> >> >> >> > ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j:
> >> >> >> > 5.291503): 107.002929
> >> >> >> > ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j:
> >> >> >> > 5.656854): 112.659783
> >> >> >> > ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j:
> >> >> >> > 6.000000): 118.659783
> >> >> >> > ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 118.659783
> >> >> >> > ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j:
> >> >> >> > 2.236068): 120.895851
> >> >> >> > ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j:
> >> >> >> > 3.162278): 124.058129
> >> >> >> > ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j:
> >> >> >> > 3.872983): 127.931112
> >> >> >> > ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j:
> >> >> >> > 4.472136): 132.403248
> >> >> >> > ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j:
> >> >> >> > 5.000000): 137.403248
> >> >> >> > ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j:
> >> >> >> > 5.477226): 142.880474
> >> >> >> > ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j:
> >> >> >> > 5.916080): 148.796553
> >> >> >> > ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j:
> >> >> >> > 6.324555): 155.121109
> >> >> >> > ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j:
> >> >> >> > 6.708204): 161.829313
> >> >> >> > ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 161.829313
> >> >> >> > ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j:
> >> >> >> > 2.449490): 164.278802
> >> >> >> > ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j:
> >> >> >> > 3.464102): 167.742904
> >> >> >> > ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j:
> >> >> >> > 4.242641): 171.985545
> >> >> >> > ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j:
> >> >> >> > 4.898979): 176.884524
> >> >> >> > ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j:
> >> >> >> > 5.477226): 182.361750
> >> >> >> > ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j:
> >> >> >> > 6.000000): 188.361750
> >> >> >> > ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j:
> >> >> >> > 6.480741): 194.842491
> >> >> >> > ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j:
> >> >> >> > 6.928203): 201.770694
> >> >> >> > ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j:
> >> >> >> > 7.348469): 209.119163
> >> >> >> > ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 209.119163
> >> >> >> > ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j:
> >> >> >> > 2.645751): 211.764914
> >> >> >> > ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j:
> >> >> >> > 3.741657): 215.506572
> >> >> >> > ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j:
> >> >> >> > 4.582576): 220.089147
> >> >> >> > ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j:
> >> >> >> > 5.291503): 225.380650
> >> >> >> > ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j:
> >> >> >> > 5.916080): 231.296730
> >> >> >> > ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j:
> >> >> >> > 6.480741): 237.777470
> >> >> >> > ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j:
> >> >> >> > 7.000000): 244.777470
> >> >> >> > ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j:
> >> >> >> > 7.483315): 252.260785
> >> >> >> > ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j:
> >> >> >> > 7.937254): 260.198039
> >> >> >> > ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 260.198039
> >> >> >> > ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j:
> >> >> >> > 2.828427): 263.026466
> >> >> >> > ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j:
> >> >> >> > 4.000000): 267.026466
> >> >> >> > ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j:
> >> >> >> > 4.898979): 271.925446
> >> >> >> > ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j:
> >> >> >> > 5.656854): 277.582300
> >> >> >> > ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j:
> >> >> >> > 6.324555): 283.906855
> >> >> >> > ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j:
> >> >> >> > 6.928203): 290.835059
> >> >> >> > ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j:
> >> >> >> > 7.483315): 298.318373
> >> >> >> > ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j:
> >> >> >> > 8.000000): 306.318373
> >> >> >> > ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j:
> >> >> >> > 8.485281): 314.803655
> >> >> >> > ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j:
> >> >> >> > 0.000000): 314.803655
> >> >> >> > ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j:
> >> >> >> > 3.000000): 317.803655
> >> >> >> > ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j:
> >> >> >> > 4.242641): 322.046295
> >> >> >> > ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j:
> >> >> >> > 5.196152): 327.242448
> >> >> >> > ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j:
> >> >> >> > 6.000000): 333.242448
> >> >> >> > ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j:
> >> >> >> > 6.708204): 339.950652
> >> >> >> > ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j:
> >> >> >> > 7.348469): 347.299121
> >> >> >> > ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j:
> >> >> >> > 7.937254): 355.236375
> >> >> >> > ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j:
> >> >> >> > 8.485281): 363.721656
> >> >> >> > ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j:
> >> >> >> > 9.000000): 372.721656
> >> >> >> > Final Result: 372.721656
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > As we can see in the following iterations the sqrt(1) as well as
> > the
> >> >> >> > result is set to zero for some reason.
> >> >> >> >
> >> >> >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
> >> >> >> > 0.000000): 0.000000
> >> >> >> >
> >> >> >> > Please help me to resolve the accuracy issue! I think that it
> will
> >> >> >> > be very useful for gem5 community.
> >> >> >> >
> >> >> >> > To be noticed, I find the correct simulated tick in which the
> >> >> >> > application started in FS (using m5 dumpstats), and I start the
> >> >> >> > --debug-start, but the trace file which is generated is 10x
> larger
> >> >> >> > than SE mode for the same application. How can I compare them?
> >> >> >> >
> >> >> >> > Thank you in advance!
> >> >> >> > Best regards,
> >> >> >> > Nikos
> >> >> >> >
> >> >> >> > Quoting Νικόλαος Ταμπουρατζής <ntampourat...@ece.auth.gr>:
> >> >> >> >
> >> >> >> >> Dear Jason,
> >> >> >> >>
> >> >> >> >> I am trying to use --debug-start but in FS mode it is very
> >> >> >> >> difficult to find the tick on which the application is started!
> >> >> >> >>
> >> >> >> >> However, I am writing the following very simple c++ program:
> >> >> >> >>
> >> >> >> >> #include <cmath>
> >> >> >> >> #include <stdio.h>
> >> >> >> >>
> >> >> >> >> int main(){
> >> >> >> >>
> >> >> >> >>    int dim = 4096;
> >> >> >> >>
> >> >> >> >>    double result;
> >> >> >> >>
> >> >> >> >>    for (int iter = 0; iter < 2; iter++){
> >> >> >> >>        result = 0;
> >> >> >> >>        for (int i = 0; i < dim; i++){
> >> >> >> >>            for (int j = 0; j < dim; j++){
> >> >> >> >>                result += sqrt(i) * sqrt(j);
> >> >> >> >>            }
> >> >> >> >>        }
> >> >> >> >>        printf("Result: %lf\n", result); //Result:
> > 30530733453.127449
> >> >> >> >>    }
> >> >> >> >> }
> >> >> >> >>
> >> >> >> >> I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o
> >> >> >> >> test_riscv test_riscv.cpp
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> While in X86 (without cross-compilation of course), QEMU-RISCV,
> >> >> >> >> GEM5-SE the result is the same (30530733453.127449), in GEM5-FS
> > the
> >> >> >> >> result is different! In addition, the result is also different
> >> >> >> >> between the 2 iterations.
> >> >> >> >>
> >> >> >> >> Please reproduce the error if you want in order to verify my
> > result.
> >> >> >> >> Ηow can the issue be resolved?
> >> >> >> >>
> >> >> >> >> Thank you in advance!
> >> >> >> >>
> >> >> >> >> Best regards,
> >> >> >> >> Nikos
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> Quoting Jason Lowe-Power <ja...@lowepower.com>:
> >> >> >> >>
> >> >> >> >>> Hi Nikos,
> >> >> >> >>>
> >> >> >> >>> You can use --debug-start to start the debugging after some
> > number
> >> >> of
> >> >> >> >>> ticks. Also, I would expect that the difference should come up
> >> >> >> quickly, so
> >> >> >> >>> no need to run the program to the end.
> >> >> >> >>>
> >> >> >> >>> For the FS mode one, you will want to just start the trace as
> > the
> >> >> >> >>> application starts. This could be a bit of a pain.
> >> >> >> >>>
> >> >> >> >>> I'm not really sure what fundamentally could be different. FS
> > and SE
> >> >> >> mode
> >> >> >> >>> use the exact same code for executing instructions, so I don't
> > think
> >> >> >> that's
> >> >> >> >>> the problem. Have you tried running for smaller inputs or just
> > one
> >> >> >> >>> iteration?
> >> >> >> >>>
> >> >> >> >>> Jason
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>> On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής <
> >> >> >> >>> ntampourat...@ece.auth.gr> wrote:
> >> >> >> >>>
> >> >> >> >>>> Dear Bobby,
> >> >> >> >>>>
> >> >> >> >>>> Iam trying to add --debug-flags=Exec (building the gem5 for
> >> >> gem5.opt
> >> >> >> >>>> not for gem5.fast which I had) but the debug traces exceed
> the
> > 20GB
> >> >> >> >>>> (and it is not finished yet) for less than 1 simulated
> second.
> > How
> >> >> can
> >> >> >> >>>> I reduce the size of the debug-flags (or set something more
> >> >> specific)?
> >> >> >> >>>>
> >> >> >> >>>> In contrast I build the HPCG benchmark with DHPCG_DEBUG flag.
> > If
> >> >> you
> >> >> >> >>>> want, you can compare these two output files
> >> >> >> >>>> (hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode).
> As
> > you
> >> >> can
> >> >> >> >>>> see, something goes wrong with the accuracy of calculations
> in
> > FS
> >> >> mode
> >> >> >> >>>> (benchmark uses double precission). You can find the files
> > here:
> >> >> >> >>>> http://kition.mhl.tuc.gr:8000/d/68d82f3533/
> >> >> >> >>>>
> >> >> >> >>>> Best regards,
> >> >> >> >>>> Nikos
> >> >> >> >>>>
> >> >> >> >>>> Quoting Jason Lowe-Power <ja...@lowepower.com>:
> >> >> >> >>>>
> >> >> >> >>>>> That's quite odd that it works in SE mode but not FS mode!
> >> >> >> >>>>>
> >> >> >> >>>>> I would suggest running with --debug-flags=Exec for both and
> > then
> >> >> >> >>>> perform a
> >> >> >> >>>>> diff to see how they differ.
> >> >> >> >>>>>
> >> >> >> >>>>> Cheers,
> >> >> >> >>>>> Jason
> >> >> >> >>>>>
> >> >> >> >>>>> On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής <
> >> >> >> >>>>> ntampourat...@ece.auth.gr> wrote:
> >> >> >> >>>>>
> >> >> >> >>>>>> Dear Bobby,
> >> >> >> >>>>>>
> >> >> >> >>>>>> In QEMU I get the same (correct) results that I get in SE
> > mode
> >> >> >> >>>>>> simulation. I get invalid results in FS simulation (in both
> >> >> >> >>>>>> riscv-fs.py and riscv-ubuntu-run.py). I cannot access real
> > RISCV
> >> >> >> >>>>>> hardware at this moment, however, if you want you may
> > execute my
> >> >> >> xhpcg
> >> >> >> >>>>>> binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with
> the
> >> >> >> >>>>>> following configuration:
> >> >> >> >>>>>>
> >> >> >> >>>>>> ./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1
> > --rt=0.1
> >> >> >> >>>>>>
> >> >> >> >>>>>> Please let me know if you have any updates!
> >> >> >> >>>>>>
> >> >> >> >>>>>> Best regards,
> >> >> >> >>>>>> Nikos
> >> >> >> >>>>>>
> >> >> >> >>>>>>
> >> >> >> >>>>>> Quoting Jason Lowe-Power <ja...@lowepower.com>:
> >> >> >> >>>>>>
> >> >> >> >>>>>>> Hi Nikos,
> >> >> >> >>>>>>>
> >> >> >> >>>>>>> I notice you said the following in your original email:
> >> >> >> >>>>>>>
> >> >> >> >>>>>>> In addition, I used the RISCV Ubuntu image
> >> >> >> >>>>>>>> (
> >> >> >>
> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu
> >> >> >> >>>> ),
> >> >> >> >>>>>>>> I installed the gcc compiler, compile it (through qemu)
> > and I
> >> >> get
> >> >> >> >>>>>>>> wrong results too.
> >> >> >> >>>>>>>
> >> >> >> >>>>>>>
> >> >> >> >>>>>>> Is this saying you get the wrong results is QEMU? If so,
> > the bug
> >> >> >> is in
> >> >> >> >>>>>> GCC
> >> >> >> >>>>>>> or the HPCG workload, not in gem5. If not, I would test in
> > QEMU
> >> >> to
> >> >> >> >>>> make
> >> >> >> >>>>>>> sure the binary works there. Another way you could test to
> > see
> >> >> if
> >> >> >> the
> >> >> >> >>>>>>> problem is your binary or gem5 would be to run it on real
> >> >> >> hardware. We
> >> >> >> >>>>>> have
> >> >> >> >>>>>>> access to some RISC-V hardware here at UC Davis, if you
> > don't
> >> >> have
> >> >> >> >>>> access
> >> >> >> >>>>>>> to it.
> >> >> >> >>>>>>>
> >> >> >> >>>>>>> Cheers,
> >> >> >> >>>>>>> Jason
> >> >> >> >>>>>>>
> >> >> >> >>>>>>> On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής <
> >> >> >> >>>>>>> ntampourat...@ece.auth.gr> wrote:
> >> >> >> >>>>>>>
> >> >> >> >>>>>>>> Dear Bobby,
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>> 1) I use the original riscv-fs.py which is provided in
> the
> >> >> latest
> >> >> >> >>>> gem5
> >> >> >> >>>>>>>> release.
> >> >> >> >>>>>>>> I run the gem5 once (./build/RISCV/gem5.fast -d
> >> >> ./HPCG_FS_results
> >> >> >> >>>>>>>> ./configs/example/gem5_library/riscv-fs.py) in order to
> >> >> download
> >> >> >> the
> >> >> >> >>>>>>>> riscv-bootloader-vmlinux-5.10 and riscv-disk-img.
> >> >> >> >>>>>>>> After this I mount the riscv-disk-img (sudo mount -o loop
> >> >> >> >>>>>>>> riscv-disk-img /mnt), put the xhpcg executable and I do
> the
> >> >> >> following
> >> >> >> >>>>>>>> changes in riscv-fs.py to boot the riscv-disk-img with
> >> >> executable:
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>> image = CustomDiskImageResource(
> >> >> >> >>>>>>>>      local_path =
> > "/home/cossim/.cache/gem5/riscv-disk-img",
> >> >> >> >>>>>>>> )
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>> # Set the Full System workload.
> >> >> >> >>>>>>>> board.set_kernel_disk_workload(
> >> >> >> >>>>>>>>
> >> >> >>  kernel=Resource("riscv-bootloader-vmlinux-5.10"),
> >> >> >> >>>>>>>>                     disk_image=image,
> >> >> >> >>>>>>>> )
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>> Finally, in the
> >> >> >> gem5/src/python/gem5/components/boards/riscv_board.py
> >> >> >> >>>>>>>> I change the last line to "return ["console=ttyS0",
> >> >> >> >>>>>>>> "root={root_value}", "rw"]" in order to allow the write
> >> >> >> permissions
> >> >> >> >>>> in
> >> >> >> >>>>>>>> the image.
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>> 2) The HPCG benchmark after some iterations calculates if
> > the
> >> >> >> results
> >> >> >> >>>>>>>> are valid or not valid. In the case of FS it gives
> invalid
> >> >> >> results.
> >> >> >> >>>> As
> >> >> >> >>>>>>>> I see from the results, one (at least) problem is that
> > produces
> >> >> >> >>>>>>>> different results in each HPCG execution (with the same
> >> >> >> >>>> configuration).
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>> Here is the HPCG output and riscv-fs.py
> >> >> >> >>>>>>>> (http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may
> >> >> reproduce
> >> >> >> the
> >> >> >> >>>>>>>> results in the video if you use the xhpcg executable
> >> >> >> >>>>>>>> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/)
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>> Please help me in order to solve it!
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>> Finally, I get invalid results in the HPL benchmark in FS
> > mode
> >> >> >> too.
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>> Best regards,
> >> >> >> >>>>>>>> Nikos
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>> Quoting Bobby Bruce <bbr...@ucdavis.edu>:
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>> > I'm going to need a bit more information to help:
> >> >> >> >>>>>>>> >
> >> >> >> >>>>>>>> > 1. In what way have you modified
> >> >> >> >>>>>>>> > ./configs/example/gem5_library/riscv-fs.py? Can you
> > attach
> >> >> the
> >> >> >> >>>> script
> >> >> >> >>>>>>>> here?
> >> >> >> >>>>>>>> > 2. What error are you getting or in what way are the
> > results
> >> >> >> >>>> invalid?
> >> >> >> >>>>>>>> >
> >> >> >> >>>>>>>> > -
> >> >> >> >>>>>>>> > Dr. Bobby R. Bruce
> >> >> >> >>>>>>>> > Room 3050,
> >> >> >> >>>>>>>> > Kemper Hall, UC Davis
> >> >> >> >>>>>>>> > Davis,
> >> >> >> >>>>>>>> > CA, 95616
> >> >> >> >>>>>>>> >
> >> >> >> >>>>>>>> > web: https://www.bobbybruce.net
> >> >> >> >>>>>>>> >
> >> >> >> >>>>>>>> >
> >> >> >> >>>>>>>> > On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής <
> >> >> >> >>>>>>>> > ntampourat...@ece.auth.gr> wrote:
> >> >> >> >>>>>>>> >
> >> >> >> >>>>>>>> >>
> >> >> >> >>>>>>>> >> Dear gem5 community,
> >> >> >> >>>>>>>> >>
> >> >> >> >>>>>>>> >> I have successfully cross-compile the HPCG benchmark
> for
> >> >> RISCV
> >> >> >> >>>>>> (Serial
> >> >> >> >>>>>>>> >> version, without MPI and OpenMP). While it working
> > properly
> >> >> in
> >> >> >> >>>> gem5
> >> >> >> >>>>>> SE
> >> >> >> >>>>>>>> >> mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results
> >> >> >> >>>>>>>> >> ./configs/example/se.py -c xhpcg --options '--nx=16
> > --ny=16
> >> >> >> >>>> --nz=16
> >> >> >> >>>>>>>> >> --npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid
> > results
> >> >> in FS
> >> >> >> >>>>>>>> >> simulation using "./build/RISCV/gem5.fast -d
> >> >> ./HPCG_FS_results
> >> >> >> >>>>>>>> >> ./configs/example/gem5_library/riscv-fs.py" (I mount
> the
> >> >> riscv
> >> >> >> >>>> image
> >> >> >> >>>>>>>> >> and put it).
> >> >> >> >>>>>>>> >>
> >> >> >> >>>>>>>> >> Can you help me please?
> >> >> >> >>>>>>>> >>
> >> >> >> >>>>>>>> >> In addition, I used the RISCV Ubuntu image
> >> >> >> >>>>>>>> >> (
> >> >> >> >>>>
> >> >> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu
> >> >> >> >>>>>> ),
> >> >> >> >>>>>>>> >> I installed the gcc compiler, compile it (through
> qemu)
> > and
> >> >> I
> >> >> >> get
> >> >> >> >>>>>>>> >> wrong results too.
> >> >> >> >>>>>>>> >>
> >> >> >> >>>>>>>> >> Here is the Makefile which I use, the hpcg executable
> > for
> >> >> RISCV
> >> >> >> >>>>>>>> >> (xhpcg), and a video that shows the results
> >> >> >> >>>>>>>> >> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/).
> >> >> >> >>>>>>>> >>
> >> >> >> >>>>>>>> >> P.S. I use the latest gem5 version.
> >> >> >> >>>>>>>> >>
> >> >> >> >>>>>>>> >> Thank you in advance! :)
> >> >> >> >>>>>>>> >>
> >> >> >> >>>>>>>> >> Best regards,
> >> >> >> >>>>>>>> >> Nikos
> >> >> >> >>>>>>>> >> _______________________________________________
> >> >> >> >>>>>>>> >> gem5-users mailing list -- gem5-users@gem5.org
> >> >> >> >>>>>>>> >> To unsubscribe send an email to
> > gem5-users-le...@gem5.org
> >> >> >> >>>>>>>> >>
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>> _______________________________________________
> >> >> >> >>>>>>>> gem5-users mailing list -- gem5-users@gem5.org
> >> >> >> >>>>>>>> To unsubscribe send an email to
> gem5-users-le...@gem5.org
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>
> >> >> >> >>>>>>
> >> >> >> >>>>>> _______________________________________________
> >> >> >> >>>>>> gem5-users mailing list -- gem5-users@gem5.org
> >> >> >> >>>>>> To unsubscribe send an email to gem5-users-le...@gem5.org
> >> >> >> >>>>>>
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>> _______________________________________________
> >> >> >> >>>> gem5-users mailing list -- gem5-users@gem5.org
> >> >> >> >>>> To unsubscribe send an email to gem5-users-le...@gem5.org
> >> >> >> >>>>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> _______________________________________________
> >> >> >> >> gem5-users mailing list -- gem5-users@gem5.org
> >> >> >> >> To unsubscribe send an email to gem5-users-le...@gem5.org
> >> >> >> >
> >> >> >> >
> >> >> >> > _______________________________________________
> >> >> >> > gem5-users mailing list -- gem5-users@gem5.org
> >> >> >> > To unsubscribe send an email to gem5-users-le...@gem5.org
> >> >> >>
> >> >> >>
> >> >> >> _______________________________________________
> >> >> >> gem5-users mailing list -- gem5-users@gem5.org
> >> >> >> To unsubscribe send an email to gem5-users-le...@gem5.org
> >> >> >>
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> gem5-users mailing list -- gem5-users@gem5.org
> >> >> To unsubscribe send an email to gem5-users-le...@gem5.org
> >> >>
> >>
> >>
> >> _______________________________________________
> >> gem5-users mailing list -- gem5-users@gem5.org
> >> To unsubscribe send an email to gem5-users-le...@gem5.org
>
>
> _______________________________________________
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
>
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

Reply via email to