Dear Bobby, Jason, all,
Is there any update about the accuracy of RISC-V FS?
Best regards,
Nikos
Quoting Bobby Bruce <bbr...@ucdavis.edu>:
> Jason and I had a theory that this may be due to the "Rounding Mode" for
> floating pointing being set incorrectly in FS mode. That's set via a
macro
> here:
>
https://gem5.googlesource.com/public/gem5/+/refs/tags/v22.0.0.2/src/arch/riscv/fp_inst.hh#36
>
> I manually expanded the macro here:
>
https://gem5.googlesource.com/public/gem5/+/refs/tags/v22.0.0.2/src/arch/riscv/isa/decoder.isa#1495
,
> inside the "fsqrt_d" definition then compiled "build/ALL/gem5.debug".
Then
> used gdb to add a breakpoint in the "Fsqrt_d::execute" function (in the
> generated "build/ALL/arch/riscv/generated/exec-ns.cc.inc" file).
>
> ```
> gdb build/ALL/gem5.opt
> break Fsqrt_d::execute
> run bug-recreation/se-mode-run.py # or `run
bug-recreation/fs-mode-run.py`
> ```
>
> Stepping through with gdb I the rounding mode is `0` for SE mode and `0`
> for FS mode as well. So, no luck with that theory.
>
> My new theory is that this bug has something to do with thread context
> switching being implemented incorrectly in RISC-V somehow. I find it
> strange that the sqrt(1) works fine for a while (i.e. returns `1`) then
> suddenly starts returning zero after a certain point in the execution. In
> addition, it's odd that the loop is not returning the same value each
time
> despite executing the same code. It'd make sense to me that the thread is
> being stored and then resumed with some corruption of the floating point
> data. This would also explain why this bug only occurs in FS mode.
>
> I'll try to find time to figure out a good test for this. If anyone has
any
> other theories or ideas then let me know.
>
> --
> Dr. Bobby R. Bruce
> Room 3050,
> Kemper Hall, UC Davis
> Davis,
> CA, 95616
>
> web: https://www.bobbybruce.net
>
>
> On Fri, Oct 7, 2022 at 12:50 PM Νικόλαος Ταμπουρατζής <
> ntampourat...@ece.auth.gr> wrote:
>>
>> Dear Jason & Boddy,
>>
>> Unfortunately, I have tried my simple example without the sqrt
>> function and the problem remains. Specifically, I have the following
>> simple code:
>>
>>
>> #include <cmath>
>> #include <stdio.h>
>>
>> int main(){
>>
>> int dim = 1024;
>>
>> double result;
>>
>> for (int iter = 0; iter < 2; iter++){
>> result = 0;
>> for (int i = 0; i < dim; i++){
>> for (int j = 0; j < dim; j++){
>> result += i * j;
>> }
>> }
>> printf("Final Result: %lf\n", result);
>> }
>> }
>>
>>
>> In the above code, the correct result is 274341298176.000000 (from
>> RISCV-SE mode and x86), while in FS mode I get sometimes the correct
>> result and other times a different number.
>>
>> Best regards,
>> Nikos
>>
>>
>> Quoting Jason Lowe-Power <ja...@lowepower.com>:
>>
>> > I have an idea...
>> >
>> > Have you put a breakpoint in the implementation of the fsqrt_d
> function? I
>> > would like to know if when running in SE mode and running in FS mode
we
> are
>> > using the same rounding mode. My hypothesis is that in FS mode the
> rounding
>> > mode is set differently.
>> >
>> > Cheers,
>> > Jason
>> >
>> > On Fri, Oct 7, 2022 at 12:15 AM Νικόλαος Ταμπουρατζής <
>> > ntampourat...@ece.auth.gr> wrote:
>> >
>> >> Dear Boddy,
>> >>
>> >> Thanks a lot for the effort! I looked in detail and I observe that
the
>> >> problem is created only using float and double variables (in the case
>> >> of int it is working properly in FS mode). Specifically, in the case
>> >> of float the variables are set to "nan", while in the case of double
>> >> the variables are set to 0.000000 (in random time - probably from
some
>> >> instruction of simulated OS?). You may use a simple c/c++ example in
>> >> order to get some traces before going to HPCG...
>> >>
>> >> Thank you in advance!!
>> >> Best regards,
>> >> Nikos
>> >>
>> >>
>> >> Quoting Bobby Bruce <bbr...@ucdavis.edu>:
>> >>
>> >> > Hey Niko,
>> >> >
>> >> > Thanks for this analysis. I jumped a little into this today but
> didn't
>> >> get
>> >> > as far as you did. I wanted to find a quick way to recreate the
>> >> following:
>> >> > https://gem5-review.googlesource.com/c/public/gem5/+/64211.
Please
> feel
>> >> > free to use this, if it helps any.
>> >> >
>> >> > It's very strange to me that this bug hasn't manifested itself
> before but
>> >> > it's undeniably there. I'll try to spend more time looking at this
>> >> tomorrow
>> >> > with some traces and debug flags and see if I can narrow down the
>> >> problem.
>> >> >
>> >> > --
>> >> > Dr. Bobby R. Bruce
>> >> > Room 3050,
>> >> > Kemper Hall, UC Davis
>> >> > Davis,
>> >> > CA, 95616
>> >> >
>> >> > web: https://www.bobbybruce.net
>> >> >
>> >> >
>> >> > On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής <
>> >> > ntampourat...@ece.auth.gr> wrote:
>> >> >
>> >> >> In my previous results, I had used double (not float) for the
>> >> >> following variables: result, sq_i and sq_j. In the case of float
>> >> >> instead of double I get "nan" and not 0.000000.
>> >> >>
>> >> >> Quoting Νικόλαος Ταμπουρατζής <ntampourat...@ece.auth.gr>:
>> >> >>
>> >> >> > Dear Jason, all,
>> >> >> >
>> >> >> > I am trying to find the accuracy problem with RISCV-FS and I
> observe
>> >> >> > that the problem is created (at least in my dummy example)
because
>> >> >> > the variables (double) are set to zero in random simulated time
> (for
>> >> >> > this reason I get different results among executions of the same
>> >> >> > code). Specifically for the following dummy code:
>> >> >> >
>> >> >> >
>> >> >> > #include <cmath>
>> >> >> > #include <stdio.h>
>> >> >> >
>> >> >> > int main(){
>> >> >> >
>> >> >> > int dim = 10;
>> >> >> >
>> >> >> > float result;
>> >> >> >
>> >> >> > for (int iter = 0; iter < 2; iter++){
>> >> >> > result = 0;
>> >> >> > for (int i = 0; i < dim; i++){
>> >> >> > for (int j = 0; j < dim; j++){
>> >> >> > float sq_i = sqrt(i);
>> >> >> > float sq_j = sqrt(j);
>> >> >> > result += sq_i * sq_j;
>> >> >> > printf("ITER: %d | i: %d | j: %d Result(i: %f |
j:
>> >> >> > %f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j,
> result);
>> >> >> > }
>> >> >> > }
>> >> >> > printf("Final Result: %lf\n", result);
>> >> >> > }
>> >> >> > }
>> >> >> >
>> >> >> >
>> >> >> > The correct Final Result in both iterations is 372.721656.
> However,
>> >> >> > I get the following results in FS:
>> >> >> >
>> >> >> > ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j:
>> >> >> > 1.000000): 1.000000
>> >> >> > ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j:
>> >> >> > 1.414214): 2.414214
>> >> >> > ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j:
>> >> >> > 1.732051): 4.146264
>> >> >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j:
>> >> >> > 1.414214): 1.414214
>> >> >> > ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j:
>> >> >> > 2.000000): 3.414214
>> >> >> > ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j:
>> >> >> > 2.449490): 5.863703
>> >> >> > ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j:
>> >> >> > 2.828427): 8.692130
>> >> >> > ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j:
>> >> >> > 3.162278): 11.854408
>> >> >> > ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j:
>> >> >> > 3.464102): 15.318510
>> >> >> > ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j:
>> >> >> > 3.741657): 19.060167
>> >> >> > ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j:
>> >> >> > 4.000000): 23.060167
>> >> >> > ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j:
>> >> >> > 4.242641): 27.302808
>> >> >> > ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 27.302808
>> >> >> > ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j:
>> >> >> > 1.732051): 29.034859
>> >> >> > ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j:
>> >> >> > 2.449490): 31.484348
>> >> >> > ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j:
>> >> >> > 3.000000): 34.484348
>> >> >> > ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j:
>> >> >> > 3.464102): 37.948450
>> >> >> > ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j:
>> >> >> > 3.872983): 41.821433
>> >> >> > ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j:
>> >> >> > 4.242641): 46.064074
>> >> >> > ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j:
>> >> >> > 4.582576): 50.646650
>> >> >> > ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j:
>> >> >> > 4.898979): 55.545629
>> >> >> > ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j:
>> >> >> > 5.196152): 60.741782
>> >> >> > ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 60.741782
>> >> >> > ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j:
>> >> >> > 2.000000): 62.741782
>> >> >> > ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j:
>> >> >> > 2.828427): 65.570209
>> >> >> > ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j:
>> >> >> > 3.464102): 69.034310
>> >> >> > ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j:
>> >> >> > 4.000000): 73.034310
>> >> >> > ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j:
>> >> >> > 4.472136): 77.506446
>> >> >> > ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j:
>> >> >> > 4.898979): 82.405426
>> >> >> > ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j:
>> >> >> > 5.291503): 87.696928
>> >> >> > ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j:
>> >> >> > 5.656854): 93.353783
>> >> >> > ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j:
>> >> >> > 6.000000): 99.353783
>> >> >> > ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 99.353783
>> >> >> > ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j:
>> >> >> > 2.236068): 101.589851
>> >> >> > ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j:
>> >> >> > 3.162278): 104.752128
>> >> >> > ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j:
>> >> >> > 3.872983): 108.625112
>> >> >> > ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j:
>> >> >> > 4.472136): 113.097248
>> >> >> > ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j:
>> >> >> > 5.000000): 118.097248
>> >> >> > ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j:
>> >> >> > 5.477226): 123.574473
>> >> >> > ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j:
>> >> >> > 5.916080): 129.490553
>> >> >> > ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j:
>> >> >> > 6.324555): 135.815108
>> >> >> > ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j:
>> >> >> > 6.708204): 142.523312
>> >> >> > ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 142.523312
>> >> >> > ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j:
>> >> >> > 2.449490): 144.972802
>> >> >> > ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j:
>> >> >> > 3.464102): 148.436904
>> >> >> > ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j:
>> >> >> > 4.242641): 152.679544
>> >> >> > ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j:
>> >> >> > 4.898979): 157.578524
>> >> >> > ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j:
>> >> >> > 5.477226): 163.055749
>> >> >> > ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j:
>> >> >> > 6.000000): 169.055749
>> >> >> > ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j:
>> >> >> > 6.480741): 175.536490
>> >> >> > ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j:
>> >> >> > 6.928203): 182.464693
>> >> >> > ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j:
>> >> >> > 7.348469): 189.813162
>> >> >> > ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 189.813162
>> >> >> > ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j:
>> >> >> > 2.645751): 192.458914
>> >> >> > ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j:
>> >> >> > 3.741657): 196.200571
>> >> >> > ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j:
>> >> >> > 4.582576): 200.783147
>> >> >> > ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j:
>> >> >> > 5.291503): 206.074649
>> >> >> > ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j:
>> >> >> > 5.916080): 211.990729
>> >> >> > ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j:
>> >> >> > 6.480741): 218.471470
>> >> >> > ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j:
>> >> >> > 7.000000): 225.471470
>> >> >> > ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j:
>> >> >> > 7.483315): 232.954785
>> >> >> > ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j:
>> >> >> > 7.937254): 240.892039
>> >> >> > ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 240.892039
>> >> >> > ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j:
>> >> >> > 2.828427): 243.720466
>> >> >> > ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j:
>> >> >> > 4.000000): 247.720466
>> >> >> > ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j:
>> >> >> > 4.898979): 252.619445
>> >> >> > ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j:
>> >> >> > 5.656854): 258.276300
>> >> >> > ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j:
>> >> >> > 6.324555): 264.600855
>> >> >> > ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j:
>> >> >> > 6.928203): 271.529058
>> >> >> > ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j:
>> >> >> > 7.483315): 279.012373
>> >> >> > ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j:
>> >> >> > 8.000000): 287.012373
>> >> >> > ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j:
>> >> >> > 8.485281): 295.497654
>> >> >> > ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 295.497654
>> >> >> > ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j:
>> >> >> > 3.000000): 298.497654
>> >> >> > ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j:
>> >> >> > 4.242641): 302.740295
>> >> >> > ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j:
>> >> >> > 5.196152): 307.936447
>> >> >> > ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j:
>> >> >> > 6.000000): 313.936447
>> >> >> > ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j:
>> >> >> > 6.708204): 320.644651
>> >> >> > ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j:
>> >> >> > 7.348469): 327.993120
>> >> >> > ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j:
>> >> >> > 7.937254): 335.930374
>> >> >> > ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j:
>> >> >> > 8.485281): 344.415656
>> >> >> > ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j:
>> >> >> > 9.000000): 353.415656
>> >> >> > Final Result: 353.415656
>> >> >> > ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j:
>> >> >> > 1.000000): 1.000000
>> >> >> > ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j:
>> >> >> > 1.414214): 2.414214
>> >> >> > ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j:
>> >> >> > 1.732051): 4.146264
>> >> >> > ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | i*j:
>> >> >> > 2.000000): 6.146264
>> >> >> > ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i*j:
>> >> >> > 2.236068): 8.382332
>> >> >> > ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | i*j:
>> >> >> > 2.449490): 10.831822
>> >> >> > ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i*j:
>> >> >> > 2.645751): 13.477573
>> >> >> > ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | i*j:
>> >> >> > 2.828427): 16.306001
>> >> >> > ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i*j:
>> >> >> > 3.000000): 19.306001
>> >> >> > ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 19.306001
>> >> >> > ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j:
>> >> >> > 1.414214): 20.720214
>> >> >> > ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j:
>> >> >> > 2.000000): 22.720214
>> >> >> > ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j:
>> >> >> > 2.449490): 25.169704
>> >> >> > ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j:
>> >> >> > 2.828427): 27.998131
>> >> >> > ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j:
>> >> >> > 3.162278): 31.160409
>> >> >> > ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j:
>> >> >> > 3.464102): 34.624510
>> >> >> > ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j:
>> >> >> > 3.741657): 38.366168
>> >> >> > ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j:
>> >> >> > 4.000000): 42.366168
>> >> >> > ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j:
>> >> >> > 4.242641): 46.608808
>> >> >> > ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 46.608808
>> >> >> > ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j:
>> >> >> > 1.732051): 48.340859
>> >> >> > ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j:
>> >> >> > 2.449490): 50.790349
>> >> >> > ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j:
>> >> >> > 3.000000): 53.790349
>> >> >> > ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j:
>> >> >> > 3.464102): 57.254450
>> >> >> > ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j:
>> >> >> > 3.872983): 61.127434
>> >> >> > ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j:
>> >> >> > 4.242641): 65.370075
>> >> >> > ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j:
>> >> >> > 4.582576): 69.952650
>> >> >> > ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j:
>> >> >> > 4.898979): 74.851630
>> >> >> > ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j:
>> >> >> > 5.196152): 80.047782
>> >> >> > ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 80.047782
>> >> >> > ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j:
>> >> >> > 2.000000): 82.047782
>> >> >> > ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j:
>> >> >> > 2.828427): 84.876209
>> >> >> > ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j:
>> >> >> > 3.464102): 88.340311
>> >> >> > ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j:
>> >> >> > 4.000000): 92.340311
>> >> >> > ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j:
>> >> >> > 4.472136): 96.812447
>> >> >> > ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j:
>> >> >> > 4.898979): 101.711426
>> >> >> > ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j:
>> >> >> > 5.291503): 107.002929
>> >> >> > ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j:
>> >> >> > 5.656854): 112.659783
>> >> >> > ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j:
>> >> >> > 6.000000): 118.659783
>> >> >> > ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 118.659783
>> >> >> > ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j:
>> >> >> > 2.236068): 120.895851
>> >> >> > ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j:
>> >> >> > 3.162278): 124.058129
>> >> >> > ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j:
>> >> >> > 3.872983): 127.931112
>> >> >> > ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j:
>> >> >> > 4.472136): 132.403248
>> >> >> > ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j:
>> >> >> > 5.000000): 137.403248
>> >> >> > ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j:
>> >> >> > 5.477226): 142.880474
>> >> >> > ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j:
>> >> >> > 5.916080): 148.796553
>> >> >> > ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j:
>> >> >> > 6.324555): 155.121109
>> >> >> > ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j:
>> >> >> > 6.708204): 161.829313
>> >> >> > ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 161.829313
>> >> >> > ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j:
>> >> >> > 2.449490): 164.278802
>> >> >> > ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j:
>> >> >> > 3.464102): 167.742904
>> >> >> > ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j:
>> >> >> > 4.242641): 171.985545
>> >> >> > ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j:
>> >> >> > 4.898979): 176.884524
>> >> >> > ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j:
>> >> >> > 5.477226): 182.361750
>> >> >> > ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j:
>> >> >> > 6.000000): 188.361750
>> >> >> > ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j:
>> >> >> > 6.480741): 194.842491
>> >> >> > ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j:
>> >> >> > 6.928203): 201.770694
>> >> >> > ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j:
>> >> >> > 7.348469): 209.119163
>> >> >> > ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 209.119163
>> >> >> > ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j:
>> >> >> > 2.645751): 211.764914
>> >> >> > ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j:
>> >> >> > 3.741657): 215.506572
>> >> >> > ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j:
>> >> >> > 4.582576): 220.089147
>> >> >> > ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j:
>> >> >> > 5.291503): 225.380650
>> >> >> > ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j:
>> >> >> > 5.916080): 231.296730
>> >> >> > ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j:
>> >> >> > 6.480741): 237.777470
>> >> >> > ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j:
>> >> >> > 7.000000): 244.777470
>> >> >> > ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j:
>> >> >> > 7.483315): 252.260785
>> >> >> > ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j:
>> >> >> > 7.937254): 260.198039
>> >> >> > ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 260.198039
>> >> >> > ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j:
>> >> >> > 2.828427): 263.026466
>> >> >> > ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j:
>> >> >> > 4.000000): 267.026466
>> >> >> > ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j:
>> >> >> > 4.898979): 271.925446
>> >> >> > ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j:
>> >> >> > 5.656854): 277.582300
>> >> >> > ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j:
>> >> >> > 6.324555): 283.906855
>> >> >> > ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j:
>> >> >> > 6.928203): 290.835059
>> >> >> > ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j:
>> >> >> > 7.483315): 298.318373
>> >> >> > ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j:
>> >> >> > 8.000000): 306.318373
>> >> >> > ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j:
>> >> >> > 8.485281): 314.803655
>> >> >> > ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j:
>> >> >> > 0.000000): 314.803655
>> >> >> > ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j:
>> >> >> > 3.000000): 317.803655
>> >> >> > ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j:
>> >> >> > 4.242641): 322.046295
>> >> >> > ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j:
>> >> >> > 5.196152): 327.242448
>> >> >> > ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j:
>> >> >> > 6.000000): 333.242448
>> >> >> > ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j:
>> >> >> > 6.708204): 339.950652
>> >> >> > ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j:
>> >> >> > 7.348469): 347.299121
>> >> >> > ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j:
>> >> >> > 7.937254): 355.236375
>> >> >> > ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j:
>> >> >> > 8.485281): 363.721656
>> >> >> > ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j:
>> >> >> > 9.000000): 372.721656
>> >> >> > Final Result: 372.721656
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > As we can see in the following iterations the sqrt(1) as well as
> the
>> >> >> > result is set to zero for some reason.
>> >> >> >
>> >> >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
>> >> >> > 0.000000): 0.000000
>> >> >> >
>> >> >> > Please help me to resolve the accuracy issue! I think that it
will
>> >> >> > be very useful for gem5 community.
>> >> >> >
>> >> >> > To be noticed, I find the correct simulated tick in which the
>> >> >> > application started in FS (using m5 dumpstats), and I start the
>> >> >> > --debug-start, but the trace file which is generated is 10x
larger
>> >> >> > than SE mode for the same application. How can I compare them?
>> >> >> >
>> >> >> > Thank you in advance!
>> >> >> > Best regards,
>> >> >> > Nikos
>> >> >> >
>> >> >> > Quoting Νικόλαος Ταμπουρατζής <ntampourat...@ece.auth.gr>:
>> >> >> >
>> >> >> >> Dear Jason,
>> >> >> >>
>> >> >> >> I am trying to use --debug-start but in FS mode it is very
>> >> >> >> difficult to find the tick on which the application is started!
>> >> >> >>
>> >> >> >> However, I am writing the following very simple c++ program:
>> >> >> >>
>> >> >> >> #include <cmath>
>> >> >> >> #include <stdio.h>
>> >> >> >>
>> >> >> >> int main(){
>> >> >> >>
>> >> >> >> int dim = 4096;
>> >> >> >>
>> >> >> >> double result;
>> >> >> >>
>> >> >> >> for (int iter = 0; iter < 2; iter++){
>> >> >> >> result = 0;
>> >> >> >> for (int i = 0; i < dim; i++){
>> >> >> >> for (int j = 0; j < dim; j++){
>> >> >> >> result += sqrt(i) * sqrt(j);
>> >> >> >> }
>> >> >> >> }
>> >> >> >> printf("Result: %lf\n", result); //Result:
> 30530733453.127449
>> >> >> >> }
>> >> >> >> }
>> >> >> >>
>> >> >> >> I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o
>> >> >> >> test_riscv test_riscv.cpp
>> >> >> >>
>> >> >> >>
>> >> >> >> While in X86 (without cross-compilation of course), QEMU-RISCV,
>> >> >> >> GEM5-SE the result is the same (30530733453.127449), in GEM5-FS
> the
>> >> >> >> result is different! In addition, the result is also different
>> >> >> >> between the 2 iterations.
>> >> >> >>
>> >> >> >> Please reproduce the error if you want in order to verify my
> result.
>> >> >> >> Ηow can the issue be resolved?
>> >> >> >>
>> >> >> >> Thank you in advance!
>> >> >> >>
>> >> >> >> Best regards,
>> >> >> >> Nikos
>> >> >> >>
>> >> >> >>
>> >> >> >> Quoting Jason Lowe-Power <ja...@lowepower.com>:
>> >> >> >>
>> >> >> >>> Hi Nikos,
>> >> >> >>>
>> >> >> >>> You can use --debug-start to start the debugging after some
> number
>> >> of
>> >> >> >>> ticks. Also, I would expect that the difference should come up
>> >> >> quickly, so
>> >> >> >>> no need to run the program to the end.
>> >> >> >>>
>> >> >> >>> For the FS mode one, you will want to just start the trace as
> the
>> >> >> >>> application starts. This could be a bit of a pain.
>> >> >> >>>
>> >> >> >>> I'm not really sure what fundamentally could be different. FS
> and SE
>> >> >> mode
>> >> >> >>> use the exact same code for executing instructions, so I don't
> think
>> >> >> that's
>> >> >> >>> the problem. Have you tried running for smaller inputs or just
> one
>> >> >> >>> iteration?
>> >> >> >>>
>> >> >> >>> Jason
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής <
>> >> >> >>> ntampourat...@ece.auth.gr> wrote:
>> >> >> >>>
>> >> >> >>>> Dear Bobby,
>> >> >> >>>>
>> >> >> >>>> Iam trying to add --debug-flags=Exec (building the gem5 for
>> >> gem5.opt
>> >> >> >>>> not for gem5.fast which I had) but the debug traces exceed
the
> 20GB
>> >> >> >>>> (and it is not finished yet) for less than 1 simulated
second.
> How
>> >> can
>> >> >> >>>> I reduce the size of the debug-flags (or set something more
>> >> specific)?
>> >> >> >>>>
>> >> >> >>>> In contrast I build the HPCG benchmark with DHPCG_DEBUG flag.
> If
>> >> you
>> >> >> >>>> want, you can compare these two output files
>> >> >> >>>> (hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode).
As
> you
>> >> can
>> >> >> >>>> see, something goes wrong with the accuracy of calculations
in
> FS
>> >> mode
>> >> >> >>>> (benchmark uses double precission). You can find the files
> here:
>> >> >> >>>> http://kition.mhl.tuc.gr:8000/d/68d82f3533/
>> >> >> >>>>
>> >> >> >>>> Best regards,
>> >> >> >>>> Nikos
>> >> >> >>>>
>> >> >> >>>> Quoting Jason Lowe-Power <ja...@lowepower.com>:
>> >> >> >>>>
>> >> >> >>>>> That's quite odd that it works in SE mode but not FS mode!
>> >> >> >>>>>
>> >> >> >>>>> I would suggest running with --debug-flags=Exec for both and
> then
>> >> >> >>>> perform a
>> >> >> >>>>> diff to see how they differ.
>> >> >> >>>>>
>> >> >> >>>>> Cheers,
>> >> >> >>>>> Jason
>> >> >> >>>>>
>> >> >> >>>>> On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής <
>> >> >> >>>>> ntampourat...@ece.auth.gr> wrote:
>> >> >> >>>>>
>> >> >> >>>>>> Dear Bobby,
>> >> >> >>>>>>
>> >> >> >>>>>> In QEMU I get the same (correct) results that I get in SE
> mode
>> >> >> >>>>>> simulation. I get invalid results in FS simulation (in both
>> >> >> >>>>>> riscv-fs.py and riscv-ubuntu-run.py). I cannot access real
> RISCV
>> >> >> >>>>>> hardware at this moment, however, if you want you may
> execute my
>> >> >> xhpcg
>> >> >> >>>>>> binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with
the
>> >> >> >>>>>> following configuration:
>> >> >> >>>>>>
>> >> >> >>>>>> ./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1
> --rt=0.1
>> >> >> >>>>>>
>> >> >> >>>>>> Please let me know if you have any updates!
>> >> >> >>>>>>
>> >> >> >>>>>> Best regards,
>> >> >> >>>>>> Nikos
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>> Quoting Jason Lowe-Power <ja...@lowepower.com>:
>> >> >> >>>>>>
>> >> >> >>>>>>> Hi Nikos,
>> >> >> >>>>>>>
>> >> >> >>>>>>> I notice you said the following in your original email:
>> >> >> >>>>>>>
>> >> >> >>>>>>> In addition, I used the RISCV Ubuntu image
>> >> >> >>>>>>>> (
>> >> >>
https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu
>> >> >> >>>> ),
>> >> >> >>>>>>>> I installed the gcc compiler, compile it (through qemu)
> and I
>> >> get
>> >> >> >>>>>>>> wrong results too.
>> >> >> >>>>>>>
>> >> >> >>>>>>>
>> >> >> >>>>>>> Is this saying you get the wrong results is QEMU? If so,
> the bug
>> >> >> is in
>> >> >> >>>>>> GCC
>> >> >> >>>>>>> or the HPCG workload, not in gem5. If not, I would test in
> QEMU
>> >> to
>> >> >> >>>> make
>> >> >> >>>>>>> sure the binary works there. Another way you could test to
> see
>> >> if
>> >> >> the
>> >> >> >>>>>>> problem is your binary or gem5 would be to run it on real
>> >> >> hardware. We
>> >> >> >>>>>> have
>> >> >> >>>>>>> access to some RISC-V hardware here at UC Davis, if you
> don't
>> >> have
>> >> >> >>>> access
>> >> >> >>>>>>> to it.
>> >> >> >>>>>>>
>> >> >> >>>>>>> Cheers,
>> >> >> >>>>>>> Jason
>> >> >> >>>>>>>
>> >> >> >>>>>>> On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής <
>> >> >> >>>>>>> ntampourat...@ece.auth.gr> wrote:
>> >> >> >>>>>>>
>> >> >> >>>>>>>> Dear Bobby,
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> 1) I use the original riscv-fs.py which is provided in
the
>> >> latest
>> >> >> >>>> gem5
>> >> >> >>>>>>>> release.
>> >> >> >>>>>>>> I run the gem5 once (./build/RISCV/gem5.fast -d
>> >> ./HPCG_FS_results
>> >> >> >>>>>>>> ./configs/example/gem5_library/riscv-fs.py) in order to
>> >> download
>> >> >> the
>> >> >> >>>>>>>> riscv-bootloader-vmlinux-5.10 and riscv-disk-img.
>> >> >> >>>>>>>> After this I mount the riscv-disk-img (sudo mount -o loop
>> >> >> >>>>>>>> riscv-disk-img /mnt), put the xhpcg executable and I do
the
>> >> >> following
>> >> >> >>>>>>>> changes in riscv-fs.py to boot the riscv-disk-img with
>> >> executable:
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> image = CustomDiskImageResource(
>> >> >> >>>>>>>> local_path =
> "/home/cossim/.cache/gem5/riscv-disk-img",
>> >> >> >>>>>>>> )
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> # Set the Full System workload.
>> >> >> >>>>>>>> board.set_kernel_disk_workload(
>> >> >> >>>>>>>>
>> >> >> kernel=Resource("riscv-bootloader-vmlinux-5.10"),
>> >> >> >>>>>>>> disk_image=image,
>> >> >> >>>>>>>> )
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> Finally, in the
>> >> >> gem5/src/python/gem5/components/boards/riscv_board.py
>> >> >> >>>>>>>> I change the last line to "return ["console=ttyS0",
>> >> >> >>>>>>>> "root={root_value}", "rw"]" in order to allow the write
>> >> >> permissions
>> >> >> >>>> in
>> >> >> >>>>>>>> the image.
>> >> >> >>>>>>>>
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> 2) The HPCG benchmark after some iterations calculates if
> the
>> >> >> results
>> >> >> >>>>>>>> are valid or not valid. In the case of FS it gives
invalid
>> >> >> results.
>> >> >> >>>> As
>> >> >> >>>>>>>> I see from the results, one (at least) problem is that
> produces
>> >> >> >>>>>>>> different results in each HPCG execution (with the same
>> >> >> >>>> configuration).
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> Here is the HPCG output and riscv-fs.py
>> >> >> >>>>>>>> (http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may
>> >> reproduce
>> >> >> the
>> >> >> >>>>>>>> results in the video if you use the xhpcg executable
>> >> >> >>>>>>>> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/)
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> Please help me in order to solve it!
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> Finally, I get invalid results in the HPL benchmark in FS
> mode
>> >> >> too.
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> Best regards,
>> >> >> >>>>>>>> Nikos
>> >> >> >>>>>>>>
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> Quoting Bobby Bruce <bbr...@ucdavis.edu>:
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> > I'm going to need a bit more information to help:
>> >> >> >>>>>>>> >
>> >> >> >>>>>>>> > 1. In what way have you modified
>> >> >> >>>>>>>> > ./configs/example/gem5_library/riscv-fs.py? Can you
> attach
>> >> the
>> >> >> >>>> script
>> >> >> >>>>>>>> here?
>> >> >> >>>>>>>> > 2. What error are you getting or in what way are the
> results
>> >> >> >>>> invalid?
>> >> >> >>>>>>>> >
>> >> >> >>>>>>>> > -
>> >> >> >>>>>>>> > Dr. Bobby R. Bruce
>> >> >> >>>>>>>> > Room 3050,
>> >> >> >>>>>>>> > Kemper Hall, UC Davis
>> >> >> >>>>>>>> > Davis,
>> >> >> >>>>>>>> > CA, 95616
>> >> >> >>>>>>>> >
>> >> >> >>>>>>>> > web: https://www.bobbybruce.net
>> >> >> >>>>>>>> >
>> >> >> >>>>>>>> >
>> >> >> >>>>>>>> > On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής <
>> >> >> >>>>>>>> > ntampourat...@ece.auth.gr> wrote:
>> >> >> >>>>>>>> >
>> >> >> >>>>>>>> >>
>> >> >> >>>>>>>> >> Dear gem5 community,
>> >> >> >>>>>>>> >>
>> >> >> >>>>>>>> >> I have successfully cross-compile the HPCG benchmark
for
>> >> RISCV
>> >> >> >>>>>> (Serial
>> >> >> >>>>>>>> >> version, without MPI and OpenMP). While it working
> properly
>> >> in
>> >> >> >>>> gem5
>> >> >> >>>>>> SE
>> >> >> >>>>>>>> >> mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results
>> >> >> >>>>>>>> >> ./configs/example/se.py -c xhpcg --options '--nx=16
> --ny=16
>> >> >> >>>> --nz=16
>> >> >> >>>>>>>> >> --npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid
> results
>> >> in FS
>> >> >> >>>>>>>> >> simulation using "./build/RISCV/gem5.fast -d
>> >> ./HPCG_FS_results
>> >> >> >>>>>>>> >> ./configs/example/gem5_library/riscv-fs.py" (I mount
the
>> >> riscv
>> >> >> >>>> image
>> >> >> >>>>>>>> >> and put it).
>> >> >> >>>>>>>> >>
>> >> >> >>>>>>>> >> Can you help me please?
>> >> >> >>>>>>>> >>
>> >> >> >>>>>>>> >> In addition, I used the RISCV Ubuntu image
>> >> >> >>>>>>>> >> (
>> >> >> >>>>
>> >> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu
>> >> >> >>>>>> ),
>> >> >> >>>>>>>> >> I installed the gcc compiler, compile it (through
qemu)
> and
>> >> I
>> >> >> get
>> >> >> >>>>>>>> >> wrong results too.
>> >> >> >>>>>>>> >>
>> >> >> >>>>>>>> >> Here is the Makefile which I use, the hpcg executable
> for
>> >> RISCV
>> >> >> >>>>>>>> >> (xhpcg), and a video that shows the results
>> >> >> >>>>>>>> >> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/).
>> >> >> >>>>>>>> >>
>> >> >> >>>>>>>> >> P.S. I use the latest gem5 version.
>> >> >> >>>>>>>> >>
>> >> >> >>>>>>>> >> Thank you in advance! :)
>> >> >> >>>>>>>> >>
>> >> >> >>>>>>>> >> Best regards,
>> >> >> >>>>>>>> >> Nikos
>> >> >> >>>>>>>> >> _______________________________________________
>> >> >> >>>>>>>> >> gem5-users mailing list -- gem5-users@gem5.org
>> >> >> >>>>>>>> >> To unsubscribe send an email to
> gem5-users-le...@gem5.org
>> >> >> >>>>>>>> >>
>> >> >> >>>>>>>>
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> _______________________________________________
>> >> >> >>>>>>>> gem5-users mailing list -- gem5-users@gem5.org
>> >> >> >>>>>>>> To unsubscribe send an email to
gem5-users-le...@gem5.org
>> >> >> >>>>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>> _______________________________________________
>> >> >> >>>>>> gem5-users mailing list -- gem5-users@gem5.org
>> >> >> >>>>>> To unsubscribe send an email to gem5-users-le...@gem5.org
>> >> >> >>>>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> _______________________________________________
>> >> >> >>>> gem5-users mailing list -- gem5-users@gem5.org
>> >> >> >>>> To unsubscribe send an email to gem5-users-le...@gem5.org
>> >> >> >>>>
>> >> >> >>
>> >> >> >>
>> >> >> >> _______________________________________________
>> >> >> >> gem5-users mailing list -- gem5-users@gem5.org
>> >> >> >> To unsubscribe send an email to gem5-users-le...@gem5.org
>> >> >> >
>> >> >> >
>> >> >> > _______________________________________________
>> >> >> > gem5-users mailing list -- gem5-users@gem5.org
>> >> >> > To unsubscribe send an email to gem5-users-le...@gem5.org
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> gem5-users mailing list -- gem5-users@gem5.org
>> >> >> To unsubscribe send an email to gem5-users-le...@gem5.org
>> >> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> gem5-users mailing list -- gem5-users@gem5.org
>> >> To unsubscribe send an email to gem5-users-le...@gem5.org
>> >>
>>
>>
>> _______________________________________________
>> gem5-users mailing list -- gem5-users@gem5.org
>> To unsubscribe send an email to gem5-users-le...@gem5.org
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org