Dear Jason, all,
I am trying to find the accuracy problem with RISCV-FS and I observe
that the problem is created (at least in my dummy example) because the
variables (double) are set to zero in random simulated time (for this
reason I get different results among executions of the same code).
Specifically for the following dummy code:
#include <cmath>
#include <stdio.h>
int main(){
int dim = 10;
float result;
for (int iter = 0; iter < 2; iter++){
result = 0;
for (int i = 0; i < dim; i++){
for (int j = 0; j < dim; j++){
float sq_i = sqrt(i);
float sq_j = sqrt(j);
result += sq_i * sq_j;
printf("ITER: %d | i: %d | j: %d Result(i: %f | j: %f
| i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result);
}
}
printf("Final Result: %lf\n", result);
}
}
The correct Final Result in both iterations is 372.721656. However, I
get the following results in FS:
ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j:
1.000000): 1.000000
ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j:
1.414214): 2.414214
ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j:
1.732051): 4.146264
ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j:
1.414214): 1.414214
ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j:
2.000000): 3.414214
ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j:
2.449490): 5.863703
ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j:
2.828427): 8.692130
ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j:
3.162278): 11.854408
ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j:
3.464102): 15.318510
ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j:
3.741657): 19.060167
ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j:
4.000000): 23.060167
ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j:
4.242641): 27.302808
ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j:
0.000000): 27.302808
ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j:
1.732051): 29.034859
ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j:
2.449490): 31.484348
ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j:
3.000000): 34.484348
ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j:
3.464102): 37.948450
ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j:
3.872983): 41.821433
ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j:
4.242641): 46.064074
ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j:
4.582576): 50.646650
ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j:
4.898979): 55.545629
ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j:
5.196152): 60.741782
ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j:
0.000000): 60.741782
ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j:
2.000000): 62.741782
ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j:
2.828427): 65.570209
ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j:
3.464102): 69.034310
ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j:
4.000000): 73.034310
ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j:
4.472136): 77.506446
ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j:
4.898979): 82.405426
ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j:
5.291503): 87.696928
ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j:
5.656854): 93.353783
ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j:
6.000000): 99.353783
ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j:
0.000000): 99.353783
ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j:
2.236068): 101.589851
ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j:
3.162278): 104.752128
ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j:
3.872983): 108.625112
ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j:
4.472136): 113.097248
ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j:
5.000000): 118.097248
ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j:
5.477226): 123.574473
ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j:
5.916080): 129.490553
ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j:
6.324555): 135.815108
ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j:
6.708204): 142.523312
ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j:
0.000000): 142.523312
ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j:
2.449490): 144.972802
ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j:
3.464102): 148.436904
ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j:
4.242641): 152.679544
ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j:
4.898979): 157.578524
ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j:
5.477226): 163.055749
ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j:
6.000000): 169.055749
ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j:
6.480741): 175.536490
ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j:
6.928203): 182.464693
ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j:
7.348469): 189.813162
ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j:
0.000000): 189.813162
ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j:
2.645751): 192.458914
ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j:
3.741657): 196.200571
ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j:
4.582576): 200.783147
ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j:
5.291503): 206.074649
ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j:
5.916080): 211.990729
ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j:
6.480741): 218.471470
ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j:
7.000000): 225.471470
ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j:
7.483315): 232.954785
ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j:
7.937254): 240.892039
ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j:
0.000000): 240.892039
ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j:
2.828427): 243.720466
ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j:
4.000000): 247.720466
ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j:
4.898979): 252.619445
ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j:
5.656854): 258.276300
ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j:
6.324555): 264.600855
ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j:
6.928203): 271.529058
ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j:
7.483315): 279.012373
ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j:
8.000000): 287.012373
ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j:
8.485281): 295.497654
ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j:
0.000000): 295.497654
ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j:
3.000000): 298.497654
ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j:
4.242641): 302.740295
ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j:
5.196152): 307.936447
ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j:
6.000000): 313.936447
ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j:
6.708204): 320.644651
ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j:
7.348469): 327.993120
ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j:
7.937254): 335.930374
ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j:
8.485281): 344.415656
ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j:
9.000000): 353.415656
Final Result: 353.415656
ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j:
1.000000): 1.000000
ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j:
1.414214): 2.414214
ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j:
1.732051): 4.146264
ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | i*j:
2.000000): 6.146264
ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i*j:
2.236068): 8.382332
ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | i*j:
2.449490): 10.831822
ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i*j:
2.645751): 13.477573
ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | i*j:
2.828427): 16.306001
ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i*j:
3.000000): 19.306001
ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j:
0.000000): 19.306001
ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j:
1.414214): 20.720214
ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j:
2.000000): 22.720214
ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j:
2.449490): 25.169704
ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j:
2.828427): 27.998131
ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j:
3.162278): 31.160409
ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j:
3.464102): 34.624510
ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j:
3.741657): 38.366168
ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j:
4.000000): 42.366168
ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j:
4.242641): 46.608808
ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j:
0.000000): 46.608808
ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j:
1.732051): 48.340859
ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j:
2.449490): 50.790349
ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j:
3.000000): 53.790349
ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j:
3.464102): 57.254450
ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j:
3.872983): 61.127434
ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j:
4.242641): 65.370075
ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j:
4.582576): 69.952650
ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j:
4.898979): 74.851630
ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j:
5.196152): 80.047782
ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j:
0.000000): 80.047782
ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j:
2.000000): 82.047782
ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j:
2.828427): 84.876209
ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j:
3.464102): 88.340311
ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j:
4.000000): 92.340311
ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j:
4.472136): 96.812447
ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j:
4.898979): 101.711426
ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j:
5.291503): 107.002929
ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j:
5.656854): 112.659783
ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j:
6.000000): 118.659783
ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j:
0.000000): 118.659783
ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j:
2.236068): 120.895851
ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j:
3.162278): 124.058129
ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j:
3.872983): 127.931112
ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j:
4.472136): 132.403248
ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j:
5.000000): 137.403248
ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j:
5.477226): 142.880474
ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j:
5.916080): 148.796553
ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j:
6.324555): 155.121109
ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j:
6.708204): 161.829313
ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j:
0.000000): 161.829313
ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j:
2.449490): 164.278802
ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j:
3.464102): 167.742904
ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j:
4.242641): 171.985545
ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j:
4.898979): 176.884524
ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j:
5.477226): 182.361750
ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j:
6.000000): 188.361750
ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j:
6.480741): 194.842491
ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j:
6.928203): 201.770694
ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j:
7.348469): 209.119163
ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j:
0.000000): 209.119163
ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j:
2.645751): 211.764914
ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j:
3.741657): 215.506572
ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j:
4.582576): 220.089147
ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j:
5.291503): 225.380650
ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j:
5.916080): 231.296730
ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j:
6.480741): 237.777470
ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j:
7.000000): 244.777470
ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j:
7.483315): 252.260785
ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j:
7.937254): 260.198039
ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j:
0.000000): 260.198039
ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j:
2.828427): 263.026466
ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j:
4.000000): 267.026466
ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j:
4.898979): 271.925446
ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j:
5.656854): 277.582300
ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j:
6.324555): 283.906855
ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j:
6.928203): 290.835059
ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j:
7.483315): 298.318373
ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j:
8.000000): 306.318373
ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j:
8.485281): 314.803655
ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j:
0.000000): 314.803655
ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j:
3.000000): 317.803655
ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j:
4.242641): 322.046295
ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j:
5.196152): 327.242448
ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j:
6.000000): 333.242448
ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j:
6.708204): 339.950652
ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j:
7.348469): 347.299121
ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j:
7.937254): 355.236375
ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j:
8.485281): 363.721656
ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j:
9.000000): 372.721656
Final Result: 372.721656
As we can see in the following iterations the sqrt(1) as well as the
result is set to zero for some reason.
ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j:
0.000000): 0.000000
Please help me to resolve the accuracy issue! I think that it will be
very useful for gem5 community.
To be noticed, I find the correct simulated tick in which the
application started in FS (using m5 dumpstats), and I start the
--debug-start, but the trace file which is generated is 10x larger
than SE mode for the same application. How can I compare them?
Thank you in advance!
Best regards,
Nikos
Quoting Νικόλαος Ταμπουρατζής <ntampourat...@ece.auth.gr>:
Dear Jason,
I am trying to use --debug-start but in FS mode it is very difficult
to find the tick on which the application is started!
However, I am writing the following very simple c++ program:
#include <cmath>
#include <stdio.h>
int main(){
int dim = 4096;
double result;
for (int iter = 0; iter < 2; iter++){
result = 0;
for (int i = 0; i < dim; i++){
for (int j = 0; j < dim; j++){
result += sqrt(i) * sqrt(j);
}
}
printf("Result: %lf\n", result); //Result: 30530733453.127449
}
}
I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o
test_riscv test_riscv.cpp
While in X86 (without cross-compilation of course), QEMU-RISCV,
GEM5-SE the result is the same (30530733453.127449), in GEM5-FS the
result is different! In addition, the result is also different
between the 2 iterations.
Please reproduce the error if you want in order to verify my result.
Ηow can the issue be resolved?
Thank you in advance!
Best regards,
Nikos
Quoting Jason Lowe-Power <ja...@lowepower.com>:
Hi Nikos,
You can use --debug-start to start the debugging after some number of
ticks. Also, I would expect that the difference should come up quickly, so
no need to run the program to the end.
For the FS mode one, you will want to just start the trace as the
application starts. This could be a bit of a pain.
I'm not really sure what fundamentally could be different. FS and SE mode
use the exact same code for executing instructions, so I don't think that's
the problem. Have you tried running for smaller inputs or just one
iteration?
Jason
On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής <
ntampourat...@ece.auth.gr> wrote:
Dear Bobby,
Iam trying to add --debug-flags=Exec (building the gem5 for gem5.opt
not for gem5.fast which I had) but the debug traces exceed the 20GB
(and it is not finished yet) for less than 1 simulated second. How can
I reduce the size of the debug-flags (or set something more specific)?
In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. If you
want, you can compare these two output files
(hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As you can
see, something goes wrong with the accuracy of calculations in FS mode
(benchmark uses double precission). You can find the files here:
http://kition.mhl.tuc.gr:8000/d/68d82f3533/
Best regards,
Nikos
Quoting Jason Lowe-Power <ja...@lowepower.com>:
That's quite odd that it works in SE mode but not FS mode!
I would suggest running with --debug-flags=Exec for both and then
perform a
diff to see how they differ.
Cheers,
Jason
On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής <
ntampourat...@ece.auth.gr> wrote:
Dear Bobby,
In QEMU I get the same (correct) results that I get in SE mode
simulation. I get invalid results in FS simulation (in both
riscv-fs.py and riscv-ubuntu-run.py). I cannot access real RISCV
hardware at this moment, however, if you want you may execute my xhpcg
binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the
following configuration:
./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 --rt=0.1
Please let me know if you have any updates!
Best regards,
Nikos
Quoting Jason Lowe-Power <ja...@lowepower.com>:
> Hi Nikos,
>
> I notice you said the following in your original email:
>
> In addition, I used the RISCV Ubuntu image
>> (https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu
),
>> I installed the gcc compiler, compile it (through qemu) and I get
>> wrong results too.
>
>
> Is this saying you get the wrong results is QEMU? If so, the bug is in
GCC
> or the HPCG workload, not in gem5. If not, I would test in QEMU to
make
> sure the binary works there. Another way you could test to see if the
> problem is your binary or gem5 would be to run it on real hardware. We
have
> access to some RISC-V hardware here at UC Davis, if you don't have
access
> to it.
>
> Cheers,
> Jason
>
> On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής <
> ntampourat...@ece.auth.gr> wrote:
>
>> Dear Bobby,
>>
>> 1) I use the original riscv-fs.py which is provided in the latest
gem5
>> release.
>> I run the gem5 once (./build/RISCV/gem5.fast -d ./HPCG_FS_results
>> ./configs/example/gem5_library/riscv-fs.py) in order to download the
>> riscv-bootloader-vmlinux-5.10 and riscv-disk-img.
>> After this I mount the riscv-disk-img (sudo mount -o loop
>> riscv-disk-img /mnt), put the xhpcg executable and I do the following
>> changes in riscv-fs.py to boot the riscv-disk-img with executable:
>>
>> image = CustomDiskImageResource(
>> local_path = "/home/cossim/.cache/gem5/riscv-disk-img",
>> )
>>
>> # Set the Full System workload.
>> board.set_kernel_disk_workload(
>> kernel=Resource("riscv-bootloader-vmlinux-5.10"),
>> disk_image=image,
>> )
>>
>> Finally, in the gem5/src/python/gem5/components/boards/riscv_board.py
>> I change the last line to "return ["console=ttyS0",
>> "root={root_value}", "rw"]" in order to allow the write permissions
in
>> the image.
>>
>>
>> 2) The HPCG benchmark after some iterations calculates if the results
>> are valid or not valid. In the case of FS it gives invalid results.
As
>> I see from the results, one (at least) problem is that produces
>> different results in each HPCG execution (with the same
configuration).
>>
>> Here is the HPCG output and riscv-fs.py
>> (http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may reproduce the
>> results in the video if you use the xhpcg executable
>> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/)
>>
>> Please help me in order to solve it!
>>
>> Finally, I get invalid results in the HPL benchmark in FS mode too.
>>
>> Best regards,
>> Nikos
>>
>>
>> Quoting Bobby Bruce <bbr...@ucdavis.edu>:
>>
>> > I'm going to need a bit more information to help:
>> >
>> > 1. In what way have you modified
>> > ./configs/example/gem5_library/riscv-fs.py? Can you attach the
script
>> here?
>> > 2. What error are you getting or in what way are the results
invalid?
>> >
>> > -
>> > Dr. Bobby R. Bruce
>> > Room 3050,
>> > Kemper Hall, UC Davis
>> > Davis,
>> > CA, 95616
>> >
>> > web: https://www.bobbybruce.net
>> >
>> >
>> > On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής <
>> > ntampourat...@ece.auth.gr> wrote:
>> >
>> >>
>> >> Dear gem5 community,
>> >>
>> >> I have successfully cross-compile the HPCG benchmark for RISCV
(Serial
>> >> version, without MPI and OpenMP). While it working properly in
gem5
SE
>> >> mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results
>> >> ./configs/example/se.py -c xhpcg --options '--nx=16 --ny=16
--nz=16
>> >> --npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid results in FS
>> >> simulation using "./build/RISCV/gem5.fast -d ./HPCG_FS_results
>> >> ./configs/example/gem5_library/riscv-fs.py" (I mount the riscv
image
>> >> and put it).
>> >>
>> >> Can you help me please?
>> >>
>> >> In addition, I used the RISCV Ubuntu image
>> >> (
https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu
),
>> >> I installed the gcc compiler, compile it (through qemu) and I get
>> >> wrong results too.
>> >>
>> >> Here is the Makefile which I use, the hpcg executable for RISCV
>> >> (xhpcg), and a video that shows the results
>> >> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/).
>> >>
>> >> P.S. I use the latest gem5 version.
>> >>
>> >> Thank you in advance! :)
>> >>
>> >> Best regards,
>> >> Nikos
>> >> _______________________________________________
>> >> gem5-users mailing list -- gem5-users@gem5.org
>> >> To unsubscribe send an email to gem5-users-le...@gem5.org
>> >>
>>
>>
>> _______________________________________________
>> gem5-users mailing list -- gem5-users@gem5.org
>> To unsubscribe send an email to gem5-users-le...@gem5.org
>>
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org