[gem5-users] Re: ARM SVE ISA

Giacomo Travaglini via gem5-users Mon, 15 Jan 2024 06:56:10 -0800

Hi Nazmus

On 15/01/2024 14:32, Nazmus Sakib wrote:
Hello. Thanks for your response.
I am running O3 cpu (ARMO3CPU), not minor.


It's the same:

https://github.com/gem5/gem5/blob/stable/src/cpu/o3/lsq.cc#L816


Also, I get it that LSQ unit can do this.
But a cache must have separate logic for scalar and vector read/writes, as 
scheduling events to support a timing model for vector load/store must be 
different ?


A gem5 cache only reasons in terms of cacheline (64bytes) and same goes for a 
coherent interconnect, regardless of vector vs scalar.



Also, the interconnection (bus or crossbar or whatever) must be large enough to 
support vector read/writes ?


As I mentioned earlier, memory requests bigger than a cacheline will be split 
into fragments at the LSQ. To give you a more concrete example: say that you 
have a 1024bits vector (128bytes). A single vector load will be split into 2 
64bytes memory requests. The D-cache will see two requests to two consecutive 
cachelines. It will produce two GetS if it is a miss, or it will return them if 
present.

The LSQ will wait for both requests to return with data and will coalesce them 
before returning data to the writeback vector register.


I hope this helps


Giacomo



________________________________
From: Giacomo Travaglini 
<giacomo.travagl...@arm.com><mailto:giacomo.travagl...@arm.com>
Sent: 15 January 2024 03:30
To: Nazmus Sakib <nsak...@nmsu.edu><mailto:nsak...@nmsu.edu>; The gem5 Users mailing list 
<gem5-users@gem5.org><mailto:gem5-users@gem5.org>
Cc: Jason Lowe-Power <jlowepo...@ucdavis.edu><mailto:jlowepo...@ucdavis.edu>
Subject: Re: ARM SVE ISA

WARNING This email originated external to the NMSU email system. Do not click 
on links or open attachments unless you are sure the content is safe.

Hi Nazmus,


On 15/01/2024 02:41, Nazmus Sakib wrote:
Thank you. I will try to switch to starter_se.py.
I still had some questions regarding SVE.
1. When I compile with msve-vector-bit set to 512, I can see PTRUE instruction, 
which is replaced by whilelow when I compile without setting the vector bit 
value. Now on gem5, it seems whilelow and the corresponding incw instructions 
works fine, because when I keep sve_vl=1 in gem5, incw increments by 0x4 ( 128 
bits) and when I set sve_vl=4 the incw increments by 0x16 (512 bits). But what 
I am curious about, is whether there is anything wrong with the implementation 
of PTRUE instruction in gem5.


Without inspecting the disassembled program, I simply guess using 
msve-vector-bit=512 forces the code to not be VL agnostic and hardcodes it to 
512. So there's nothing surprising in failing the run with a non matching 
hardware.

I believe the proof there is nothing inherently wrong in ptrue in gem5 comes 
from the fact that, keeping the 512b binary untouched (with ptrue), and only 
setting VL=4, you have a successful run.


2. As shown in my first email, my data arrays are 64 bytes in size. An sve load 
instruction with sve_vl=4 will allow all 64 bytes to be loaded by one ld1w 
instruction (theoretically at least in an actual cpu ). I can see from the 
outputs generated by debug flag LSQUnit and CacheALL, that indeed all 64 bytes 
are accessed by one instruction. For example:
system.cpu.dcache: access for WriteReq [81010:8104f]
The address range here are for 64 byte (16 integer of 4 byte in my test code).
But, without support in the bus/interconnection connected with cpu to deal with 
64 bytes (or whatever is the vector length)  and additional code in gem5 to 
support multi-word read/write , shouldnt only one word (I am guessing that is 4 
byte in gem5 for arm) can be read from cache to cpu ? In that case, how are all 
64 bytes is requested and read from cache to cpu in gem5 with one instruction? 
Is there some underlying mechanism, like micro-ops or some architectural 
feature that is taking place transparently ? Or maybe a simple loop that is not 
part of the debug flag output? I tried to look in src/mem/cache/base.cc and 
cache.cc but could not get an answer.


Simply put, the O3/Minor LSQ will allow every request which does not span 
between a cacheline boundary. If a memory request spans two cachelines, the 
request will be split in two (or more) fragments [1].


Hope this helps


Giacomo


[1]: https://github.com/gem5/gem5/blob/stable/src/cpu/minor/lsq.cc#L1632



________________________________
From: Giacomo Travaglini 
<giacomo.travagl...@arm.com><mailto:giacomo.travagl...@arm.com>
Sent: 12 January 2024 03:56
To: Nazmus Sakib <nsak...@nmsu.edu><mailto:nsak...@nmsu.edu>; The gem5 Users mailing list 
<gem5-users@gem5.org><mailto:gem5-users@gem5.org>
Cc: Jason Lowe-Power <jlowepo...@ucdavis.edu><mailto:jlowepo...@ucdavis.edu>
Subject: Re: ARM SVE ISA


You don't often get email from 
giacomo.travagl...@arm.com<mailto:giacomo.travagl...@arm.com>. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>

WARNING This email originated external to the NMSU email system. Do not click 
on links or open attachments unless you are sure the content is safe.

You are right, I created a PR to fix this:



https://github.com/gem5/gem5/pull/764



Kind Regards



Giacomo



From: Nazmus Sakib <nsak...@nmsu.edu><mailto:nsak...@nmsu.edu>
Date: Thursday, 11 January 2024 at 19:34
To: Giacomo Travaglini <giacomo.travagl...@arm.com><mailto:giacomo.travagl...@arm.com>, The 
gem5 Users mailing list <gem5-users@gem5.org><mailto:gem5-users@gem5.org>
Cc: Jason Lowe-Power <jlowepo...@ucdavis.edu><mailto:jlowepo...@ucdavis.edu>
Subject: Re: ARM SVE ISA

Not compiling with -msve-vector-bits did the trick. It runs perfectly, whether 
I set the cpu[0].isa[0].sve_vl_se to 4 or keep it to 1.
Thank you for the suggestions !!
One last thing, the starter_se.py does not seem to have support for 
--cpu-type=ArmO3CPU (or am I missing something) ?

________________________________

From: Giacomo Travaglini 
<giacomo.travagl...@arm.com><mailto:giacomo.travagl...@arm.com>
Sent: 11 January 2024 12:16
To: The gem5 Users mailing list 
<gem5-users@gem5.org><mailto:gem5-users@gem5.org>
Cc: Jason Lowe-Power <jlowepo...@ucdavis.edu><mailto:jlowepo...@ucdavis.edu>; Nazmus Sakib 
<nsak...@nmsu.edu><mailto:nsak...@nmsu.edu>
Subject: Re: ARM SVE ISA




You don't often get email from 
giacomo.travagl...@arm.com<mailto:giacomo.travagl...@arm.com>. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>


WARNING This email originated external to the NMSU email system. Do not click 
on links or open attachments unless you are sure the content is safe.

Hi Nazmus,



I can see from what you posted you are compiling the testcase with 512b vector 
width. I believe you should amend the gem5 VL accordingly… Basically writing up 
in the gem5 config:



cpu.isa[0].sve_vl_se = 4



According to [1].

This should fix your problem. Another solution I believe would be to compile 
without specifying the VL. Then it should be VL agnostic code I presume.



Anyway, I also recommend you use configs/example/arm/starter_se.py as se.py is 
per se deprecated



Kind Regards



Giacomo



[1]: https://github.com/gem5/gem5/blob/stable/src/arch/arm/ArmISA.py#L179



From: Nazmus Sakib via gem5-users 
<gem5-users@gem5.org><mailto:gem5-users@gem5.org>
Date: Thursday, 11 January 2024 at 17:54
To: gem5-users@gem5.org<mailto:gem5-users@gem5.org> 
<gem5-users@gem5.org><mailto:gem5-users@gem5.org>
Cc: Jason Lowe-Power <jlowepo...@ucdavis.edu><mailto:jlowepo...@ucdavis.edu>, Nazmus Sakib 
<nsak...@nmsu.edu><mailto:nsak...@nmsu.edu>
Subject: [gem5-users] ARM SVE ISA

Hello.
I am trying to run a simple program with SVE instructions on gem5. However, the 
output with debug flag ExecALL suggests there is a issue with the decoder.
Here is the test code:

#define STREAM_ARRAY_SIZE 16
void main()

{

for (int j=0; j<STREAM_ARRAY_SIZE; j++)

      {

      A[j]=3; B[j]=2;

      }

int x=add(A,B);

printf("return %d \n",A[3]);  // should print 6, does not in gem5

}



int add(int * restrict p, int * restrict q)

{  

for (int i=0; i<STREAM_ARRAY_SIZE; i+=1)

      {

        *(p+i)=*(q+i)+4;

              }

printf("dummy %d %d \n",  *(p+3),  *(q+3));    // should print 6 and 2, does 
not in gem5

return *(p+3);

}
I compiled it with gcc cross compiler for arm with following command:

aarch64-linux-gnu-gcc-11 -O3 -static  -mcpu=a64fx+sve2 -msve-vector-bits=512 -o 
test test.c

Without the-mcpu=a64fx+sve2, SVE instructions are not generated.
Here is the command I used:
./build/ARM/gem5.opt ./configs/deprecated/example/se.py --cpu-type=ArmO3CPU 
--caches --cacheline_size=64 --mem-size=8GB --arm-iset=aarch64 -c ./test
I have also used "./configs/example/arm/starter_se.py", but the results are 
same.
When I use --debug-flag=Execall, I see the following isssues:
1) 12589500: system.cpu: A0 T0 : 0x400524 @main+4    :   ptrue   p0, VL64       
  : SimdPredAlu
:  D=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]  FetchSeq=14292  CPSeq=4962  flags=()

The D=[] should not be all zeros.

2)

12591000: system.cpu: A0 T0 : 0x400550 @main+48    :   st1   {z1}, p0/z, , 
[x19] : MemWrite :
A=0x491040  FetchSeq=14305  CPSeq=4975  flags=(IsInteger|IsVector|IsStore)

12591000: system.cpu: A0 T0 : 0x400554 @main+52    :   st1   {z0}, p0/z, , 
[x19, #1, mul vl] : MemWrite : A=0x491050  FetchSeq=14306  CPSeq=4976  
flags=(IsInteger|IsVector|IsStore)

The second A should be 0x491080, not 0x491050.

I have run the same thing on RIKEN simulator, which was built on top of gem5 
for Fujitsu A64FX.
Here are the same instructions seen in RIKEN.
1) 15322000: system.cpu A0 T0 : @main+4    :   ptrue   p0, VL64         : 
SimdPredAlu :  
D=0b[0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111]
  FetchSeq=18146  CPSeq=5254  flags=()
As you can see, my data arrays are 64 bytes and appropriate bits in predicate 
registers are set to 1.
2)
15323000: system.cpu A0 T0 : @main+48    :   st1   {z1}, p0/z, , [x19] : 
SveMemWrite :
A=0x491040  FetchSeq=18159  CPSeq=5267  
flags=(IsInteger|IsVector|IsMemRef|IsStore)

15323000: system.cpu A0 T0 : @main+52    :   st1   {z0}, p0/z, , [x19, #1, mul 
vl] : SveMemWrite :

 A=0x491080  FetchSeq=18160  CPSeq=5268

The second address is calcuated as 0x491080, which is the correct result for 
x19, #1, mul vl, as vl=64.

I tried to compare the files in src/arch/arm/ISA from riken with current gem5. 
Since RIKEN is based on old gem5, there are obvious syntax differences. Other 
than that, I have found 2 things:
1) in ArmISA.py, in riken, there is this:

    id_aa64pfr0_el1 = Param.UInt64(0x0000000100000022, "AArch64 Processor Feature 
Register 0")"

I did not find anything similar in gem5. I did find id_aa64pfr0_el1 in 
ar/arm/reg/misch.hh but its value wasnt set anwhere.

2) In ArmISA.py in current gem5, there is this "FEAT_SVE" extension in class 
ArmDefaultSERelease. However, this is for armv8.2, and I dont know how to specify this 
architecture in command line.

What I am trying to find out is, am I missing any runtime flags that would 
enable the proper SVE instructions in gem5, or is it due to any compile time 
flags since I am setting -mcpu to a64fx (setting -march to armv8.2-a+sve or 
whatever does not produce SVE instructions, it has to be -mcpu=a64fx+sve), or 
is it a possible issue/bug in the new gem5 itself. Any suggestions would be 
appreciated.
Thank you.

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: ARM SVE ISA

Reply via email to