Hi Nazmus
On 15/01/2024 14:32, Nazmus Sakib wrote: Hello. Thanks for your response. I am running O3 cpu (ARMO3CPU), not minor. It's the same: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/lsq.cc#L816 Also, I get it that LSQ unit can do this. But a cache must have separate logic for scalar and vector read/writes, as scheduling events to support a timing model for vector load/store must be different ? A gem5 cache only reasons in terms of cacheline (64bytes) and same goes for a coherent interconnect, regardless of vector vs scalar. Also, the interconnection (bus or crossbar or whatever) must be large enough to support vector read/writes ? As I mentioned earlier, memory requests bigger than a cacheline will be split into fragments at the LSQ. To give you a more concrete example: say that you have a 1024bits vector (128bytes). A single vector load will be split into 2 64bytes memory requests. The D-cache will see two requests to two consecutive cachelines. It will produce two GetS if it is a miss, or it will return them if present. The LSQ will wait for both requests to return with data and will coalesce them before returning data to the writeback vector register. I hope this helps Giacomo ________________________________ From: Giacomo Travaglini <giacomo.travagl...@arm.com><mailto:giacomo.travagl...@arm.com> Sent: 15 January 2024 03:30 To: Nazmus Sakib <nsak...@nmsu.edu><mailto:nsak...@nmsu.edu>; The gem5 Users mailing list <gem5-users@gem5.org><mailto:gem5-users@gem5.org> Cc: Jason Lowe-Power <jlowepo...@ucdavis.edu><mailto:jlowepo...@ucdavis.edu> Subject: Re: ARM SVE ISA WARNING This email originated external to the NMSU email system. Do not click on links or open attachments unless you are sure the content is safe. Hi Nazmus, On 15/01/2024 02:41, Nazmus Sakib wrote: Thank you. I will try to switch to starter_se.py. I still had some questions regarding SVE. 1. When I compile with msve-vector-bit set to 512, I can see PTRUE instruction, which is replaced by whilelow when I compile without setting the vector bit value. Now on gem5, it seems whilelow and the corresponding incw instructions works fine, because when I keep sve_vl=1 in gem5, incw increments by 0x4 ( 128 bits) and when I set sve_vl=4 the incw increments by 0x16 (512 bits). But what I am curious about, is whether there is anything wrong with the implementation of PTRUE instruction in gem5. Without inspecting the disassembled program, I simply guess using msve-vector-bit=512 forces the code to not be VL agnostic and hardcodes it to 512. So there's nothing surprising in failing the run with a non matching hardware. I believe the proof there is nothing inherently wrong in ptrue in gem5 comes from the fact that, keeping the 512b binary untouched (with ptrue), and only setting VL=4, you have a successful run. 2. As shown in my first email, my data arrays are 64 bytes in size. An sve load instruction with sve_vl=4 will allow all 64 bytes to be loaded by one ld1w instruction (theoretically at least in an actual cpu ). I can see from the outputs generated by debug flag LSQUnit and CacheALL, that indeed all 64 bytes are accessed by one instruction. For example: system.cpu.dcache: access for WriteReq [81010:8104f] The address range here are for 64 byte (16 integer of 4 byte in my test code). But, without support in the bus/interconnection connected with cpu to deal with 64 bytes (or whatever is the vector length) and additional code in gem5 to support multi-word read/write , shouldnt only one word (I am guessing that is 4 byte in gem5 for arm) can be read from cache to cpu ? In that case, how are all 64 bytes is requested and read from cache to cpu in gem5 with one instruction? Is there some underlying mechanism, like micro-ops or some architectural feature that is taking place transparently ? Or maybe a simple loop that is not part of the debug flag output? I tried to look in src/mem/cache/base.cc and cache.cc but could not get an answer. Simply put, the O3/Minor LSQ will allow every request which does not span between a cacheline boundary. If a memory request spans two cachelines, the request will be split in two (or more) fragments [1]. Hope this helps Giacomo [1]: https://github.com/gem5/gem5/blob/stable/src/cpu/minor/lsq.cc#L1632 ________________________________ From: Giacomo Travaglini <giacomo.travagl...@arm.com><mailto:giacomo.travagl...@arm.com> Sent: 12 January 2024 03:56 To: Nazmus Sakib <nsak...@nmsu.edu><mailto:nsak...@nmsu.edu>; The gem5 Users mailing list <gem5-users@gem5.org><mailto:gem5-users@gem5.org> Cc: Jason Lowe-Power <jlowepo...@ucdavis.edu><mailto:jlowepo...@ucdavis.edu> Subject: Re: ARM SVE ISA You don't often get email from giacomo.travagl...@arm.com<mailto:giacomo.travagl...@arm.com>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> WARNING This email originated external to the NMSU email system. Do not click on links or open attachments unless you are sure the content is safe. You are right, I created a PR to fix this: https://github.com/gem5/gem5/pull/764 Kind Regards Giacomo From: Nazmus Sakib <nsak...@nmsu.edu><mailto:nsak...@nmsu.edu> Date: Thursday, 11 January 2024 at 19:34 To: Giacomo Travaglini <giacomo.travagl...@arm.com><mailto:giacomo.travagl...@arm.com>, The gem5 Users mailing list <gem5-users@gem5.org><mailto:gem5-users@gem5.org> Cc: Jason Lowe-Power <jlowepo...@ucdavis.edu><mailto:jlowepo...@ucdavis.edu> Subject: Re: ARM SVE ISA Not compiling with -msve-vector-bits did the trick. It runs perfectly, whether I set the cpu[0].isa[0].sve_vl_se to 4 or keep it to 1. Thank you for the suggestions !! One last thing, the starter_se.py does not seem to have support for --cpu-type=ArmO3CPU (or am I missing something) ? ________________________________ From: Giacomo Travaglini <giacomo.travagl...@arm.com><mailto:giacomo.travagl...@arm.com> Sent: 11 January 2024 12:16 To: The gem5 Users mailing list <gem5-users@gem5.org><mailto:gem5-users@gem5.org> Cc: Jason Lowe-Power <jlowepo...@ucdavis.edu><mailto:jlowepo...@ucdavis.edu>; Nazmus Sakib <nsak...@nmsu.edu><mailto:nsak...@nmsu.edu> Subject: Re: ARM SVE ISA You don't often get email from giacomo.travagl...@arm.com<mailto:giacomo.travagl...@arm.com>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> WARNING This email originated external to the NMSU email system. Do not click on links or open attachments unless you are sure the content is safe. Hi Nazmus, I can see from what you posted you are compiling the testcase with 512b vector width. I believe you should amend the gem5 VL accordingly… Basically writing up in the gem5 config: cpu.isa[0].sve_vl_se = 4 According to [1]. This should fix your problem. Another solution I believe would be to compile without specifying the VL. Then it should be VL agnostic code I presume. Anyway, I also recommend you use configs/example/arm/starter_se.py as se.py is per se deprecated Kind Regards Giacomo [1]: https://github.com/gem5/gem5/blob/stable/src/arch/arm/ArmISA.py#L179 From: Nazmus Sakib via gem5-users <gem5-users@gem5.org><mailto:gem5-users@gem5.org> Date: Thursday, 11 January 2024 at 17:54 To: gem5-users@gem5.org<mailto:gem5-users@gem5.org> <gem5-users@gem5.org><mailto:gem5-users@gem5.org> Cc: Jason Lowe-Power <jlowepo...@ucdavis.edu><mailto:jlowepo...@ucdavis.edu>, Nazmus Sakib <nsak...@nmsu.edu><mailto:nsak...@nmsu.edu> Subject: [gem5-users] ARM SVE ISA Hello. I am trying to run a simple program with SVE instructions on gem5. However, the output with debug flag ExecALL suggests there is a issue with the decoder. Here is the test code: #define STREAM_ARRAY_SIZE 16 void main() { for (int j=0; j<STREAM_ARRAY_SIZE; j++) { A[j]=3; B[j]=2; } int x=add(A,B); printf("return %d \n",A[3]); // should print 6, does not in gem5 } int add(int * restrict p, int * restrict q) { for (int i=0; i<STREAM_ARRAY_SIZE; i+=1) { *(p+i)=*(q+i)+4; } printf("dummy %d %d \n", *(p+3), *(q+3)); // should print 6 and 2, does not in gem5 return *(p+3); } I compiled it with gcc cross compiler for arm with following command: aarch64-linux-gnu-gcc-11 -O3 -static -mcpu=a64fx+sve2 -msve-vector-bits=512 -o test test.c Without the-mcpu=a64fx+sve2, SVE instructions are not generated. Here is the command I used: ./build/ARM/gem5.opt ./configs/deprecated/example/se.py --cpu-type=ArmO3CPU --caches --cacheline_size=64 --mem-size=8GB --arm-iset=aarch64 -c ./test I have also used "./configs/example/arm/starter_se.py", but the results are same. When I use --debug-flag=Execall, I see the following isssues: 1) 12589500: system.cpu: A0 T0 : 0x400524 @main+4 : ptrue p0, VL64 : SimdPredAlu : D=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] FetchSeq=14292 CPSeq=4962 flags=() The D=[] should not be all zeros. 2) 12591000: system.cpu: A0 T0 : 0x400550 @main+48 : st1 {z1}, p0/z, , [x19] : MemWrite : A=0x491040 FetchSeq=14305 CPSeq=4975 flags=(IsInteger|IsVector|IsStore) 12591000: system.cpu: A0 T0 : 0x400554 @main+52 : st1 {z0}, p0/z, , [x19, #1, mul vl] : MemWrite : A=0x491050 FetchSeq=14306 CPSeq=4976 flags=(IsInteger|IsVector|IsStore) The second A should be 0x491080, not 0x491050. I have run the same thing on RIKEN simulator, which was built on top of gem5 for Fujitsu A64FX. Here are the same instructions seen in RIKEN. 1) 15322000: system.cpu A0 T0 : @main+4 : ptrue p0, VL64 : SimdPredAlu : D=0b[0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111] FetchSeq=18146 CPSeq=5254 flags=() As you can see, my data arrays are 64 bytes and appropriate bits in predicate registers are set to 1. 2) 15323000: system.cpu A0 T0 : @main+48 : st1 {z1}, p0/z, , [x19] : SveMemWrite : A=0x491040 FetchSeq=18159 CPSeq=5267 flags=(IsInteger|IsVector|IsMemRef|IsStore) 15323000: system.cpu A0 T0 : @main+52 : st1 {z0}, p0/z, , [x19, #1, mul vl] : SveMemWrite : A=0x491080 FetchSeq=18160 CPSeq=5268 The second address is calcuated as 0x491080, which is the correct result for x19, #1, mul vl, as vl=64. I tried to compare the files in src/arch/arm/ISA from riken with current gem5. Since RIKEN is based on old gem5, there are obvious syntax differences. Other than that, I have found 2 things: 1) in ArmISA.py, in riken, there is this: id_aa64pfr0_el1 = Param.UInt64(0x0000000100000022, "AArch64 Processor Feature Register 0")" I did not find anything similar in gem5. I did find id_aa64pfr0_el1 in ar/arm/reg/misch.hh but its value wasnt set anwhere. 2) In ArmISA.py in current gem5, there is this "FEAT_SVE" extension in class ArmDefaultSERelease. However, this is for armv8.2, and I dont know how to specify this architecture in command line. What I am trying to find out is, am I missing any runtime flags that would enable the proper SVE instructions in gem5, or is it due to any compile time flags since I am setting -mcpu to a64fx (setting -march to armv8.2-a+sve or whatever does not produce SVE instructions, it has to be -mcpu=a64fx+sve), or is it a possible issue/bug in the new gem5 itself. Any suggestions would be appreciated. Thank you. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org