Hi Atul,

Just to clarify your setup:

Are you compiling your benchmark for Linux and running Linux + benchmark on 
gem5? Or are you running it baremetal on the simulated platform? There’s 
nothing inherently wrong on observing SVCs and MSR/MRS if your application 
contains some syscalls… If you really want to know why the CPU is going to EL2, 
you need to check your syscall trace and see if there’s anything weird.


  1.  Gem5 + Linux + benchmark -> Run your application with strace
  2.  Gem5 + benchmark -> If you run gem5 with the Exec tracer 
(“—debug-flags=Exec”) symbols will be printed in the instruction trace, so you 
will be able to gather from the function names the reason why there are many 
syscalls.

It might be this is not the reason why there is a divergence in performance 
with real hardware… However, I am curious to know why you are going to EL2 
rather than EL1 with SVC… I would check if FEAT_VHE is enabled in your 
simulation. If yes, try to disable it and let me know if you observe a 
difference (in performance). Otherwise, disable virtualization as a whole…

Kind Regards

Giacomo


From: Atul Rahman via gem5-users <gem5-users@gem5.org>
Date: Thursday, 31 August 2023 at 17:44
To: gem5-users@gem5.org <gem5-users@gem5.org>
Cc: Atul Rahman <atul.rah...@outlook.com>
Subject: [gem5-users] Help needed regarding EL2 MSR MRS instruction call 
(Arm-v8a aarch64) in gem5
Hello,
I am running a benchmark binary compiled with clang with  armv8a+fp+simd+crypto 
options.
All the workloads of this compiled benchmark have similar performance in gem5 
compared to actual mobile device except this one workload (quite simple 
workload, running Convolutional Neural Network by using C++ code from scratch 
without using any external library).
I generated tarmac tracing for first few thousand instructions starting from 
the ROI.
I see that, there are SVC instructions and MSR, MRS instructions at EL2 level. 
I am failing to understand, why there is no HVC instructions in tarmac tracing 
log but I am seeing so many MSR and MRS instructions executed at EL2! I do 
think, this is causing the particular workload to perform poorly. I don’t see 
any such EL2 instructions for other workloads of the same benchmark,
I am using gem5’s fs_bigLIttle.py script (O3 ARM CPU tuned) for FS simulation.

Any insight on this topic would be very helpful. Thanks.
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

Reply via email to