[Public]

Hi,


No worries about the questions! I will try to answer them all, so this will be 
a long email 😊:

The disconnected (or disjoint) Ruby network is essentially the same as the APU 
Ruby network used in SE mode -  That is, it combines two Ruby protocols in one 
protocol (MOESI_AMD_base and GPU_VIPER).  They are disjointed because there are 
no paths / network links between the GPU and CPU side, simulating a discrete 
GPU. These protocols work together because they use the same network messages / 
virtual channels to the directory – Basically you cannot simply drop in another 
CPU protocol and have it work.

Atomic CPU is working *very* recently – As in this week.  It is on review board 
right now and I believe might be part of the gem5 v23.0 release.  However, the 
reason Atomic and KVM CPUs are required is because they use the 
atomic_noncaching memory mode and basically bypass the CPU cache. The timing 
CPUs (timing and O3) are trying to generate routes to the GPU side which is 
causing deadlocks.  I have not had any time to look into this further, but that 
is the status.

| are the GPU applications run on KVM?

The CPU portion of GPU applications runs on KVM.  The GPU is simulated in 
timing mode so the compute units, cache, memory, etc. are all simulated with 
events.  For an application that simply launches GPU kernels, the CPU is just 
waiting for the kernels to finish.

For your other questions:
1.  Unfortunately no, it is not this easy. There is an issue with timing CPUs 
that is still an outstanding bug – we focused on atomic CPU recently as a way 
to allow users who aren’t able to use KVM to be able to use the GPU model.
2.  KVM exits whenever there is a memory request outside of its VM range. The 
PCI address range is outside the VM range, so for example when the CPU writes 
to PCI space it will trigger an event for the GPU. The only Ruby involvement 
here is that Ruby will send all requests outside of its memory range to the IO 
bus (KVM or not).
3.  The MMIO trace is only to load the GPU driver and not used in applications. 
It basically contains some reasonable register values for anything that is not 
modeled in gem5 so that we do not need to model them (e.g., graphics, power 
management, video encode/decode, etc.).  This is not required for compute-only 
GPU variants but that is a different topic.
4.  I’m not familiar enough with this particular application to answer this 
question.
5.  I think you will need to use SE mode to do what you are trying to do.  Full 
system mode is using the real GPU driver, ROCm stack, etc. which currently does 
not support any APU-like devices. SE mode is able to do this by making use of 
an emulated driver.


-Matt

From: Anoop Mysore via gem5-users <gem5-users@gem5.org>
Sent: Friday, June 30, 2023 8:43 AM
To: The gem5 Users mailing list <gem5-users@gem5.org>
Cc: Anoop Mysore <mysan...@gmail.com>
Subject: [gem5-users] Re: Replacing CPU model in GPU-FS

Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding.

It appears the host part of GPU applications are indeed executed on KVM, from: 
https://www.gem5.org/assets/files/workshop-isca-2023/slides/improving-gem5s-gpufs-support.pdf.
A few more questions:
1. I missed that it isn't mentioned that O3 CPU models aren't supported -- 
would that be as easy as changing the `cpu_type` in the config file and 
running? I intend to run with the latest O3 CPU config I have (an Intel CPU).
2. The Ruby network that's used -- is it intercepting (perhaps just MMIO) 
memory operations from the KVM CPU? Could you please briefly describe how Ruby 
is working with both KVM and GPU (or point me to any document)?
3. The GPU MMIO trace we pass during simulator invocation -- what exactly is 
this? If it's a trace of the kernel driver/CPU's MMIO calls into GPU, how is it 
portable across different programs within a benchmark-suite -- HeteroSync, for 
example?
4. In HeteroSync, there's fine-grain synchronization between CPU and GPU in 
many apps. If I use the vega10_kvm.py, which has a discrete GPU with a KVM CPU, 
where do the synchronizations happen?
5. If I want to move to an integrated GPU model with an O3 CPU (the only 
requirement is the shared LLC) -- are there any resources that can help me? I 
do see a bootcamp that uses the apu_se.py -- can this be utilized at least 
partially to support full system O3 CPU + integrated GPU? Are there any 
modifications that need to be made to support synchronizations in L3?

Please excuse the jumbled questions, I am in the process of gaining more 
clarity.

On Fri, Jun 30, 2023 at 12:10 PM Anoop Mysore 
<mysan...@gmail.com<mailto:mysan...@gmail.com>> wrote:
According to the GPU-FS 
blog<https://urldefense.com/v3/__https:/www.gem5.org/2023/02/13/moving-to-full-system-gpu.html__;!!K-Hz7m0Vt54!k9hG9tkVg8rKCoOxEFpXaQmrvFKmQeDhFkiaPWuQkMFWMhio1S4d8IWkF32x0Nyo7bBbV3LJKv6eLEKcc8oh0uyzua4$>,
    "Currently KVM and X86 are required to run full system. Atomic and Timing 
CPUs are not yet compatible with the disconnected Ruby network required for 
GPUFS and is a work in progress."
My understanding is that KVM is used to boot Ubuntu; so, are the GPU 
applications run on KVM? Also, what does "disconnected" Ruby network mean there?
If so, is there any work in progress that I can use to develop on, or a 
(noob-friendly) documentation of what needs to be done to extend the support to 
Atomic/O3 CPU?
For a project I'm working on, I need complete visibility into the CPU+GPU cache 
hierarchy + perhaps a few more custom probes; could you comment on whether this 
would be restrictive if going with KVM in the meantime given that it leverages 
the host for the virtualized HW?

Please let me know if I have got any of this wrong or if there are other 
details you think would be useful.
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

Reply via email to