As a follow-up on this, I saw that in the config.dot and config.ini files generated in the m5out folder, there is only 1 memory controller in the case of dGPU but different memory pools for the CPU and dGPU. In that case, what is the real difference between APU and dGPU? Usually, host-to-device (and device-to-host) transfers represent a big overhead, so how are they modeled in this case where the transfers effectively happen in the same memory (across different memory pools)?
Thanks again, Imad On Oct 29 2021, at 6:02 pm, Imad Al Assir <imad.al.as...@upc.edu> wrote: > Hello, > I have been looking at the source code of the GPU model for the past few > weeks, and I had some doubts about the virtual memory system for discrete > GPUs (and APUs if there are any differences). I will include my questions and > partial answers below, and I hope you can correct me if I'm wrong. Also, it > would be great if you can point me to the documentation/source code where > each of these answers can be found. > 1- Where are the page tables located exactly? Who manages them? > I saw that the page tables are emulated (i.e. with the EmulationPageTable > structure) and that the GPU uses the host x86 page tables. But since there is > no OS, who manages them and where are they located exactly? In the Ruby > memory of the CPU? > 2- How do page walks happen? > I saw some comment saying that they are not real page walks, and that the > CPU's x86 page table walkers (PTWs) are used. But how is the translation from > the page table actually fetched if the walk is not real? Don't the page > walkers still have to walk the tables in memory? > 3- How are page faults handled if there is no OS? > 4- What components of the VM hierarchy are already present: IOMMU, TLBs, PWC, > PTWs? > What I am sure of is that there is a customizable TLB hierarchy and TLB > coalescers. As for the IOMMU, I was not able to figure out what it consisted > of. I know that there is a PTW and that the model uses the CPU's x86 page > tables to do the translations. But how many PTWs are there; GPUs usually > require multiple PTWs, so is this number customizable? Also, I did not see > any page walk caches or IOMMU TLBs. Are these not present in the current > model? If I am wrong, please point me to the source code of each component > (and where they are instantiated). > > I saw that a paper published by AMD in the latest MICRO > (https://dl.acm.org/doi/10.1145/3466752.3480105) used the GPU model, and that > they had all of the components mentioned in question 4, so are these publicly > available to everyone or should I implement them myself? > 5- I saw a comment in gpu_compute_driver.cc saying: "TODO: IOMMU and GPUTLBs > do not seem to correctly support shootdown". Does this mean that TLB > shootdown is not working at all? And when you say IOMMU, what do you mean > exactly (since there is no concrete IOMMU component), i.e. what does it > consist of? > Sorry for the long e-mail and thank you in advance for your help, > Imad Al Assir
_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s