Hi Matt S, Thanks again for your quick reply with useful information. I will rerun with -reg-alloc-policy=dynamic in my mini regression to see If it makes a difference
As for LRN, I won't make modifications to lrn_config.dnnmark unless it's required to run additional DNN tests. The 4 tests : test_fwd_softmax, test_bwd_softmax, test_fwd_pool, and test_bwd_bn are good enough for now. For Matt S and Matt P, Are these parameters for "mem_req_latency" and "mem_resp_latency" valid for both APU (Carrizo) and GPU (VEGA) ? gem5/src/gpu-compute/GPU.py mem_req_latency = Param.Int(40, "Latency for request from the cu to ruby. "\ "Represents the pipeline to reach the TCP "\ "and specified in GPU clock cycles") mem_resp_latency = Param.Int(40, "Latency for responses from ruby to the "\ "cu. Represents the pipeline between the "\ "TCP and cu as well as TCP data array "\ "access. Specified in GPU clock cycles") It seems like to me the GPU (VEGA) with dedicated memory (GDDR5) should be using a different parameter for its memory access latencies. My company's IP could be used to reduce interconnect latencies for the APU and GPU and would to quantify this at system level with benchmarks. We would like to determine if GPU can get performance boost with reduced memory access latencies. Please confirm which memory latencies parameters to modify and use for GPU (VEGA). Thanks, David From: Matt Sinclair <sincl...@cs.wisc.edu> Sent: Tuesday, March 15, 2022 1:08 PM To: David Fong <da...@chronostech.com>; gem5 users mailing list <gem5-users@gem5.org> Cc: Kyle Roarty <kroa...@wisc.edu>; Poremba, Matthew <matthew.pore...@amd.com> Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn Hi David, The dynamic register allocation policy allows the GPU to schedule as many wavefronts as there is register space on a CU. By default, the original register allocator released with this GPU model ("simple") only allowed 1 wavefront per CU at a time because the publicly available dependence modeling was fairly primitive. However, this was not very realistic relative to how a real GPU performs, so my group has added better dependence tracking support (more could probably still be done, but it reduced stalls by up to 42% relative to simple) and a register allocation scheme that allows multiple wavefronts to run concurrently per CU ("dynamic"). By default, the GPU model assumes that the simple policy is used unless otherwise specified. I have a patch in progress to change that though: https://gem5-review.googlesource.com/c/public/gem5/+/57537<https://urldefense.proofpoint.com/v2/url?u=https-3A__gem5-2Dreview.googlesource.com_c_public_gem5_-2B_57537&d=DwMFAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=-FsihDasL1fNssjztS5omIcHAqEWFVJsA1229NUHVPg&s=ruzg3Ubt4KOu5fVZrj9H2BR_v2Pz5LclHxnZGz-vSGU&e=>. Regardless, if applications are failing with the simple register allocation scheme, I wouldn't expect a more complex scheme to fix the issue. But I do strongly recommend you use the dynamic policy for all experiments - otherwise you are using a very simple, less realistic GPU model. Setting all of that aside, I looked up the perror message you sent last night and it appears that happens when your physical machine has run out of memory (which means we can't do much to fix gem5, since the machine itself wouldn't allocate as much memory as you requested). So, if you want to run LRN and can't run on a machine with more memory, one thing you could do is change the LRN config file to use smaller NCHW values (e.g., reduce the batch size, N, from 100 to something smaller that fits on your machine): https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/lrn_config.dnnmark#6<https://urldefense.proofpoint.com/v2/url?u=https-3A__gem5.googlesource.com_public_gem5-2Dresources_-2B_refs_heads_develop_src_gpu_DNNMark_config-5Fexample_lrn-5Fconfig.dnnmark-236&d=DwMFAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=-FsihDasL1fNssjztS5omIcHAqEWFVJsA1229NUHVPg&s=UUNQuxWw2evznkakAuo9_teQSdLxGjhk111Z0ZOHIac&e=>. If you do this though, you will likely need to re-run the generate_cachefile to generate the MIOpen binaries for this different sized LRN. Hope this helps, Matt From: David Fong <da...@chronostech.com<mailto:da...@chronostech.com>> Sent: Tuesday, March 15, 2022 2:58 PM To: Matt Sinclair <sincl...@cs.wisc.edu<mailto:sincl...@cs.wisc.edu>>; gem5 users mailing list <gem5-users@gem5.org<mailto:gem5-users@gem5.org>> Cc: Kyle Roarty <kroa...@wisc.edu<mailto:kroa...@wisc.edu>>; Poremba, Matthew <matthew.pore...@amd.com<mailto:matthew.pore...@amd.com>> Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn Hi Matt S., Thanks for the detailed reply. I looked at the link you sent me for the weekly run. I see an additional parameter which I didn't use: --reg-alloc-policy=dynamic What does this do ? I was able to run the two other tests you use in your weekly runs : test_fwd_pool, test_bwd_bn for CUs=4. David From: Matt Sinclair <sincl...@cs.wisc.edu<mailto:sincl...@cs.wisc.edu>> Sent: Monday, March 14, 2022 7:41 PM To: gem5 users mailing list <gem5-users@gem5.org<mailto:gem5-users@gem5.org>> Cc: David Fong <da...@chronostech.com<mailto:da...@chronostech.com>>; Kyle Roarty <kroa...@wisc.edu<mailto:kroa...@wisc.edu>>; Poremba, Matthew <matthew.pore...@amd.com<mailto:matthew.pore...@amd.com>> Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn Hi David, I have not seen this mmap error before, and my initial guess was the mmap error is happening because you are trying to allocate more memory than we created when mmap'ing the inputs for the applications (we do this to speed up SE mode, because otherwise initializing arrays can take several hours). However, the fact that it is failing in physical.cc and not in the application itself is throwing me off there. Looking at where the failure is occurring, it seems the backing store code itself is failing here (from such a large allocation). Since the failure is with a C++ mmap call itself, that is perhaps more problematic - is "Cannot allocate memory" the failure from the perror() call on the line above the fatal() print? Regarding the other question, and the failures more generally: we have never tested with > 64 CUs before, so certainly you are stressing the system and encountering different kinds of failures than we have seen previously. In terms of applications, I had thought most/all of them passed previously, but we do not test each and every one all the time because this would make our weekly regressions run for a very long time. You can see here: https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/tests/weekly.sh#176<https://urldefense.proofpoint.com/v2/url?u=https-3A__gem5.googlesource.com_public_gem5_-2B_refs_heads_develop_tests_weekly.sh-23176&d=DwMFAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=PHGn1HCe8I3xN31ZIG4ubHju1ngyERkZLvihkRk2ZXk&s=19clmMzYHLZtPwMARK0v5V0YZvD3ESFCoS4dnaX_tZo&e=> which ones we run on a weekly basis. I expect all of those to pass (although your comment seems to indicate that is not always true?). Your issues are exposing that perhaps we need to test more of them beyond these 3 - perhaps on a quarterly basis or something though to avoid inflating the weekly runtime. Having said that, I have not run LRN in a long time, as some ML people told me that LRN was not widely used anymore. But when I did run it, I do remember it requiring a large amount of memory - which squares with what you are seeing here. I thought LRN needed -mem-size=32 GB to run, but based on your message it seems that is not the case. @Matt P: have you tried LRN lately? If so, have you run into the same OOM/backing store failures? I know Kyle R. is looking into your other failure, so this one may have to wait behind it from our end, unless Matt P knows of a fix. Thanks, Matt From: David Fong via gem5-users <gem5-users@gem5.org<mailto:gem5-users@gem5.org>> Sent: Monday, March 14, 2022 4:38 PM To: David Fong via gem5-users <gem5-users@gem5.org<mailto:gem5-users@gem5.org>> Cc: David Fong <da...@chronostech.com<mailto:da...@chronostech.com>> Subject: [gem5-users] gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn Hi, I'm getting an error related to memory for test_fwd_lrn I increased the memory size from 4GB to 512GB I got memory size issue : "out of memory". build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:599: warn: unimplemented ioctl: AMDKFD_IOC_SET_SCRATCH_BACKING_VA build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:609: warn: unimplemented ioctl: AMDKFD_IOC_SET_TRAP_HANDLER build/GCN3_X86/sim/mem_pool.cc:120: fatal: fatal condition freePages() <= 0 occurred: Out of memory, please increase size of physical memory. But once I increased mem size to 1024GB, 1536GB,2048GB I'm getting this DRAM device capacity issue. docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py --mem-size 1536GB --num-compute-units 256 -n3 --benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_lrn -cdnnmark_test_fwd_lrn --options="-config gem5/gem5-resources/src/gpu/DNNMark/config_example/lrn_config.dnnmark -mmap gem5/gem5-resources/src/gpu/DNNMark/mmap.bin" |& tee gem5_gpu_cu256_run_dnnmark_test_fwd_lrn_50latency.log Global frequency set at 1000000000000 ticks per second build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (2097152 Mbytes) mmap: Cannot allocate memory build/GCN3_X86/mem/physical.cc:231: fatal: Could not mmap 1649267441664 bytes for range [0:0x18000000000]! Smaller number of CUs like 4 also have same type of error. Is there a regression script or regression log for DNNMark to show mem-size or configurations that are known working for DNNMark tests so I can use same setup to run a few DNNMark tests? Only test_fwd_softmax, test_bwd_softmax are working for CUs from {4,8,16,32,64,128,256} Thanks, David
_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s