Hi all,a bit of research resulted in me finding this (old) page, which indicated there were some environment variables I could use to control memory allocation of AMD's OpenCL implementation:
https://community.amd.com/t5/drivers-software/solved-clinfo-reports-error-33-of-quot-global-free-memory-amd/td-p/172760 So I decided to give that a try.First step: Ensure I can run an OpenCL program from the shell. I used the Primegrid binary for that:
# /var/lib/boinc-client/projects/www.primegrid.com/genefer22g_linux64_22.12.02 -h
geneferg version 22.12.2 (linux x64, gcc-7.5.0, boinc-7.20.2) Copyright (c) 2022, Yves Gallot genefer is free source code, under the MIT license. Command line: '-h'Running on device 'gfx1031', vendor 'Advanced Micro Devices, Inc.', version 'OpenCL 2.0 ', driver '3513.0 (HSA1.1,LC)'.
330000000^{2^15} + 1: 00:00:35, 0.0386 ms/bit, data size: 1.12 MB.
200000000^{2^16} + 1: 00:01:28, 0.0489 ms/bit, data size: 2.25 MB.
120000000^{2^17} + 1: 00:04:35, 0.0784 ms/bit, data size: 4.5 MB.
18000000^{2^18} + 1: 00:14:01, 0.133 ms/bit, data size: 9 MB.
5500000^{2^19} + 1: 00:49:15, 0.252 ms/bit, data size: 18 MB.
2000000^{2^20} + 1: 01:58:46, 0.325 ms/bit, data size: 24 MB.
910000^{2^21} + 1: 07:04:18, 0.613 ms/bit, data size: 48 MB.
270000^{2^22} + 1: 25:34:55, 1.22 ms/bit, data size: 96 MB.
1000000^{2^22} + 1: 30:51:50, 1.33 ms/bit, data size: 96 MB.
500000^{2^23} + 1: 116:24:31, 2.64 ms/bit, data size: 192 MB.
So that worked.
It also did not cause an error, but of course, test data != real
workload, right?
Next step Ensure I can reproduce the error:# /var/lib/boinc-client/projects/www.primegrid.com/genefer22g_linux64_22.12.02 -p -n 22 -b 1053460 -f gproof
geneferg version 22.12.2 (linux x64, gcc-7.5.0, boinc-7.20.2) Copyright (c) 2022, Yves Gallot genefer is free source code, under the MIT license. Command line: '-p -n 22 -b 1053460 -f gproof'Running on device 'gfx1031', vendor 'Advanced Micro Devices, Inc.', version 'OpenCL 2.0 ', driver '3513.0 (HSA1.1,LC)', data size: 96 MB.
0.0202% done, 28:15:37 remaining, 1.21 ms/bit. Interesting, this seems to work without problem. Right now, I'm at 7.29% done, 26:21:57 remaining, 1.22 ms/bit. which is much longer than what I've seen before. My conclusion for now: The boinc service must have some limits set. systemctl show gives me, among others: LimitCPU=infinity LimitCPUSoft=infinity LimitFSIZE=infinity LimitFSIZESoft=infinity LimitDATA=infinity LimitDATASoft=infinity LimitSTACK=infinity LimitSTACKSoft=8388608 LimitCORE=infinity LimitCORESoft=0 LimitRSS=infinity LimitRSSSoft=infinity LimitNOFILE=524288 LimitNOFILESoft=1024 LimitAS=infinity LimitASSoft=infinity LimitNPROC=253399 LimitNPROCSoft=253399 LimitMEMLOCK=8388608 LimitMEMLOCKSoft=8388608 LimitLOCKS=infinity LimitLOCKSSoft=infinity LimitSIGPENDING=253399 LimitSIGPENDINGSoft=253399 LimitMSGQUEUE=819200 LimitMSGQUEUESoft=819200 LimitNICE=0 LimitNICESoft=0 LimitRTPRIO=0 LimitRTPRIOSoft=0 LimitRTTIME=infinity LimitRTTIMESoft=infinityMy first candidate would be LimitMEMLOCK as I suspect that, for interaction between GPU and CPU, shared and locked would be a likely way. (You notice I know nearly nothing of OpenCL...)
I do know how 'systemctl edit' works, though, and set the limit to 1 GB: LimitMEMLOCK=1073741824 LimitMEMLOCKSoft=1073741824 Which did not help, same error after similar time.Still, seeing that I can run the binary in question from the shell, I'm kind of confident this should be solvable via proper unit configuration.
Which leaves me with one question for this mail thread: Can anybody recommend a test program for OpenCL functionality?
Thanks, Arno -- Arno Lehmann IT-Service Lehmann Sandstr. 6, 49080 Osnabrück

