Was this "illegal mem access" with namd12 resolved? ISSUE* somehow I have a lot of problems with the NAMD-2.12 version. All CUDA jobs * >From * owner-namd-l_at_ks.uiuc.edu <owner-namd-l_at_ks.uiuc.edu?Subject=Re:%20%20NAMD-2.12%20handful%20of%20issues%20with%20CUDA> [mailto:owner-namd-l_at_ks.uiuc.edu <owner-namd-l_at_ks.uiuc.edu?Subject=Re:%20%20NAMD-2.12%20handful%20of%20issues%20with%20CUDA>] *Im * *> Auftrag von *Norman Geist * *> *Gesendet:* Freitag, 10. März 2017 10:16 * *will: **1. Immediately fail for SMP single process runs when having more * *than 1 thread via ++ppn: * *FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file * *src/CudaTileListKernel.cu, function sortTileLists * *on Pe 4 (gpu5 device 1): an illegal memory access was encountered * *------------- Processor 4 Exiting: Called CmiAbort ------------ * *Reason: FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file * *src/CudaTileListKernel.cu, function sortTileLists * *on Pe 4 (gpu5 device 1): an illegal memory access was encountered * *This happens for my own compiled versions (CUDA-7.5) as well as for the * *precompiled multicore version (CUDA-6.5). *
*From:* Ajasja Ljubetič (*ajasja.ljubetic_at_gmail.com* <ajasja.ljubetic_at_gmail.com?Subject=Re:%20%20NAMD-2.12%20handful%20of%20issues%20with%20CUDA> ) *Date:* Fri Mar 10 2017 - 05:14:10 CST Are you sure your graphics card is OK? Have you tried any of the available memory checkers? *From:* Norman Geist (*norman.geist_at_uni-greifswald.de* <norman.geist_at_uni-greifswald.de?Subject=Re:%20AW:%20%20NAMD-2.12%20handful%20of%20issues%20with%20CUDA> ) *Date:* Fri Mar 10 2017 - 05:41:10 CST Yes, since it works with gromacs, cp2k and namd versions < 2.12. Maybe I should also mention that I’m using amber FF and files. .............................. Actually, as far as I understand, "illegal mem access" is a software not hardware problem. What could I do? Perhaps running something else than NAMD, may be a game involving the GPUs? Thanks for advice francesco ---------- Forwarded message --------- From: Francesco Pietra <chiendar...@gmail.com> Date: Mon, Jan 17, 2022 at 3:50 PM Subject: Re: namd-l: Fwd: nvidia issue with namd12 Debian 11 To: Vermaas, Josh <verma...@msu.edu> Cc: nam...@ks.uiuc.edu <nam...@ks.uiuc.edu>, debian-users < debian-user@lists.debian.org> Hi Josh, no big system: Info) Analyzing structure ... Info) Atoms: 107292 Info) Bonds: 77829 Info) Angles: 61441 Dihedrals: 46455 Impropers: 1604 Cross-terms: 158 Info) Bondtypes: 0 Angletypes: 0 Dihedraltypes: 0 Impropertypes: 0 Info) Residues: 31152 Info) Waters: 30102 Info) Segments: 128 Info) Fragments: 30587 Protein: 9 Nucleic: 25 Following your hint, I tried MD with a very small system: Info) Analyzing structure ... Info) Atoms: 1448 Info) Bonds: 1187 Info) Angles: 1618 Dihedrals: 699 Impropers: 0 Cross-terms: 0 Info) Bondtypes: 0 Angletypes: 0 Dihedraltypes: 0 Impropertypes: 0 Info) Residues: 261 Info) Waters: 0 Info) Segments: 33 Info) Fragments: 261 Protein: 0 Nucleic: 0 Exactly the same error messages that I reported for the bigger system. So, it is not a problem of insufficient mem on the GTX. My very feeble guess is that there is a mismatch between the linux kernel and the nvidia driver, but they were selected by the Debian code and other people should have met the issue. I am not sure that Debian 11 could work correctly with a downgraded couple of linux kernel/nvidia driver. Perhaps it could easier to downgrade to Debian 10, which worked correctly on my raid1 box. thanks francesco Incidentally, I said namd12, while it is 14. On Mon, Jan 17, 2022 at 1:24 PM Vermaas, Josh <verma...@msu.edu> wrote: > How big is your system? The error being tossed back is that you are out of > memory. The GTX 680 only has 2GB of memory, and so depending on your system > size you may run yourself out of memory. > > > > -Josh > > > > *From: *<owner-nam...@ks.uiuc.edu> on behalf of Francesco Pietra < > chiendar...@gmail.com> > *Reply-To: *"nam...@ks.uiuc.edu" <nam...@ks.uiuc.edu>, Francesco Pietra < > chiendar...@gmail.com> > *Date: *Monday, January 17, 2022 at 4:40 AM > *To: *NAMD <nam...@ks.uiuc.edu>, debian-users < > debian-user@lists.debian.org> > *Subject: *namd-l: Fwd: nvidia issue with namd12 Debian 11 > > > > I forgot to add that commands 'nvidia-detect' and 'nvidia-smi' detect both > GTX 680 as activated and tells that they are supported by all driver > versions, including those for Tesla 450. > > Actually, legacy nvidia drivers are only required for very old nvidia > graphic cards, from 400 downwards. > > > > I alsoo add that the box is at CUDA 11.2 > > > > ---------- Forwarded message --------- > From: *Francesco Pietra* <chiendar...@gmail.com> > Date: Mon, Jan 17, 2022 at 4:15 AM > Subject: nvidia issue with namd12 Debian 11 > To: NAMD <nam...@ks.uiuc.edu>, debian-users <debian-user@lists.debian.org> > > > > With a Debian 11 box with two GTX 680 I am unable to get them working. The > problem occurred with upgrading from debian 10 to 11 and, from namd 11 to > 12 (/NAMD_Git-2021-11-27_Linux-x86_64-multicore-CUDA) > > > > nvidia-driver 460.91.03-1 > > linux-image-amd64 5.10.84-1 > > linux kernel 5.10.0-10-amd64 > > > > Error when trying a minimization: > > > > TCL: Minimizing for 3000 steps > FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file > src/CudaTileListKernel.cu, function sortTileLists, line 1577 > on Pe 2 (gig64 device 0 pci 0:2:0): an illegal memory access was > encountered > FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file > src/CudaTileListKernel.cu, function sortTileLists, line 1577 > on Pe 2 (gig64 device 0 pci 0:2:0): an illegal memory access was > encountered > [Partition 0][Node 0] End of program > FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file > src/CudaTileListKernel.cu, function sortTileLists, line 1577 > on Pe 4 (gig64 device 1 pci 0:3:0): an illegal memory access was > encountered > FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file > src/CudaTileListKernel.cu, function sortTileLists, line 1577 > on Pe 4 (gig64 device 1 pci 0:3:0): an illegal memory access was > encountered > > > > I have also reconfigured the xserver, at no avail. > > > > I have noticed issues about namd12/nvidia on the web, apparently > unresolved. > > > > Thanks for advice > > francesco pietra > > > > >