Hello Main board GA-X79-UD3 with two 680 GPUs Debian10 Linux, kernel 5.10.0-19-amd64 OpenGL 4.6.0 nvidia driver 470.141.03 ------------------------------ Months ago, following updating/upgrading of amd64, the GPUs, while rendering correctly, became unable to run classical molecular dynamics simulations. Launching a minimization with software NAMD with both GPUs or with one of them (by software or even by removing one GPU)
namd2 +idlepoll +p12 +devices 0,1 min.conf namd2 +idlepoll +p12 +devices 0 min.conf namd2 +idlepoll +p12 +devices 1 min.conf NAMD organizes the simulation correctly but at the stage of starting the computation, accessing memory, a crash occurs with error TCL: Minimizing for 3000 steps > FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file > src/CudaTileListKernel.cu, function buildTileLists, line 1136 > on Pe 4 (gig64 device 0 pci 0:2:0): an illegal memory access was > encountered > FATAL ERROR: CUDA error in ComputeBondedCUDA::forceDoneCheck after polling > 48 times over 0.005047 s on Pe 8 (gig64 device 1 pci 0:3:0): an illegal > memory access was encountered > FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file > src/CudaTileListKernel.cu, function buildTileLists, line 1136 > on Pe 4 (gig64 device 0 pci 0:2:0): an illegal memory access was > encountered > FATAL ERROR: CUDA error in ComputeBondedCUDA::forceDoneCheck after polling > 48 times over 0.005047 s on Pe 8 (gig64 device 1 pci 0:3:0): an illegal > memory access was encountered > [Partition 0][Node 0] End of program > "illegal memory access" is a software error (as also proven by using alternatively one of the two GPUs) that escapes all my attempts at unraveling its origin. I had no clues from NAMD forum. Hope here. Thanks for your kind attention francesco pietra