Re: Antw: [gmx-users] NaN error using mdrun-gpu

Bongkeun Kim Wed, 15 Dec 2010 09:38:28 -0800

Hello,

This is the output from deviceQuery command:
./deviceQuery Starting...


 CUDA Device Query (Runtime API) version (CUDART static linking)

There are 4 devices supporting CUDA

Device 0: "Tesla T10 Processor"
  CUDA Driver Version:                           3.20
  CUDA Runtime Version:                          3.20
  CUDA Capability Major revision number:         1
  CUDA Capability Minor revision number:         3
  Total amount of global memory:                 4294770688 bytes
  Number of multiprocessors:                     30
  Number of cores:                               240
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 16384
  Warp size:                                     32
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Clock rate:                                    1.44 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     No
  Integrated:                                    No
  Support host page-locked memory mapping:       Yes

Compute mode: Default (multiplehost threads can use this device simultaneously)


Device 1: "Tesla T10 Processor"
  CUDA Driver Version:                           3.20
  CUDA Runtime Version:                          3.20
  CUDA Capability Major revision number:         1
  CUDA Capability Minor revision number:         3
  Total amount of global memory:                 4294770688 bytes
  Number of multiprocessors:                     30
  Number of cores:                               240
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 16384
  Warp size:                                     32
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Clock rate:                                    1.44 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     No
  Integrated:                                    No
  Support host page-locked memory mapping:       Yes

Compute mode: Default (multiplehost threads can use this device simultaneously)


Device 2: "Tesla T10 Processor"
  CUDA Driver Version:                           3.20
  CUDA Runtime Version:                          3.20
  CUDA Capability Major revision number:         1
  CUDA Capability Minor revision number:         3
  Total amount of global memory:                 4294770688 bytes
  Number of multiprocessors:                     30
  Number of cores:                               240
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 16384
  Warp size:                                     32
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Clock rate:                                    1.44 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     No
  Integrated:                                    No
  Support host page-locked memory mapping:       Yes

Compute mode: Default (multiplehost threads can use this device simultaneously)


Device 3: "Tesla T10 Processor"
  CUDA Driver Version:                           3.20
  CUDA Runtime Version:                          3.20
  CUDA Capability Major revision number:         1
  CUDA Capability Minor revision number:         3
  Total amount of global memory:                 4294770688 bytes
  Number of multiprocessors:                     30
  Number of cores:                               240
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 16384
  Warp size:                                     32
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Clock rate:                                    1.44 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     No
  Integrated:                                    No
  Support host page-locked memory mapping:       Yes

Compute mode: Default (multiplehost threads can use this device simultaneously)

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4243455, CUDARuntime Version = 3.20, NumDevs = 4, Device = Tesla T10 Processor,Device = Tesla T10 Processor



PASSED
----------------------------------------------------------------------------

And this simulation was already done by cpu first and I tried to runthe second one with gpu.

Thanks.
Bongkeun Kim


Quoting Szilard Pall <szilard.p...@cbr.su.se>:

Hi,

Tesla C1060 and S1070 should is definitely supported so it's strange
that you get that warning. The only thing I can think of is that for
some reason the CUDA runtime reports the name of the GPUS other than
C1060/S1070. Could you please run the deviceQuery from the SDK and
provide the output here?

However, that should not be causing the NaN issue. Does the same
simulation run on the CPU?

Cheers,
--
Szilard



2010/12/15 Bongkeun Kim <b...@chem.ucsb.edu>:

Hello,

I tried using 1fs timestep and it did not work.
I'm using nvidia T10 gpus(c1060 or s1070) and mdrun-gpu said it's not
supported gpu and I had to use "force-device=y". Do you think this is the
reason of the error?
Thanks.
Bongkeun Kim

Quoting Emanuel Peter <emanuel.pe...@chemie.uni-regensburg.de>:

Hello,

If you use for your timestep 1fs instead of 2fs, it could run better.

Bests,

Emanuel

Bongkeun Kim  15.12.10 8.36 Uhr >>>


Hello,



I got an error log when I used gromacs-gpu on npt simulation.

The error is like:

---------------------------------------------------------------

Input Parameters:

   integrator           = md

   nsteps               = 50000000

   init_step            = 0

   ns_type              = Grid

   nstlist              = 5

   ndelta               = 2

   nstcomm              = 10

   comm_mode            = Linear

   nstlog               = 1000

   nstxout              = 1000

   nstvout              = 1000

   nstfout              = 0

   nstcalcenergy        = 5

   nstenergy            = 1000

   nstxtcout            = 1000

   init_t               = 0

   delta_t              = 0.002

   xtcprec              = 1000

   nkx                  = 32

   nky                  = 32

   nkz                  = 32

   pme_order            = 4

   ewald_rtol           = 1e-05

   ewald_geometry       = 0

   epsilon_surface      = 0

   optimize_fft         = FALSE

   ePBC                 = xyz

   bPeriodicMols        = FALSE

   bContinuation        = TRUE

   bShakeSOR            = FALSE

   etc                  = V-rescale

   nsttcouple           = 5

   epc                  = Parrinello-Rahman

   epctype              = Isotropic

   nstpcouple           = 5

   tau_p                = 2

   ref_p (3x3):

      ref_p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}

      ref_p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}

      ref_p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}

   compress (3x3):

      compress[    0]={ 4.50000e-05,  0.00000e+00,  0.00000e+00}

      compress[    1]={ 0.00000e+00,  4.50000e-05,  0.00000e+00}

      compress[    2]={ 0.00000e+00,  0.00000e+00,  4.50000e-05}

   refcoord_scaling     = No

   posres_com (3):

      posres_com[0]= 0.00000e+00

      posres_com[1]= 0.00000e+00

      posres_com[2]= 0.00000e+00

   posres_comB (3):

      posres_comB[0]= 0.00000e+00

      posres_comB[1]= 0.00000e+00

      posres_comB[2]= 0.00000e+00

   andersen_seed        = 815131

   rlist                = 1

   rlistlong            = 1

   rtpi                 = 0.05

   coulombtype          = PME

   rcoulomb_switch      = 0

   rcoulomb             = 1

   vdwtype              = Cut-off

   rvdw_switch          = 0

   rvdw                 = 1

   epsilon_r            = 1

   epsilon_rf           = 1

   tabext               = 1

   implicit_solvent     = No

   gb_algorithm         = Still

   gb_epsilon_solvent   = 80

   nstgbradii           = 1

   rgbradii             = 1

   gb_saltconc          = 0

   gb_obc_alpha         = 1

   gb_obc_beta          = 0.8

   gb_obc_gamma         = 4.85

   gb_dielectric_offset = 0.009

   sa_algorithm         = Ace-approximation

   sa_surface_tension   = 2.05016

   DispCorr             = EnerPres

   free_energy          = no

   init_lambda          = 0

   delta_lambda         = 0

   n_foreign_lambda     = 0

   sc_alpha             = 0

   sc_power             = 0

   sc_sigma             = 0.3

   sc_sigma_min         = 0.3

   nstdhdl              = 10

   separate_dhdl_file   = yes

   dhdl_derivatives     = yes

   dh_hist_size         = 0

   dh_hist_spacing      = 0.1

   nwall                = 0

   wall_type            = 9-3

   wall_atomtype[0]     = -1

   wall_atomtype[1]     = -1

   wall_density[0]      = 0

   wall_density[1]      = 0

   wall_ewald_zfac      = 3

   pull                 = no

   disre                = No

   disre_weighting      = Conservative

   disre_mixed          = FALSE

   dr_fc                = 1000

   dr_tau               = 0

   nstdisreout          = 100

   orires_fc            = 0

   orires_tau           = 0

   nstorireout          = 100

   dihre-fc             = 1000

   em_stepsize          = 0.01

   em_tol               = 10

   niter                = 20

   fc_stepsize          = 0

   nstcgsteep           = 1000

   nbfgscorr            = 10

   ConstAlg             = Lincs

   shake_tol            = 0.0001

   lincs_order          = 4

   lincs_warnangle      = 30

   lincs_iter           = 1

   bd_fric              = 0

   ld_seed              = 1993

   cos_accel            = 0

   deform (3x3):

      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}

      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}

      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}

   userint1             = 0

   userint2             = 0

   userint3             = 0

   userint4             = 0

   userreal1            = 0

   userreal2            = 0

   userreal3            = 0

   userreal4            = 0

grpopts:

   nrdf:       24715

   ref_t:         325

   tau_t:         0.1

anneal:          No

ann_npoints:           0

   acc:            0           0           0

   nfreeze:           N           N           N

   energygrp_flags[  0]: 0

   efield-x:

      n = 0

   efield-xt:

      n = 0

   efield-y:

      n = 0

   efield-yt:

      n = 0

   efield-z:

      n = 0

   efield-zt:

      n = 0

   bQMMM                = FALSE

   QMconstraints        = 0

   QMMMscheme           = 0

   scalefactor          = 1

qm_opts:

   ngQM                 = 0

Table routines are used for coulomb: TRUE

Table routines are used for vdw:     FALSE

Will do PME sum in reciprocal space.



++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++

U. Essman, L. Perela, M. L. Berkowitz, T. Darden, H. Lee and L. G.
Pedersen

A smooth particle mesh Ewald method

J. Chem. Phys. 103 (1995) pp. 8577-8592

-------- -------- --- Thank You --- -------- --------



Will do ordinary reciprocal space Ewald sum.

Using a Gaussian width (1/beta) of 0.320163 nm for Ewald

Cut-off's:   NS: 1   Coulomb: 1   LJ: 1

Long Range LJ corr.:  2.9723e-04

System total charge: 0.000

Generated table with 1000 data points for Ewald.

Tabscale = 500 points/nm

Generated table with 1000 data points for LJ6.

Tabscale = 500 points/nm

Generated table with 1000 data points for LJ12.

Tabscale = 500 points/nm

Generated table with 1000 data points for 1-4 COUL.

Tabscale = 500 points/nm

Generated table with 1000 data points for 1-4 LJ6.

Tabscale = 500 points/nm

Generated table with 1000 data points for 1-4 LJ12.

Tabscale = 500 points/nm



Enabling SPC-like water optimization for 3910 molecules.



Configuring nonbonded kernels...

Configuring standard C nonbonded kernels...







Initializing LINear Constraint Solver



++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++

B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije

LINCS: A Linear Constraint Solver for molecular simulations

J. Comp. Chem. 18 (1997) pp. 1463-1472

-------- -------- --- Thank You --- -------- --------



The number of constraints is 626



++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++

S. Miyamoto and P. A. Kollman

SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid

Water Models

J. Comp. Chem. 13 (1992) pp. 952-962

-------- -------- --- Thank You --- -------- --------



Center of mass motion removal mode is Linear

We have the following groups for center of mass motion removal:

  0:  rest



++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++

G. Bussi, D. Donadio and M. Parrinello

Canonical sampling through velocity rescaling

J. Chem. Phys. 126 (2007) pp. 014101

-------- -------- --- Thank You --- -------- --------



Max number of connections per atom is 103

Total number of connections is 37894

Max number of graph edges per atom is 4

Total number of graph edges is 16892



OpenMM plugins loaded from directory
/home/bkim/packages/openmm/lib/plugins:

libOpenMMCuda.so, libOpenMMOpenCL.so,

The combination rule of the used force field matches the one used by
OpenMM.

Gromacs will use the OpenMM platform: Cuda

Non-supported GPU selected (#1, Tesla T10 Processor), forced

continuing.Note, th

at the simulation can be slow or it migth even crash.

Pre-simulation ~15s memtest in progress...

Memory test completed without errors.



++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++

Entry Friedrichs2009 not found in citation database

-------- -------- --- Thank You --- -------- --------



Initial temperature: 0 K



Started mdrun on node 0 Tue Dec 14 23:10:20 2010



           Step           Time         Lambda

              0        0.00000        0.00000



   Energies (kJ/mol)

      Potential    Kinetic En.   Total Energy    Temperature   Constr.
rmsd

   -1.40587e+05    3.36048e+04   -1.06982e+05    3.27065e+02
 0.00000e+00



           Step           Time         Lambda

           1000        2.00000        0.00000



   Energies (kJ/mol)

      Potential    Kinetic En.   Total Energy    Temperature   Constr.
rmsd

            nan            nan            nan            nan
 0.00000e+00







Received the second INT/TERM signal, stopping at the next step



           Step           Time         Lambda

           1927        3.85400        0.00000



   Energies (kJ/mol)

      Potential    Kinetic En.   Total Energy    Temperature   Constr.
rmsd

            nan            nan            nan            nan
 0.00000e+00



Writing checkpoint, step 1927 at Tue Dec 14 23:12:07 2010





        <======  ###############  ==>

        <====  A V E R A G E S  ====>

        <==  ###############  ======>



        Statistics over 3 steps using 3 frames



   Energies (kJ/mol)

      Potential    Kinetic En.   Total Energy    Temperature   Constr.
rmsd

            nan            nan            nan            nan
 0.00000e+00



          Box-X          Box-Y          Box-Z

    3.91363e-24    6.72623e-44   -1.71925e+16



   Total Virial (kJ/mol)

    0.00000e+00    0.00000e+00    0.00000e+00

    0.00000e+00    0.00000e+00    0.00000e+00

    0.00000e+00    0.00000e+00    0.00000e+00



   Pressure (bar)

    0.00000e+00    0.00000e+00    0.00000e+00

    0.00000e+00    0.00000e+00    0.00000e+00

    0.00000e+00    0.00000e+00    0.00000e+00



   Total Dipole (D)

    0.00000e+00    0.00000e+00    0.00000e+00

------------------------------------------------------------------------



The input mdp file is given by

========================================================

title           = OPLS Lysozyme MD

; Run parameters

integrator      = md            ; leap-frog integrator

nsteps          = 50000000      ;

dt              = 0.002         ; 2 fs

; Output control

nstxout         = 1000          ; save coordinates every 2 ps

nstvout         = 1000          ; save velocities every 2 ps

nstxtcout       = 1000          ; xtc compressed trajectory output every 2
ps

nstenergy       = 1000          ; save energies every 2 ps

nstlog          = 1000          ; update log file every 2 ps

; Bond parameters

continuation    = yes           ; Restarting after NPT

constraint_algorithm = lincs    ; holonomic constraints

constraints     = all-bonds     ; all bonds (even heavy atom-H bonds)

constraine

d

lincs_iter      = 1             ; accuracy of LINCS

lincs_order     = 4             ; also related to accuracy

; Neighborsearching

ns_type         = grid          ; search neighboring grid cels

nstlist         = 5             ; 10 fs

rlist           = 1.0           ; short-range neighborlist cutoff (in nm)

rcoulomb        = 1.0           ; short-range electrostatic cutoff (in nm)

rvdw            = 1.0           ; short-range van der Waals cutoff (in nm)

; Electrostatics

coulombtype     = PME           ; Particle Mesh Ewald for long-range

electrostat

ics

pme_order       = 4             ; cubic interpolation

fourierspacing  = 0.16          ; grid spacing for FFT

; Temperature coupling is on

tcoupl          = V-rescale     ; modified Berendsen thermostat

tc-grps         = System        ; two coupling groups - more accurate

tau_t           = 0.1           ; time constant, in ps

ref_t           = 325           ; reference temperature, one for each

group, in

K

; Pressure coupling is on

pcoupl          = Parrinello-Rahman     ; Pressure coupling on in NPT

pcoupltype      = isotropic     ; uniform scaling of box vectors

tau_p           = 2.0           ; time constant, in ps

ref_p           = 1.0           ; reference pressure, in bar

compressibility = 4.5e-5        ; isothermal compressibility of water,
bar^-1

; Periodic boundary conditions

pbc             = xyz           ; 3-D PBC

; Dispersion correction

DispCorr        = EnerPres      ; account for cut-off vdW scheme

; Velocity generation

gen_vel         = no            ; Velocity generation is off

=========================================================================



It worked with generic cpu mdrun but gave this error when mdrun-gpu

was used by



mdrun-gpu -deffnm md_0_2 -device

"OpenMM:platform=Cuda,deviceid=1,force-device=y

es"



If you have any idea how to avoid this problem, I will really appreciate
it.

Thank you.

Bongkeun Kim





--

gmx-users mailing list    gmx-users@gromacs.org

http://lists.gromacs.org/mailman/listinfo/gmx-users

Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!

Please don't post (un)subscribe requests to the list. Use the

www interface or send it to gmx-users-requ...@gromacs.org.

Can't post? Read http://www.gromacs.org/Support/Mailing_Lists





--
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use thewww interface
or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

--
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users

Please search the archive athttp://www.gromacs.org/Support/Mailing_Lists/Search before posting!

Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists





--
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!

Please don't post (un)subscribe requests to the list. Use thewww interface or send it to gmx-users-requ...@gromacs.org.

Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

Re: Antw: [gmx-users] NaN error using mdrun-gpu

Reply via email to