As Mark said, please share the *entire* log file. Among other
important things, the result of PP-PME tuning is not included above.
However, I suspect that in this case scaling is strongly affected or
by the small size of the system you are simulating.
--
Szilárd
On Sun, Nov 10, 2013 at 5:28 AM,
On Thu, Nov 7, 2013 at 6:34 AM, James Starlight wrote:
> I've gone to conclusion that simulation with 1 or 2 GPU simultaneously gave
> me the same performance
> mdrun -ntmpi 2 -ntomp 6 -gpu_id 01 -v -deffnm md_CaM_test,
>
> mdrun -ntmpi 2 -ntomp 6 -gpu_id 0 -v -deffnm md_CaM_test,
>
> Doest it b
Let's not hijack James' thread as your hardware is different from his.
On Tue, Nov 5, 2013 at 11:00 PM, Dwey Kauffman wrote:
> Hi Szilard,
>
>Thanks for your suggestions. I am indeed aware of this page. In a 8-core
> AMD with 1GPU, I am very happy about its performance. See below. My
Actual
On Tue, Nov 5, 2013 at 9:55 PM, Dwey Kauffman wrote:
> Hi Timo,
>
> Can you provide a benchmark with "1" Xeon E5-2680 with "1" Nvidia
> k20x GPGPU on the same test of 29420 atoms ?
>
> Are these two GPU cards (within the same node) connected by a SLI (Scalable
> Link Interface) ?
Note that
> threads, hence a total of 24 threads however even with hyper threading
>>> > enabled there are only 12 threads on your machine. Therefore, only
>>> allocate
>>> > 12. Try
>>> >
>>> > mdrun -ntmpi 2 -ntomp 6 -gpu_id 01 -v -deffnm md_CaM_test
&g
Timo,
Have you used the default settings, that is one rank/GPU? If that is
the case, you may want to try using multiple ranks per GPU, this can
often help when you have >4-6 cores/GPU. Separate PME ranks are not
switched on by default with GPUs, have you tried using any?
Cheers,
--
Szilárd P
You can use the "-march=native" flag with gcc to optimize for the CPU
your are building on or e.g. -march=corei7-avx-i for Intel Ivy Bridge
CPUs.
--
Szilárd Páll
On Mon, Nov 4, 2013 at 12:37 PM, James Starlight wrote:
> Szilárd, thanks for suggestion!
>
> What kind of CPU op
hine configurations before buying. (Note
that I have never tried it myself, so I can't provide more details or
vouch for it in any way.)
Cheers,
--
Szilárd Páll
On Fri, Nov 1, 2013 at 3:08 AM, David Chalmers
wrote:
> Hi All,
>
> I am considering setting up a small cluster to run Gr
Brad,
These numbers seems rather low for a standard simulation setup! Did
you use a particularly long cut-off or short time-step?
Cheers,
--
Szilárd Páll
On Fri, Nov 1, 2013 at 6:30 PM, Brad Van Oosten wrote:
> Im not sure on the prices of these systems any more, they are getting dated
&
That should be enough. You may want to use the -march (or equivalent)
compiler flag for CPU optimization.
Cheers,
--
Szilárd Páll
On Sun, Nov 3, 2013 at 10:01 AM, James Starlight wrote:
> Dear Gromacs Users!
>
> I'd like to compile lattest 4.6 Gromacs with native GPU supporting
Hi Carsten,
On Thu, Oct 24, 2013 at 4:52 PM, Carsten Kutzner wrote:
> On Oct 24, 2013, at 4:25 PM, Mark Abraham wrote:
>
>> Hi,
>>
>> No. mdrun reports the stride with which it moves over the logical cores
>> reported by the OS, setting the affinity of GROMACS threads to logical
>> cores, and wa
e are
a few analysis tools that support OpenMP and even with those I/O will
be a severe bottleneck if you were considering using the Phi-s for
analysis.
So for now, I would stick to using only the CPUs in the system.
Cheers,
--
Szilárd Páll
On Thu, Oct 10, 2013 at 12:58 PM, Arun Sharma
Hi,
Admittedly, both the documentation on these features and the
communication on the known issues with these aspects of GROMACS has
been lacking.
Here's a brief summary/explanation:
- GROMACS 4.5: implicit solvent simulations possible using mdrun-gpu
which is essentially mdrun + OpenMM, hence it
On Mon, Sep 16, 2013 at 7:04 PM, PaulC wrote:
> Hi,
>
>
> I'm attempting to build GROMACS 4.6.3 to run entirely within a single Xeon
> Phi (i.e. native) with either/both Intel MPI/OpenMP for parallelisation
> within the single Xeon Phi.
>
> I followed these instructions from Intel for cross compil
Looks like you are compiling 4.5.1. You should try compiling the
latest version in the 4.5 series, 4.5.7.
--
Szilárd
On Sun, Sep 15, 2013 at 6:39 PM, Muthukumaran R wrote:
> hello,
>
> I am trying to install gromacs in cygwin but after issuing "make",
> installation stops with the following erro
le to judge what is causing
the problem.
Cheers,
--
Szilárd
> Best regards,
> Guanglei
>
>
> On Mon, Sep 9, 2013 at 4:35 PM, Szilárd Páll wrote:
>
>> HI,
>>
>> First of all, icc 11 is not well tested and there have been reports
>> about it compiling broken
FYI, I've file a bug report which you can track if interested:
http://redmine.gromacs.org/issues/1334
--
Szilárd
On Sun, Sep 1, 2013 at 9:49 PM, Szilárd Páll wrote:
> I may have just come across this issue as well. I have no time to
> investigate, but my guess is that it's
HI,
First of all, icc 11 is not well tested and there have been reports
about it compiling broken code. This could explain the crash, but
you'd need to do a bit more testing to confirm. Regading the GPU
detection error, if you use a driver which is incompatible with the
CUDA runtime (at least as h
On Tue, Sep 3, 2013 at 9:50 PM, Guanglei Cui
wrote:
> Hi Mark,
>
> I agree with you and Justin, but let's just say there are things that are
> out of my control ;-) I just tried SSE2 and NONE. Both failed the
> regression check.
That's alarming, with GMX_CPU_ACCELERATION=None only the plain C
ker
On Thu, Aug 29, 2013 at 7:18 AM, Gianluca Interlandi
wrote:
> Justin,
>
> I respect your opinion on this. However, in the paper indicated below by BR
> Brooks they used a cutoff of 10 A on LJ when testing IPS in CHARMM:
>
> Title: Pressure-based long-range correction for Lennard-Jones interactions
I may have just come across this issue as well. I have no time to
investigate, but my guess is that it's related to some thread-safety
issue with thread-MPI.
Could one of you please file a bug report on redmine.gromacs.org?
Cheers,
--
Szilárd
On Thu, Aug 8, 2013 at 5:52 PM, Brad Van Oosten wro
That should never happen. If mdrun is compiled with GPU support and
GPUs are detected, the detection stats should always get printed.
Can you reliably reproduce the issue?
--
Szilárd
On Fri, Aug 2, 2013 at 9:50 AM, Jernej Zidar wrote:
> Hi there.
> Lately I've been running simulations using G
erties to the MB should I consider for such system ?
>>
>> James
>>
>>
>> 2013/5/28 lloyd riggs
>>
>>> Dear Dr. Pali,
>>>
>>> Thank you,
>>>
>>> Stephan Watkins
>>>
>>> *Gesendet:* Dienstag, 28. Mai 2013 um
Hi,
The Intel compilers are only recommended for pre-Bulldozer AMD
processors (K10: Magny-Cours, Intanbul, Barcelona, etc.). On these,
PME non-bonded kernels (not the RF or plain cut-off!) are 10-30%
slower with gcc than with icc. The icc-gcc difference is the smallest
with gcc 4.7, typically arou
Dear Ramon,
Thanks for the kind words!
On Tue, Jun 18, 2013 at 10:22 AM, Ramon Crehuet Simon
wrote:
> Dear Szilard,
> Thanks for your message. Your help is priceless and helps advance science
> more than many publications. I extend that to many experts who kindly and
> promptly answer question
On Fri, Jul 19, 2013 at 6:59 PM, gigo wrote:
> Hi!
>
>
> On 2013-07-17 21:08, Mark Abraham wrote:
>>
>> You tried ppn3 (with and without --loadbalance)?
>
>
> I was testing on 8-replicas simulation.
>
> 1) Without --loadbalance and -np 8.
> Excerpts from the script:
> #PBS -l nodes=8:ppn=3
> seten
On Thu, Jul 25, 2013 at 5:55 PM, Mark Abraham wrote:
> That combo is supposed to generate a CMake warning.
>
> I also get a warning during linking that some shared library will have
> to provide some function (getpwuid?) at run time, but the binary is
> static.
That warning has always popped up f
The message is perfectly normal. When you do not use all available
cores/hardware threads (seen as "CPUs" by the OS), to avoid potential
clashes, mdrun does not pin threads (i.e. it lets the OS migrate
threads). On NUMA systems (most multi-CPU machines), this will cause
performance degradation as w
Depending on the level of parallelization (number of nodes and number
of particles/core) you may want to try:
- 2 ranks/node: 8 cores + 1 GPU, no separate PME (default):
mpirun -np 2*Nnodes mdrun_mpi [-gpu_id 01 -npme 0]
- 4 ranks per node: 4 cores + 1 GPU (shared between two ranks), no separat
FYI: The MKL FFT has been shown to be up to 30%+ slower than FFTW 3.3.
--
Szilárd
On Thu, Jul 11, 2013 at 1:17 AM, Éric Germaneau wrote:
> I have the same feeling too but I'm not in charge of it unfortunately.
> Thank you, I appreciate.
>
>
> On 07/11/2013 07:15 AM, Mark Abraham wrote:
>>
>> No
Just a note regarding the performance "issues" mentioned. You are
using reaction-field electrostatics case in which by default there is
very little force workload left for the CPU (only the bondeds) and
therefore the CPU idles most of the time. To improve performance, use
-nb gpu_cpu with multiple
Hi,
Is affinity setting (pinning) on? What compiler are you using? There
are some known issues with Intel OpenMP getting in the way of the
internal affinity setting. To verify whether this is causing a
problem, try turning of pinning (-pin off).
Cheers,
--
Szilárd
On Tue, Jul 9, 2013 at 5:29 PM
On Tue, Jul 9, 2013 at 11:20 AM, Albert wrote:
> On 07/09/2013 11:15 AM, Szilárd Páll wrote:
>>
>> Tesla C1060 is not compatible - which should be shown in the log and
>> standard output.
>>
>> Cheers,
>> --
>> Szilárd
>
>
> THX for kind comme
Tesla C1060 is not compatible - which should be shown in the log and
standard output.
Cheers,
--
Szilárd
On Tue, Jul 9, 2013 at 10:54 AM, Albert wrote:
> Dear:
>
> I've installed a gromacs-4.6.3 in a GPU cluster, and I obtained the
> following information for testing:
>
> NOTE: Using a GPU wit
PS: the error message is referring the to *driver* version, not the
CUDA toolkit/runtime version.
--
Szilárd
On Tue, Jul 9, 2013 at 11:15 AM, Szilárd Páll wrote:
> Tesla C1060 is not compatible - which should be shown in the log and
> standard output.
>
> Cheers,
> --
> Sz
On Mon, Jun 24, 2013 at 4:43 PM, Szilárd Páll wrote:
> On Sat, Jun 22, 2013 at 5:55 PM, Mirco Wahab
> wrote:
>> On 22.06.2013 17:31, Mare Libero wrote:
>>>
>>> I am assembling a GPU workstation to run MD simulations, and I was
>>> wondering if anyone has a
autocomplete).
>
> I am still trying to fix the issues with the intel compiler. The gcc
> compiled version benchmark at 52ns/day with the lysozyme in water tutorial.
icc 12 and 13 should just work with CUDA 5.0.
Cheers,
--
Szilárd
>
> Thanks again.
>
>
FYI: 4.6.2 contains a bug related to thread affinity setting which
will lead to a considerable performance loss (I;ve seen 35%) as well
as often inconsistent performance - especially with GPUs (case in
which one would run many OpenMP threads/rank). My advice is that you
either use the code from git
On Thu, Jun 27, 2013 at 12:57 PM, Mare Libero wrote:
> Hello everybody,
>
> Does anyone have any recommendation regarding the installation of gromacs 4.6
> on Ubuntu 12.04? I have the nvidia-cuda-toolkit that comes in synaptic
> (4.0.17-3ubuntu0.1 installed in /usr/lib/nvidia-cuda-toolkit) and t
Thanks Mirco, good info, your numbers look quite consistent. The only
complicating factor is that your CPUs are overclocked by different
amounts, which changes the relative performances somewhat compared to
non-overclocked parts.
However, let me list some prices to show that the top-of-the line AM
If you have a solid example that reproduced the problem, feel free to
file an issue on redmine.gromacs.org ASAP. Briefly documenting your
experiments and verification process on the issue report page can help
help developers in giving you faster feedback as well as with
accepting the report as a bu
On Sat, Jun 22, 2013 at 5:55 PM, Mirco Wahab
wrote:
> On 22.06.2013 17:31, Mare Libero wrote:
>>
>> I am assembling a GPU workstation to run MD simulations, and I was
>> wondering if anyone has any recommendation regarding the GPU/CPU
>> combination.
>> From what I can see, the GTX690 could be th
I strongly suggest that you consider the single-chip GTX cards instead
of a dual-chip one; from the point of view of price/performance you'll
probably get the most from a 680 or 780.
You could ask why, so here are the reasons:
- The current parallelization scheme requires domain-decomposition to
u
Dear Ramon,
Compute capability does not reflect the performance of a card, but it
is an indicator of what functionalities does the GPU provide - more
like a generation number or feature set version.
Quadro cards are typically quite close in performance/$ to Teslas with
roughly 5-8x *lower* "GROMA
-missing-field-initializers
> -Wno-sign-compare -Wall -Wno-unused -Wunused-value -fomit-frame-pointer
> -funroll-all-loops -fexcess-precision=fast -O3 -DNDEBUG
>
>
> All the regressiontests failed. So it appears that, at least for my system,
> I need to include the direc
Amil,
It looks like there is a mixup in your software configuration and
mdrun is linked against libguide.so, the OpenMP library part of the
Intel compiler v11 which gets loaded early and is probably causing the
crash. This library was probably pulled in implicitly by MKL which the
build system det
On Wed, Jun 5, 2013 at 4:35 PM, João Henriques
wrote:
> Just to wrap up this thread, it does work when the mpirun is properly
> configured. I knew it had to be my fault :)
>
> Something like this works like a charm:
> mpirun -npernode 2 mdrun_mpi -ntomp 8 -gpu_id 01 -deffnm md -v
That is indeed t
On Sat, Jun 8, 2013 at 9:21 PM, Albert wrote:
> Hello:
>
> Recently I found a strange question about Gromacs-4.6.2 on GPU workstaion.
> In my GTX690 machine, when I run md production I found that the ECC is on.
> However, in my another GTX590 machine, I found the ECC was off:
>
> 4 GPUs detected:
Just a few minor details:
- You can set the affinities yourself through the job scheduler which
should give nearly identical results compared to the mdrun internal
affinity if you simply assign cores to mdrun threads in a sequential
order (or with an #physical cores stride if you want to use
Hyper
"-nt" is mostly a backward compatibility option and sets the total
number of threads (per rank). Instead, you should set both "-ntmpi"
(or -np with MPI) and "-ntomp". However, note that unless a single
mdrun uses *all* cores/hardware threads on a node, it won't pin the
threads to cores. Failing to
mdrun is not blind, just the current design does report the hardware
of all compute nodes used. Whatever CPU/GPU hardware mdrun reports in
the log/std output is *only* what rank 0, i.e. the first MPI process,
detects. If you have a heterogeneous hardware configuration, in most
cases you should be a
There's no ibverbs support, s o pick your favorite/best MPI
implementation, more than that you can't do.
--
Szilárd
On Mon, Jun 3, 2013 at 2:54 PM, Bert wrote:
> Dear all,
>
> My cluster has a FDR (56 Gb/s) Infiniband network. It is well known that
> there is a big difference between using IPoIB
gner
> PhD Student, MBM Group
>
> Klaus Tschira Lab (KTL)
> Max Planck Partner Institut for Computational Biology (PICB)
> 320 YueYang Road
> 200031 Shanghai, China
>
> phone: +86-21-54920475
> email: johan...@picb.ac.cn
>
> and
>
> Heidelberg Institut for Theore
Thanks for reporting this.
he best would be a redmine bug with a tpr, command line invocation for
reproduction as well log output to see what software and hardware
configuration are you using.
Cheers,
--
Szilárd
On Mon, Jun 3, 2013 at 2:46 PM, Johannes Wagner
wrote:
> Hi there,
> trying to set
On Tue, May 28, 2013 at 10:14 AM, James Starlight
wrote:
> I've found GTX Titat with 6gb of RAM and 384 bit. The price of such card is
> equal to the price of the latest TESLA cards.
Nope!
Titan: $1000
Tesla K10: $2750
Tesla K20(c): $3000
TITAN is cheaper than any Tesla and the fastest of all N
On Sat, May 25, 2013 at 2:16 PM, Broadbent, Richard
wrote:
> I've been running on my Universities GPU nodes these are one E5-xeon (6-cores
> 12 threads) and have 4 Nvidia 690gtx's. My system is 93 000 atoms of DMF
> under NVE. The performance has been a little disappointing
That sounds like a
Dear all,
As far as I understand, the OP is interested in hardware for *running*
GROMACS 4.6 rather than developing code. or running LINPACK.
To get best performance it is important to use a machine with hardware
balanced for GROMACS' workloads. Too little GPU resources will result
in CPU idling
10.04 comes with gcc 4.3 and 4.4 which should both work (we even test
them with Jenkins).
Still, you should really get a newer gcc, especially if you have an
8-core AMD CPU (=> either Bulldozer or Piledriver) both of which are
fully supported only by gcc 4.7 and later. Additionally, AFAIK the
2.6.
With the verlet cutoff scheme (new in 4.6) you get much better control
over the drift caused by (missed) short range interactions; you just
set a maximum allowed target drift and the buffer will be calculated
accordingly. Additionally, with the verlet scheme you are free to
tweak the neighbor searc
The thread-MPI library provides the thread affinity setting
functionality to mdrun, hence certain parts of it will always be
compiled in, even with GMX_MPI=ON. Apparently, the Cray compiler does
not like some of the thread-MPI headers. Feel free to file a bug
report on redmine.gromacs.org, but *don
On Fri, May 17, 2013 at 2:48 PM, Djurre de Jong-Bruinink
wrote:
>
>
>>The answer is in the log files, in particular the performance summary
>>should indicate where is the performance difference. If you post your
>>log files somewhere we can probably give further tips on optimizing
>>your run confi
The answer is in the log files, in particular the performance summary
should indicate where is the performance difference. If you post your
log files somewhere we can probably give further tips on optimizing
your run configurations.
Note that with such a small system the scaling with the group sch
PS: if your compute-nodes are Intel of some recent architecture
OpenMP-only parallelization can be considerably more efficient.
For more details see
http://www.gromacs.org/Documentation/Acceleration_and_parallelization
--
Szilárd
On Thu, May 16, 2013 at 7:26 PM, Szilárd Páll wrote:
> I
I'm not sure what you mean by "threads". In GROMACS this can refer to
either thread-MPI or OpenMP multi-threading. To run within a single
compute node a default GROMACS installation using either of the two
aforementioned parallelization methods (or a combination of the two)
can be used.
--
Szilárd
Hi,
Such an issue typically indicates a GPU kernel crash. This can be
caused by a large variety of factors from program bug to GPU hardware
problem. To do a simple check for the former please run with the CUDA
memory checker, e.g:
/usr/local/cuda/bin/cuda-memcheck mdrun [...]
Additionally, as you
On Mon, Apr 29, 2013 at 3:51 PM, Albert wrote:
> On 04/29/2013 03:47 PM, Szilárd Páll wrote:
>>
>> In that case, while it isn't very likely, the issue could be caused by
>> some implementation detail which aims to avoid performance loss caused
>> by an issue
e GPU while mdrun was running?
Cheers,
--
Szilárd
On Mon, Apr 29, 2013 at 3:32 PM, Albert wrote:
> On 04/29/2013 03:31 PM, Szilárd Páll wrote:
>>
>> The segv indicates that mdrun crashed and not that the machine was
>> restarted. The GPU detection output (both on stderr and l
On Mon, Apr 29, 2013 at 2:41 PM, Albert wrote:
> On 04/28/2013 05:45 PM, Justin Lemkul wrote:
>>
>>
>> Frequent failures suggest instability in the simulated system. Check your
>> .log file or stderr for informative Gromacs diagnostic information.
>>
>> -Justin
>
>
>
> my log file didn't have any
Have you tried running on CPUs only just to see if the issue persists?
Unless the issue does not occur with the same binary on the same
hardware running on CPUs only, I doubt it's a problem in the code.
Do you have ECC on?
--
Szilárd
On Sun, Apr 28, 2013 at 5:27 PM, Albert wrote:
> Dear:
>
>
This error means that your binaries contain machine instructions that
the processor you run them on does not support. The most probable
cause is that you compiled the binaries on a machine with different
architecture than the one you are running on.
Cheers,
--
Szilárd
On Mon, Apr 29, 2013 at 11
You got a warning at configure-time that the nvcc host compiler can't
be set because the mpi compiler wrapper are used. Because of this,
nvcc is using gcc to compile CPU code whick chokes on the icc flags.
You can:
- set CUDA_HOST_COMPILER to the mpicc backend, i.e. icc or
- let cmake detect MPI an
Hi,
You should really check out the documentation on how to use mdrun 4.6:
http://www.gromacs.org/Documentation/Acceleration_and_parallelization#Running_simulations
Brief summary: when running on GPUs every domain is assigned to a set
of CPU cores and a GPU, hence you need to start as many PP MPI
On Mon, Apr 22, 2013 at 8:49 AM, Albert wrote:
> On 04/22/2013 08:40 AM, Mikhail Stukan wrote:
>>
>> Could you explain which hardware do you mean? As far as I know, K20X
>> supports double precision, so I would assume that double precision GROMACS
>> should be realizable on it.
>
>
> Really? But m
On Tue, Apr 9, 2013 at 6:52 PM, David van der Spoel wrote:
> On 2013-04-09 18:06, Mikhail Stukan wrote:
>
>> Dear experts,
>>
>> I have the following question. I am trying to compile GROMACS 4.6.1 with
>> GPU acceleration and have the following diagnostics:
>>
>> # cmake .. -DGMX_DOUBLE=ON -DGMX_B
Hi,
Your problem will likely be solved by not writing the rpath to the
binaries which can be accomplished by setting -DCMAKE_SKIP_RPATH=OFF.
This will mean that you will have to make sure that the library path
is set for mdrun to work.
If that does not fully solve the problem, you might have to b
On Thu, Apr 18, 2013 at 6:17 PM, Mike Hanby wrote:
> Thanks for the reply, so the next question, after I finish building single
> precision non parallel, is there an efficient way to kick off the double
> precision build, then the single precision mpi and so on?
>
> Or do I need to delete everyt
On Sat, Apr 13, 2013 at 5:27 PM, Szilárd Páll wrote:
> On Sat, Apr 13, 2013 at 3:30 PM, Mirco Wahab
> wrote:
>> On 12.04.2013 20:20, Szilárd Páll wrote:
>>>
>>> On Fri, Apr 12, 2013 at 3:45 PM, 라지브간디 wrote:
>>>>
>>>> Can cygwin recog
On Sat, Apr 13, 2013 at 3:30 PM, Mirco Wahab
wrote:
> On 12.04.2013 20:20, Szilárd Páll wrote:
>>
>> On Fri, Apr 12, 2013 at 3:45 PM, 라지브간디 wrote:
>>>
>>> Can cygwin recognize the CUDA installed in win 7? if so, how do i link
>>> them ?
>>
>&
On Fri, Apr 12, 2013 at 3:45 PM, 라지브간디 wrote:
> Thanks for your answers. I have uninstalled the mpi, have also reinstalled
> the CUDA and got the same issue. As you have mentioned before I noticed that
> it struggle to detect the CUDA.
Do you mean that you reconfigured without MPI and with CUDA
Indeed it's strange. In fact, it seems that CUDA detection did not
even run, there should be a message whether it found the toolkit or
not just before the "Enabling native GPU acceleration" - and the
enabling should not even happen without CUDA detected.
Unrelated, but do you really need MPI with
Hi,
No, it just means that *your simulation* does not scale. The question
is very vague, hence impossible to answer without more details
However, assuming that you are not running a, say, 5000 atom system
over 6 nodes, the most probable reason is that you have 6 Sandy Bridge
nodes with 12-16 core
On Wed, Apr 10, 2013 at 4:24 PM, Szilárd Páll wrote:
> Hi Andrew,
>
> As others have said, 40x speedup with GPUs is certainly possible, but more
> often than not comparisons leading to such numbers are not entirely fair -
> at least from a computational perspective. The most comm
On Wed, Apr 10, 2013 at 4:50 PM, 申昊 wrote:
> Hello,
>I wanna ask some questions about load imbalance.
> 1> Here are the messages resulted from grompp -f md.mdp -p topol.top -c
> npt.gro -o md.tpr
>
>NOTE 1 [file md.mdp]:
> The optimal PME mesh load for parallel simulations is below 0.5
On Wed, Apr 10, 2013 at 4:48 PM, 陈照云 wrote:
> I have tested gromacs-4.6.1 with k20.
> But when I run the mdrun, I met some problems.
> 1.GPU only support float accelerating?
>
Yes.
> 2.Configure options are -DGMX_MPI ,-DGMX_DOUBLE .
> But if I run parallely with mpirun, it would get wrong with
Hi Andrew,
As others have said, 40x speedup with GPUs is certainly possible, but more
often than not comparisons leading to such numbers are not entirely fair -
at least from a computational perspective. The most common case is when
people compare legacy, poorly (SIMD)-optimized codes with some ne
On Wed, Apr 10, 2013 at 3:34 AM, Benjamin Bobay wrote:
> Szilárd -
>
> First, many thanks for the reply.
>
> Second, I am glad that I am not crazy.
>
> Ok so based on your suggestions, I think I know what the problem is/was.
> There was a sander process running on 1 of the CPUs. Clearly GROMACS
Hi Ben,
That performance is not reasonable at all - neither for CPU only run on
your quad-core Sandy Bridge, nor for the CPU+GPU run. For the latter you
should be getting more like 50 ns/day or so.
What's strange about your run is that the CPU-GPU load balancing is picking
a *very* long cut-off w
On Mon, Apr 8, 2013 at 1:37 PM, Justin Lemkul wrote:
> On Mon, Apr 8, 2013 at 2:28 AM, Hrachya Astsatryan wrote:
>
> > Dear all,
> >
> > We have installed the latest version of Gromacs (version 4.6) on our
> > cluster by the following step:
> >
> > * cmake .. -DGMX_MPI=ON -DCMAKE_INSTALL_PREFIX
Hi,
As the error message states, the reason for the failed configuration is
that CMake can't auto-detect MPI which is needed when you are not providing
the MPI compiler wrapper as compiler.
If you want to build with MPI you can either let CMake auto-detect MPI and
just compile with the C compiler
Hi,
You can certainly use your hardware setup. I assume you've been looking at
the log/console output based on which it might seem that mdrun is only
using the GPUs in the first (=master) node. However, that is not the case,
it's just that the current hardware and launch configuration reporting is
>
>
> --
> Chandan kumar Choudhury
> NCL, Pune
> INDIA
>
>
> On Thu, Mar 28, 2013 at 4:26 PM, Chandan Choudhury >wrote:
>
> >
> > On Thu, Mar 28, 2013 at 4:09 PM, Szilárd Páll >wrote:
> >
> >> Hi,
> >>
> >> If mdrun s
Hi,
If mdrun says that it could not detect GPUs it simply means that the GPU
enumeration found no GPUs, otherwise it would have printed what was found.
This is rather strange because mdrun uses the same mechanism the
deviceQuery SDK example. I really don't have a good idea what could be the
issue,
Hi,
Actually, if you don't want to run across the network, with those Westmere
processors you should be fine with running OpenMP across the two sockets,
i.e
mdrun -ntomp 24
or to run without HyperThreading (which can be sometimes faster) just use
mdrun -ntomp 12 -pin on
Now, when it comes to GPU
FYI: On your machine running OpenMP across two sockets will probably not be
very efficient. Depending on the input and at how high paralleliation are
you running, you could be better off with running multiple MPI ranks per
GPU. This is a bit of an unexplained feature due to it being complicated to
Hi Quentin,
That's just a way of saying that something is wrong with either of the
following (in order of possibility of the event):
- your GPU driver is too old, hence incompatible with your CUDA version;
- your GPU driver installation is broken;
- your GPU is behaving in an unexpected/strange ma
FYI: As much as Intel likes to say that you can "just run" MPI/MPI+OpenMP
code on MIC, you will probably not be impressed with the performance (it
will be *much* slower than a Xeon CPU).
If you want to know why and what/when are we doing something about it,
please read my earlier comments on MIC p
Hi Chris,
You should be able to run on MIC/Xeon Phi as these accelerators, when used
in symmetric mode, behave just like a compute node. However, for two main
reasons the performance will be quite bad:
- no SIMD accelerated kernels for MIC;
- no accelerator-specific parallelization implemented (as
As Mark said, we need concrete details to answer the question:
- log files (all four of them: 1/2 nodes, 4.5/4.6)
- hardware (CPUs, network)
- compilers
The 4.6 log files contain much of the second and third point except the
network.
Note that you can compare the performance summary table's entrie
elp out other
> users!
> >
> > As an aside, I found that the OpenMP + Verlet combination was slower for
> > this particular system, but I suspect that it's because it's almost
> > entirely water and hence probably benefits from the Group scheme
> > optimi
On Thu, Mar 7, 2013 at 2:02 PM, Berk Hess wrote:
>
> Hi,
>
> This was only a note, not a fix.
> I was just trying to say that what linear algebra library you use for
> Gromacs is irrelevant in more than 99% of the cases.
> But having said that, the choice of library should not complicate the
> co
1 - 100 of 302 matches
Mail list logo