You tried ppn3 (with and without --loadbalance)? Mark
On Wed, Jul 17, 2013 at 6:30 PM, gigo <g...@ibb.waw.pl> wrote: > On 2013-07-13 11:10, Mark Abraham wrote: >> >> On Sat, Jul 13, 2013 at 1:24 AM, gigo <g...@ibb.waw.pl> wrote: >>> >>> On 2013-07-12 20:00, Mark Abraham wrote: >>>> >>>> >>>> On Fri, Jul 12, 2013 at 4:27 PM, gigo <g...@ibb.waw.pl> wrote: >>>>> >>>>> >>>>> Hi! >>>>> >>>>> On 2013-07-12 11:15, Mark Abraham wrote: >>>>>> >>>>>> >>>>>> >>>>>> What does --loadbalance do? >>>>> >>>>> >>>>> >>>>> >>>>> It balances the total number of processes across all allocated nodes. >>>> >>>> >>>> >>>> OK, but using it means you are hostage to its assumptions about balance. >>> >>> >>> >>> Thats true, but as long as I do not try to use more resources that the >>> torque gives me, everything is OK. The question is, what is a proper way >>> of >>> running multiple simulations in parallel with MPI that are further >>> parallelized with OpenMP, when pinning fails? I could not find any other. >> >> >> I think pinning fails because you are double-crossing yourself. You do >> not want 12 MPI processes per node, and that is likely what ppn is >> setting. AFAIK your setup should work, but I haven't tested it. >> >>>> >>>>> The >>>>> thing is that mpiexec does not know that I want each replica to fork to >>>>> 4 >>>>> OpenMP threads. Thus, without this option and without affinities (in a >>>>> sec >>>>> about it) mpiexec starts too many replicas on some nodes - gromacs >>>>> complains >>>>> about the overload then - while some cores on other nodes are not used. >>>>> It >>>>> is possible to run my simulation like that: >>>>> >>>>> mpiexec mdrun_mpi -v -cpt 20 -multi 144 -replex 2000 -cpi (without >>>>> --loadbalance for mpiexec and without -ntomp for mdrun) >>>>> >>>>> Then each replica runs on 4 MPI processes (I allocate 4 times more >>>>> cores >>>>> then replicas and mdrun sees it). The problem is that it is much slower >>>>> than >>>>> using OpenMP for each replica. I did not find any other way than >>>>> --loadbalance in mpiexec and then -multi 144 -ntomp 4 in mdrun to use >>>>> MPI >>>>> and OpenMP at the same time on the torque-controlled cluster. >>>> >>>> >>>> >>>> That seems highly surprising. I have not yet encountered a job >>>> scheduler that was completely lacking a "do what I tell you" layout >>>> scheme. More importantly, why are you using #PBS -l nodes=48:ppn=12? >>> >>> >>> >>> I thing that torque is very similar to all PBS-like resource managers in >>> this regard. It actually does what I tell it to do. There are 12-core >>> nodes, >>> I ask for 48 of them - I get them (simple #PBS -l ncpus=576 does not >>> work), >>> end of story. Now, the program that I run is responsible for populating >>> resources that I got. >> >> >> No, that's not the end of the story. The scheduler and the MPI system >> typically cooperate to populate the MPI processes on the hardware, set >> OMP_NUM_THREADS, set affinities, etc. mdrun honours those if they are >> set. > > > I was able to run what I wanted flawlessly on another cluster with PBS-Pro. > The torque cluster seem to work like I said ("the end of story" behaviour). > REMD runs well on torque when I give a whole physical node to one replica. > Otherwise the simulation does not go or the pinning fails (sometimes > partially). I run out of options, I did not find any working > example/documentation on running hybrid MPI/OpenMP jobs in torque. It seems > that I stumbled upon limitations of this resource manager, and it is not > really the Gromacs issue. > Best Regards, > Grzegorz > > >> >> You seem to be using 12 because you know there are 12 cores per node. >> The scheduler should know that already. ppn should be a command about >> what to do with the hardware, not a description of what it is. More to >> the point, you should read the docs and be sure what it does. >> >>>> Surely you want 3 MPI processes per 12-core node? >>> >>> >>> >>> Yes - I want each node to run 3 MPI processes. Preferably, I would like >>> to >>> run each MPI process on separate node (spread on 12 cores with OpenMP) >>> but I >>> will not get as much of resources. But again, without the --loadbalance >>> hack >>> I would not be able to properly populate the nodes... >> >> >> So try ppn 3! >> >>>> >>>>>> What do the .log files say about >>>>>> OMP_NUM_THREADS, thread affinities, pinning, etc? >>>>> >>>>> >>>>> >>>>> >>>>> Each replica logs: >>>>> "Using 1 MPI process >>>>> Using 4 OpenMP threads", >>>>> That is is correct. As I said, the threads are forked, but 3 out of 4 >>>>> don't >>>>> do anything, and the simulation does not go at all. >>>>> >>>>> About affinities Gromacs says: >>>>> "Can not set thread affinities on the current platform. On NUMA systems >>>>> this >>>>> can cause performance degradation. If you think your platform should >>>>> support >>>>> setting affinities, contact the GROMACS developers." >>>>> >>>>> Well, the "current platform" is normal x86_64 cluster, but the whole >>>>> information about resources is passed by Torque to OpenMPI-linked >>>>> Gromacs. >>>>> Can it be that mdrun sees the resources allocated by torque as a big >>>>> pool >>>>> of >>>>> cpus and misses the information about nodes topology? >>>> >>>> >>>> >>>> mdrun gets its processor topology from the MPI layer, so that is where >>>> you need to focus. The error message confirms that GROMACS sees things >>>> that seem wrong. >>> >>> >>> >>> Thank you, I will take a look. But the first thing I want to do is >>> finding >>> the reason why Gromacs 4.6.3 is not able to run on my (slightly weird, I >>> admit) setup, while 4.6.2 does it very well. >> >> >> 4.6.2 had a bug that inhibited any MPI-based mdrun from attempting to >> set affinities. It's still not clear why ppn 12 worked at all. >> Apparently mdrun was able to float some processes around to get >> something that worked. The good news is that when you get it working >> in 4.6.3, you will see a performance boost. >> >> Mark > > -- > gmx-users mailing list gmx-users@gromacs.org > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the www > interface or send it to gmx-users-requ...@gromacs.org. > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists