Re: [OMPI users] Fault tolerant feature in Open MPI

2016-03-18 Thread Husen R
Dear all, Thanks for the reply and valuable informations. I have configured MVAPICH2 using the instructions available in a resource provided by Xavier. I also have installed FTB (Fault-Tolerant Backplane) in order for MVAPICH2 to have process migration feature. however, I got the following error

Re: [OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-18 Thread Thomas Jahns
Hi, On 03/17/2016 10:00 AM, Rainer Koenig wrote: I'm experiencing a strange problem with running LIGGGHTS on 48 core workstation running Ubuntu 14.04.4 LTS. If I cold boot the workstation and start one of the examples form LIGGGHTS then everything looks fine: $ mpirun -np 48 liggghts < in.chu

Re: [OMPI users] Fault tolerant feature in Open MPI

2016-03-18 Thread Ralph Castain
I don’t believe OMPI supports FTB, I’m afraid - you might want to post your question on an FTB mailing list (I don’t recall if that project is even active any more?) > On Mar 18, 2016, at 3:24 AM, Husen R wrote: > > Dear all, > > Thanks for the reply and valuable informations. > > I have c

Re: [OMPI users] Dynamically throttle/scale processes

2016-03-18 Thread Andrus, Brian Contractor
Gilles, Yep, that is what I am doing. I have the child processes wait for a message from the master before doing anything. Now I have to find a creative way to determine which processes to scale back as I allocate the work. I was just using modulo on the line number from the input, but if my di

Re: [OMPI users] OpenMPI 1.10.1 *ix hard/soft open files limits >= 4096 still required?

2016-03-18 Thread Lane, William
Ralph, For the following openMPI job submission: qsub -q short.q -V -pe make 84 -b y mpirun -np 84 --prefix /hpc/apps/mpi/openmpi/1.10.1/ --hetero-nodes --mca btl ^sm --mca plm_base_verbose 5 /hpc/home/lanew/mpi/openmpi/a_1_10_1.out I have some more information on this issue. All the server da

Re: [OMPI users] OpenMPI 1.10.1 *ix hard/soft open files limits >= 4096 still required?

2016-03-18 Thread Ralph Castain
I forget - have you tried this launch with the "-mca plm_rsh_no_tree_spawn 1”? It might let you progress on the launch, but I suspect you will then hit the shared memory error again. The problem is that your tmp file system is flooded and so we are hitting either qrsh being unable to launch on