Re: [OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-23 Thread Gilles Gouaillardet
Rainer, what if you explicitly bind tasks to cores ? mpirun -bind-to core ... note this is v1.8 syntax ... v1.6 is now obsolete (Debian folks are working on upgrading it...) out of curiosity, did you try an other distro such as redhat and the likes, suse ... and do you observe the same behavior

Re: [OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-23 Thread Rainer Koenig
Gilles, I managed to get snapshots of all the /proc//status entries for all liggghts jobs, but the Cpus_allowed ist similar no matter if the system was cold or warm booted. Then I looked around in /proc/ and found sched_debug. This at least shows, that the liggghts-processes are not spread over

Re: [OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-22 Thread Gilles Gouaillardet
Rainer, a first step could be to gather /proc/pid/status for your 48 tasks. then you can grep Cpus_allowed_list and see if you find something suspucious. if your processes are idling, then the scheduler might assign them to the same core. in this case, your processes not being spread is a consequ

Re: [OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-22 Thread Rainer Koenig
Am 17.03.2016 um 10:40 schrieb Ralph Castain: > Just some thoughts offhand: > > * what version of OMPI are you using? dpkg -l openmpi-bin says 1.6.5-8 from Ubuntu 14.04. > > * are you saying that after the warm reboot, all 48 procs are running on a > subset of cores? Yes. After a cold boot all

Re: [OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-18 Thread Thomas Jahns
Hi, On 03/17/2016 10:00 AM, Rainer Koenig wrote: I'm experiencing a strange problem with running LIGGGHTS on 48 core workstation running Ubuntu 14.04.4 LTS. If I cold boot the workstation and start one of the examples form LIGGGHTS then everything looks fine: $ mpirun -np 48 liggghts < in.chu

Re: [OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-17 Thread Ralph Castain
Just some thoughts offhand: * what version of OMPI are you using? * are you saying that after the warm reboot, all 48 procs are running on a subset of cores? * it sounds like some of the cores have been marked as “offline” for some reason. Make sure you have hwloc installed on the machine, and

[OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-17 Thread Rainer Koenig
Hi, I'm experiencing a strange problem with running LIGGGHTS on 48 core workstation running Ubuntu 14.04.4 LTS. If I cold boot the workstation and start one of the examples form LIGGGHTS then everything looks fine: $ mpirun -np 48 liggghts < in.chute_wear launches the example on all 48 cores,

Re: [OMPI users] Strange problem with SSH

2015-11-19 Thread Federico Reghenzani
Thank you for the fix, I could have tried only today, I confirm it works with the patch and with the mca option. Cheers, Federico Reghenzani 2015-11-18 6:15 GMT+01:00 Gilles Gouaillardet : > Federico, > > i made PR #772 https://github.com/open-mpi/ompi-release/pull/772 > > feel free to manually

Re: [OMPI users] Strange problem with SSH

2015-11-18 Thread Gilles Gouaillardet
Federico, i made PR #772 https://github.com/open-mpi/ompi-release/pull/772 feel free to manually patch your ompi install or use the workaround i previously described Cheers, Gilles On 11/18/2015 11:31 AM, Gilles Gouaillardet wrote: Federico, thanks for the report, i will push a fix shortl

Re: [OMPI users] Strange problem with SSH

2015-11-17 Thread Gilles Gouaillardet
Federico, thanks for the report, i will push a fix shortly meanwhile, and as a workaround, you can add the --mca orte_keep_fqdn_hostnames true to your mpirun command line when using --host user@ip Cheers, Gilles On 11/17/2015 7:19 PM, Federico Reghenzani wrote: I'm trying to execute this com

Re: [OMPI users] Strange problem with SSH

2015-11-17 Thread Ralph Castain
I can't check it this week due to the Supercomputing project. It looks like you are feeding us a hostfile that contains userid and a hostname expressed as an IP address. Can you convert the IP address to a name? I think that might be a workaround until I can address it. On Tue, Nov 17, 2015 at 4:

[OMPI users] Strange problem with SSH

2015-11-17 Thread Federico Reghenzani
I'm trying to execute this command: *mpirun -np 8 --host openmpi@10.10.1.1 ,openmpi@10.10.1.2 ,openmpi@10.10.1.3 ,openmpi@10.10.1.4 --mca oob_tcp_if_exclude lo,wlp2s0 ompi_info* Everything goes find if I execute the same command with only 2 nodes (independently of which nodes). With 3 or more

Re: [OMPI users] strange problem with OpenMPI + rankfile + Intelcompiler 11.0.074 + centos/fedora-12

2010-03-31 Thread Jeff Squyres
On Mar 24, 2010, at 12:49 AM, Anton Starikov wrote: > Two different OSes: centos 5.4 (2.6.18 kernel) and Fedora-12 (2.6.32 kernel) > Two different CPUs: Opteron 248 and Opteron 8356. > > same binary for OpenMPI. Same binary for user code (vasp compiled for older > arch) Are you sure that the co

[OMPI users] strange problem with OpenMPI + rankfile + Intel compiler 11.0.074 + centos/fedora-12

2010-03-24 Thread Anton Starikov
Intel compiler 11.0.074 OpenMPI 1.4.1 Two different OSes: centos 5.4 (2.6.18 kernel) and Fedora-12 (2.6.32 kernel) Two different CPUs: Opteron 248 and Opteron 8356. same binary for OpenMPI. Same binary for user code (vasp compiled for older arch) When I supply rankfile, then depending on combo

Re: [OMPI users] Strange problem

2009-03-09 Thread Gabriele Fatigati
Hai Ralph, i've tried --nolocal flag, but doesn't works .. :( The error is the same. 2009/2/20 Ralph Castain : > Hi Gabriele > > Could be we have a problem in our LSF support - none of us have a way of > testing it, so this is somewhat of a blind programming case for us. > > From the message, it l

Re: [OMPI users] Strange problem

2009-02-20 Thread Ralph Castain
Hi Gabriele Could be we have a problem in our LSF support - none of us have a way of testing it, so this is somewhat of a blind programming case for us. From the message, it looks like there is some misunderstanding about how many slots were allocated vs how many were mapped to a specific

[OMPI users] Strange problem

2009-02-20 Thread Gabriele Fatigati
Dear OpenMPi developers, i'm running my MPI code compiled with OpenMPI 1.3 over Infiniband and LSF scheduler. But i got the error attached. I suppose that spawning process doesn't works well. The same program under OpenMPI 1.2.5 works well. Could you help me? Thanks in advance. -- Ing. Gabriele

Re: [OMPI users] Strange problem with 1.2.6

2008-07-14 Thread Jeff Squyres
I don't believe so -- as I understand that ticket, it's a problem on the trunk only, due to changes in ob1 that have occurred since the 1.2 series. On Jul 14, 2008, at 10:04 AM, Lenny Verkhovsky wrote: maybe it's related to #1378 PML ob1 deadlock for ping/ping ? On 7/14/08, Jeff Squyres

Re: [OMPI users] Strange problem with 1.2.6

2008-07-14 Thread Lenny Verkhovsky
maybe it's related to #1378 PML ob1 deadlock for ping/ping ? On 7/14/08, Jeff Squyres wrote: > > What application is it? The majority of the message passing engine did not > change in the 1.2 series; we did add a new option into 1.2.6 for disabling > early completion: > > > http://www.open-mpi

Re: [OMPI users] Strange problem with 1.2.6

2008-07-14 Thread Jeff Squyres
What application is it? The majority of the message passing engine did not change in the 1.2 series; we did add a new option into 1.2.6 for disabling early completion: http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion See if that helps you out. Note that I don

Re: [OMPI users] Strange problem with 1.2.6

2008-07-11 Thread Willem Vermin
Hello Joe, I have no solution, but the same problem, see http://www.open-mpi.org/community/lists/users/2008/07/6007.php There you will find a small program to demonstrate the problem. I found that the problem does not exists on all hardware, I have the impression that the problem manifests itse

[OMPI users] Strange problem with 1.2.6

2008-07-10 Thread Joe Landman
Hi folks: I am running into a strange problem with Open-MPI 1.2.6, built using gcc/g++ and intel ifort 10.1.015, atop an OFED stack (1.1-ish). The problem appears to be that if I run using the tcp btl, disabling sm and openib, the run completes successfully (on several different platforms),