Rainer,
what if you explicitly bind tasks to cores ?
mpirun -bind-to core ...
note this is v1.8 syntax ...
v1.6 is now obsolete (Debian folks are working on upgrading it...)
out of curiosity, did you try an other distro such as redhat and the likes,
suse ...
and do you observe the same behavior
Gilles,
I managed to get snapshots of all the /proc//status entries for all
liggghts jobs, but the Cpus_allowed ist similar no matter if the system
was cold or warm booted.
Then I looked around in /proc/ and found sched_debug.
This at least shows, that the liggghts-processes are not spread over
Rainer,
a first step could be to gather /proc/pid/status for your 48 tasks.
then you can
grep Cpus_allowed_list
and see if you find something suspucious.
if your processes are idling, then the scheduler might assign them to the
same core.
in this case, your processes not being spread is a consequ
Am 17.03.2016 um 10:40 schrieb Ralph Castain:
> Just some thoughts offhand:
>
> * what version of OMPI are you using?
dpkg -l openmpi-bin says 1.6.5-8 from Ubuntu 14.04.
>
> * are you saying that after the warm reboot, all 48 procs are running on a
> subset of cores?
Yes. After a cold boot all
Hi,
On 03/17/2016 10:00 AM, Rainer Koenig wrote:
I'm experiencing a strange problem with running LIGGGHTS on 48 core
workstation running Ubuntu 14.04.4 LTS.
If I cold boot the workstation and start one of the examples form
LIGGGHTS then everything looks fine:
$ mpirun -np 48 liggghts < in.chu
Just some thoughts offhand:
* what version of OMPI are you using?
* are you saying that after the warm reboot, all 48 procs are running on a
subset of cores?
* it sounds like some of the cores have been marked as “offline” for some
reason. Make sure you have hwloc installed on the machine, and
Hi,
I'm experiencing a strange problem with running LIGGGHTS on 48 core
workstation running Ubuntu 14.04.4 LTS.
If I cold boot the workstation and start one of the examples form
LIGGGHTS then everything looks fine:
$ mpirun -np 48 liggghts < in.chute_wear
launches the example on all 48 cores,
Thank you for the fix,
I could have tried only today, I confirm it works with the patch and with
the mca option.
Cheers,
Federico Reghenzani
2015-11-18 6:15 GMT+01:00 Gilles Gouaillardet :
> Federico,
>
> i made PR #772 https://github.com/open-mpi/ompi-release/pull/772
>
> feel free to manually
Federico,
i made PR #772 https://github.com/open-mpi/ompi-release/pull/772
feel free to manually patch your ompi install or use the workaround i
previously described
Cheers,
Gilles
On 11/18/2015 11:31 AM, Gilles Gouaillardet wrote:
Federico,
thanks for the report, i will push a fix shortl
Federico,
thanks for the report, i will push a fix shortly
meanwhile, and as a workaround, you can add the
--mca orte_keep_fqdn_hostnames true
to your mpirun command line when using --host user@ip
Cheers,
Gilles
On 11/17/2015 7:19 PM, Federico Reghenzani wrote:
I'm trying to execute this com
I can't check it this week due to the Supercomputing project. It looks like
you are feeding us a hostfile that contains userid and a hostname expressed
as an IP address. Can you convert the IP address to a name? I think that
might be a workaround until I can address it.
On Tue, Nov 17, 2015 at 4:
I'm trying to execute this command:
*mpirun -np 8 --host openmpi@10.10.1.1
,openmpi@10.10.1.2 ,openmpi@10.10.1.3
,openmpi@10.10.1.4 --mca
oob_tcp_if_exclude lo,wlp2s0 ompi_info*
Everything goes find if I execute the same command with only 2 nodes
(independently of which nodes).
With 3 or more
On Mar 24, 2010, at 12:49 AM, Anton Starikov wrote:
> Two different OSes: centos 5.4 (2.6.18 kernel) and Fedora-12 (2.6.32 kernel)
> Two different CPUs: Opteron 248 and Opteron 8356.
>
> same binary for OpenMPI. Same binary for user code (vasp compiled for older
> arch)
Are you sure that the co
Intel compiler 11.0.074
OpenMPI 1.4.1
Two different OSes: centos 5.4 (2.6.18 kernel) and Fedora-12 (2.6.32 kernel)
Two different CPUs: Opteron 248 and Opteron 8356.
same binary for OpenMPI. Same binary for user code (vasp compiled for older
arch)
When I supply rankfile, then depending on combo
Hai Ralph,
i've tried --nolocal flag, but doesn't works .. :(
The error is the same.
2009/2/20 Ralph Castain :
> Hi Gabriele
>
> Could be we have a problem in our LSF support - none of us have a way of
> testing it, so this is somewhat of a blind programming case for us.
>
> From the message, it l
Hi Gabriele
Could be we have a problem in our LSF support - none of us have a way
of testing it, so this is somewhat of a blind programming case for us.
From the message, it looks like there is some misunderstanding about
how many slots were allocated vs how many were mapped to a specific
Dear OpenMPi developers,
i'm running my MPI code compiled with OpenMPI 1.3 over Infiniband and
LSF scheduler. But i got the error attached. I suppose that spawning
process doesn't works well. The same program under OpenMPI 1.2.5 works
well. Could you help me?
Thanks in advance.
--
Ing. Gabriele
I don't believe so -- as I understand that ticket, it's a problem on
the trunk only, due to changes in ob1 that have occurred since the 1.2
series.
On Jul 14, 2008, at 10:04 AM, Lenny Verkhovsky wrote:
maybe it's related to #1378 PML ob1 deadlock for ping/ping ?
On 7/14/08, Jeff Squyres
maybe it's related to #1378 PML ob1 deadlock for ping/ping ?
On 7/14/08, Jeff Squyres wrote:
>
> What application is it? The majority of the message passing engine did not
> change in the 1.2 series; we did add a new option into 1.2.6 for disabling
> early completion:
>
>
> http://www.open-mpi
What application is it? The majority of the message passing engine
did not change in the 1.2 series; we did add a new option into 1.2.6
for disabling early completion:
http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion
See if that helps you out.
Note that I don
Hello Joe,
I have no solution, but the same problem, see
http://www.open-mpi.org/community/lists/users/2008/07/6007.php
There you will find a small program to demonstrate the problem.
I found that the problem does not exists on all hardware, I have the
impression that the problem manifests itse
Hi folks:
I am running into a strange problem with Open-MPI 1.2.6, built using
gcc/g++ and intel ifort 10.1.015, atop an OFED stack (1.1-ish). The
problem appears to be that if I run using the tcp btl, disabling sm and
openib, the run completes successfully (on several different platforms),
22 matches
Mail list logo