date:20201114

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-11-14 Thread Jorge Silva via users


Hello,

In spite of the delay, I was not able to solve my problem. Thanks to 
Joseph and Prentice for their interesting suggestions.


I uninstalled AppAmor (SElinux is not installed ) as suggested by 
Prentice but there were no changes, mpirun  sttill hangs.


The result of gdb stack trace is the following:


$ sudo gdb -batch -ex "thread apply all bt" -p $(ps -C mpirun -o pid= | 
head -n 1)

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x7f9289544307 in __libc_connect (fd=9, addr=..., len=16) at 
../sysdeps/unix/sysv/linux/connect.c:26
26  ../sysdeps/unix/sysv/linux/connect.c: Aucun fichier ou dossier 
de ce type.


Thread 1 (Thread 0x7f92891f4e80 (LWP 4948)):
#0  0x7f9289544307 in __libc_connect (fd=9, addr=..., len=16) at 
../sysdeps/unix/sysv/linux/connect.c:26

#1  0x7f9288fff59d in ?? () from /lib/x86_64-linux-gnu/libxcb.so.1
#2  0x7f9288fffc49 in xcb_connect_to_display_with_auth_info () from 
/lib/x86_64-linux-gnu/libxcb.so.1
#3  0x7f928906cb7a in _XConnectXCB () from 
/lib/x86_64-linux-gnu/libX11.so.6
#4  0x7f928905d319 in XOpenDisplay () from 
/lib/x86_64-linux-gnu/libX11.so.6
#5  0x7f92897de4fb in ?? () from 
/usr/lib/x86_64-linux-gnu/hwloc/hwloc_gl.so

#6  0x7f92893b901e in ?? () from /lib/x86_64-linux-gnu/libhwloc.so.15
#7  0x7f92893c13a0 in hwloc_topology_load () from 
/lib/x86_64-linux-gnu/libhwloc.so.15
#8  0x7f92896df564 in opal_hwloc_base_get_topology () from 
/lib/x86_64-linux-gnu/libopen-pal.so.40
#9  0x7f92891da6be in ?? () from 
/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_hnp.so
#10 0x7f92897a22fc in orte_init () from 
/lib/x86_64-linux-gnu/libopen-rte.so.40
#11 0x7f92897a6c86 in orte_submit_init () from 
/lib/x86_64-linux-gnu/libopen-rte.so.40

#12 0x55819fd0b3a3 in ?? ()
#13 0x7f92894480b3 in __libc_start_main (main=0x55819fd0b1c0, 
argc=1, argv=0x7fff5334fe48, init=, fini=, 
rtld_fini=, stack_end=0x7fff5334fe38) at 
../csu/libc-start.c:308

#14 0x55819fd0b1fe in ?? ()
[Inferior 1 (process 4948) detached]


So it seems to be a problem in the connection  via libxcb (socket ?) but 
this is out of my system computer skills.. Is there any authorization 
needed?


As is libX11 at the origin of the call I tried to execute in a bare 
terminal (ctrl-alt-f2 and via ssh) but the message is the same. I tried 
to recompile/install the hole package and have the same result.


Thank you for your help.

Jorge

 Le 22/10/2020 à 12:16, Joseph Schuchart via users a écrit :

Hi Jorge,

Can you try to get a stack trace of mpirun using the following command 
in a separate terminal?


sudo gdb -batch -ex "thread apply all bt" -p $(ps -C mpirun -o pid= | 
head -n 1)


Maybe that will give some insight where mpirun is hanging.

Cheers,
Joseph

On 10/21/20 9:58 PM, Jorge SILVA via users wrote:

Hello Jeff,

The  program is not executed, seems waits for something to connect 
with (why twice ctrl-C ?)


jorge@gcp26:~/MPIRUN$ mpirun -np 1 touch /tmp/foo
^C^C

jorge@gcp26:~/MPIRUN$ ls -l /tmp/foo
ls: impossible d'accéder à '/tmp/foo': Aucun fichier ou dossier de ce 
type


no file  is created..

In fact, my question was if are there differences in mpirun usage  
between these versions..  The


mpirun -help

gives a different output as expected, but I  tried a lot of options 
without any success.



Le 21/10/2020 à 21:16, Jeff Squyres (jsquyres) a écrit :
There's huge differences between Open MPI v2.1.1 and v4.0.3 (i.e., 
years of development effort); it would be very hard to categorize 
them all; sorry!


What happens if you

    mpirun -np 1 touch /tmp/foo

(Yes, you can run non-MPI apps through mpirun)

Is /tmp/foo created?  (i.e., did the job run, and mpirun is somehow 
not terminating)




On Oct 21, 2020, at 12:22 PM, Jorge SILVA via users 
mailto:users@lists.open-mpi.org>> wrote:


Hello Gus,

 Thank you for your answer..  Unfortunately my problem is much more 
basic. I  didn't try to run the program in both computers , but 
just to run something in one computer. I just installed the new OS 
an openmpi in two different computers, in the standard way, with 
the same result.


For example:

In kubuntu20.4.1 LTS with openmpi 4.0.3-0ubuntu

jorge@gcp26:~/MPIRUN$ cat hello.f90
 print*,"Hello World!"
end
jorge@gcp26:~/MPIRUN$ mpif90 hello.f90 -o hello
jorge@gcp26:~/MPIRUN$ ./hello
 Hello World!
jorge@gcp26:~/MPIRUN$ mpirun -np 1 hello <---here  the program 
hangs with no output

^C^Cjorge@gcp26:~/MPIRUN$

The mpirun task sleeps with no output, and only twice ctrl-C ends 
the execution  :


jorge   5540  0.1  0.0 44768  8472 pts/8    S+   17:54 0:00 
mpirun -np 1 hello


In kubuntu 18.04.5 LTS with openmpi 2.1.1, of course, the same 
program gives


jorge@gcp30:~/MPIRUN$ cat hello.f90
 print*, "Hello World!"
 END
jorge@gcp30:~/MPIRUN$ mpif90 hello.f90 -o hello
jorge@gcp30:~/MPIRUN$ ./hello
 Hello World!
jorge@gcp30:~/MPIRUN$ mpirun -np 1 hello
 Hello Worl

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-11-14 Thread Jorge Silva via users

Sorry, if  I  execute mpirun in a *really *bare terminal, without X 
Server running it works! but with an error message :


Invalid MIT-MAGIC-COOKIE-1 key

So the problem is related to X, but I have still no solution

Jorge


Le 14/11/2020 à 12:33, Jorge Silva via users a écrit :

Hello,

In spite of the delay, I was not able to solve my problem. Thanks to 
Joseph and Prentice for their interesting suggestions.


I uninstalled AppAmor (SElinux is not installed ) as suggested by 
Prentice but there were no changes, mpirun  sttill hangs.


The result of gdb stack trace is the following:


$ sudo gdb -batch -ex "thread apply all bt" -p $(ps -C mpirun -o pid= 
| head -n 1)

[Thread debugging using libthread_db enabled]
Using host libthread_db library 
"/lib/x86_64-linux-gnu/libthread_db.so.1".
0x7f9289544307 in __libc_connect (fd=9, addr=..., len=16) at 
../sysdeps/unix/sysv/linux/connect.c:26
26  ../sysdeps/unix/sysv/linux/connect.c: Aucun fichier ou dossier 
de ce type.


Thread 1 (Thread 0x7f92891f4e80 (LWP 4948)):
#0  0x7f9289544307 in __libc_connect (fd=9, addr=..., len=16) at 
../sysdeps/unix/sysv/linux/connect.c:26

#1  0x7f9288fff59d in ?? () from /lib/x86_64-linux-gnu/libxcb.so.1
#2  0x7f9288fffc49 in xcb_connect_to_display_with_auth_info () 
from /lib/x86_64-linux-gnu/libxcb.so.1
#3  0x7f928906cb7a in _XConnectXCB () from 
/lib/x86_64-linux-gnu/libX11.so.6
#4  0x7f928905d319 in XOpenDisplay () from 
/lib/x86_64-linux-gnu/libX11.so.6
#5  0x7f92897de4fb in ?? () from 
/usr/lib/x86_64-linux-gnu/hwloc/hwloc_gl.so

#6  0x7f92893b901e in ?? () from /lib/x86_64-linux-gnu/libhwloc.so.15
#7  0x7f92893c13a0 in hwloc_topology_load () from 
/lib/x86_64-linux-gnu/libhwloc.so.15
#8  0x7f92896df564 in opal_hwloc_base_get_topology () from 
/lib/x86_64-linux-gnu/libopen-pal.so.40
#9  0x7f92891da6be in ?? () from 
/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_hnp.so
#10 0x7f92897a22fc in orte_init () from 
/lib/x86_64-linux-gnu/libopen-rte.so.40
#11 0x7f92897a6c86 in orte_submit_init () from 
/lib/x86_64-linux-gnu/libopen-rte.so.40

#12 0x55819fd0b3a3 in ?? ()
#13 0x7f92894480b3 in __libc_start_main (main=0x55819fd0b1c0, 
argc=1, argv=0x7fff5334fe48, init=, fini=out>, rtld_fini=, stack_end=0x7fff5334fe38) at 
../csu/libc-start.c:308

#14 0x55819fd0b1fe in ?? ()
[Inferior 1 (process 4948) detached]


So it seems to be a problem in the connection  via libxcb (socket ?) 
but this is out of my system computer skills.. Is there any 
authorization needed?


As is libX11 at the origin of the call I tried to execute in a bare 
terminal (ctrl-alt-f2 and via ssh) but the message is the same. I 
tried to recompile/install the hole package and have the same result.


Thank you for your help.

Jorge

 Le 22/10/2020 à 12:16, Joseph Schuchart via users a écrit :

Hi Jorge,

Can you try to get a stack trace of mpirun using the following 
command in a separate terminal?


sudo gdb -batch -ex "thread apply all bt" -p $(ps -C mpirun -o pid= | 
head -n 1)


Maybe that will give some insight where mpirun is hanging.

Cheers,
Joseph

On 10/21/20 9:58 PM, Jorge SILVA via users wrote:

Hello Jeff,

The  program is not executed, seems waits for something to connect 
with (why twice ctrl-C ?)


jorge@gcp26:~/MPIRUN$ mpirun -np 1 touch /tmp/foo
^C^C

jorge@gcp26:~/MPIRUN$ ls -l /tmp/foo
ls: impossible d'accéder à '/tmp/foo': Aucun fichier ou dossier de 
ce type


no file  is created..

In fact, my question was if are there differences in mpirun usage  
between these versions..  The


mpirun -help

gives a different output as expected, but I  tried a lot of options 
without any success.



Le 21/10/2020 à 21:16, Jeff Squyres (jsquyres) a écrit :
There's huge differences between Open MPI v2.1.1 and v4.0.3 (i.e., 
years of development effort); it would be very hard to categorize 
them all; sorry!


What happens if you

    mpirun -np 1 touch /tmp/foo

(Yes, you can run non-MPI apps through mpirun)

Is /tmp/foo created?  (i.e., did the job run, and mpirun is somehow 
not terminating)




On Oct 21, 2020, at 12:22 PM, Jorge SILVA via users 
mailto:users@lists.open-mpi.org>> wrote:


Hello Gus,

 Thank you for your answer..  Unfortunately my problem is much 
more basic. I  didn't try to run the program in both computers , 
but just to run something in one computer. I just installed the 
new OS an openmpi in two different computers, in the standard way, 
with the same result.


For example:

In kubuntu20.4.1 LTS with openmpi 4.0.3-0ubuntu

jorge@gcp26:~/MPIRUN$ cat hello.f90
 print*,"Hello World!"
end
jorge@gcp26:~/MPIRUN$ mpif90 hello.f90 -o hello
jorge@gcp26:~/MPIRUN$ ./hello
 Hello World!
jorge@gcp26:~/MPIRUN$ mpirun -np 1 hello <---here  the program 
hangs with no output

^C^Cjorge@gcp26:~/MPIRUN$

The mpirun task sleeps with no output, and only twice ctrl-C ends 
the execution  :


jorge   5540  0.1  0.0 44768  8472 pts/8    S+   17:54 0:00 
mpirun -np 1 hello


In k

[OMPI users] PRRTE DVM: how to tell prun to not share nodes among prun jobs?

2020-11-14 Thread Alexei Colin via users

Hi, in context of the PRRTE Distributed Virtual Machine, is there a way
to tell the task mapper inside prun to not share a node across separate
prun jobs?

For example, inside a resource allocation from Cobalt/ALPS: 2 nodes with
64 cores each:

prte --daemonize
prun ... &
...
prun ... &
pterm

Scenario A:

$ prun --map-by ppr:64:node -n 64 ./mpitest &
$ prun --map-by ppr:64:node -n 64 ./mpitest &

MPI World size = 64 processes
Hello World from rank 0 running on nid03834 (hostname nid03834)!
...
Hello World from rank 63 running on nid03834 (hostname nid03834)!

MPI World size = 64 processes
Hello World from rank 0 running on nid03835 (hostname nid03835)!
...
Hello World from rank 63 running on nid03835 (hostname nid03835)!

Scenario B:

$ prun --map-by ppr:64:node -n 1 ./mpitest &
$ prun --map-by ppr:64:node -n 1 ./mpitest &

MPI World size = 1 processes
Hello World from rank 0 running on nid03834 (hostname nid03834)!

MPI World size = 1 processes
Hello World from rank 0 running on nid03834 (hostname nid03834)!

The question is: in Scneario B, how to tell prun that node nid03834
should not be used for the second prun job, because this node is already
(partially) occupied by a different prun instance job?

Scenario A implies that the DVM already tracks occupancy, so the
question is just how to tell the mapper to treat a free core on a free
node differently from a free core on a partially occupied node. The
--map-by :NOOVERSUBSCRIBE does not look like the answer since there's
no oversubscription of cores, right? Would need something like --map-by
:exclusive:node? If not supported, how hard would it be for me to patch?

Potential workarounds I can think of is to fill the unoccupied cores on
partially occupied nodes with dummy jobs with --host pointing to the
partially occupied nodes and a -n count matching the number of
unoccupied cores, but is this even doable? also requires dumping the
mapping from each prun which I am unable to achive with --map-by
:DISPLAY (works with mpirun but not with prun).

Or, run a Flux instance [1] instead of the PRRTE DVM on the resource
allocation, which seems similar but features a scheduler with a queue (a
feature proposed for the PRRTE DVM on the list earlier [1]). I am
guessing that Flux has the flexibility to this exclusive node mapping,
but not sure.

The DVM is proving to be very useful to deal with restrictions on
minimum nodecount per job on some HPC clusters, by batching many small
jobs into one job. A queue would be even more useful, but even without a
queue it is still useful for batching sets of jobs which are known to
fit on an allocation in parallel (i.e. without having to wait at all).

[1] https://flux-framework.readthedocs.io/en/latest/quickstart.html
[2] https://www.mail-archive.com/users@lists.open-mpi.org/msg30692.html

OpenMPI: commit 7a922c8774b184ecb3aa1cd06720390bd9200b50
Fri Nov 6 08:48:29 2020 -0800
PRRTE: commit 37dd45c4d9fe973df1000f1a1421c2718fd80050
Fri Nov 6 12:45:38 2020 -0600

Thank you.

Re: [OMPI users] PRRTE DVM: how to tell prun to not share nodes among prun jobs?

2020-11-14 Thread Ralph Castain via users

IIRC, the correct syntax is:

prun -host +e ...

This tells PRRTE that you want empty nodes for this application. You can even 
specify how many empty nodes you want:

prun -host +e:2 ...

I haven't tested that in a bit, so please let us know if it works or not so we 
can fix it if necessary.

As for the queue - we do plan to add a queue to PRRTE in 1st quarter next year. 
Wasn't really thinking of a true scheduler - just a FIFO queue for now.


> On Nov 14, 2020, at 11:52 AM, Alexei Colin via users 
>  wrote:
> 
> Hi, in context of the PRRTE Distributed Virtual Machine, is there a way
> to tell the task mapper inside prun to not share a node across separate
> prun jobs?
> 
> For example, inside a resource allocation from Cobalt/ALPS: 2 nodes with
> 64 cores each:
> 
> prte --daemonize
> prun ... &
> ...
> prun ... &
> pterm
> 
> Scenario A:
> 
> $ prun --map-by ppr:64:node -n 64 ./mpitest &
> $ prun --map-by ppr:64:node -n 64 ./mpitest &
> 
>   MPI World size = 64 processes
>   Hello World from rank 0 running on nid03834 (hostname nid03834)!
>   ...
>   Hello World from rank 63 running on nid03834 (hostname nid03834)!
> 
>   MPI World size = 64 processes
>   Hello World from rank 0 running on nid03835 (hostname nid03835)!
>   ...
>   Hello World from rank 63 running on nid03835 (hostname nid03835)!
> 
> Scenario B:
> 
> $ prun --map-by ppr:64:node -n 1 ./mpitest &
> $ prun --map-by ppr:64:node -n 1 ./mpitest &
> 
>   MPI World size = 1 processes
>   Hello World from rank 0 running on nid03834 (hostname nid03834)!
> 
>   MPI World size = 1 processes
>   Hello World from rank 0 running on nid03834 (hostname nid03834)!
> 
> The question is: in Scneario B, how to tell prun that node nid03834
> should not be used for the second prun job, because this node is already
> (partially) occupied by a different prun instance job?
> 
> Scenario A implies that the DVM already tracks occupancy, so the
> question is just how to tell the mapper to treat a free core on a free
> node differently from a free core on a partially occupied node. The
> --map-by :NOOVERSUBSCRIBE does not look like the answer since there's
> no oversubscription of cores, right? Would need something like --map-by
> :exclusive:node? If not supported, how hard would it be for me to patch?
> 
> Potential workarounds I can think of is to fill the unoccupied cores on
> partially occupied nodes with dummy jobs with --host pointing to the
> partially occupied nodes and a -n count matching the number of
> unoccupied cores, but is this even doable? also requires dumping the
> mapping from each prun which I am unable to achive with --map-by
> :DISPLAY (works with mpirun but not with prun).
> 
> Or, run a Flux instance [1] instead of the PRRTE DVM on the resource
> allocation, which seems similar but features a scheduler with a queue (a
> feature proposed for the PRRTE DVM on the list earlier [1]). I am
> guessing that Flux has the flexibility to this exclusive node mapping,
> but not sure.
> 
> The DVM is proving to be very useful to deal with restrictions on
> minimum nodecount per job on some HPC clusters, by batching many small
> jobs into one job. A queue would be even more useful, but even without a
> queue it is still useful for batching sets of jobs which are known to
> fit on an allocation in parallel (i.e. without having to wait at all).
> 
> [1] https://flux-framework.readthedocs.io/en/latest/quickstart.html
> [2] https://www.mail-archive.com/users@lists.open-mpi.org/msg30692.html
> 
> OpenMPI: commit 7a922c8774b184ecb3aa1cd06720390bd9200b50
> Fri Nov 6 08:48:29 2020 -0800
> PRRTE: commit 37dd45c4d9fe973df1000f1a1421c2718fd80050
> Fri Nov 6 12:45:38 2020 -0600
> 
> Thank you.

Re: [OMPI users] PRRTE DVM: how to tell prun to not share nodes among prun jobs?

2020-11-14 Thread Alexei Colin via users

On Sat, Nov 14, 2020 at 08:07:47PM +, Ralph Castain via users wrote:
> IIRC, the correct syntax is:
> 
> prun -host +e ...
> 
> This tells PRRTE that you want empty nodes for this application. You can even 
> specify how many empty nodes you want:
> 
> prun -host +e:2 ...
> 
> I haven't tested that in a bit, so please let us know if it works or not so 
> we can fix it if necessary.

Works! Thank you.

$ prun --map-by ppr:64:node --host +e -n 1  ./mpitest &
$ prun --map-by ppr:64:node --host +e -n 1  ./mpitest &

MPI World size = 1 processes
Hello World from rank 0 running on nid03835 (hostname nid03835)!

MPI World size = 1 processes
Hello World from rank 0 running on nid03834 (hostname nid03834)!

Should I PR a patch to prun manpage to change this:

   -H, -host, --host 
 List of hosts on which to invoke processes.

to something like this?:

   -H, -host, --host 
 List of hosts on which to invoke processes. Pass
 +e to allocate only onto empty nodes (i.e. none of
 whose cores have been allocated to other prun jobs) or
 +e:N to allocate to nodes at least N of which are empty.

Re: [OMPI users] PRRTE DVM: how to tell prun to not share nodes among prun jobs?

2020-11-14 Thread Ralph Castain via users

That would be very kind of you and most welcome!

> On Nov 14, 2020, at 12:38 PM, Alexei Colin  wrote:
> 
> On Sat, Nov 14, 2020 at 08:07:47PM +, Ralph Castain via users wrote:
>> IIRC, the correct syntax is:
>> 
>> prun -host +e ...
>> 
>> This tells PRRTE that you want empty nodes for this application. You can 
>> even specify how many empty nodes you want:
>> 
>> prun -host +e:2 ...
>> 
>> I haven't tested that in a bit, so please let us know if it works or not so 
>> we can fix it if necessary.
> 
> Works! Thank you.
> 
> $ prun --map-by ppr:64:node --host +e -n 1  ./mpitest &
> $ prun --map-by ppr:64:node --host +e -n 1  ./mpitest &
> 
>   MPI World size = 1 processes
>   Hello World from rank 0 running on nid03835 (hostname nid03835)!
> 
>   MPI World size = 1 processes
>   Hello World from rank 0 running on nid03834 (hostname nid03834)!
> 
> Should I PR a patch to prun manpage to change this:
> 
>   -H, -host, --host 
> List of hosts on which to invoke processes.
> 
> to something like this?:
> 
>   -H, -host, --host 
> List of hosts on which to invoke processes. Pass
>+e to allocate only onto empty nodes (i.e. none of
>whose cores have been allocated to other prun jobs) or
>+e:N to allocate to nodes at least N of which are empty.

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-11-14 Thread Brice Goglin via users

Hello

The hwloc/X11 stuff is caused by OpenMPI using a hwloc that was built
with the GL backend enabled (in your case, it's because package
libhwloc-plugins is installed). That backend is used for querying the
locality of X11 displays running on NVIDIA GPUs (using libxnvctrl). Does
running "lstopo" fail/hang too? (it will basically run hwloc without
OpenMPI).

One workaround should be to set HWLOC_COMPONENTS=-gl in your environment
so that this backend is ignored. Recent hwloc releases have a way to
avoid some plugins at runtime through the C interface, we should likely
blacklist all plugins that are already blacklisted at compile time when
OMPI builds its own hwloc.

Brice



Le 14/11/2020 à 12:33, Jorge Silva via users a écrit :
> Hello,
>
> In spite of the delay, I was not able to solve my problem. Thanks to
> Joseph and Prentice for their interesting suggestions.
>
> I uninstalled AppAmor (SElinux is not installed ) as suggested by
> Prentice but there were no changes, mpirun  sttill hangs.
>
> The result of gdb stack trace is the following:
>
>
> $ sudo gdb -batch -ex "thread apply all bt" -p $(ps -C mpirun -o pid=
> | head -n 1)
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library
> "/lib/x86_64-linux-gnu/libthread_db.so.1".
> 0x7f9289544307 in __libc_connect (fd=9, addr=..., len=16) at
> ../sysdeps/unix/sysv/linux/connect.c:26
> 26  ../sysdeps/unix/sysv/linux/connect.c: Aucun fichier ou dossier
> de ce type.
>
> Thread 1 (Thread 0x7f92891f4e80 (LWP 4948)):
> #0  0x7f9289544307 in __libc_connect (fd=9, addr=..., len=16) at
> ../sysdeps/unix/sysv/linux/connect.c:26
> #1  0x7f9288fff59d in ?? () from /lib/x86_64-linux-gnu/libxcb.so.1
> #2  0x7f9288fffc49 in xcb_connect_to_display_with_auth_info ()
> from /lib/x86_64-linux-gnu/libxcb.so.1
> #3  0x7f928906cb7a in _XConnectXCB () from
> /lib/x86_64-linux-gnu/libX11.so.6
> #4  0x7f928905d319 in XOpenDisplay () from
> /lib/x86_64-linux-gnu/libX11.so.6
> #5  0x7f92897de4fb in ?? () from
> /usr/lib/x86_64-linux-gnu/hwloc/hwloc_gl.so
> #6  0x7f92893b901e in ?? () from /lib/x86_64-linux-gnu/libhwloc.so.15
> #7  0x7f92893c13a0 in hwloc_topology_load () from
> /lib/x86_64-linux-gnu/libhwloc.so.15
> #8  0x7f92896df564 in opal_hwloc_base_get_topology () from
> /lib/x86_64-linux-gnu/libopen-pal.so.40
> #9  0x7f92891da6be in ?? () from
> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_hnp.so
> #10 0x7f92897a22fc in orte_init () from
> /lib/x86_64-linux-gnu/libopen-rte.so.40
> #11 0x7f92897a6c86 in orte_submit_init () from
> /lib/x86_64-linux-gnu/libopen-rte.so.40
> #12 0x55819fd0b3a3 in ?? ()
> #13 0x7f92894480b3 in __libc_start_main (main=0x55819fd0b1c0,
> argc=1, argv=0x7fff5334fe48, init=, fini= out>, rtld_fini=, stack_end=0x7fff5334fe38) at
> ../csu/libc-start.c:308
> #14 0x55819fd0b1fe in ?? ()
> [Inferior 1 (process 4948) detached]
>
>
> So it seems to be a problem in the connection  via libxcb (socket ?)
> but this is out of my system computer skills.. Is there any
> authorization needed?
>
> As is libX11 at the origin of the call I tried to execute in a bare
> terminal (ctrl-alt-f2 and via ssh) but the message is the same. I
> tried to recompile/install the hole package and have the same result.
>
> Thank you for your help.
>
> Jorge
>
>  Le 22/10/2020 à 12:16, Joseph Schuchart via users a écrit :
>> Hi Jorge,
>>
>> Can you try to get a stack trace of mpirun using the following
>> command in a separate terminal?
>>
>> sudo gdb -batch -ex "thread apply all bt" -p $(ps -C mpirun -o pid= |
>> head -n 1)
>>
>> Maybe that will give some insight where mpirun is hanging.
>>
>> Cheers,
>> Joseph
>>
>> On 10/21/20 9:58 PM, Jorge SILVA via users wrote:
>>> Hello Jeff,
>>>
>>> The  program is not executed, seems waits for something to connect
>>> with (why twice ctrl-C ?)
>>>
>>> jorge@gcp26:~/MPIRUN$ mpirun -np 1 touch /tmp/foo
>>> ^C^C
>>>
>>> jorge@gcp26:~/MPIRUN$ ls -l /tmp/foo
>>> ls: impossible d'accéder à '/tmp/foo': Aucun fichier ou dossier de
>>> ce type
>>>
>>> no file  is created..
>>>
>>> In fact, my question was if are there differences in mpirun usage 
>>> between these versions..  The
>>>
>>> mpirun -help
>>>
>>> gives a different output as expected, but I  tried a lot of options
>>> without any success.
>>>
>>>
>>> Le 21/10/2020 à 21:16, Jeff Squyres (jsquyres) a écrit :
 There's huge differences between Open MPI v2.1.1 and v4.0.3 (i.e.,
 years of development effort); it would be very hard to categorize
 them all; sorry!

 What happens if you

     mpirun -np 1 touch /tmp/foo

 (Yes, you can run non-MPI apps through mpirun)

 Is /tmp/foo created?  (i.e., did the job run, and mpirun is somehow
 not terminating)



> On Oct 21, 2020, at 12:22 PM, Jorge SILVA via users
> mailto:users@lists.open-mpi.org>> wrote:
>
> Hello Gus,
>
>  Thank you for your answer..  Unfortunatel

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

[OMPI users] PRRTE DVM: how to tell prun to not share nodes among prun jobs?

Re: [OMPI users] PRRTE DVM: how to tell prun to not share nodes among prun jobs?

Re: [OMPI users] PRRTE DVM: how to tell prun to not share nodes among prun jobs?

Re: [OMPI users] PRRTE DVM: how to tell prun to not share nodes among prun jobs?

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

7 matches

Site Navigation

Mail list logo

Footer information