[OMPI users] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file util/show_help.c at line 501; error in device init Mesh created.

2023-05-19 Thread Rob Kudyba via users
RHEL 8 with OpenMPI 4.1.5a1 on a HPC cluster compute node Singularity version 3.7.1. I see the error in another issue mentioned at the Git page an on SO

[OMPI users] ORTE_ERROR_LOG: Out of resource for openmpi-master-201903260242-dfbc144

2019-03-26 Thread Siegmar Gross
Hi, I've installed openmpi-master-201903260242-dfbc144 on my "SUSE Linux Enterprise Server 12.3 with gcc-7.4.0, icc-19.0.3.199, and pgcc-18.4.0. Unfortunately, I get the following error for some of my small programs for all three compilers. loki hello_2 103 mpiexec -np 1 --host loki hello_2_mpi

[OMPI users] ORTE_ERROR_LOG: Pack data mismatch for openmpi-v4.0.x and openmpi-master

2018-11-13 Thread Siegmar Gross
Hi, I've installed openmpi-v4.0.x-20180241-725f625 and openmpi-master-201811100305-3dc1629 on my "SUSE Linux Enterprise Server 12.3 (x86_64)" with Sun C 5.15 (Oracle Developer Studio 12.6), gcc-6.4.0, icc-19.x, and pgcc-18.4. Unfortunately, I still get the following error for all compilers fo

[OMPI users] ORTE_ERROR_LOG: Pack data mismatch in openmpi-master-201810190352-8db5aaa

2018-10-22 Thread Siegmar Gross
Hi, I've installed openmpi-master-201810190352-8db5aaa on my "SUSE Linux Enterprise Server 12.3 (x86_64)" with Sun C 5.15 (Oracle Developer Studio 12.6), gcc-6.4.0, icc-19.x, and pgcc-18.4 Unfortunately, I get the following error for gcc, icc, and pgcc. I'm still unable to build a version with Su

[OMPI users] ORTE_ERROR_LOG: Data unpack would read past end of buffer with openmpi-v4.0.x-201810190241-6c18cb1

2018-10-22 Thread Siegmar Gross
Hi, I've installed openmpi-v4.0.x-201810190241-6c18cb1 on my "SUSE Linux Enterprise Server 12.3 (x86_64)" with Sun C 5.15 (Oracle Developer Studio 12.6), gcc-6.4.0, icc-19.x, and pgcc-18.4 Unfortunately, I get the following error for all four compilers. loki config_files 160 head -7 /export2/s

Re: [OMPI users] ORTE_ERROR_LOG

2013-10-25 Thread Ralph Castain
hat is still wrong? > > > > > From: r...@open-mpi.org > Date: Fri, 25 Oct 2013 02:13:58 -0700 > To: us...@open-mpi.org > Subject: Re: [OMPI users] ORTE_ERROR_LOG > > I see two "mpirun" cmds on that cmd line - is that a copy/paste error or did > yo

Re: [OMPI users] ORTE_ERROR_LOG

2013-10-25 Thread Tommi Laiho
wrong? From: r...@open-mpi.org List-Post: users@lists.open-mpi.org Date: Fri, 25 Oct 2013 02:13:58 -0700 To: us...@open-mpi.org Subject: Re: [OMPI users] ORTE_ERROR_LOG I see two "mpirun" cmds on that cmd line - is that a copy/paste error or did you really put two of them on one l

Re: [OMPI users] ORTE_ERROR_LOG

2013-10-25 Thread Ralph Castain
I see two "mpirun" cmds on that cmd line - is that a copy/paste error or did you really put two of them on one line? On Oct 24, 2013, at 10:27 PM, Tommi Laiho wrote: > Hi > > I am trying to setup a simple two machines home cluster. Later I may increase > the number to 4 machines. > > I hav

[OMPI users] ORTE_ERROR_LOG

2013-10-25 Thread Tommi Laiho
Hi I am trying to setup a simple two machines home cluster. Later I may increase the number to 4 machines. I have a bridged modem so each of my machines has its own ip. My target is now to calculate a simple OpenFOAM tutorial with two computers and together with 14 cores. However when I ru

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-29 Thread Hugh Dickinson
The remote node starts the following process when mpirun is executed on the local node: 25734 ?Ss 0:00 /usr/lib/openmpi/1.2.5-gcc/bin/orted -- bootproxy 1 -- I checked and it was not running before mpirun was executed. I'll look into installing a more recent version of Open MPI.

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Ralph Castain
Best I can tell, the remote orted never got executed - it looks to me like there is something that blocks the ssh from working. Can you get into another window and ssh to the remote node? If so, can you do a ps and verify that the orted is actually running there? mpirun is using the same sh

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Hugh Dickinson
As far as I can tell, both the PATH and LD_LIBRARY_PATH are set correctly. I've tried with the full path to the mpirun executable and using the --prefix command line option. Neither works. The debug output seems to contain a lot of system specific information (IPs, usernames and such), which I'm a

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Ralph Castain
Okay, that's one small step forward. You can lock that in by setting the appropriate MCA parameter in one of two ways: 1. add the following to your default mca parameter file: btl = tcp,sm,self (I added the shared memory subsystem as this will help with performance). You can see how to do

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Hugh Dickinson
Hi, Yes I'm using ethernet connections. Doing as you suggest removes the errors generated by running the small test program, but still doesn't allow programs (including the small test program) to execute on any node other than the one launching mpirun. If I try to do that, the command han

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Ralph Castain
In this instance, OMPI is complaining that you are attempting to use Infiniband, but no suitable devices are found. I assume you have Ethernet between your nodes? Can you run this with the following added to your mpirun cmd line: -mca btl tcp,self That will cause OMPI to ignore the Infiniband su

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Hugh Dickinson
Many thanks for your help nonetheless. Hugh On 28 Apr 2009, at 17:23, jody wrote: Hi Hugh I'm sorry, but i must admit that i have never encountered these messages, and i don't know what their cause exactly is. Perhaps one of the developers can give an explanation? Jody On Tue, Apr 28, 2

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread jody
Hi Hugh I'm sorry, but i must admit that i have never encountered these messages, and i don't know what their cause exactly is. Perhaps one of the developers can give an explanation? Jody On Tue, Apr 28, 2009 at 5:52 PM, Hugh Dickinson wrote: > Hi again, > > I tried a simple mpi c++ program: >

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Hugh Dickinson
Hi again, I tried a simple mpi c++ program: -- #include #include using namespace MPI; using namespace std; int main(int argc, char* argv[]) { int rank,size; Init(argc,argv); rank=COMM_WORLD.Get_rank(); size=COMM_WORLD.Get_size(); cout << "P:" << rank << " out of " << size << endl;

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Hugh Dickinson
Hi Jody, I can paswordlessly ssh between all nodes (to and from) Almost none of these mpirun commands work. The only working case is if nodenameX is the node from which you are running the command. I don't know if this gives you extra diagnostic information, but if I explicitly set the wron

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread jody
Hi Hugh You're right, there is no initialization command (like lamboot) you have to call. I don't really know why your sewtup doesn't work, so i'm making some more "blind shots" can you do passwordless ssh from between any two of your nodes? does mpirun -np 1 --host nodenameX uptime work for e

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Hugh Dickinson
Hi Jody, The node names are exactly the same. I wanted to avoid updating the version because I'm not the system administrator, and it could take some time before it gets done. If it's likely to fix the problem though I'll try it. I'm assuming that I don't have to do something analogous to

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread jody
Hi Hugh Again, just to make sure, are the hostnames in your host file well-known? I.e. when you say you can do ssh nodename uptime do you use exactly the same nodename in your host file? (I'm trying to eliminate all non-Open-MPI error sources, because with your setup it should basically work.)

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Hugh Dickinson
Hi Jody,Indeed, all the nodes are running the same version of Open MPI. Perhaps I was incorrect to describe the cluster as heterogeneous. In fact, all the nodes run the same operating system (Scientific Linux 5.2), it's only the hardware that's different and even then they're all i386 or i686. I'm

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread jody
Hi Hugh Just to make sure: You have installed Open-MPI on all your nodes? Same version everywhere? Jody On Tue, Apr 28, 2009 at 12:57 PM, Hugh Dickinson wrote: > Hi all, > > First of all let me make it perfectly clear that I'm a complete beginner as > far as MPI is concerned, so this may well

[OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Hugh Dickinson
Hi all, First of all let me make it perfectly clear that I'm a complete beginner as far as MPI is concerned, so this may well be a trivial problem! I've tried to set up Open MPI to use SSH to communicate between nodes on a heterogeneous cluster. I've set up passwordless SSH and it seems

Re: [OMPI users] ORTE_ERROR_LOG

2009-01-16 Thread Jeff Squyres
Please send all the information here: http://www.open-mpi.org/community/help/ This kind of error can mean that you are inadvertently using mismatched versions of Open MPI across your nodes. On Jan 16, 2009, at 3:50 AM, Bernard Secher - SFME/LGLS wrote: Hello, I have the following err

[OMPI users] ORTE_ERROR_LOG

2009-01-16 Thread Bernard Secher - SFME/LGLS
Hello, I have the following error at the beginning of my mpi code: [is124684:07869] [[38040,0],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file orted/orted_comm.c at line 448 Anybody can help me to solve this pb? Bernard

Re: [OMPI users] ORTE_ERROR_LOG timeout

2008-07-08 Thread Ralph H Castain
Several thins are going on here. First, this error message: > mpirun noticed that job rank 1 with PID 9658 on node mac1 exited on signal > 6 (Aborted). > 2 additional processes aborted (not shown) indicates that your application procs are aborting for some reason. The system is then attempting to

[OMPI users] ORTE_ERROR_LOG timeout

2008-07-08 Thread Alastair Basden
Hi, I've got some code that uses openmpi, and sometimes, it crashes, after printing somthing like: [mac1:09654] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275 [mac1:09654] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1166 [mac1:09654] [0,0,0

Re: [OMPI users] ORTE_ERROR_LOG Timeout

2008-06-04 Thread Jeff Squyres
James -- Sorry for the delay in replying. Do you have any firewall software running on your nodes (e.g., iptables)? OMPI uses random TCP ports to connect between nodes for control messages. If they can't reach each other because TCP ports are blocked, Bad Things will happen (potentially

[OMPI users] ORTE_ERROR_LOG Timeout

2008-05-20 Thread Rudd, James
I have been trying to compile a molecular dynamics program with the Openmpi 1.2.5 included in OFED 1.3. I am running Fedora Core 6; the output of uname -r is 2.6.18-1.2798.fc6. I've traced the problems I've been having back to openmpi because I'm unable to run the test programs such as glob on

Re: [OMPI users] ORTE_ERROR_LOG: Data unpack had inadequate space in file gpr_replica_cmd_processor.c at line 361

2007-12-14 Thread Ralph H Castain
er > Moffett Field, CA 94035-1000 > > Fax: 415-604-3957 > > > If I try to use multiple nodes, I got the error messages: > ORTE_ERROR_LOG: Data unpack had inadequate space in file dss/dss_unpack.c at > line 90 > ORTE_ERROR_LOG: Data unpack had inadequate space in file > gpr_replica

Re: [OMPI users] ORTE_ERROR_LOG: Data unpack had inadequate space in file gpr_replica_cmd_processor.c at line 361

2007-12-14 Thread Ralph H Castain
Hi Qiang This error message usually indicates that you have more than one Open MPI installation around, and that the backend nodes are picking up a different version than mpirun is using. Check to make sure that you have a consistent version across all the nodes. I also noted you were building wi

[OMPI users] ORTE_ERROR_LOG: Data unpack had inadequate space in file gpr_replica_cmd_processor.c at line 361

2007-12-13 Thread Qiang Xu
I installed OpenMPI-1.2.4 on our cluster. Here is the compute node infor [qiang@compute-0-1 ~]$ uname -a Linux compute-0-1.local 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 00:17:26 CDT 2006 i686 i686 i386 GNU/Linux [qiang@compute-0-1 bin]$ gcc -v Reading specs from /usr/lib/gcc/i386-redhat-linux/3.4.6/

Re: [OMPI users] Orte_error_log w/ ompi 1.1.1 and torque 2.1.2

2006-10-07 Thread Jeff Squyres
Followups on this show that this was caused by accidentally running on a one node Torque allocation and using the "-nolocal" option to mpirun. So Open MPI is doing what it should do (refusing to run), but being less than helpful about its error message. I'll file a feature enhancement to see if w

[OMPI users] Orte_error_log w/ ompi 1.1.1 and torque 2.1.2

2006-10-06 Thread Maestas, Christopher Daniel
Has anyone ever seen this? --- [dn32:07156] [0,0,0] ORTE_ERROR_LOG: Temporarily out of resource in file base/rmaps_base_node.c at line 153 [dn32:07156] [0,0,0] ORTE_ERROR_LOG: Temporarily out of resource in file rmaps_rr.c at line 270 [dn32:07156] [0,0,0] ORTE_ERROR_LOG: Temporarily out of resource