Do we know if this was definitely fixed in v4.1.x?

> On Feb 4, 2021, at 7:46 AM, Gilles Gouaillardet via users 
> <users@lists.open-mpi.org> wrote:
> 
> Martin,
> 
> this is a connectivity issue reported by the btl/tcp component.
> 
> You can try restricting the IP interface to a subnet known to work
> (and with no firewall) between both hosts
> 
> mpirun --mca btl_tcp_if_include 192.168.0.0/24 ...
> 
> If the error persists, you can
> 
> mpirun --mca btl_tcp_base_verbose 20 ...
> 
> and then compress and post the logs so we can have a look
> 
> 
> Cheers,
> 
> Gilles
> 
> On Thu, Feb 4, 2021 at 9:33 PM Martín Morales via users
> <users@lists.open-mpi.org> wrote:
>> 
>> Hi Marcos,
>> 
>> 
>> 
>> Yes, I have a problem with spawning to a “worker” host (on localhost, 
>> works). There are just two machines: “master” and “worker”.  I’m using 
>> Windows 10 in both with same Cygwin and packages. Pasted below some details.
>> 
>> Thanks for your help. Regards,
>> 
>> 
>> 
>> Martín
>> 
>> 
>> 
>> ----
>> 
>> 
>> 
>> Running:
>> 
>> 
>> 
>> mpirun -np 1 -hostfile ./hostfile ./spawner.exe 8
>> 
>> 
>> 
>> hostfile:
>> 
>> 
>> 
>> master slots=5
>> 
>> worker slots=5
>> 
>> 
>> 
>> Error:
>> 
>> 
>> 
>> At least one pair of MPI processes are unable to reach each other for
>> 
>> MPI communications.  This means that no Open MPI device has indicated
>> 
>> that it can be used to communicate between these processes.  This is
>> 
>> an error; Open MPI requires that all MPI processes be able to reach
>> 
>> each other.  This error can sometimes be the result of forgetting to
>> 
>> specify the "self" BTL.
>> 
>> 
>> 
>> Process 1 ([[31598,1],0]) is on host: DESKTOP-C0G4680
>> 
>> Process 2 ([[31598,2],2]) is on host: worker
>> 
>> BTLs attempted: self tcp
>> 
>> 
>> 
>> Your MPI job is now going to abort; sorry.
>> 
>> --------------------------------------------------------------------------
>> 
>> [DESKTOP-C0G4680:02828] [[31598,1],0] ORTE_ERROR_LOG: Unreachable in file 
>> /pub/devel/openmpi/v4.0/openmpi-4.0.5-1.x86_64/src/openmpi-4.0.5/ompi/dpm/dpm.c
>>  at line 493
>> 
>> [DESKTOP-C0G4680:02828] *** An error occurred in MPI_Comm_spawn
>> 
>> [DESKTOP-C0G4680:02828] *** reported by process [2070806529,0]
>> 
>> [DESKTOP-C0G4680:02828] *** on communicator MPI_COMM_SELF
>> 
>> [DESKTOP-C0G4680:02828] *** MPI_ERR_INTERN: internal error
>> 
>> [DESKTOP-C0G4680:02828] *** MPI_ERRORS_ARE_FATAL (processes in this 
>> communicator will now abort,
>> 
>> [DESKTOP-C0G4680:02828] ***    and potentially your MPI job)
>> 
>> 
>> 
>> USER_SSH@DESKTOP-C0G4680 ~
>> 
>> $ [WinDev2012Eval:00120] [[31598,2],2] ORTE_ERROR_LOG: Unreachable in file 
>> /pub/devel/openmpi/v4.0/openmpi-4.0.5-1.x86_64/src/openmpi-4.0.5/ompi/dpm/dpm.c
>>  at line 493
>> 
>> [WinDev2012Eval:00121] [[31598,2],3] ORTE_ERROR_LOG: Unreachable in file 
>> /pub/devel/openmpi/v4.0/openmpi-4.0.5-1.x86_64/src/openmpi-4.0.5/ompi/dpm/dpm.c
>>  at line 493
>> 
>> --------------------------------------------------------------------------
>> 
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> 
>> likely to abort.  There are many reasons that a parallel process can
>> 
>> fail during MPI_INIT; some of which are due to configuration or environment
>> 
>> problems.  This failure appears to be an internal failure; here's some
>> 
>> additional information (which may only be relevant to an Open MPI
>> 
>> developer):
>> 
>> 
>> 
>> ompi_dpm_dyn_init() failed
>> 
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> 
>> --------------------------------------------------------------------------
>> 
>> [WinDev2012Eval:00121] *** An error occurred in MPI_Init
>> 
>> [WinDev2012Eval:00121] *** reported by process 
>> [15289389101093879810,12884901891]
>> 
>> [WinDev2012Eval:00121] *** on a NULL communicator
>> 
>> [WinDev2012Eval:00121] *** Unknown error
>> 
>> [WinDev2012Eval:00121] *** MPI_ERRORS_ARE_FATAL (processes in this 
>> communicator will now abort,
>> 
>> [WinDev2012Eval:00121] ***    and potentially your MPI job)
>> 
>> [DESKTOP-C0G4680:02831] 2 more processes have sent help message 
>> help-mca-bml-r2.txt / unreachable proc
>> 
>> [DESKTOP-C0G4680:02831] Set MCA parameter "orte_base_help_aggregate" to 0 to 
>> see all help / error messages
>> 
>> [DESKTOP-C0G4680:02831] 1 more process has sent help message 
>> help-mpi-runtime.txt / mpi_init:startup:internal-failure
>> 
>> [DESKTOP-C0G4680:02831] 1 more process has sent help message 
>> help-mpi-errors.txt / mpi_errors_are_fatal unknown handle
>> 
>> 
>> 
>> Script spawner:
>> 
>> 
>> 
>> #include "mpi.h"
>> 
>> #include <stdio.h>
>> 
>> #include <stdlib.h>
>> 
>> #include <unistd.h>
>> 
>> 
>> 
>> int main(int argc, char ** argv){
>> 
>>    int processesToRun;
>> 
>>    MPI_Comm intercomm;
>> 
>>    MPI_Info info;
>> 
>> 
>> 
>>           if(argc < 2 ){
>> 
>>                      printf("Processes number needed!\n");
>> 
>>                      return 0;
>> 
>>           }
>> 
>>           processesToRun = atoi(argv[1]);
>> 
>>    MPI_Init( NULL, NULL );
>> 
>>           printf("Spawning from parent:...\n");
>> 
>>           MPI_Comm_spawn( "./spawned.exe", MPI_ARGV_NULL, processesToRun, 
>> MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm, MPI_ERRCODES_IGNORE);
>> 
>> 
>> 
>>    MPI_Finalize();
>> 
>>    return 0;
>> 
>> }
>> 
>> 
>> 
>> Script spawned:
>> 
>> 
>> 
>> #include "mpi.h"
>> 
>> #include <stdio.h>
>> 
>> #include <stdlib.h>
>> 
>> 
>> 
>> int main(int argc, char ** argv){
>> 
>>    int hostName_len,rank, size;
>> 
>>    MPI_Comm parentcomm;
>> 
>>    char hostName[200];
>> 
>> 
>> 
>>    MPI_Init( NULL, NULL );
>> 
>>    MPI_Comm_get_parent( &parentcomm );
>> 
>>    MPI_Get_processor_name(hostName, &hostName_len);
>> 
>>    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>> 
>>    MPI_Comm_size(MPI_COMM_WORLD, &size);
>> 
>> 
>> 
>>    if (parentcomm != MPI_COMM_NULL) {
>> 
>> printf("I'm the spawned h: %s  r/s: %i/%i\n", hostName, rank, size );
>> 
>>    }
>> 
>> 
>> 
>>    MPI_Finalize();
>> 
>>    return 0;
>> 
>> }
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> From: Marco Atzeri via users
>> Sent: miércoles, 3 de febrero de 2021 17:58
>> To: users@lists.open-mpi.org
>> Cc: Marco Atzeri
>> Subject: Re: [OMPI users] OMPI 4.1 in Cygwin packages?
>> 
>> 
>> 
>> On 03.02.2021 21:35, Martín Morales via users wrote:
>>> Hello,
>>> 
>>> I would like to know if any OMPI 4.1.* is going to be available in the
>>> Cygwin packages.
>>> 
>>> Thanks and regards,
>>> 
>>> Martín
>>> 
>> 
>> Hi Martin,
>> anything in it that is abolutely needed short term ?
>> 
>> Any problem with current 4.0.5 package ?
>> 
>> 
>> Usually it is very time consuming the build
>> and I am busy with other cygwin stuff
>> 
>> Regards
>> Marco
>> 
>> 


-- 
Jeff Squyres
jsquy...@cisco.com

Reply via email to