Re: [OMPI users] CUDA-aware codes not using GPU

2019-09-06 Thread Arturo Fernandez via users
<<< text/html: Unrecognized >>> ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Inquiry about pml layer

2020-04-14 Thread Arturo Fernandez via users
Hello,I'm using CUDA-aware OMPIv4.0.3 with UCX to run some apps. Most of them have worked seamlessly, but one breaks and returns the error:memtype_cache.c:299  UCX  ERROR failed to set UCM memtype event handler: Unsupported operation--

[OMPI users] High errorcode message

2021-01-29 Thread Arturo Fernandez via users
Hello, My system is running CentOS8 & OpenMPI v4.1.0. Most stuff is working fine but one app is aborting with: MPI_ABORT was invoked on rank 7 in communicator MPI_COMM_WORLD with errorcode 1734831948. The other 23 MPI ranks also abort. I'm a bit confused by the high error code. Does it mean anythi

Re: [OMPI users] High errorcode message

2021-01-30 Thread Arturo Fernandez via users
Hi Jeff. Sorry for the delay. It took a while but I was finally error to track down the point where the app breaks down. The problem seems to originate in an output subroutine, not because any MPI communication is malfunctioning. My guess is that MPI_Abort needs to produce some error message. Why

Re: [OMPI users] High errorcode message

2021-02-01 Thread Arturo Fernandez via users
The app is not calling MPI_ABORT directly. I dug a little deeper into it but didn't find anything interesting. It just doesn't find the subdirectory for output purposes (the internal error variable is 0) and simply crashes when returning from the subroutine. It was just me not setting things up pr

[OMPI users] Status of v4.1.1

2021-02-21 Thread Arturo Fernandez via users
Hello, I need to decide whether to use v4.1.0 or v4.1.1 for a project and was wondering if there is any extra information about the latter version. Is there going to be a rc2 or an ETA for the stable version? Thanks, Arturo

[OMPI users] Error Signal code: Address not mapped (1)

2021-06-21 Thread Arturo Fernandez via users
Hello, I'm getting the error message (with either v4.1.0 or v4.1.1) *** Process received signal *** Signal: Segmentation fault (11) Signal code: Address not mapped (1) Failing at address: (nil) *** End of error message *** Segmentation fault (core dumped) The AWS system is running CentOS8 but I do

[OMPI users] Cannot locate PMIx

2021-11-23 Thread Arturo Fernandez via users
Hello, This is kind of an odd issue as it had not happened earlier in many builds. The configuration (./configure --with-ofi=PATH_TO_LIBFABRIC installed from https://github.com/ofiwg/libfabric) for v4.1.1 returns: ... Miscellaneous --- CUDA support: no HWLOC support: internal L