Re: [OMPI users] When is it save to free the buffer after MPI_Isend?

2019-08-11 Thread Jeff Hammond via users
The snippets suggest you were storing a reference to an object on the
stack. Stack variables go out of scope when the function returns. Using a
reference to them out-of-scope is illegal but often fails
nondeterministically. Good compilers will issue a warning about this under
the right conditions (ie compiler flags).

Jeff

On Sat, Aug 10, 2019 at 10:59 AM carlos aguni via users <
users@lists.open-mpi.org> wrote:

> Hi all,
>
> Sorry no reply.
>
> I just figured out the solution.
>
> The problem was that I had a function that would MPI_Isend a message on
> every call to it. Then I'd store its request pointer to a list.
> My MPI_Isend snippet:
> MPI_Request req;
> MPI_Isend(blabla, &req)
> task_push(&req);
>
> From time to time at the beginning of that function I'd call another
> function that would iterate over that list MPI_Testing if that message
> had completed to then free the buffers used.
> The problem was that even if the flag returned from MPI_Test(&req, &flag,
> &status) previouly assigned to 0 returned 1 but my guess is that C had
> already deallocated it from the heap (idk much about C though..)
> Snippet of my clean function:
> 
> int flag = 0;
> MPI_Test(req, &flag, &status);
> if (flag) { // then free..
> ...
>
> My solution that worked was previously malloc it before the MPI_Isend call
> like:
> MPI_Request *rr = (MPI_Request *)malloc(sizeof(MPI_Request));
> MPI_Isend(blabla, rr);
> task_push(rr);
>
> All I know is that it's working now..
>
> Thanks to all.
>
> Regards,
> C.
>
> On Sun, Jul 28, 2019 at 11:53 AM Jeff Squyres (jsquyres) via users <
> users@lists.open-mpi.org> wrote:
>
>> On Jul 27, 2019, at 10:43 PM, Gilles Gouaillardet via users <
>> users@lists.open-mpi.org> wrote:
>> >
>> > MPI_Isend() does not automatically frees the buffer after it sends the
>> message.
>> > (it simply cannot do it since the buffer might be pointing to a global
>> > variable or to the stack).
>>
>> Gilles is correct: MPI_Isend does not free the buffer.  I was wondering
>> if you had somehow used that same buffer -- or some subset of that buffer
>> -- in other non-blocking MPI API calls, and freeing it triggered Bad Things
>> because MPI was still using (some of) that buffer because of other pending
>> MPI requests.
>>
>> > Can you please extract a reproducer from your program ?
>>
>> Yes, please do this.
>>
>> > Out of curiosity, what if you insert an (useless) MPI_Wait() like this ?
>> >
>> > MPI_Test(req, &flag, &status);
>> > if (flag){
>> >MPI_Wait(req, MPI_STATUS_IGNORE);
>> >free(buffer);
>> > }
>>
>> That should be a no-op, because "req" should have been turned into
>> MPI_REQUEST_NULL if flag==true.
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] silent failure for large allgather

2019-08-11 Thread Jeff Hammond via users
On Tue, Aug 6, 2019 at 9:54 AM Emmanuel Thomé via users <
users@lists.open-mpi.org> wrote:

> Hi,
>
> In the attached program, the MPI_Allgather() call fails to communicate
> all data (the amount it communicates wraps around at 4G...).  I'm running
> on an omnipath cluster (2018 hardware), openmpi 3.1.3 or 4.0.1 (tested
> both).
>
> With the OFI mtl, the failure is silent, with no error message reported.
> This is very annoying.
>
> With the PSM2 mtl, we have at least some info printed that 4G is a limit.
>
> I have tested it with various combinations of mca parameters. It seems
> that the one config bit that makes the test pass is the selection of the
> ob1 pml. However I have to select it explicitly, because otherwise cm is
> selected instead (priority 40 vs 20, it seems), and the program fails. I
> don't know to which extent the cm pml is the root cause, or whether I'm
> witnessing a side-effect of something.
>
> openmpi-3.1.3 (debian10 package openmpi-bin-3.1.3-11):
>
> node0 ~ $ mpiexec -machinefile /tmp/hosts --map-by node  -n 2 ./a.out
> MPI_Allgather, 2 nodes, 0x10001 chunks of 0x1 bytes, total 2 *
> 0x10001 bytes: ...
> Message size 4295032832 bigger than supported by PSM2 API. Max =
> 4294967296
> MPI error returned:
> MPI_ERR_OTHER: known error not in list
> MPI_Allgather, 2 nodes, 0x10001 chunks of 0x1 bytes, total 2 *
> 0x10001 bytes: NOK
> [node0.localdomain:14592] 1 more process has sent help message
> help-mtl-psm2.txt / message too big
> [node0.localdomain:14592] Set MCA parameter "orte_base_help_aggregate"
> to 0 to see all help / error messages
>
> node0 ~ $ mpiexec -machinefile /tmp/hosts --map-by node  -n 2 --mca
> mtl ofi ./a.out
> MPI_Allgather, 2 nodes, 0x10001 chunks of 0x1 bytes, total 2 *
> 0x10001 bytes: ...
> MPI_Allgather, 2 nodes, 0x10001 chunks of 0x1 bytes, total 2 *
> 0x10001 bytes: NOK
> node 0 failed_offset = 0x10002
> node 1 failed_offset = 0x1
>
> I attached the corresponding outputs with some mca verbose
> parameters on, plus ompi_info, as well as variations of the pml layer
> (ob1 works).
>
> openmpi-4.0.1 gives essentially the same results (similar files
> attached), but with various doubts on my part as to whether I've run this
> check correctly. Here are my doubts:
> - whether I should or not have an ucx build for an omnipath cluster
>   (IIUC https://github.com/openucx/ucx/issues/750 is now fixed ?),
>

UCX is not optimized for Omni Path.  Don't use it.


> - which btl I should use (I understand that openib goes to
>   deprecation and it complains unless I do --mca btl openib --mca
>   btl_openib_allow_ib true ; fine. But then, which non-openib non-tcp
>   btl should I use instead ?)
>

OFI->PS2 and PSM2 are the right conduits for Omni Path.


> - which layers matter, which ones matter less... I tinkered with btl
>   pml mtl.  It's fine if there are multiple choices, but if some
>   combinations lead to silent data corruption, that's not really
>   cool.
>

It sounds like Open-MPI doesn't properly support the maximum transfer size
of PSM2.  One way to work around this is to wrap your MPI collective calls
and do <4G chunking yourself.

Jeff


> Could the error reporting in this case be somehow improved ?
>
> I'd be glad to provide more feedback if needed.
>
> E.
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users



-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users