Thanks!
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
> On Oct 31, 2014, at 2:22 PM, Ralph Castain wrote:
>
>
>> On Oct 30, 2014, at 3:15 PM, Brock Palen wrote:
>>
>> If i'm on the node hosting mpirun for a job, and run:
>>
> On Oct 30, 2014, at 3:15 PM, Brock Palen wrote:
>
> If i'm on the node hosting mpirun for a job, and run:
>
> orte-ps
>
> It finds the job and shows the pids and info for all ranks.
>
> If I use orte-top though it does no such default, I have to find the mpirun
> pid and then use it.
>
>
Does anyone have issues with jobs dying with errors:
> The InfiniBand retry count between two MPI processes has been
> exceeded. "Retry count" is defined in the InfiniBand spec 1.2
> (section 12.7.38):
We started seeing this about a year ago. If we have changes to the IB fabric,
this can happe
> "Nathan" == Nathan Hjelm writes:
Hi Nathan
Nathan> I want to close the loop on this issue. 1.8.5 will address
Nathan> it in several ways:
Nathan> - knem support in btl/sm has been fixed. A sanity check was
Nathan>disabling knem during component registration. I wrote t
Dear developers of OPENMPI,
There remains a hanging observed in MPI_WIN_ALLOCATE_SHARED.
But first:
Thank you for your advices to employ shmem_mmap_relocate_backing_file = 1
It indeed turned out, that the bad (but silent) allocations by
MPI_WIN_ALLOCATE_SHARED, which I observed in the past
Le 31/10/2014 00:24, Gus Correa a écrit :
> 2) Any recommendation for the values of the
> various vader btl parameters?
> [There are 12 of them in OMPI 1.8.3!
> That is real challenge to get right.]
>
> Which values did you use in your benchmarks?
> Defaults?
> Other?
>
> In particular, is there an