On 11/12/2014 05:45 PM, Reuti wrote:
Am 12.11.2014 um 17:27 schrieb Reuti:
Am 11.11.2014 um 02:25 schrieb Ralph Castain:
Another thing you can do is (a) ensure you built with —enable-debug,
and then (b) run it with -mca oob_base_verbose 100
(without the tcp_if_include option) so we can watch
Could you please send the output of netstat -nr on both head and compute node ?
no problem obfuscating the ip of the head node, i am only interested in
netmasks and routes.
Ralph Castain wrote:
>
>> On Nov 12, 2014, at 2:45 PM, Reuti wrote:
>>
>> Am 12.11.2014 um 17:27 schrieb Reuti:
>>
>>> A
> On Nov 12, 2014, at 2:45 PM, Reuti wrote:
>
> Am 12.11.2014 um 17:27 schrieb Reuti:
>
>> Am 11.11.2014 um 02:25 schrieb Ralph Castain:
>>
>>> Another thing you can do is (a) ensure you built with —enable-debug, and
>>> then (b) run it with -mca oob_base_verbose 100 (without the tcp_if_incl
Am 12.11.2014 um 17:27 schrieb Reuti:
> Am 11.11.2014 um 02:25 schrieb Ralph Castain:
>
>> Another thing you can do is (a) ensure you built with —enable-debug, and
>> then (b) run it with -mca oob_base_verbose 100 (without the tcp_if_include
>> option) so we can watch the connection handshake
yes I confirm. Thanks for saying that this is the supposed behaviour.
In the binary, the code goes to munmap@plt, which goes to the libc,
not to libopen-pal.so
libc is 2.13-38+deb7u1
I'm a total noob at got/plt relocations. What is the mechanism which
should make the opal relocation win over the
FWIW, munmap is *supposed* to be intercepted. Can you confirm that when your
application calls munmap, it doesn't make a call to libopen-pal.so?
It should be calling this (1-line) function:
-
/* intercept munmap, as the user can give back memory that way as well. */
OPAL_DECLSPEC int munmap
You could just disable leave pinned:
-mca mpi_leave_pinned 0 -mca mpi_leave_pinned_pipeline 0
This will fix the issue but may reduce performance. Not sure why the
munmap wrapper is failing to execute but this will get you running.
-Nathan Hjelm
HPC-5, LANL
On Wed, Nov 12, 2014 at 05:08:06PM +0
Am 11.11.2014 um 02:25 schrieb Ralph Castain:
> Another thing you can do is (a) ensure you built with —enable-debug, and then
> (b) run it with -mca oob_base_verbose 100 (without the tcp_if_include
> option) so we can watch the connection handshake and see what it is doing.
> The —hetero-nodes
Hi folks
Those of you following the mailing lists probably know that we had hoped to
release 1.8.4 last Friday, but were unable to do so. We currently have a couple
of issues pending resolution, and our developers are badly “crunched” by final
prep for Supercomputing. We then will hit the US Th
As far as I have been able to understand while looking at the code, it
very much seems that Joshua pointed out the exact cause for the issue.
munmap'ing a virtual address space region does not evict it from
mpool_grdma->pool->lru_list . If a later mmap happens to return the
same address (a priori
I was going to send something out to the list today anyway - will do so now.
> On Nov 12, 2014, at 6:58 AM, Jeff Squyres (jsquyres)
> wrote:
>
> On Nov 12, 2014, at 9:53 AM, Ray Sheppard wrote:
>
>> Thanks, and sorry to blast my little note out to the list. I guess your
>> mail address is
> On Nov 12, 2014, at 7:15 AM, Dave Love wrote:
>
> Ralph Castain writes:
>
>> You might also add the —display-allocation flag to mpirun so we can
>> see what it thinks the allocation looks like. If there are only 16
>> slots on the node, it seems odd that OMPI would assign 32 procs to it
>> u
Hi,
I'm trying to find the correct settings for OFED kernel parameter for the
cluster. Each node has 32G RAM, installed Red Hat Enterprise Linux
Server release 6.4 (Santiago) , OFED 2.1.192, OpenMPI 1.6.5 and
Mellanox Technologies MT27500 Family [ConnectX-3] with 56G actived.
lsmod showe
"SLIM H.A." writes:
> Dear Reuti and Ralph
>
> Below is the output of the run for openmpi 1.8.3 with this line
>
> mpirun -np $NSLOTS --display-map --display-allocation --cpus-per-proc 1 $exe
-np is redundant with tight integration unless you're using fewer than
NSLOTS from SGE.
> ompi_info | g
Reuti writes:
>> If so, I’m wondering if that NULL he shows in there is the source of the
>> trouble. The parser doesn’t look like it would handle that very well, though
>> I’d need to test it. Is that NULL expected? Or is the NULL not really in the
>> file?
>
> I must admit here: for me the f
Ralph Castain writes:
> You might also add the —display-allocation flag to mpirun so we can
> see what it thinks the allocation looks like. If there are only 16
> slots on the node, it seems odd that OMPI would assign 32 procs to it
> unless it thinks there is only 1 node in the job, and oversubs
"Jeff Squyres (jsquyres)" writes:
> Yeah, we don't actually share man pages.
I suppose it wouldn't save much anyhow at this stage of the game.
> I think the main issue would be just to edit the *.3in pages here:
>
> https://github.com/open-mpi/ompi/tree/master/ompi/mpi/man/man3
>
> They're
On Nov 12, 2014, at 9:53 AM, Ray Sheppard wrote:
> Thanks, and sorry to blast my little note out to the list. I guess your mail
> address is now aliased to the mailing list in my mail client.
:-)
No worries; I'm sure this is a question on other people's minds, too.
--
Jeff Squyres
jsquy...@
Thanks, and sorry to blast my little note out to the list. I guess your
mail address is now aliased to the mailing list in my mail client.
Ray
On 11/12/2014 9:41 AM, Jeff Squyres (jsquyres) wrote:
We have 2 critical issues left that need fixing (a THREAD_MULTIPLE/locking
issue and a shmem is
We have 2 critical issues left that need fixing (a THREAD_MULTIPLE/locking
issue and a shmem issue). There's active work progressing on both.
I think we'd love to say it would be ready by SC, but I know that a lot of us
-- myself included -- are fighting to meet our own SC deadlines.
Ralph Cas
Hi Jeff,
Sorry to bother you directly, but do you know when y'all will release
the stable version of 1.8.4? I have users asking for it and really
would like to build it for them before I leave for SC. But, either way,
it would be great to be able to help manage their expectations. Thanks.
Do you have firewalling enabled on either server?
See this FAQ item:
http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems
On Nov 12, 2014, at 4:57 AM, Syed Ahsan Ali wrote:
> Dear All
>
> I need your advice. While trying to run mpirun job across nodes I get
> follo
Am 11.11.2014 um 02:12 schrieb Gilles Gouaillardet:
> Hi,
>
> IIRC there were some bug fixes between 1.8.1 and 1.8.2 in order to really use
> all the published interfaces.
>
> by any change, are you running a firewall on your head node ?
Yes, but only for the interface to the outside world. N
Dear All
I need your advice. While trying to run mpirun job across nodes I get
following error. It seems that the two nodes i.e, compute-01-01 and
compute-01-06 are not able to communicate with each other. While nodes
see each other on ping.
[pmdtest@pmd ERA_CLM45]$ mpirun -np 16 -hostfile hostli
Please accept our apologies if you receive multiple copies of this CfP.
***
* ALCHEMY Workshop 2015
* Architecture, Languages, Compilation and Hardware support for Emerging
ManYcore systems
*
* Held in conjunction
25 matches
Mail list logo