[OMPI users] MPI_Init() "num local peers failed" - bug?

2023-05-16 Thread Jan Florian Wagner via users
(s) to eventually come up on the same node? many thanks, Jan

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread Nysal Jan K A
hi Gabriele, You can check some of the available options here - https://www.ibm.com/support/knowledgecenter/en/SSZTET_10.1.0/smpi02/smpi02_interconnect.html The "-pami_noib" option might be of help in this scenario. Alternatively, on a single node, the vader BTL can also be used. Regards --Nysal

Re: [OMPI users] [EXTERNAL] Re: Errors on POWER8 Ubuntu 14.04u2

2015-03-30 Thread Nysal Jan K A
If this is Power 8 in LE mode, its most likely a libtool issue. You need libtool >= 2.4.3, which has the LE patches, and need to run autogen.pl again. I have an issue open for this - https://github.com/open-mpi/ompi/issues/396 Regards --Nysal On Sat, Mar 28, 2015 at 12:41 AM, Hammond, Simon David

Re: [OMPI users] Open MPI vs IBM MPI performance help

2010-12-03 Thread Nysal Jan
Collecting MPI Profile information might help narrow down the issue. You could use some of the tools mentioned here - http://www.open-mpi.org/faq/?category=perftools --Nysal On Wed, Dec 1, 2010 at 11:59 PM, Price, Brian M (N-KCI) < brian.m.pr...@lmco.com> wrote: > OpenMPI version: 1.4.3 > > Pla

Re: [OMPI users] EXTERNAL: Re: Creating 64-bit objects?

2010-11-11 Thread Nysal Jan
n. > > > > Brian > > > > > > *From:* users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] *On > Behalf Of *Nysal Jan > *Sent:* Wednesday, November 10, 2010 12:19 AM > *To:* Open MPI Users > *Subject:* EXTERNAL: Re: [OMPI users] Creating 64-bit obj

Re: [OMPI users] Creating 64-bit objects?

2010-11-11 Thread Nysal Jan
the same issue occur in OMPI 1.5? > > Should we put in a local patch for OMPI 1.4.x and/or OMPI 1.5? (we've done > this before while waiting for upstream Libtool patches to be released, etc.) > > > > On Nov 10, 2010, at 2:19 AM, Nysal Jan wrote: > > > Hi Brian, &

Re: [OMPI users] Creating 64-bit objects?

2010-11-10 Thread Nysal Jan
Hi Brian, This problem was first reported by Paul H. Hargrove in the developer mailing list. It is a bug in libtool and has been fixed in the latest release (2.2.8). More details are available here - http://www.open-mpi.org/community/lists/devel/2010/10/8606.php Regards --Nysal On Wed, Nov 10, 20

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-29 Thread Nysal Jan
t;>>>>>>>>> messages size are > > >>>>>>>>>>> > > >>>>>>>>>>>> 10k but can be very much larger. > > >>>>>>>>>>> > > >>>>>>>>&g

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-17 Thread Nysal Jan
Hi Eloi, Sorry for the delay in response. I haven't read the entire email thread, but do you have a test case which can reproduce this error? Without that it will be difficult to nail down the cause. Just to clarify, I do not work for an iwarp vendor. I can certainly try to reproduce it on an IB sy

Re: [OMPI users] Checksuming in openmpi 1.4.1

2010-09-01 Thread Nysal Jan
Hi Gilbert, Checksums are turned off by default. If you need checksums to be activated add "-mca pml csum" to the mpirun command line. Checksums are enabled only for inter-node communication. Intra-node communication is typically over shared memory and hence checksum is disabled for this case. If y

Re: [OMPI users] [openib] segfault when using openib btl

2010-08-17 Thread Nysal Jan
te an invalid > memory access allowing to understand why reg->cbfunc / hdr->tag are null. > > Do you think that a thread race condition could explain the hdr->tag value > ? > > Thanks for your help, > Eloi > > On Monday 16 August 2010 20:46:39 Nysal Jan wrote: &

Re: [OMPI users] [openib] segfault when using openib btl

2010-08-16 Thread Nysal Jan
The value of hdr->tag seems wrong. In ompi/mca/pml/ob1/pml_ob1_hdr.h #define MCA_PML_OB1_HDR_TYPE_MATCH (MCA_BTL_TAG_PML + 1) #define MCA_PML_OB1_HDR_TYPE_RNDV (MCA_BTL_TAG_PML + 2) #define MCA_PML_OB1_HDR_TYPE_RGET (MCA_BTL_TAG_PML + 3) #define MCA_PML_OB1_HDR_TYPE_ACK (MCA_BT

Re: [OMPI users] Memory allocation error when linking with MPI libraries

2010-08-15 Thread Nysal Jan
ibc instead of openmpi one, but does > it have an effect on performance or something else ? > > Nicolas > > 2010/8/8 Nysal Jan > > What interconnect are you using? Infiniband? Use >> "--without-memory-manager" option while building ompi in order to disable >

Re: [OMPI users] Bug in POWERPC32.asm?

2010-08-09 Thread Nysal Jan
Thanks for reporting this Matthew. Fixed in r23576 ( https://svn.open-mpi.org/trac/ompi/changeset/23576) Regards --Nysal On Fri, Aug 6, 2010 at 10:38 PM, Matthew Clark wrote: > I was looking in my copy of openmpi-1.4.1 opal/asm/base/POWERPC32.asm > and saw the following: > > START_FUNC(opal_sys_

Re: [OMPI users] Memory allocation error when linking with MPI libraries

2010-08-08 Thread Nysal Jan
What interconnect are you using? Infiniband? Use "--without-memory-manager" option while building ompi in order to disable ptmalloc. Regards --Nysal On Sun, Aug 8, 2010 at 7:49 PM, Nicolas Deladerriere < nicolas.deladerri...@gmail.com> wrote: > Yes, I'am using 24G machine on 64 bit Linux OS. >

Re: [OMPI users] Implementing a new BTL module in MCA

2010-08-03 Thread Nysal Jan
You can find the template for a BTL in ompi/mca/btl/template (You will find this on the subversion trunk). Copy and rename the folder/files. Use this as a starting point. For details on creating a new component (such as a new BTL) look here - https://svn.open-mpi.org/trac/ompi/wiki/devel/CreateComp

Re: [OMPI users] OpenMPI providing rank?

2010-07-28 Thread Nysal Jan
OMPI_COMM_WORLD_RANK can be used to get the MPI rank. For other environment variables - http://www.open-mpi.org/faq/?category=running#mpi-environmental-variables For processor affinity see this FAQ entry - http://www.open-mpi.org/faq/?category=all#using-paffinity --Nysal On Wed, Jul 28, 2010 at 9

Re: [OMPI users] Problem with compilation : statically linked applications

2010-06-14 Thread Nysal Jan
__cxa_get_exception_ptr should be defined in libstdc++ shared library. --Nysal On Mon, Jun 14, 2010 at 5:51 AM, HeeJin Kim wrote: > Dear all, > > I had built openmpi-1.4.2 with: > configure CC=icc CXX=icpc F77=ifort FC=ifort > --prefix=/home/biduri/program/openmpi --enable-mpi-threads --enable-

Re: [OMPI users] Buffer size limit and memory consumption problem on heterogeneous (32 bit / 64 bit) machines

2010-05-20 Thread Nysal Jan
This probably got fixed in https://svn.open-mpi.org/trac/ompi/ticket/2386 Can you try 1.4.2, the fix should be in there. Regards --Nysal On Thu, May 20, 2010 at 2:02 PM, Olivier Riff wrote: > Hello, > > I assume this question has been already discussed many times, but I can not > find on Intern

[OMPI users] users Digest, Vol 1217, Issue 2, Message3

2009-05-21 Thread jan
Thanks Jeff. We solve this problem finally. Download the newest OFED-1.4.1-rc6.tgz, and reinstall all node's infiniband drivers and utilities. Everythings looks good, and I have my own coffee time now. Thanks again. Best Regards, Gloria Jan Wavelink Technology Inc I don't think th

[OMPI users] users Digest, Vol 1217, Issue 2, Message3

2009-05-07 Thread jan
:0 active_width: 4X (2) active_speed: 5.0 Gbps (2) phys_state: LINK_UP (5) GID[ 0]: fe80::::0018:8b90:97fe:73ce Best Regards, Gloria Jan Wavelink

Re: [OMPI users] users Digest, Vol 1217, Issue 2, Message3

2009-05-04 Thread jan
Thank you Jeff. I have passed the mail to the IB vendor Dell company(the blade was ordered from Dell Taiwan), but he todl me that he didn't understand "layer 0 diagnostics". Coluld you help us to get more information of "layer 0 diagnostics". Thanks again. Rega

Re: [OMPI users] users Digest, Vol 1217, Issue 2, Message3

2009-05-04 Thread jan
Thank you Jeff. I have passed the mail to the IB vendor Dell company(the blade was ordered from Dell Taiwan), but he todl me that he didn't understand "layer 0 diagnostics". Coluld you help us to get more information of "layer 0 diagnostics". Thanks again. Rega

[OMPI users] Fw: users Digest, Vol 1217, Issue 2, Message3

2009-05-04 Thread jan
][[3175,1],0][btl_openib_component.c:3029:poll_device] error polling HP CQ with -2 errno says Success = Is this problem unsolvable? Best Regards, Gloria Jan Wavelink Technology Inc I can confirm that I have exactly the same

Re: [OMPI users] users Digest, Vol 1217, Issue 2, Message3

2009-04-30 Thread jan
nectX product? Thank you again. Best Regards, Gloria Jan Wavelink Technology Inc I can confirm that I have exactly the same problem, also on Dell system, even with latest openpmpi. Our system is: Dell M905 OpenSUSE 11.1 kernel: 2.6.27.21-0.1-default ofed-1.4-21.12 from SUSE repositories. Op

Re: [OMPI users] users Digest, Vol 1212, Issue 3, Message: 2

2009-04-27 Thread jan
configuration again. but found the problem still occurred periodic, ie. twice success, then twice failed, twice success, then twice failed ... . Do you have any suggestion for this issue? Thank you again. Best Regards, Gloria Jan Wavelink Technology Inc. Per http://www.open-mpi.org/community/lists

Re: [OMPI users] users Digest, Vol 1212, Issue 3

2009-04-26 Thread jan
:21 AM, jan wrote: Dear Sir, I?m running a cluster with OpenMPI. $mpirun --mca mpi_show_mpi_alloc_mem_leaks 8 --mca mpi_show_handle_leaks 1 $HOME/test/cpi I got the error message as job failed: Process 15 on node2 Process 6 on node1 Process 14 on node2 ? ? ? Process 0 on node1 Process 10 on

[OMPI users] running problem on Dell blade server, confirm 2d21ce3ce8be64d8104b3ad71b8c59e2514a72eb

2009-04-24 Thread jan
ed on Dell PowerEdge M600 Blade Server. The infiniband Mezzanine Cards is Mellanox ConnectX QDR & DDR. And infiniband switch module is Mellanox M2401G. OS is CentOS 5.3, kernel 2.6.18-128.1.6.el5, with PGI V7.2-5 compiler. It’s running OpenSM subnet manager. Best Regards, Gloria Jan Wavel

[OMPI users] running problem on Dell blade server, confirm 2d21ce3ce8be64d8104b3ad71b8c59e2514a72eb

2009-04-24 Thread jan
ed on Dell PowerEdge M600 Blade Server. The infiniband Mezzanine Cards is Mellanox ConnectX QDR & DDR. And infiniband switch module is Mellanox M2401G. OS is CentOS 5.3, kernel 2.6.18-128.1.6.el5, with PGI V7.2-5 compiler. It’s running OpenSM subnet manager. Best Regards, Gloria Jan Wavel

Re: [OMPI users] libnuma issue

2009-04-15 Thread Nysal Jan
You could try statically linking the Intel-provided libraries. Use LDFLAGS=-static-intel --Nysal On Wed, 2009-04-15 at 21:03 +0200, Francesco Pietra wrote: > On Wed, Apr 15, 2009 at 8:39 PM, Prentice Bisbal wrote: > > Francesco Pietra wrote: > >> I used --with-libnuma=/usr since Prentice Bisbal

Re: [OMPI users] XLF and 1.3.1

2009-04-14 Thread Nysal Jan
Can you try adding --disable-dlopen to the configure command line --Nysal On Tue, 2009-04-14 at 10:19 +0200, Jean-Michel Beuken wrote: > Hello, > > I'm trying to build 1.3.1 under IBM Power5 + SLES 9.1 + XLF 9.1... > > after some searches on FAQ and Google, my configure : > > export CC="/opt/

Re: [OMPI users] selected pml cm, but peer [[2469, 1], 0] on compute-0-0 selected pml ob1

2009-03-19 Thread Nysal Jan
fs1 is selecting the "cm" PML whereas other nodes are selecting the "ob1" PML component. You can force ob1 to be used via "--mca pml ob1" What kind of hardware/NIC does fs1 have? --Nysal On Wed, 2009-03-18 at 17:17 -0400, Gary Draving wrote: > Hi all, > > anyone ever seen an error like this? Se

Re: [OMPI users] RETRY EXCEEDED ERROR

2009-03-05 Thread Jan Lindheim
ts checked, 103 ports have errors beyond threshold I wonder if this is something that needs to be tuned in the Infiniband switch or if there is something in OpenMPI/OpenIB that can be tuned. Thanks, Jan Lindheim

Re: [OMPI users] RETRY EXCEEDED ERROR

2009-03-04 Thread Jan Lindheim
On Wed, Mar 04, 2009 at 04:34:49PM -0500, Jeff Squyres wrote: > On Mar 4, 2009, at 4:16 PM, Jan Lindheim wrote: > > >On Wed, Mar 04, 2009 at 04:02:06PM -0500, Jeff Squyres wrote: > >> This *usually* indicates a physical / layer 0 problem in your IB > >> fabric. You

Re: [OMPI users] RETRY EXCEEDED ERROR

2009-03-04 Thread Jan Lindheim
gt; fabrics and/or very congested networks. Thanks Jeff! What is considered to be very large IB fabrics? I assume that with just over 180 compute nodes, our cluster does not fall into this category. Jan > > > On Mar 4, 2009, at 3:28 PM, Jan Lindheim wrote: > > >I found sev

[OMPI users] RETRY EXCEEDED ERROR

2009-03-04 Thread Jan Lindheim
parameters that need to be looked at too? Thanks for any insight on this! Regards, Jan Lindheim

Re: [OMPI users] Problems in 1.3 loading shared libs when usingVampirServer

2009-02-25 Thread Nysal Jan
On Tue, 2009-02-24 at 13:30 -0500, Jeff Squyres wrote: > - Get Python to give you the possibility of opening dependent > libraries in the global scope. This may be somewhat controversial; > there are good reasons to open plugins in private scopes. But I have > to imagine that OMPI is not th

Re: [OMPI users] lammps MD code fails with Open MPI 1.3

2009-02-20 Thread Nysal Jan
mp_Stealth-OMPI < in.testbench_small > LAMMPS (22 Jan 2008) > > Interestingly, I downloaded Open MPI 1.2.8, built it with the same > configure options I had used with 1.3, and it worked. > > I'm getting by fine with 1.2.8. I just wanted to file a possible bug > rep

Re: [OMPI users] Heap profiling with OpenMPI

2008-08-07 Thread Jan Ploski
I implementations to validate the prediction model I constructed in my local cluster. Regards, Jan Ploski

Re: [OMPI users] Heap profiling with OpenMPI

2008-08-07 Thread Jan Ploski
users-boun...@open-mpi.org schrieb am 08/06/2008 07:44:03 PM: > On Aug 6, 2008, at 12:37 PM, Jan Ploski wrote: > > >> I'm using the latest of Open MPI compiled with debug turned on, and > >> valgrind 3.3.0. From your trace it looks like there is a conflict > >

Re: [OMPI users] Heap profiling with OpenMPI

2008-08-06 Thread Jan Ploski
George Bosilca wrote: Jan, I'm using the latest of Open MPI compiled with debug turned on, and valgrind 3.3.0. From your trace it looks like there is a conflict between two memory managers. I'm not having the same problem as I disable the Open MPI memory manager on my builds

Re: [OMPI users] Heap profiling with OpenMPI

2008-08-06 Thread Jan Ploski
users-boun...@open-mpi.org schrieb am 08/05/2008 05:51:51 PM: > Jan, > > I'm using valgrind with Open MPI on a [very] regular basis and I never > had any problems. I usually want to know the execution path on the MPI > applications. For this I use: > mpirun -np XX valgr

[OMPI users] Heap profiling with OpenMPI

2008-08-05 Thread Jan Ploski
tation and recompiling)? Best regards, Jan Ploski -- Dipl.-Inform. (FH) Jan Ploski OFFIS FuE Bereich Energie | R&D Division Energy Escherweg 2 - 26121 Oldenburg - Germany Phone/Fax: +49 441 9722 - 184 / 202 E-Mail: jan.plo...@offis.de URL: http://www.offis.de

Re: [OMPI users] Problem with AlltoAll routine

2008-05-17 Thread Nysal Jan
Gabriele, Can you try with Open MPI 1.2.6. It has a parameter to disable early completion, set it to zero (-mca pml_ob1_use_early_completion 0). --Nysal On Wed, May 7, 2008 at 9:29 PM, Gabriele FATIGATI wrote: > I have attached informations requested about Infiniband net and OpenMPi > enviromen

Re: [OMPI users] openmpi-1.1a9r10157 Fails to build with Nag, f95Compiler // and Pathscale

2006-06-02 Thread Jan De Laet
Jeff, Ok, this solved the problem with the Pathscale compiler. Thanks -- Jan Message: 2 Date: Thu, 1 Jun 2006 17:37:36 -0400 From: "Jeff Squyres \(jsquyres\)" Subject: Re: [OMPI users] openmpi-1.1a9r10157 Fails to build with Nag f95Compiler To: "Open MPI Use

Re: [OMPI users] openmpi-1.1a9r10157 Fails to build with Nag f95

2006-06-02 Thread Jan De Laet
Hi, openmpi-1.1a9r10157's fortran bindings also fail to build with the Pathscale 2.1 pathf90 compiler. At the same spot but with different error messages (see below), which perhaps helps to clarify things. Any help greatly appreciated as well. Best regards, Jan De