Re: [OMPI users] Bug report in plm_lsf_module.c

2010-04-26 Thread Ralph Castain
Done (applied to trunk for now) - thanks again! On Apr 26, 2010, at 12:06 PM, Teng Lin wrote: > Hi, > > We recently identify a bug in our LSF cluster. > The job always hang if all LSF related components present. One observation we > have is that the job works fine after removing all LSF related

Re: [OMPI users] deadlock when calling MPI_gatherv

2010-04-26 Thread Teng Lin
On Apr 26, 2010, at 9:07 PM, Trent Creekmore wrote: > You are going to have to debug and trace the program to find out where it is > stopping. > You may want to try using KDbg, a graphical front end for the command line > debugger dbg, which makes it a LOT easier, or use Eclipse. As a matter of

Re: [OMPI users] deadlock when calling MPI_gatherv

2010-04-26 Thread Trent Creekmore
You are going to have to debug and trace the program to find out where it is stopping. You may want to try using KDbg, a graphical front end for the command line debugger dbg, which makes it a LOT easier, or use Eclipse. -Original Message- From: users-boun...@open-mpi.org [mailto:users-

[OMPI users] deadlock when calling MPI_gatherv

2010-04-26 Thread Teng Lin
Hi, We recently ran into deadlock when calling MPI_gatherv with Open MPI 1.3.4. It seems to have something to do with sm at first. However, it still hangs even after turning off sm btl. Any idea how to track down the problem? Thanks, Teng # Stac

Re: [OMPI users] unresolved symbol mca_base_param_reg_int

2010-04-26 Thread Nev
On Mon, 2010-04-26 at 14:43 -0400, Jeff Squyres wrote: > On Apr 24, 2010, at 10:14 PM, Nev wrote: > > > void * const result = dlopen(libName, RTLD_LAZY | RTLD_LOCAL); > > This line is the problem: change RTLD_LOCAL to RTLD_GLOBAL and it'll work. > There's another option, too -- keep reading..

Re: [OMPI users] problem using new OMPI1.4.1 vie SGE

2010-04-26 Thread Prentice Bisbal
Matt, I just found something else with your job script. You are defining the LD_LIBRARY_PATH environment variable, and *then* loading the openmpi environment module: >>> #!/bin/bash >>> export TMPDIR=$SCRATCH/abyss_tmp/ >>> LD_LIBRARY_PATH=/work/01301/mmacmane >>>

Re: [OMPI users] problem using new OMPI1.4.1 vie SGE

2010-04-26 Thread Prentice Bisbal
On Apr 26, 2010, at 9:08 AM, Matthew MacManes wrote: >> >>> I am using SGE to submit jobs to one of the TeraGrid sites, >>> specifically TACC-RANGER. The problem, is, that I am using a >>> program that requires OpenMPI version 1.4.1, and the latest >>> instal

[OMPI users] [OMP users]: OpenMP1.4 tuning for sending large messages

2010-04-26 Thread Pooja Varshneya
Hi All, I am using OpenMPI 1.4 on a cluster of Intel quad-core processors running Linux and connected by ethernet. In an application, i m trying to send and receive large messages of sizes ranging from 1 KB upto 500 MB. The application works fine if the messages sizes are within 1 MB ran

Re: [OMPI users] problem using new OMPI1.4.1 vie SGE

2010-04-26 Thread Ralph Castain
My question about version wasn't "why can't you use 1.3?". It was "why do you believe the problems you are seeing are caused by not finding the correct version?". It looks to me like everything is working correctly, but that communications are blocked for some reason. That doesn't sound like a

Re: [OMPI users] Bug report in plm_lsf_module.c

2010-04-26 Thread Ralph Castain
Thanks! I'll take care of this - again, appreciate the patch! On Apr 26, 2010, at 12:40 PM, Teng Lin wrote: > Ralph, > > Thanks for the prompt response. > On Apr 26, 2010, at 2:34 PM, Ralph Castain wrote: > >> Appreciate your input! None of the developers have access to an LSF machine >> any m

Re: [OMPI users] problem using new OMPI1.4.1 vie SGE

2010-04-26 Thread Matthew MacManes
Hi Ralph, Its a no-go with the --enable-mpirun-prefix-by-default. Version issue: The program I am trying to run (RAY: http://sourceforge.net/apps/mediawiki/denovoassembler/index.php?title=Main_Page#Installation ) will not work with earlier versions of OpenMPI- this is confirmed both by the autho

Re: [OMPI users] unresolved symbol mca_base_param_reg_int

2010-04-26 Thread Jeff Squyres
On Apr 24, 2010, at 10:14 PM, Nev wrote: > void * const result = dlopen(libName, RTLD_LAZY | RTLD_LOCAL); This line is the problem: change RTLD_LOCAL to RTLD_GLOBAL and it'll work. There's another option, too -- keep reading... Before discussing why this happens, know that Open MPI plugins

Re: [OMPI users] Bug report in plm_lsf_module.c

2010-04-26 Thread Teng Lin
Ralph, Thanks for the prompt response. On Apr 26, 2010, at 2:34 PM, Ralph Castain wrote: > Appreciate your input! None of the developers have access to an LSF machine > any more, so we can't test it :-/ > > What version of OMPI does this patch apply to? The patch is applied to 1.3.4, which is t

Re: [OMPI users] Bug report in plm_lsf_module.c

2010-04-26 Thread Ralph Castain
Appreciate your input! None of the developers have access to an LSF machine any more, so we can't test it :-/ What version of OMPI does this patch apply to? I can go ahead and add it - just want to know if it should just go to the trunk and 1.5 series, or also the 1.4 series. Thanks again! Ral

[OMPI users] Bug report in plm_lsf_module.c

2010-04-26 Thread Teng Lin
Hi, We recently identify a bug in our LSF cluster. The job always hang if all LSF related components present. One observation we have is that the job works fine after removing all LSF related components. Below message from stdout: [:24930] mca: base: components_open: Looking for ess compone

Re: [OMPI users] Solving SVD Using Lanczos Method Implementation

2010-04-26 Thread Jed Brown
On Mon, 26 Apr 2010 22:30:15 +0700, long thai wrote: > Hi all. > > I'm trying to develop MPI program to solve SVD using Lanczos algorithms. > However, I have no idea how to do that. Somebody suggested to take a look at > http://www.netlib.org/scalapack/ but I cannot understand exactly what to >

Re: [OMPI users] open-mpi behaviour on Fedora, Ubuntu, Debian and CentOS

2010-04-26 Thread Jeff Squyres
On Apr 26, 2010, at 12:03 PM, Dave Love wrote: > Sorry, but that's naïve, even if you can prove your code is well-defined > according to the language and floating-point standards. You should > listen to Ashley, and if it worries you, you really need just to debug > it. If you believe it's a prob

Re: [OMPI users] open-mpi behaviour on Fedora, Ubuntu, Debian and CentOS

2010-04-26 Thread Gus Correa
Dave Love wrote: Asad Ali writes: >From run to run the results can only be different if you either use different input/output or use different random number seeds. Here in my case the random number seeds are the same as well. Sorry, but that's naïve, even if you can prove your code is well-d

Re: [OMPI users] Segmentation fault when Send/Recv on heterogeneouscluster (32/64 bit machines) (Timur Magomedov)

2010-04-26 Thread TRINH Minh Hieu
Hi, I downloaded the nightly trunk snapshot openmpi-1.7a1r23032. I recompiled openmpi and my test code, and put back the btl_tcp_eager_limit to default. The bug seems to be fixed. I don't have the segfault anymore even with big array (up to 685MB). Thanks to all the developpers. There is a ticket

Re: [OMPI users] How to "guess" the incoming data type ?

2010-04-26 Thread Jed Brown
On Sun, 25 Apr 2010 20:38:54 -0700, Eugene Loh wrote: > Could you encode it into the tag? This sounds dangerous. > Or, append a data type to the front of each message? This is the idea, unfortunately this still requires multiple messages for collectives (because you can't probe for a suitable b

Re: [OMPI users] problem using new OMPI1.4.1 vie SGE

2010-04-26 Thread Dave Love
Matthew MacManes writes: > I am using SGE to submit jobs to one of the TeraGrid sites, > specifically TACC-RANGER. It's more on-topic here than the SGE list, but you should still ask the Ranger support people. People who don't know Ranger can't say if you actually can use the TCP BTL on it, bu

Re: [OMPI users] open-mpi behaviour on Fedora, Ubuntu, Debian and CentOS

2010-04-26 Thread Dave Love
Asad Ali writes: >>From run to run the results can only be different if you either use > different input/output or use different random number seeds. Here in my case > the random number seeds are the same as well. Sorry, but that's naïve, even if you can prove your code is well-defined according

Re: [OMPI users] How to "guess" the incoming data type ?

2010-04-26 Thread Sylvestre Ledru
Le lundi 26 avril 2010 à 16:51 +0100, Dave Love a écrit : > Sylvestre Ledru writes: > > > I am currently extending an application with MPI capabilities. > > This high-level application allows users to use dynamic types. Therefor, > > on the slaves, I have no way to know what the master will send

Re: [OMPI users] open-mpi behaviour on Fedora, Ubuntu, Debian and CentOS

2010-04-26 Thread Dave Love
Sylvestre Ledru writes: > This code will set the precision to double: > > #include > fpu_control_t _cw; > _FPU_GETCW(_cw); > _cw = (_cw & ~_FPU_DOUBLE) | _FPU_EXTENDED; > _FPU_SETCW(_cw); > > You should get the same result on 32 & 64 bits CPU then. Quite off-topic, but as far as I remember f

Re: [OMPI users] problem using new OMPI1.4.1 vie SGE

2010-04-26 Thread Addepalli, Srirangam V
Hello Matthew, what are the build flags you used for building openmpi on RANGER. This info should be in the config.log file. eg: head -10 config.log | grep './configure' $ ./configure --with-openib=/usr/ -with-ft=cr --with-blcr=/usr --with-blcr-libdir=/usr/lib64/ --prefix=/lustre/work/saddepa

Re: [OMPI users] How to "guess" the incoming data type ?

2010-04-26 Thread Dave Love
Sylvestre Ledru writes: > I am currently extending an application with MPI capabilities. > This high-level application allows users to use dynamic types. Therefor, > on the slaves, I have no way to know what the master will send me. Have you looked at existing MPI work with dynamic languages li

Re: [OMPI users] problem using new OMPI1.4.1 vie SGE

2010-04-26 Thread Ralph Castain
When configuring OMPI. Your configure should look like this: ./configure --prefix= --enable-mpirun-prefix-by-default . Just curious: what convinces you that you have a version mismatch? Connectivity failures can occur for a variety of reasons - this looks more like you have some kind of net

Re: [OMPI users] problem using new OMPI1.4.1 vie SGE

2010-04-26 Thread Matthew MacManes
Hi Ralph, Thanks! Do you mean to pass '--enable-mpirun-prefix-by-default' when configuring OpenMPI, or when configuring the program I am trying to use. Sorry if this should be obvious! On Mon, Apr 26, 2010 at 08:13, Ralph Castain wrote: > First, is the directory where you installed OMPI 1.4.1 v

Re: [OMPI users] problem using new OMPI1.4.1 vie SGE

2010-04-26 Thread Matthew MacManes
Hi Rangam, Thanks.. Just tried. Still the 'no route to host' issue.. Matt _ Matthew MacManes PhD Candidate University of California- Berkeley Museum of Vertebrate Zoology Phone: 510-495-5833 Lab Website: http://ib.berkeley.edu/labs/lacey Personal Website: http://ma

[OMPI users] Solving SVD Using Lanczos Method Implementation

2010-04-26 Thread long thai
Hi all. I'm trying to develop MPI program to solve SVD using Lanczos algorithms. However, I have no idea how to do that. Somebody suggested to take a look at http://www.netlib.org/scalapack/ but I cannot understand exactly what to look. Morever, I know that *las2* is the popular library to solve S

Re: [OMPI users] problem using new OMPI1.4.1 vie SGE

2010-04-26 Thread Addepalli, Srirangam V
Hello Matthew, Not sure it it helps but I am guessing that module load openmpi (in bash script) is updating variables to RANGER openmpi installation. Try removing this line in your script file and resubmit your job, also #$ -V in the submission script exports env variables from your current

Re: [OMPI users] problem using new OMPI1.4.1 vie SGE

2010-04-26 Thread Ralph Castain
First, is the directory where you installed OMPI 1.4.1 visible to all the nodes? If not, then this won't work. If it is, then try configuring with --enable-mpirun-prefix-by-default, and be sure you specify a prefix that points to your installation. On Apr 26, 2010, at 9:08 AM, Matthew MacManes

[OMPI users] problem using new OMPI1.4.1 vie SGE

2010-04-26 Thread Matthew MacManes
I am using SGE to submit jobs to one of the TeraGrid sites, specifically TACC-RANGER. The problem, is, that I am using a program that requires OpenMPI version 1.4.1, and the latest install on RANGER is 1.3.1. I was told that I could install OpenMPI in my home directory, and run jobs using my ne

Re: [OMPI users] open-mpi behaviour on Fedora, Ubuntu, Debian and CentOS

2010-04-26 Thread Tim Prince
On 4/26/2010 2:31 AM, Asad Ali wrote: On Mon, Apr 26, 2010 at 8:01 PM, Ashley Pittman > wrote: On 25 Apr 2010, at 22:27, Asad Ali wrote: > Yes I use different machines such as > > machine 1 uses AMD Opterons. (Fedora) > > machine 2 and 3

Re: [OMPI users] open-mpi behaviour on Fedora, Ubuntu, Debian and CentOS

2010-04-26 Thread Asad Ali
On Mon, Apr 26, 2010 at 8:01 PM, Ashley Pittman wrote: > > On 25 Apr 2010, at 22:27, Asad Ali wrote: > > > Yes I use different machines such as > > > > machine 1 uses AMD Opterons. (Fedora) > > > > machine 2 and 3 use Intel Xeons. (CentOS) > > > > machine 4 uses slightly older Intel Xeons. (Debian

Re: [OMPI users] open-mpi behaviour on Fedora, Ubuntu, Debian and CentOS

2010-04-26 Thread Conboy, James
Hi - Have you checked your compiler switches ?? Some have options to perform IEEE arithmetic, which is supposed to give identical results - eg pgf95 -Kieee -Knoieee (default) Perform floating-point operations in strict conformance with the IEEE 754 standard. Some optimizations are

Re: [OMPI users] Segmentation fault when Send/Recv on heterogeneouscluster (32/64 bit machines)

2010-04-26 Thread Timur Magomedov
Hello, You can get nightly trunk snapshot from here http://www.open-mpi.org/nightly/trunk/ You can grab openmpi-1.7a1r23032 and test it. This will be great. В Пнд, 26/04/2010 в 10:26 +0200, TRINH Minh Hieu пишет: > > Hello, > > I can help to test the patch if you need to. But I don't know much h

Re: [OMPI users] Segmentation fault when Send/Recv on heterogeneouscluster (32/64 bit machines)

2010-04-26 Thread TRINH Minh Hieu
Hello, I can help to test the patch if you need to. But I don't know much how to you svn to get the latest source to test. Regards TMHieu > Message: 1 > Date: Fri, 23 Apr 2010 20:15:58 +0400 > From: Timur Magomedov > Subject: Re: [OMPI users] Segmentation fault when Send/Recv on >he

Re: [OMPI users] open-mpi behaviour on Fedora, Ubuntu, Debian and CentOS

2010-04-26 Thread Ashley Pittman
On 25 Apr 2010, at 22:27, Asad Ali wrote: > Yes I use different machines such as > > machine 1 uses AMD Opterons. (Fedora) > > machine 2 and 3 use Intel Xeons. (CentOS) > > machine 4 uses slightly older Intel Xeons. (Debian) > > Only machine 1 gives correct results. While CentOS and Debian

Re: [OMPI users] open-mpi behaviour on Fedora, Ubuntu, Debian and CentOS

2010-04-26 Thread Sylvestre Ledru
Hello, Le lundi 26 avril 2010 à 14:33 +1200, Asad Ali a écrit : > Hi Jodi, > > > I once got different results when running on a 64-Bit platform > instead of > > a 32 bit platform - if i remember correctly, the reason was that on > the > > 32-bit platform 80bit extended precision floats were use

Re: [OMPI users] open-mpi behaviour on Fedora, Ubuntu, Debian and CentOS

2010-04-26 Thread jody
Hi Asad I must admit i don't know how one can find out whether extended precision is being used or not. I think one has to read up on the CPU's information. I only know that most Intel 32bit-Processors use the extended precision http://en.wikipedia.org/wiki/X86 as does AMD Athlon http://www.a