.
Running OMPI from trunk
[node-119.ssauniversal.ssa.kodiak.nx:02996] [[56978,0],65] ORTE_ERROR_LOG:
Error in file base/ess_base_std_orted.c at line 288
Thanks.
Lenny Verkhovsky
SW Engineer, Mellanox Technologies
www.mellanox.com<http://www.mellanox.com>
Office:+972 74 712 9244
Mobi
1059:
[node-119.ssauniversal.ssa.kodiak.nx:02996] [[56978,0],65] ORTE_ERROR_LOG:
Error in file base/ess_base_std_orted.c at line 288
Lenny Verkhovsky
SW Engineer, Mellanox Technologies
www.mellanox.com<http://www.mellanox.com>
Office:+972 74 712 9244
Mobile: +972 54 554 0233
Fax:+972 72
I don't think so,
It's always the 66th node, even if I swap between 65th and 66th
I also get the same error when setting np=66, while having only 65 hosts in
hostfile
(I am using only tcp btl )
Lenny Verkhovsky
SW Engineer, Mellanox Technologies
www.mellanox.com<http://www
.
Any ideas ?
Thanks.
Lenny Verkhovsky
SW Engineer, Mellanox Technologies
www.mellanox.com<http://www.mellanox.com>
Office:+972 74 712 9244
Mobile: +972 54 554 0233
Fax:+972 72 257 9400
have you tried IMB benchmark with Bcast,
I think the problem is in the app.
All ranks in the communicator should enter Bcast,
since you have
if (rank==0)
else state, not all of them enters the same flow.
if (iRank == 0)
{
iLength = sizeof (acMessage);
MPI_Bcast (&iLength, 1, MPI_INT, 0, MPI_
f collective rule
>
> Still it useful to know how to do this, when this issue gets fixed in the
> future!
>
> Daniel
>
>
>
> Den 2009-12-30 15:57:50 skrev Lenny Verkhovsky >:
>
>
> The only workaround that I found is a file with dynamic rules.
>> Th
you know of a work around? I have not used a rule file before and seem
> to be unable to find the documentation for how to use one, unfortunately.
>
> Daniel
>
> Den 2009-12-30 15:17:17 skrev Lenny Verkhovsky >:
>
>
> This is the a knowing issue,
>>https://
This is the a knowing issue,
https://svn.open-mpi.org/trac/ompi/ticket/2087
Maybe it's priority should be raised up.
Lenny.
On Wed, Dec 30, 2009 at 12:13 PM, Daniel Spångberg wrote:
> Dear OpenMPI list,
>
> I have used the dynamic rules for collectives to be able to select one
> specific
I noticed that you also have different versions of OMPI. You have 1.3.2 on
node1 and 1.3 on node2.
can you try to put same versions of OMPI on both nodes.
can you also try running np 16 on node1 when you try running separately.
Lenny.
On Tue, Nov 17, 2009 at 5:45 PM, Laurin Müller wrote:
>
>
> >>
you can use full path to mpirun, you can also set prefix
$mpirun -prefix path/to/mpi/home -np .
Lenny.
On Sun, Oct 18, 2009 at 12:03 PM, Oswald Benedikt wrote:
> Hi, thanks, that's what puzzled when I saw the reference to 1.3, but the
> LD_LIBRARY_PATH was set to point
> to the respective ve
Hi Eugene,
carto file is a file with a staic graph topology of your node.
in the opal/mca/carto/file/carto_file.h you can see example.
( yes I know that , it should be help/man list :) )
Basically it describes a map of your node and inside interconnection.
Hopefully it will be discovered automatica
you can use a shared ( i.e. NFS ) folder with this app, or provide a full
PATH to it.
ex:
$mpirun -np 2 -hostfile hostfile /home/user/app
2009/9/15 Dominik Táborský
> So I have to manually copy the compiled hello world program to all of
> the nodes so that they can be executed? I really didn't
func__ is but it needs a C99 compliant compiler.
>
> --Nysal
>
> On Tue, Sep 8, 2009 at 9:06 PM, Lenny Verkhovsky <
> lenny.verkhov...@gmail.com> wrote:
>
>> fixed in r21952
>> thanks.
>>
>> On Tue, Sep 8, 2009 at 5:08 PM, Arthur Huillet
>>
have you tried running hostname
$mpirun -np 2 --mca btl openib,self --host node1,node2 hostname
if it hangs, it's not Open MPI problem, check your setup,
especially check your firewall settings and disable it.
On Wed, Sep 2, 2009 at 2:06 PM, Lee Amy wrote:
> Hi,
>
> I encountered a very very con
I changed error message, I hope it will be more clear now.
r21919.
On Tue, Sep 1, 2009 at 2:13 PM, Lenny Verkhovsky wrote:
> please try using full ( drdb0235.en.desres.deshaw.com ) hostname
> in the hostfile/rankfile.
> It should help.
> Lenny.
>
> On Mon, Aug 31, 2009 at 7:43
please try using full ( drdb0235.en.desres.deshaw.com ) hostname
in the hostfile/rankfile.
It should help.
Lenny.
On Mon, Aug 31, 2009 at 7:43 PM, Ralph Castain wrote:
> I'm afraid the rank-file mapper in 1.3.3 has several known problems that
> have been described on the list by users. We hopefu
you need to check the release notes, and compare the differences.
also check the Open MPI version in both of them.
In general it's not so good idea to run different versions of the software
for performance comparison or art all.
since both of them are Open source, backward computability is not alwa
Hi all,
Does OpenMPI support VMware ?
I am trying to run OpenMPI 1.3.3 on VMware and it got stacked during OSU
benchmarks and IMB.
looks like random deadlock, I wander if anyone have ever tried it ?
thanks,
Lenny.
mostlike that you compiled MPI with --with-openib flag, but since there are
no openib devices avaliable on
n06 machine, you got an error.
you can "disable" this message by either recompilnig Open MPI without openib
flag, or by disabling openib btl
-mca btl ^openib
or
-mca btl sm,self,tcp
Lenny.
On
sound like environmental problems.
try running
$mpirun -prefix/home/jean/openmpisof/ ..
Lenny.
On Wed, Aug 19, 2009 at 5:36 PM, Jean Potsam wrote:
> Hi All,
> I'm a trying to install openmpi with self. However, I am
> experiencing some problems with openmpi itself.
>
> I have succe
1 max-slots=1
>>
>> Then this works fine:
>> [jody@aim-plankton neander]$ mpirun -np 4 -hostfile th_021 -rf rf_02
>> ./HelloMPI
>>
>> Is there an explanation for this?
>>
>> Thank You
>> Jody
>>
>> Lenny.
>>> On Mon, Aug 17, 200
ork!
> Is there a reason why the rankfile option treats
> host names differently than the hostfile option?
>
> Thanks
> Jody
>
>
>
> On Mon, Aug 17, 2009 at 11:20 AM, Lenny
> Verkhovsky wrote:
> > Hi
> > This message means
> > that you are trying to use
Hi
This message means
that you are trying to use host "plankton", that was not allocated via
hostfile or hostlist.
But according to the files and command line, everything seems fine.
Can you try using "plankton.uzh.ch" hostname instead of "plankton".
thanks
Lenny.
On Mon, Aug 17, 2009 at 10:36 AM,
Hi
http://www.open-mpi.org/faq/?category=tuning#using-paffinity
I am not familiar with this cluster, but in the FAQ ( see link above ) you
can find an example of the rankfile.
another simple example is the following:
$cat rankfile
rank 0=host1 slot=0
rank 1=host2 slot=0
rank 2=host3 slot=0
rank 3=h
Hi,
1.
The Mellanox has a newer fw for those HCAshttp://
www.mellanox.com/content/pages.php?pg=firmware_table_IH3Lx
I am not sure if it will help, but newer fw usually have some bug fixes.
2.
try to disable leave_pinned during the run. It's on by default in 1.3.3
Lenny.
On Thu, Aug 13, 2009 at 5:1
By default coll framework scans all avaliable modules and sets the avaliable
functions with the highest priorities.
So, to use tuned collectives explicetly you can higher it's priority.
-mca coll_tuned_priority 100
p.s. Collective modules can have only partial set of avaliable functions,
for exampl
can this be related ?
http://www.open-mpi.org/faq/?category=building#build-qs22
On Sun, Aug 9, 2009 at 12:22 PM, Attila Börcs wrote:
> Hi Everyone,
>
> What the regular method of compiling and running mpi code on Cell Broadband
> ppu-gcc and spu-gcc?
>
>
> Regards,
>
> Attila Borcs
>
> __
try specifing -prefix in the command line
ex: mpirun -np 4 -prefix $MPIHOME ./app
Lenny.
On Sat, Aug 8, 2009 at 5:04 PM, Kenneth Yoshimoto wrote:
>
> I don't own these nodes, so I have to use them with
> whatever path setups they came with. In particular,
> my home directory has a different p
Hi,
I am looking too for a file example of rules for dynamic collectives,
Have anybody tried it ? Where can I find a proper syntax for it ?
thanks.
Lenny.
On Thu, Jul 23, 2009 at 3:08 PM, Igor Kozin wrote:
> Hi Gus,
> I played with collectives a few months ago. Details are here
> http://www.cs
Hi,
you can find a lot of useful information under FAQ section
*http://www.open-mpi.org/faq/*
http://www.open-mpi.org/faq/?category=tuning#paffinity-defs
Lenny.
On Mon, Aug 3, 2009 at 11:55 AM, Lee Amy wrote:
> Hi,
>
> Dose OpenMPI has the processors binding like command "taskset"? For
> example,
Make sure you have Open MPI 1.3 series,
I dont think the if_include param is not avaliable in 1.2 series.
max btls controls fragmentation and load balancing over similar BTLS (
example using LMC > 0, or 2 ports connected to 1 network )
you need if_include param
On Wed, Jul 15, 2009 at 4:20 PM,
001
>>
>>
>> Thanks
>> Lenny.
>>
>>
>> On Wed, Jul 15, 2009 at 2:02 PM, Ralph Castain wrote:
>>
>>> Try your "not working" example without the -H on the mpirun cmd line -
>>> i.e.,, just use "mpirun -np 2 -rf rankfil
ave to keep asking you to try things - I don't have a setup here
> where I can test this as everything is RM managed.
>
>
> On Jul 15, 2009, at 12:09 AM, Lenny Verkhovsky wrote:
>
>
> Thanks Ralph, after playing with prefixes it worked,
>
> I still have a proble
;
> -np 1 -H witch1 hostname
> -np 1 -H witch2 hostname
>
> That should get you what you want.
> Ralph
>
> On Jul 14, 2009, at 10:29 AM, Lenny Verkhovsky wrote:
>
> No, it's not working as I expect , unless I expect somthing wrong .
> ( sorry for the long PATH
n Tue, Jul 14, 2009 at 7:08 PM, Ralph Castain wrote:
> Run it without the appfile, just putting the apps on the cmd line - does it
> work right then?
>
> On Jul 14, 2009, at 10:04 AM, Lenny Verkhovsky wrote:
>
> additional info
> I am running mpirun on hostA, and providing hos
dellix7
dellix7
Thanks
Lenny.
On Tue, Jul 14, 2009 at 4:59 PM, Ralph Castain wrote:
> Strange - let me have a look at it later today. Probably something simple
> that another pair of eyes might spot.
> On Jul 14, 2009, at 7:43 AM, Lenny Verkhovsky wrote:
>
> Seems like connected problem
t;> but "mpirun -np 3 ./something" will work though. It works, when you ask
>>>> for 1 CPU less. And the same behavior in any case (shared nodes, non-shared
>>>> nodes, multi-node)
>>>>
>>>> If you switch off rmaps_base_no_ove
I guess this question was already before
https://svn.open-mpi.org/trac/ompi/ticket/1367
On Thu, Jul 9, 2009 at 10:35 AM, Lenny Verkhovsky <
lenny.verkhov...@gmail.com> wrote:
> BTW, What kind of threads Open MPI supports ?
> I found in the https://svn.open-mpi.org/trac/ompi/b
BTW, What kind of threads Open MPI supports ?
I found in the https://svn.open-mpi.org/trac/ompi/browser/trunk/README that
we support MPI_THREAD_MULTIPLE,
and found few unclear mails about MPI_THREAD_FUNNELED and
MPI_THREAD_SERIALIZED.
Also found nothing in FAQ :(.
Thanks,Lenny.
On Thu, Jul 2, 2
Hi,
I am not an HPL expert, but this might help.
1. rankfile mapper is avaliale only from Open MPI 1.3 version, if you are
using Open MPI 1.2.8 try -mca mpi_paffinity_alone 1
2. if you are using Open MPI 1.3 you dont have to use mpi_leave_pinned 1 ,
since it's a default value
Lenny.
On Thu,
sounds like firewall problems to or from anfield04.
Lenny,
On Tue, May 12, 2009 at 8:18 AM, feng chen wrote:
> hi all,
>
> First of all,i'm new to openmpi. So i don't know much about mpi setting.
> That's why i'm following manual and FAQ suggestions from the beginning.
> Everything went well un
ional procs either byslot (default) or bynode (if you specify that
> option). So the rankfile doesn't need to contain an entry for every proc.
>
> Just don't want to confuse folks.
> Ralph
>
>
>
> On Tue, May 5, 2009 at 5:59 AM, Lenny Verkhovsky <
> lenny.verk
;> Date: Tue, 14 Apr 2009 06:55:58 -0600
>>> From: Ralph Castain
>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>>> To: Open MPI Users
>>> Message-ID:
>>> Content-Type: text/plain; charset="us-ascii"; Format="flowed&q
> Ralph
>
>
> On Mon, Apr 20, 2009 at 7:50 AM, Lenny Verkhovsky <
> lenny.verkhov...@gmail.com> wrote:
>
>> Me too, sorry, it definately seems like a bug. Somewere in the code
>> probably undefined variable.
>> I just never tested this code with such "b
> >
>> >
>> --
>> > > > > orterun: clean termination accomplished
>> > >
>> > >
>> > >
>> > > Message: 4
>> > > Date: Tue, 14 Apr 2009
Hi,
The first "crash" is OK, since your rankfile has ranks 0 and 1 defined,
while n=1, which means only rank 0 is present and can be allocated.
NP must be >= the largest rank in rankfile.
What exactly are you trying to do ?
I tried to recreate your seqv but all I got was
~/work/svn/ompi/trunk/
Hi,
can you try Open MPI 1.3 version.
On 3/9/09, Prasanna Ranganathan wrote:
>
> Hi all,
>
> I have a distributed program running on 400+ nodes and using OpenMPI. I
> have run the same binary with nearly the same setup successfully previously.
> However in my last two runs the program seems t
can you try Open MPI 1.3,
Lenny.
On 3/10/09, Tee Wen Kai wrote:
>
> Hi,
>
> I am using version 1.2.8.
>
> Thank you.
>
> Regards,
> Wenkai
>
> --- On *Mon, 9/3/09, Ralph Castain * wrote:
>
>
> From: Ralph Castain
> Subject: Re: [OMPI users] Problem with MPI_Comm_spawn_multiple &
> MPI_Info_free
We saw the same problem with compilation,
the workaround for us was configuring without vt ( ./configure --help ).
I hope vt guys will fix it somewhen .
Lenny.
On Mon, Feb 23, 2009 at 11:48 PM, Jeff Squyres wrote:
> It would be interesting to see what happens with the 1.3 build.
>
> It's hard
what kind of communication between nodes do you have - tcp, openib (
IB/IWARP ) ?
you can try
mpirun -np 4 -host node1,node2 -mca btl tcp,self random
On Wed, Feb 4, 2009 at 1:21 AM, Ralph Castain wrote:
> Could you tell us which version of OpenMPI you are using, and how it was
> configured?
>
Hi, just to make sure,
you wrote in the previous mail that you tested IMB-MPI1 and it
"reports for the last test" , and the results are for
"processes=6", since you have 4 and 8 core machines, this test could
be run on the same 8 core machine over shared memory and not over
Infiniband, as you
I didn't see any errors on 1.3rc3r20130, I am running mtt nightly
and it seems to be fine on x86-64 Centos5.
On Tue, Dec 16, 2008 at 10:27 AM, Gabriele Fatigati
wrote:
> Dear OpenMPI developers,
> trying to compile 1.3 nightly version , i get the follow error:
>
> ../../../orte/.libs/libopen-rte.
Hi,
1. please, provide #cat /proc/cpu_info
2. see http://www.open-mpi.org/faq/?category=tuning#paffinity-defs.
Best regards
Lenny.
also see https://svn.open-mpi.org/trac/ompi/ticket/1449
On 12/9/08, Lenny Verkhovsky wrote:
>
> maybe it's related to https://svn.open-mpi.org/trac/ompi/ticket/1378 ??
>
> On 12/5/08, Justin wrote:
>>
>> The reason i'd like to disable these eager buffers
maybe it's related to https://svn.open-mpi.org/trac/ompi/ticket/1378 ??
On 12/5/08, Justin wrote:
>
> The reason i'd like to disable these eager buffers is to help detect the
> deadlock better. I would not run with this for a normal run but it would be
> useful for debugging. If the deadlock i
Hi,
Sorry for not answering sooner,
In Open MPI 1.3 we added a paffinity mapping module.
The syntax is quite simple and flexible:
rank N=hostA slot=socket:core_range
rank M=hostB slot=cpu
see the fallowing example:
ex:
#mpirun -rf rankfile_name ./app
#cat rankfile_name
rank 0=host1 slot=0
you can also press "f" while"top" is running and choose option "j"
this way you will see what CPU is chosen under column P
Lenny.
On Mon, Nov 10, 2008 at 7:38 AM, Hodgess, Erin wrote:
> great!
>
> Thanks,
> Erin
>
>
> Erin M. Hodgess, PhD
> Associate Professor
> Department of Computer and Math
Mi
> [image: Inactive hide details for "Lenny Verkhovsky"
> ]"Lenny Verkhovsky" <
> lenny.verkhov...@gmail.com>
>
>
>
> *"Lenny Verkhovsky" *
> Sent by: users-boun...@open-mpi.org
>
> 10/23/2008 01:52
t;.
> Do you have idea when OpenMPI 1.3 will be available? OpenMPI 1.3 has quite
> a few features I'm looking for.
>
> Thanks,
> Mi
> [image: Inactive hide details for "Lenny Verkhovsky"
> ]"Lenny Verkhovsky" <
> lenny.verkhov...@gmail.com>
&
Hi,
If I understand you correctly the most suitable way to do it is by paffinity
that we have in Open MPI 1.3 and the trank.
how ever usually OS is distributing processes evenly between sockets by it
self.
There still no formal FAQ due to a multiple reasons but you can read how to
use it in the
./mpi_p1_4_TRUNK
-t lt
LT (2) (size min max avg) 1 3.443480 3.443480 3.443480
Best regards
Lenny.
On 10/6/08, Jeff Squyres wrote:
>
> On Oct 5, 2008, at 1:22 PM, Lenny Verkhovsky wrote:
>
> you should probably use -mca tcp,self -mca btl_openib_if_include ib0.8109
>>
>>
>
Hi,
you should probably use -mca tcp,self -mca btl_openib_if_include ib0.8109
Lenny.
On 10/3/08, Matt Burgess wrote:
>
> Hi,
>
>
> I'm trying to get openmpi working over openib partitions. On this cluster,
> the partition number is 0x109. The ib interfaces are pingable over the
> appropriate i
Hi,
just for the try - can run np 2
( Ping Pong test is for 2 processes only )
On 8/13/08, Daniël Mantione wrote:
>
>
>
> On Tue, 12 Aug 2008, Gus Correa wrote:
>
> > Hello Daniel and list
> >
> > Could it be a problem with memory bandwidth / contention in multi-core?
>
>
> Yes, I believe we ar
Hi,
check in /usr/lib it's usually folder for 32bit libraries.
I think OFED1.3 comes already with Open MPI so it should be installed by
default.
BTW, OFED1.3.1 comes with Open MPI 1.2.6 .
Lenny.
On 8/12/08, Mohd Radzi Nurul Azri wrote:
>
> Hi,
>
>
> Thanks for the prompt reply. This might b
you can also provide a full path to your mpi
#/usr/lib/openmpi/1.2.5-gcc/bin/mpiexec -n 2 ./a.out
On 8/12/08, jody wrote:
>
> No.
> The PATH variable simply tells the system in which order the
> directories should be searched for executables.
>
> so in .bash_profile just add the line
> PATH=/u
Sles10sp1
On 8/1/08, Scott Beardsley wrote:
>
> we might be running different OS's. I'm running RHEL 4U4
>>
>
> CentOS 5.2 here
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
try to use only openib
make sure you use nightly after r19092
On 7/31/08, Gabriele Fatigati wrote:
>
> Mm, i've tried to disable shared memory but the problem remains. Is it
> normal?
>
> 2008/7/31 Jeff Squyres
>
>> There is very definitely a shared memory bug on the trunk at the moment
>> that
maybe it's related to #1378 PML ob1 deadlock for ping/ping ?
On 7/14/08, Jeff Squyres wrote:
>
> What application is it? The majority of the message passing engine did not
> change in the 1.2 series; we did add a new option into 1.2.6 for disabling
> early completion:
>
>
> http://www.open-mpi
68 matches
Mail list logo