date:20160614

Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

2016-06-14 Thread Peter Kjellström

On Mon, 13 Jun 2016 19:04:59 -0400 Mehmet Belgin wrote: > Greetings! > > We have not upgraded our OFED stack for a very long time, and still > running on an ancient version (1.5.4.1, yeah we know). We are now > considering a big jump from this version to a tested and stable > recent version an

Re: [OMPI users] OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

2016-06-14 Thread Gilles Gouaillardet

HI, Unless you provide Open MPI static libraries, you might not be required to rebuild your apps. You will likely have to / should rebuild OpenMPI though Cheers, Gilles Peter Kjellström wrote: >On Mon, 13 Jun 2016 19:04:59 -0400 >Mehmet Belgin wrote: > >> Greetings! >> >> We have not upgrad

[OMPI users] scatter/gather, tcp, 3 nodes, homogeneous, # RAM

2016-06-14 Thread MM

Hello, I have the following 3 1-socket nodes: node1: 4GB RAM 2-core: rank 0 rank 1 node2: 4GB RAM 4-core: rank 2 rank 3 rank 4 rank 5 node3: 8GB RAM 4-core: rank 6 rank 7 rank 8 rank 9 I have a model that takes a input and produces a output, and I want to run this model for N possible combi

Re: [OMPI users] scatter/gather, tcp, 3 nodes, homogeneous, # RAM

2016-06-14 Thread Gilles Gouaillardet

Note if your program is synchronous, it will run at the speed of the slowest task. (E.g. Tasks on node 2, 1GB per task, will wait for the other tasks, 2 GB per task) You can use MPI_Comm_split_type in order to create inter node communicators. Then you can find how much memory is available per task

[OMPI users] MPI_File_read+MPI_BOTTOM crash on NFS ?

2016-06-14 Thread Nicolas Joly

Hi, At work, i do have some mpi codes that make use of custom datatypes to call MPI_File_read with MPI_BOTTOM ... It mostly works, except when the underlying filesystem is NFS where if crash with SIGSEGV. The attached sample (code + data) works just fine with 1.10.1 on my NetBSD/amd64 workstatio

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Ralph Castain

I dug into this a bit (with some help from others) and found that the spawn code appears to be working correctly - it is the test in orte/test that is wrong. The test has been correctly updated in the 2.x and master repos, but we failed to backport it to the 1.10 series. I have done so this morn

Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

2016-06-14 Thread Grigory Shamov

On 2016-06-14, 3:42 AM, "users on behalf of Peter Kjellström" wrote: >On Mon, 13 Jun 2016 19:04:59 -0400 >Mehmet Belgin wrote: > >> Greetings! >> >> We have not upgraded our OFED stack for a very long time, and still >> running on an ancient version (1.5.4.1, yeah we know). We are now >> cons

Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

2016-06-14 Thread Llolsten Kaonga

Hello Grigory, I am not sure what Redhat does exactly but when you install the OS, there is always an InfiniBand Support module during the installation process. We never check/install that module when we do OS installations because it is usually several versions of OFED behind (almost obsolete).

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Jason Maldonis

Hi Ralph, et. al, Great, thank you for the help. I downloaded the mpi loop spawn test directly from what I think is the master repo on github: https://github.com/open-mpi/ompi/blob/master/orte/test/mpi/loop_spawn.c I am still using the mpi code from 1.10.2, however. Is that test updated with the

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Ralph Castain

Hmm…I’m unable to replicate a problem on my machines. What fabric are you using? Does the problem go away if you add “-mca btl tcp,sm,self” to the mpirun cmd line? > On Jun 14, 2016, at 11:15 AM, Jason Maldonis wrote: > > Hi Ralph, et. al, > > Great, thank you for the help. I downloaded the m

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Nathan Hjelm

That message is coming from udcm in the openib btl. It indicates some sort of failure in the connection mechanism. It can happen if the listening thread no longer exists or is taking too long to process messages. -Nathan On Jun 14, 2016, at 12:20 PM, Ralph Castain wrote: Hmm…I’m unable to r

[OMPI users] Client-Server Shared Memory Transport

2016-06-14 Thread Louis Williams

Hi, I am attempting to use the sm and vader BTLs between a client and server process, but it seems impossible to use fast transports (i.e. not TCP) between two independent groups started with two separate mpirun invocations. Am I correct, or is there a way to communicate using shared memory betwee

Re: [OMPI users] Client-Server Shared Memory Transport

2016-06-14 Thread Ralph Castain

Nope - we don’t currently support cross-job shared memory operations. Nathan has talked about doing so for vader, but not at this time. > On Jun 14, 2016, at 12:38 PM, Louis Williams > wrote: > > Hi, > > I am attempting to use the sm and vader BTLs between a client and server > process, but

Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

2016-06-14 Thread Ryan Novosielski

I'm pretty sure it is based on a version: [root@perceval2 prepJobs]# modinfo mlx4_core filename: /lib/modules/3.10.0-229.20.1.el7.x86_64/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko version:2.2-1 license:Dual BSD/GPL description:Mellanox ConnectX HCA low-level driver

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Jason Maldonis

Ralph, The problem *does* go away if I add "-mca btl tcp,sm,self" to the mpiexec cmd line. (By the way, I am using mpiexec rather than mpirun; do you recommend one over the other?) Will you tell me what this means for me? For example, should I always append these arguments to mpiexec for my non-tes

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Ralph Castain

You don’t want to always use those options as your performance will take a hit - TCP vs Infiniband isn’t a good option. Sadly, this is something we need someone like Nathan to address as it is a bug in the code base, and in an area I’m not familiar with For now, just use TCP so you can move for

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Jason Maldonis

Thanks Ralph for all the help. I will do that until it gets fixed. Nathan, I am very very interested in this working because we are developing some new cool code for research in materials science. This is the last piece of the puzzle for us I believe. I can use TCP for now though of course. While

Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

2016-06-14 Thread Mehmet Belgin

Hi Llolsten, We are trying to keep up with the latest Open MPI as we can and keep a few old versions around (oldest is 1.6), with RHEL 6.5. The OFED upgrade will complement planned OS upgrades to the latest stable RHEL 6.x (probably 6.7, not sure if we will go with 6.8). Did you have to rec

Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

Re: [OMPI users] OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

[OMPI users] scatter/gather, tcp, 3 nodes, homogeneous, # RAM

Re: [OMPI users] scatter/gather, tcp, 3 nodes, homogeneous, # RAM

[OMPI users] MPI_File_read+MPI_BOTTOM crash on NFS ?

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

[OMPI users] Client-Server Shared Memory Transport

Re: [OMPI users] Client-Server Shared Memory Transport

Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

18 matches

Site Navigation

Mail list logo

Footer information