Dear all,
We would like to know if the ethernet interfaces play any role in the
startup phase of an opempi job using InfiniBand
In this case, where we can found some literature on this topic?
This interest arises from some observations of a substantial time
overhead on the startup of our
Hmm. It's not immediately clear to me what's going wrong here.
I hate to ask, but could you install a debugging version of Open MPI and
capture a proper stack trace of the segv?
Also, could you try the 1.4.4 rc and see if that magically fixes the problem?
(I'm about to post a new 1.4.4 rc late
On Sep 27, 2011, at 6:35 AM, Salvatore Podda wrote:
> We would like to know if the ethernet interfaces play any role in the
> startup phase of an opempi job using InfiniBand
> In this case, where we can found some literature on this topic?
Unfortunately, there's not a lot of docs about thi
Do you know if is there another patch available so my application treats the
fail of one node instead of mpi kill the job? This is very important for me,
I have a big cluster and I can't stop my job every time I have some problem
with just one node.
Regards
On Fri, Sep 23, 2011 at 4:34 PM, Ralph
Here is another possibly non-helpful suggestion. :) Change:
char* name[20];
int maxlen = 20;
To:
char name[256];
int maxlen = 256;
gethostname() is supposed to properly truncate the hostname it returns if the
actual name is longer than the length provided, but since you h
Hi,
we have a clustersetup with all nodes slot=1 (although 12 cores are present).
Now we would like to alternate the machinefile for a specific User.
I found this hint:
http://www.open-mpi.org/faq/?category=tm
Is this still valid?
We have openMPI v 1.4.3 running.
Trying to generate an own mac
Thanks, but my main concern is the segfault :P I changed and as I
expected it still segfaults.
On 9/27/11 9:48 AM, Henderson, Brent wrote:
Here is another possibly non-helpful suggestion. :) Change:
char* name[20];
int maxlen = 20;
To:
char name[256];
int maxlen = 2
Yes, I've been copying around the source tree. That was the problem. If I am
careful to preserve the original timestamps, there are no problems.
Thanks
-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf
Of Jeff Squyres
Sent: Monday, Septem
Any chance that the stacksize in the head node is too small,
compared to the compute nodes?
Small stacksize can cause segfaults.
Check /etc/security/limits.conf (and man limits.conf).
You could set it to unlimited (say, along with locked memory and
perhaps number of open files):
* - stack
Am 27.09.2011 um 01:16 schrieb Jeff Squyres:
On Sep 26, 2011, at 6:53 PM, Blosch, Edwin L wrote:
Actually I can download OpenMPI 1.5.4, 1.4.4rc3 or 1.4.3 - and ALL
of them build just fine.
Apparently what isn't working is the version of 1.4.3 that I have
downloaded and copied from place t
char* name[20]; yields 20 (undefined) pointers to char, guess you mean
char name[20];
So Brent's suggestion should work as well(?)
To be safe I would also add:
gethostname(name,maxlen);
name[19] = '\0';
printf("Hello, world. I am %d of %d and host %s \n", rank, ...
Cheers
On 09/27/2011 07:40 P
Hi,
Am 27.09.2011 um 16:00 schrieb Wiegers, Bert:
we have a clustersetup with all nodes slot=1 (although 12 cores are
present).
Now we would like to alternate the machinefile for a specific User.
I found this hint:
http://www.open-mpi.org/faq/?category=tm
Is this still valid?
We have openM
On Thu, Sep 22, 2011 at 11:37:10PM +0200, German Hoecht wrote:
> Hello,
>
> MPI_File_read/write functions uses an integer to specify the size of
> the buffer, for instance:
> int MPI_File_read(MPI_File fh, void *buf, int count, MPI_Datatype
> datatype, MPI_Status *status)
> with:
> count Numb
On 09/27/2011 07:50 AM, Jeff Squyres wrote:
> On Sep 27, 2011, at 6:35 AM, Salvatore Podda wrote:
>
>> We would like to know if the ethernet interfaces play any role in the
>> startup phase of an opempi job using InfiniBand
>> In this case, where we can found some literature on this topic?
Thanks Josh,
Just yesterday I stumbled upon another interesting detail about this
issue. While reconfiguring things, I accidentally ran as root, and the
checkpointing all succeeded. I'm not sure though how to go about
finding what file things are hanging up on. I've compared straces as
roo
On Sep 27, 2011, at 5:03 PM, Prentice Bisbal wrote:
> To clarify, is IP/Ethernet required, or will IPoIB be used if it's
> configured on the nodes? Would this make a difference.
IPoIB is fine, although I've heard concerns about its stability at scale.
The difference that it'll make is that it's
16 matches
Mail list logo