[OMPI users] What Red Hat Enterprise/CentOS NUMA libraries are recommended/required for OpenMPI?

2015-08-05 Thread Lane, William
I'm running OpenMPI 1.8.7 tests on a mixed bag cluster of various systems
under CentOS 6.3, I've been intermittently getting warnings about not having
the proper NUMA libraries installed. Which NUMA libraries should be installed
for CentOS 6.3 and OpenMPI 1.8.7?

Here's what I currently have installed:

yum list installed *numa*
numactl.x86_64

Here's the list of available NUMA libraries for CentOS 6.3:

listed via: yum search numa | less
numactl-devel.i686 : Development package for building Applications that use 
numa
numactl-devel.x86_64 : Development package for building Applications that 
use
 : numa
numad.x86_64 : NUMA user daemon
numactl.i686 : Library for tuning for Non Uniform Memory Access machines
numactl.x86_64 : Library for tuning for Non Uniform Memory Access machines

Also, since this cluster actually has working NUMA nodes, could the lack
of the proper NUMA libraries being installed cause any issues?

-Bill L.
IMPORTANT WARNING: This message is intended for the use of the person or entity 
to which it is addressed and may contain information that is privileged and 
confidential, the disclosure of which is governed by applicable law. If the 
reader of this message is not the intended recipient, or the employee or agent 
responsible for delivering it to the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this information is 
strictly prohibited. Thank you for your cooperation.


Re: [OMPI users] What Red Hat Enterprise/CentOS NUMA libraries are recommended/required for OpenMPI?

2015-08-05 Thread Ralph Castain
Hi Bill

You need numactl-devel on the nodes. Not having them means we cannot ensure
memory is bound local to the procs, which will hurt performance but not
much else. There is an MCA param to turn off the warnings if you choose not
to install the libs: hwloc_base_mem_bind_failure_action=silent

Ralph


On Tue, Aug 4, 2015 at 10:36 PM, Lane, William 
wrote:

> I'm running OpenMPI 1.8.7 tests on a mixed bag cluster of various systems
> under CentOS 6.3, I've been intermittently getting warnings about not
> having
> the proper NUMA libraries installed. Which NUMA libraries should be
> installed
> for CentOS 6.3 and OpenMPI 1.8.7?
>
> Here's what I currently have installed:
>
> yum list installed *numa*
> numactl.x86_64
>
> Here's the list of available NUMA libraries for CentOS 6.3:
>
> listed via: yum search numa | less
> numactl-devel.i686 : Development package for building Applications
> that use numa
> numactl-devel.x86_64 : Development package for building Applications
> that use
>  : numa
> numad.x86_64 : NUMA user daemon
> numactl.i686 : Library for tuning for Non Uniform Memory Access
> machines
> numactl.x86_64 : Library for tuning for Non Uniform Memory Access
> machines
>
> Also, since this cluster actually has working NUMA nodes, could the lack
> of the proper NUMA libraries being installed cause any issues?
>
> -Bill L.
> IMPORTANT WARNING: This message is intended for the use of the person or
> entity to which it is addressed and may contain information that is
> privileged and confidential, the disclosure of which is governed by
> applicable law. If the reader of this message is not the intended
> recipient, or the employee or agent responsible for delivering it to the
> intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this information is strictly prohibited. Thank
> you for your cooperation.
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/08/27394.php
>


Re: [OMPI users] segfault on java binding from MPI.init()

2015-08-05 Thread Howard Pritchard
Hi Nate,

Sorry for the delay in getting back.  Thanks for the sanity check.  You may
have a point about the args string to MPI.init -
there's nothing the Open MPI is needing from this but that is a difference
with your use case - your app has an argument.

Would you mind adding a

System.gc()

call immediately after MPI.init call and see if the gc blows up with a
segfault?

Also, may be interesting to add the -verbose:jni to your command line.

We'll do some experiments here with the init string arg.

Is your app open source where we could download it and try to reproduce the
problem locally?

thanks,

Howard


2015-08-04 18:52 GMT-06:00 Nate Chambers :

> Sanity checks pass. Both Hello and Ring.java run correctly with the
> expected program's output.
>
> Does MPI.init(args) expect anything from those command-line args?
>
>
> Nate
>
>
> On Tue, Aug 4, 2015 at 12:26 PM, Howard Pritchard 
> wrote:
>
>> Hello Nate,
>>
>> As a sanity check of your installation, could you try to compile the
>> examples/*.java codes using the mpijavac you've installed and see that
>> those run correctly?
>> I'd be just interested in the Hello.java and Ring.java?
>>
>> Howard
>>
>>
>>
>>
>>
>>
>>
>> 2015-08-04 14:34 GMT-06:00 Nate Chambers :
>>
>>> Sure, I reran the configure with CC=gcc and then make install. I think
>>> that's the proper way to do it. Attached is my config log. The behavior
>>> when running our code appears to be the same. The output is the same error
>>> I pasted in my email above. It occurs when calling MPI.init().
>>>
>>> I'm not great at debugging this sort of stuff, but happy to try things
>>> out if you need me to.
>>>
>>> Nate
>>>
>>>
>>> On Tue, Aug 4, 2015 at 5:09 AM, Howard Pritchard 
>>> wrote:
>>>
 Hello Nate,

 As a first step to addressing this, could you please try using gcc
 rather than the Intel compilers to build Open MPI?

 We've been doing a lot of work recently on the java bindings, etc. but
 have never tried using any compilers other
 than gcc when working with the java bindings.

 Thanks,

 Howard


 2015-08-03 17:36 GMT-06:00 Nate Chambers :

> We've been struggling with this error for a while, so hoping someone
> more knowledgeable can help!
>
> Our java MPI code exits with a segfault during its normal operation, *but
> the segfault occurs before our code ever uses MPI functionality like
> sending/receiving. *We've removed all message calls and any use of
> MPI.COMM_WORLD from the code. The segfault occurs if we call 
> MPI.init(args)
> in our code, and does not if we comment that line out. Further vexing us,
> the crash doesn't happen at the point of the MPI.init call, but later on 
> in
> the program. I don't have an easy-to-run example here because our non-MPI
> code is so large and complicated. We have run simpler test programs with
> MPI and the segfault does not occur.
>
> We have isolated the line where the segfault occurs. However, if we
> comment that out, the program will run longer, but then randomly (but
> deterministically) segfault later on in the code. Does anyone have tips on
> how to debug this? We have tried several flags with mpirun, but no good
> clues.
>
> We have also tried several MPI versions, including stable 1.8.7 and
> the most recent 1.8.8rc1
>
>
> ATTACHED
> - config.log from installation
> - output from `ompi_info -all`
>
>
> OUTPUT FROM RUNNING
>
> > mpirun -np 2 java -mx4g FeaturizeDay datadir/ days.txt
> ...
> some normal output from our code
> ...
>
> --
> mpirun noticed that process rank 0 with PID 29646 on node r9n69 exited
> on signal 11 (Segmentation fault).
>
> --
>
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/08/27386.php
>


 ___
 users mailing list
 us...@open-mpi.org
 Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
 Link to this post:
 http://www.open-mpi.org/community/lists/users/2015/08/27389.php

>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/08/27391.php
>>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this 

[OMPI users] Son of Grid Engine, Parallel Environments and OpenMPI 1.8.7

2015-08-05 Thread Lane, William
I read @

https://www.open-mpi.org/faq/?category=sge

that for OpenMPI Parallel Environments there's
a special consideration for Son of Grid Engine:

   '"qsort_args" is necessary with the Son of Grid Engine distribution,
   version 8.1.1 and later, and probably only applicable to it.  For
   very old versions of SGE, omit "accounting_summary" too.'

Does this requirement still hold true for OpenMPI 1.8.7? Because
the webpage above only refers to much older versions of OpenMPI.

I also want to thank Ralph for all his help in debugging the manifold
problems w/our mixed bag cluster.

-Bill Lane




IMPORTANT WARNING: This message is intended for the use of the person or entity 
to which it is addressed and may contain information that is privileged and 
confidential, the disclosure of which is governed by applicable law. If the 
reader of this message is not the intended recipient, or the employee or agent 
responsible for delivering it to the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this information is 
strictly prohibited. Thank you for your cooperation.


Re: [OMPI users] Son of Grid Engine, Parallel Environments and OpenMPI 1.8.7

2015-08-05 Thread Ralph Castain
You know, I honestly don't know - there is a patch in there for qsort, but
I haven't checked it against SGE. Let us know if you hit a problem and
we'll try to figure it out.

Glad to hear your cluster is working - nice to have such challenges to
shake the cobwebs out :-)

On Wed, Aug 5, 2015 at 12:43 PM, Lane, William 
wrote:

> I read @
>
> https://www.open-mpi.org/faq/?category=sge
>
> that for OpenMPI Parallel Environments there's
> a special consideration for Son of Grid Engine:
>
>'"qsort_args" is necessary with the Son of Grid Engine distribution,
>version 8.1.1 and later, and probably only applicable to it.  For
>very old versions of SGE, omit "accounting_summary" too.'
> Does this requirement still hold true for OpenMPI 1.8.7? Because
> the webpage above only refers to much older versions of OpenMPI.
>
> I also want to thank Ralph for all his help in debugging the manifold
> problems w/our mixed bag cluster.
>
> -Bill Lane
>
>
>
>
> IMPORTANT WARNING: This message is intended for the use of the person or
> entity to which it is addressed and may contain information that is
> privileged and confidential, the disclosure of which is governed by
> applicable law. If the reader of this message is not the intended
> recipient, or the employee or agent responsible for delivering it to the
> intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this information is strictly prohibited. Thank
> you for your cooperation.
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/08/27397.php
>


Re: [OMPI users] segfault on java binding from MPI.init()

2015-08-05 Thread Nate Chambers
Howard,

Thanks for looking at all this. Adding System.gc() did not cause it to
segfault. The segfault still comes much later in the processing.

I was able to reduce my code to a single test file without other
dependencies. It is attached. This code simply opens a text file and reads
its lines, one by one. Once finished, it closes and opens the same file and
reads the lines again. On my system, it does this about 4 times until the
segfault fires. Obviously this code makes no sense, but it's based on our
actual code that reads millions of lines of data and does various
processing to it.

Attached is a tweets.tgz file that you can uncompress to have an input
directory. The text file is just the same line over and over again. Run it
as:

*java MPITestBroke tweets/*


Nate





On Wed, Aug 5, 2015 at 8:29 AM, Howard Pritchard 
wrote:

> Hi Nate,
>
> Sorry for the delay in getting back.  Thanks for the sanity check.  You
> may have a point about the args string to MPI.init -
> there's nothing the Open MPI is needing from this but that is a difference
> with your use case - your app has an argument.
>
> Would you mind adding a
>
> System.gc()
>
> call immediately after MPI.init call and see if the gc blows up with a
> segfault?
>
> Also, may be interesting to add the -verbose:jni to your command line.
>
> We'll do some experiments here with the init string arg.
>
> Is your app open source where we could download it and try to reproduce
> the problem locally?
>
> thanks,
>
> Howard
>
>
> 2015-08-04 18:52 GMT-06:00 Nate Chambers :
>
>> Sanity checks pass. Both Hello and Ring.java run correctly with the
>> expected program's output.
>>
>> Does MPI.init(args) expect anything from those command-line args?
>>
>>
>> Nate
>>
>>
>> On Tue, Aug 4, 2015 at 12:26 PM, Howard Pritchard 
>> wrote:
>>
>>> Hello Nate,
>>>
>>> As a sanity check of your installation, could you try to compile the
>>> examples/*.java codes using the mpijavac you've installed and see that
>>> those run correctly?
>>> I'd be just interested in the Hello.java and Ring.java?
>>>
>>> Howard
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2015-08-04 14:34 GMT-06:00 Nate Chambers :
>>>
 Sure, I reran the configure with CC=gcc and then make install. I think
 that's the proper way to do it. Attached is my config log. The behavior
 when running our code appears to be the same. The output is the same error
 I pasted in my email above. It occurs when calling MPI.init().

 I'm not great at debugging this sort of stuff, but happy to try things
 out if you need me to.

 Nate


 On Tue, Aug 4, 2015 at 5:09 AM, Howard Pritchard 
 wrote:

> Hello Nate,
>
> As a first step to addressing this, could you please try using gcc
> rather than the Intel compilers to build Open MPI?
>
> We've been doing a lot of work recently on the java bindings, etc. but
> have never tried using any compilers other
> than gcc when working with the java bindings.
>
> Thanks,
>
> Howard
>
>
> 2015-08-03 17:36 GMT-06:00 Nate Chambers :
>
>> We've been struggling with this error for a while, so hoping someone
>> more knowledgeable can help!
>>
>> Our java MPI code exits with a segfault during its normal operation, *but
>> the segfault occurs before our code ever uses MPI functionality like
>> sending/receiving. *We've removed all message calls and any use of
>> MPI.COMM_WORLD from the code. The segfault occurs if we call 
>> MPI.init(args)
>> in our code, and does not if we comment that line out. Further vexing us,
>> the crash doesn't happen at the point of the MPI.init call, but later on 
>> in
>> the program. I don't have an easy-to-run example here because our non-MPI
>> code is so large and complicated. We have run simpler test programs with
>> MPI and the segfault does not occur.
>>
>> We have isolated the line where the segfault occurs. However, if we
>> comment that out, the program will run longer, but then randomly (but
>> deterministically) segfault later on in the code. Does anyone have tips 
>> on
>> how to debug this? We have tried several flags with mpirun, but no good
>> clues.
>>
>> We have also tried several MPI versions, including stable 1.8.7 and
>> the most recent 1.8.8rc1
>>
>>
>> ATTACHED
>> - config.log from installation
>> - output from `ompi_info -all`
>>
>>
>> OUTPUT FROM RUNNING
>>
>> > mpirun -np 2 java -mx4g FeaturizeDay datadir/ days.txt
>> ...
>> some normal output from our code
>> ...
>>
>> --
>> mpirun noticed that process rank 0 with PID 29646 on node r9n69
>> exited on signal 11 (Segmentation fault).
>>
>> --
>>
>>
>>
>>
>> __

[OMPI users] bad XRC API

2015-08-05 Thread Andy Wettstein
Hi,

I updated our OpenMPI install from 1.8.3 to 1.8.8 today and I'm getting
this error:

XRC error: bad XRC API (require XRC from OFED pre 3.12).

This happens even using the exact same node to compile and run an
example program. I saw a thread from a few weeks ago discussing this
issue as well. I changed the dlsym if statement in btl_openib_xrc.c to
this:

if (NULL != dlsym(lib, "ibv_create_xrc_recv_qp@@IBVERBS_1.1")) {

This seems to make the error message go away, so there must be something
about that check that doesn't work right without the version in it.

This is on a Scientific Linux 6.6 machine with MLNX_OFED 1.5.3-4.0.42. 

Andy

-- 
andy wettstein
hpc system administrator
research computing center
university of chicago
773.702.1104



Re: [OMPI users] segfault on java binding from MPI.init()

2015-08-05 Thread Howard Pritchard
thanks Nate.  We will give the test a try.

--

sent from my smart phonr so no good type.

Howard
On Aug 5, 2015 2:42 PM, "Nate Chambers"  wrote:

> Howard,
>
> Thanks for looking at all this. Adding System.gc() did not cause it to
> segfault. The segfault still comes much later in the processing.
>
> I was able to reduce my code to a single test file without other
> dependencies. It is attached. This code simply opens a text file and reads
> its lines, one by one. Once finished, it closes and opens the same file and
> reads the lines again. On my system, it does this about 4 times until the
> segfault fires. Obviously this code makes no sense, but it's based on our
> actual code that reads millions of lines of data and does various
> processing to it.
>
> Attached is a tweets.tgz file that you can uncompress to have an input
> directory. The text file is just the same line over and over again. Run it
> as:
>
> *java MPITestBroke tweets/*
>
>
> Nate
>
>
>
>
>
> On Wed, Aug 5, 2015 at 8:29 AM, Howard Pritchard 
> wrote:
>
>> Hi Nate,
>>
>> Sorry for the delay in getting back.  Thanks for the sanity check.  You
>> may have a point about the args string to MPI.init -
>> there's nothing the Open MPI is needing from this but that is a
>> difference with your use case - your app has an argument.
>>
>> Would you mind adding a
>>
>> System.gc()
>>
>> call immediately after MPI.init call and see if the gc blows up with a
>> segfault?
>>
>> Also, may be interesting to add the -verbose:jni to your command line.
>>
>> We'll do some experiments here with the init string arg.
>>
>> Is your app open source where we could download it and try to reproduce
>> the problem locally?
>>
>> thanks,
>>
>> Howard
>>
>>
>> 2015-08-04 18:52 GMT-06:00 Nate Chambers :
>>
>>> Sanity checks pass. Both Hello and Ring.java run correctly with the
>>> expected program's output.
>>>
>>> Does MPI.init(args) expect anything from those command-line args?
>>>
>>>
>>> Nate
>>>
>>>
>>> On Tue, Aug 4, 2015 at 12:26 PM, Howard Pritchard 
>>> wrote:
>>>
 Hello Nate,

 As a sanity check of your installation, could you try to compile the
 examples/*.java codes using the mpijavac you've installed and see that
 those run correctly?
 I'd be just interested in the Hello.java and Ring.java?

 Howard







 2015-08-04 14:34 GMT-06:00 Nate Chambers :

> Sure, I reran the configure with CC=gcc and then make install. I think
> that's the proper way to do it. Attached is my config log. The behavior
> when running our code appears to be the same. The output is the same error
> I pasted in my email above. It occurs when calling MPI.init().
>
> I'm not great at debugging this sort of stuff, but happy to try things
> out if you need me to.
>
> Nate
>
>
> On Tue, Aug 4, 2015 at 5:09 AM, Howard Pritchard 
> wrote:
>
>> Hello Nate,
>>
>> As a first step to addressing this, could you please try using gcc
>> rather than the Intel compilers to build Open MPI?
>>
>> We've been doing a lot of work recently on the java bindings, etc.
>> but have never tried using any compilers other
>> than gcc when working with the java bindings.
>>
>> Thanks,
>>
>> Howard
>>
>>
>> 2015-08-03 17:36 GMT-06:00 Nate Chambers :
>>
>>> We've been struggling with this error for a while, so hoping someone
>>> more knowledgeable can help!
>>>
>>> Our java MPI code exits with a segfault during its normal operation, 
>>> *but
>>> the segfault occurs before our code ever uses MPI functionality like
>>> sending/receiving. *We've removed all message calls and any use of
>>> MPI.COMM_WORLD from the code. The segfault occurs if we call 
>>> MPI.init(args)
>>> in our code, and does not if we comment that line out. Further vexing 
>>> us,
>>> the crash doesn't happen at the point of the MPI.init call, but later 
>>> on in
>>> the program. I don't have an easy-to-run example here because our 
>>> non-MPI
>>> code is so large and complicated. We have run simpler test programs with
>>> MPI and the segfault does not occur.
>>>
>>> We have isolated the line where the segfault occurs. However, if we
>>> comment that out, the program will run longer, but then randomly (but
>>> deterministically) segfault later on in the code. Does anyone have tips 
>>> on
>>> how to debug this? We have tried several flags with mpirun, but no good
>>> clues.
>>>
>>> We have also tried several MPI versions, including stable 1.8.7 and
>>> the most recent 1.8.8rc1
>>>
>>>
>>> ATTACHED
>>> - config.log from installation
>>> - output from `ompi_info -all`
>>>
>>>
>>> OUTPUT FROM RUNNING
>>>
>>> > mpirun -np 2 java -mx4g FeaturizeDay datadir/ days.txt
>>> ...
>>> some normal output from our 

Re: [OMPI users] Son of Grid Engine, Parallel Environments and OpenMPI 1.8.7

2015-08-05 Thread Lane, William
Actually, we're still having problems submitting OpenMPI 1.8.7 jobs
to the cluster thru SGE (which we need to do in order to track usage
stats on the cluster). I suppose I'll make a PE w/the appropriate settings
and see if that makes a difference.

-Bill L


From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
[r...@open-mpi.org]
Sent: Wednesday, August 05, 2015 1:18 PM
To: Open MPI Users
Subject: Re: [OMPI users] Son of Grid Engine, Parallel Environments and OpenMPI 
1.8.7

You know, I honestly don't know - there is a patch in there for qsort, but I 
haven't checked it against SGE. Let us know if you hit a problem and we'll try 
to figure it out.

Glad to hear your cluster is working - nice to have such challenges to shake 
the cobwebs out :-)

On Wed, Aug 5, 2015 at 12:43 PM, Lane, William 
mailto:william.l...@cshs.org>> wrote:
I read @

https://www.open-mpi.org/faq/?category=sge

that for OpenMPI Parallel Environments there's
a special consideration for Son of Grid Engine:

   '"qsort_args" is necessary with the Son of Grid Engine distribution,
   version 8.1.1 and later, and probably only applicable to it.  For
   very old versions of SGE, omit "accounting_summary" too.'

Does this requirement still hold true for OpenMPI 1.8.7? Because
the webpage above only refers to much older versions of OpenMPI.

I also want to thank Ralph for all his help in debugging the manifold
problems w/our mixed bag cluster.

-Bill Lane




IMPORTANT WARNING: This message is intended for the use of the person or entity 
to which it is addressed and may contain information that is privileged and 
confidential, the disclosure of which is governed by applicable law. If the 
reader of this message is not the intended recipient, or the employee or agent 
responsible for delivering it to the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this information is 
strictly prohibited. Thank you for your cooperation.

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/08/27397.php

IMPORTANT WARNING: This message is intended for the use of the person or entity 
to which it is addressed and may contain information that is privileged and 
confidential, the disclosure of which is governed by applicable law. If the 
reader of this message is not the intended recipient, or the employee or agent 
responsible for delivering it to the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this information is 
strictly prohibited. Thank you for your cooperation.


Re: [OMPI users] Son of Grid Engine, Parallel Environments and OpenMPI 1.8.7

2015-08-05 Thread Ralph Castain
Well that stinks! Let me know what's going on and I'll take a look. FWIW,
the best method is generally to configure OMPI with --enable-debug, and
then run with "--leave-session-attached --mca plm_base_verbose 5". That
will tell us what the launcher thinks it is doing and what the daemons
think is wrong.


On Wed, Aug 5, 2015 at 3:17 PM, Lane, William  wrote:

> Actually, we're still having problems submitting OpenMPI 1.8.7 jobs
> to the cluster thru SGE (which we need to do in order to track usage
> stats on the cluster). I suppose I'll make a PE w/the appropriate settings
> and see if that makes a difference.
>
> -Bill L
>
> --
> *From:* users [users-boun...@open-mpi.org] on behalf of Ralph Castain [
> r...@open-mpi.org]
> *Sent:* Wednesday, August 05, 2015 1:18 PM
> *To:* Open MPI Users
> *Subject:* Re: [OMPI users] Son of Grid Engine, Parallel Environments and
> OpenMPI 1.8.7
>
> You know, I honestly don't know - there is a patch in there for qsort, but
> I haven't checked it against SGE. Let us know if you hit a problem and
> we'll try to figure it out.
>
> Glad to hear your cluster is working - nice to have such challenges to
> shake the cobwebs out :-)
>
> On Wed, Aug 5, 2015 at 12:43 PM, Lane, William 
> wrote:
>
>> I read @
>>
>> https://www.open-mpi.org/faq/?category=sge
>>
>> that for OpenMPI Parallel Environments there's
>> a special consideration for Son of Grid Engine:
>>
>>'"qsort_args" is necessary with the Son of Grid Engine distribution,
>>version 8.1.1 and later, and probably only applicable to it.  For
>>very old versions of SGE, omit "accounting_summary" too.'
>> Does this requirement still hold true for OpenMPI 1.8.7? Because
>> the webpage above only refers to much older versions of OpenMPI.
>>
>> I also want to thank Ralph for all his help in debugging the manifold
>> problems w/our mixed bag cluster.
>>
>> -Bill Lane
>>
>>
>>
>>
>> IMPORTANT WARNING: This message is intended for the use of the person or
>> entity to which it is addressed and may contain information that is
>> privileged and confidential, the disclosure of which is governed by
>> applicable law. If the reader of this message is not the intended
>> recipient, or the employee or agent responsible for delivering it to the
>> intended recipient, you are hereby notified that any dissemination,
>> distribution or copying of this information is strictly prohibited. Thank
>> you for your cooperation.
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/08/27397.php
>>
>
> IMPORTANT WARNING: This message is intended for the use of the person or
> entity to which it is addressed and may contain information that is
> privileged and confidential, the disclosure of which is governed by
> applicable law. If the reader of this message is not the intended
> recipient, or the employee or agent responsible for delivering it to the
> intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this information is strictly prohibited. Thank
> you for your cooperation.
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/08/27402.php
>


Re: [OMPI users] bad XRC API

2015-08-05 Thread Ralph Castain
Yeah, I recall your earlier email on the subject. Sadly, I need someone
from Mellanox to look at this as I don't have access to such equipment.

Josh? Mike? Gilles? Can someone please look at this?


On Wed, Aug 5, 2015 at 2:31 PM, Andy Wettstein 
wrote:

> Hi,
>
> I updated our OpenMPI install from 1.8.3 to 1.8.8 today and I'm getting
> this error:
>
> XRC error: bad XRC API (require XRC from OFED pre 3.12).
>
> This happens even using the exact same node to compile and run an
> example program. I saw a thread from a few weeks ago discussing this
> issue as well. I changed the dlsym if statement in btl_openib_xrc.c to
> this:
>
> if (NULL != dlsym(lib, "ibv_create_xrc_recv_qp@@IBVERBS_1.1")) {
>
> This seems to make the error message go away, so there must be something
> about that check that doesn't work right without the version in it.
>
> This is on a Scientific Linux 6.6 machine with MLNX_OFED 1.5.3-4.0.42.
>
> Andy
>
> --
> andy wettstein
> hpc system administrator
> research computing center
> university of chicago
> 773.702.1104
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/08/27400.php
>