http://www.pimp2.com/modules/mod_osdonate/life.html
Argh, our messed up environment with three generations on infiniband bit us,
Setting openib_cpc_include to rdmacm causes ib to not be used on our old DDR ib
on some of our hosts. Note that jobs will never run across our old DDR ib and
our new QDR stuff where rdmacm does work.
I am doing some te
FWIW, my ARM contact tells me that he uses a native ARM Linux distro explicitly
to avoid all the complexities of cross-compiling... :-\
On Apr 25, 2011, at 11:29 AM, Jeff Squyres wrote:
> There's some extra special mojo that needs to be supplied when
> cross-compiling Open MPI (e.g., a file t
Was this ever committed to the OMPI src as something not having to be
run outside of OpenMPI, but as part of the PSM setup that OpenMPI
does?
I'm having some trouble getting Slurm/OpenMPI to play nice with the
setup of this key. Namely, with slurm you cannot export variables
from the --prolog of
Hi,
I am getting a "oob-tcp: Communication retries exceeded" error
message when I run a 238 MPI slave code
/opt/openmpi/i386/bin/mpirun -mca btl_openib_verbose 1 --mca btl ^tcp
--mca pls_ssh_agent ssh -mca oob_tcp_peer_retries 1000 --prefix
/usr/lib/openmpi/1.2.8-gcc/bin -np 239 --app procgr
Perhaps a firewall? All it is telling you is that mpirun couldn't establish TCP
communications with the daemon on ln10.
On Apr 27, 2011, at 11:58 AM, Sindhi, Waris PW wrote:
> Hi,
> I am getting a "oob-tcp: Communication retries exceeded" error
> message when I run a 238 MPI slave code
>
>
On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote:
> Was this ever committed to the OMPI src as something not having to be
> run outside of OpenMPI, but as part of the PSM setup that OpenMPI
> does?
Not that I know of - I don't think the PSM developers ever looked at it.
>
> I'm having s
On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote:
>
> On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote:
>
>> Was this ever committed to the OMPI src as something not having to be
>> run outside of OpenMPI, but as part of the PSM setup that OpenMPI
>> does?
>
> Not that I know of - I don
On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote:
> On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote:
>>
>> On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote:
>>
>>> Was this ever committed to the OMPI src as something not having to be
>>> run outside of OpenMPI, but as part o
On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain wrote:
>
> On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote:
>
>> On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote:
>>>
>>> On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote:
>>>
Was this ever committed to the OMPI src as someth
On Apr 27, 2011, at 2:46 PM, Ralph Castain wrote:
> Actually, I understood you correctly. I'm just saying that I find no evidence
> in the code that we try three times before giving up. What I see is a single
> attempt to bind the port - if it fails, then we abort. There is no parameter
> to co
No we do not have a firewall turned on. I can run smaller 96 slave cases
on ln10 and ln13 included on the slavelist.
Could there be another reason for this to fail ?
Sincerely,
Waris Sindhi
High Performance Computing, TechApps
Pratt & Whitney, UTC
(860)-565-8486
-Original Message-
Fr
On Apr 27, 2011, at 1:27 PM, Jeff Squyres wrote:
> On Apr 27, 2011, at 2:46 PM, Ralph Castain wrote:
>
>> Actually, I understood you correctly. I'm just saying that I find no
>> evidence in the code that we try three times before giving up. What I see is
>> a single attempt to bind the port -
On Apr 27, 2011, at 3:39 PM, Ralph Castain wrote:
> Nope, nope nope...in this mode of operation, we are using -static- ports.
Er.. right. Sorry -- my bad for not reading the full context here... ignore
what I said...
--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
ht
On Thu, Apr 21, 2011 at 06:35:16PM -0400, Jeff Squyres wrote:
> It's normal and expected for there to be lots of errors in config.log.
>
> There's a bunch of tests in configure that are designed to succeed on some
> systems and fail on others.
>
> So don't read anything into the failures tha
On Thu, Apr 28, 2011 at 12:46:27AM +0200, Tru Huynh wrote:
> On Thu, Apr 21, 2011 at 06:35:16PM -0400, Jeff Squyres wrote:
> > It's normal and expected for there to be lots of errors in config.log.
> >
> > There's a bunch of tests in configure that are designed to succeed on some
> > systems an
On Apr 27, 2011, at 1:31 PM, Sindhi, Waris PW wrote:
> No we do not have a firewall turned on. I can run smaller 96 slave cases
> on ln10 and ln13 included on the slavelist.
>
> Could there be another reason for this to fail ?
What is in "procgroup"? Is it a single application?
Offhand, ther
On Apr 27, 2011, at 1:06 PM, Michael Di Domenico wrote:
> On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain wrote:
>>
>> On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote:
>>
>>> On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote:
On Apr 27, 2011, at 10:09 AM, Michael Di Domen
18 matches
Mail list logo