Re: [OMPI users] orte has lost communication

2016-04-12 Thread Stefan Friedel
On Tue, Apr 12, 2016 at 01:30:37PM +0200, Stefan Friedel wrote: -thanks for you support!- nope, no core, just the "orte has lost"... Dear list - the problem is _not_ related to openmpi. I compiled mvapich2 and I get communication errors,too. Probably this is a hardware problem. Sor

Re: [OMPI users] orte has lost communication

2016-04-12 Thread Stefan Friedel
Even with just 100 nodes: some jobs are failing (50/50), failing jobs: _no output_, _no core dumped_...only orte has lost... Running on >=350 nodes: almost all jobs are failing, but some jobs succeeded (similar output: only "orte has lost..." for failing jobs and the expected outp

Re: [OMPI users] orte has lost communication

2016-04-12 Thread Stefan Friedel
MCA sharedfp: individual (MCA v2.0.0, API v2.0.0, Component v1.10.2) MCA topo: basic (MCA v2.0.0, API v2.1.0, Component v1.10.2) MCA vprotocol: pessimist (MCA v2.0.0, API v2.0.0, Component v1.10.2) MfG/Sincerely Stefan Friedel -- IWR * 4.317 * INF205 * 69120 Heidelber

[OMPI users] orte has lost communication

2016-04-12 Thread Stefan Friedel
at some later point. Any hint? PSM? Some kernel number must be increased? Wrong network/routing (should not happen with --mca oob_tcp_if_include eth0)?? MfG/Sincerely Stefan Friedel -- IWR * 4.317 * INF205 * 69120 Heidelberg T +49 6221 5414404 * F +49 6221 5414427 stefan.frie...@iwr.uni-hei

[OMPI users] libtool *.la files with references to install dir

2013-05-03 Thread Stefan Friedel
are these references inside the *.la files? Thanks for hints- MfG/Sincerely, Stefan Friedel -- IWR * 523 * INF 368 * 69120 Heidelberg T +49 6221 548240 * F +49 6221 545224 stefan.frie...@iwr.uni-heidelberg.de signature.asc Description: Digital signature

Re: [OMPI users] openmpi, 1.6.3, mlx4_core, log_num_mtt and Debian/vanilla kernel

2013-02-21 Thread Stefan Friedel
://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem As I wrote: I'm aware of this FAQ entries -but: you can't set the log_num_mtt parameter if you're using a Debian/vanilla kernel: the mlx4_core-module does not offer this parameter. MfG/Sincerely, Stefan Friedel -- IWR

[OMPI users] openmpi, 1.6.3, mlx4_core, log_num_mtt and Debian/vanilla kernel

2013-02-21 Thread Stefan Friedel
hint? MfG/Sincerely, Stefan Friedel -- IWR * 523 * INF 368 * 69120 Heidelberg T +49 6221 548240 * F +49 6221 545224 stefan.frie...@iwr.uni-heidelberg.de signature.asc Description: Digital signature