Re: [OMPI users] Ssh problem

2009-02-18 Thread Marco
* Gabriele Fatigati  [2009 02 17, 17:10]:
> ssh works well. But the problem is still here..

 Seems an ssh issue anyway. Maybe your hostfile contains a host you
are not allowed to ssh to?


Greetings



Re: [OMPI users] ssh MPi and program tests

2009-04-08 Thread Marco
* Francesco Pietra  [2009 04 06, 16:51]:
> cd cytosine && ./Run.cytosine
> The authenticity of host deb64 (which is the hostname) (127.0.1.1)
> can't be established.
> RSA fingerprint .
> connecting ?

 This is a warning from ssh, not from OpenMPI; probably it is the first
time the system tries to connect to itself, and is asking you a
confirmation to continue.

 Please note that 127.0.1.1 seems quite strange to me, since the
'standard' ip for localhost is '127.0.0.1'. You may want to check your
/etc/hosts .

> I stopped the ssh daemon, whereby tests were interrupted because deb64
> (i.e., itself) could no more be accessed. 

 I'm afraid it wasn't a great idea... the ssh daemon is required to
receive connections to localhost; and since mpi wants to do just that,
stopping sshd won't really fix the issue ;)



Re: [OMPI users] Cygwin compilation problems for openmpi-1.8

2014-04-13 Thread Marco Atzeri

On 12/04/2014 18:42, Cristian Butincu wrote:

Hello.

The latest precompiled version to date of openmpi for cygwin is 1.7.4-1.


1.7.5 will be uploaded shortly.
I don't plan to upload 1.8 until 1.8.1,
but I will built it just to check.


Because I got some runtime errors when trying to run simple MPI
programs, I have decided to upgrade to openmpi-1.8.


which errors ?
1.8 should be almost identical to 1.7.5, except the oshmem default,
so unlike to make any difference



When trying to compile openmpi-1.8 under cygwin I get the following
error during "make all" and the build process halts:
Error: symbol `Lhwloc1' is already defined

The commands issued:
  ./configure --prefix=$HOME/Apps/openmpi-1.8
  make all


cygwin package is currently built with

   --disable-mca-dso \
--disable-sysv-shmem \
--without-udapl \
--enable-cxx-exceptions \
--with-threads=posix \
--without-cs-fs \
--enable-heterogeneous \
--with-mpi-param_check=always \
--enable-contrib-no-build=vt,libompitrace \

--enable-mca-no-build=paffinity,installdirs-windows,timer-windows,shmem-sysv



Platform:
  Operating system: Windows 8, 32 bits
  CPU: Intel Core2 Duo
  Memory: 4 GB
  Cygwin version: 1.7.29

I have attached to this message an archive containing the output of
config and build processes.
Thank you.







Re: [OMPI users] Cygwin compilation problems for openmpi-1.8

2014-04-15 Thread Marco Atzeri



On 15/04/2014 13:31, Cristian Butincu wrote:

This is the simple MPI program (test.c) I was talking about:

#include 
#include 

int main(int argc, char* argv[]) {
 int my_rank; /* rank of process */
 int p; /* number of processes */

 /* start up MPI */
 MPI_Init(&argc, &argv);

 /* find out process rank */
 MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

 /* find out number of processes */
 MPI_Comm_size(MPI_COMM_WORLD, &p);

 printf("Hello MPI World! Num processes: %d\n", p);

 /* shut down MPI */
 MPI_Finalize();

return 0;
}

Issued commands:
$ mpicc test.c
$ orterun -np 4 a.exe



on 64bit 1.7.5,
as Symantec Endpoint protections, just decided
that a portion of 32bit MPI is a Trojan...

$ mpicc test_mpi.c -o test_mpi

$ mpirun -n 4 ./test_mpi.exe
Hello MPI World! Num processes: 4
Hello MPI World! Num processes: 4
Hello MPI World! Num processes: 4
Hello MPI World! Num processes: 4

Regards
Marco



Re: [OMPI users] Cygwin compilation problems for openmpi-1.8

2014-04-15 Thread Marco Atzeri



On 15/04/2014 14:42, Jeff Squyres (jsquyres) wrote:

On Apr 15, 2014, at 8:35 AM, Marco Atzeri  wrote:


on 64bit 1.7.5,
as Symantec Endpoint protections, just decided
that a portion of 32bit MPI is a Trojan...



It's the infamous MPI trojan.  We take over your computer and use it to help 
cure cancer.

:p



of course ;-)

Sunday when I built it no problem at all, today is just messy.

I just raised a claim with Symantec, hoping for a fast verification
but I have little hopes

Marco


Re: [OMPI users] Problem to Run Hello World on a desktop with Cygwin and OpenMPI 1.7.5

2014-05-30 Thread Marco Atzeri

On 30/05/2014 13:45, Sergii Veremieiev wrote:

Dear Sir/Madam,

I'm trying to compile and run a simple "Hello World" C++/MPI code on my
personal desktop machine (6-core Intel Core i7-3930K CPU with Windows 7
SP1 and Cygwin with the default built-in Open MPI 1.7.5 and GCC 4.8.2).
I'm beginner with this, never run parallel codes on desktops, only on a
cluster. Here is the code:

using namespace std;

#include "mpi.h"

int main(int argc, char *argv[])

{

int noprocs, nid;

MPI::Init(argc,argv);

nid = MPI::COMM_WORLD.Get_rank();

noprocs = MPI::COMM_WORLD.Get_size();

if (nid==0) cout << "Hello from processor " << nid << " of " <<
noprocs << endl;

MPI::Finalize();

}

Using “mpicxx -o Hello Hello.cpp” the code compiles without any problems
and generates an executable. However when I try to run the code using
“mpirun -np 1 Hello” or “mpiexec -n 1 Hello” the following error message
is returned:


Hi Sergii

mpirun -np 1 ./Hello
 works fine for me : "Hello from processor 0 of 1"

As the message is
"opal_os_dirpath_create: Error: Unable to create the sub-directory
/tmp/openmpi-sessions-enrsvere@byenr502b-01f_0/11302"

you need to check the location and permission of /tmp

You can also follows guidelines on https://cygwin.com/problems.html
and follow up on cygwin mailing list https://cygwin.com/lists.html


Not relevant, but please note your code is missing something as

$ mpirun -np 4 ./Hello
Hello from processor 0 of 4


While the standard hello_cxx.cc

#include "mpi.h"
#include 

int main(int argc, char **argv)
{
int rank, size, len;
char version[MPI_MAX_LIBRARY_VERSION_STRING];

MPI::Init();
rank = MPI::COMM_WORLD.Get_rank();
size = MPI::COMM_WORLD.Get_size();
MPI_Get_library_version(version, &len);
std::cout << "Hello, world!  I am " << rank << " of " << size
  << "(" << version << ", " << len << ")" << std::endl;
MPI::Finalize();

return 0;
}
---

produces

 $ mpirun -np 4 ./hello_cxx
Hello, world!  I am 1 of 4 ...
Hello, world!  I am 0 of 4 ...
Hello, world!  I am 2 of 4 ...
Hello, world!  I am 3 of 4 ...



Re: [OMPI users] openmpi linking problem

2014-06-09 Thread Marco Atzeri

On 09/06/2014 19:14, Sergii Veremieiev wrote:

Dear Sir/Madam,

I'm trying to link a C/FORTRAN code on Cygwin with Open MPI 1.7.5 and
GCC 4.8.2:

mpicxx ./lib/Multigrid.o ./lib/GridFE.o ./lib/Data.o ./lib/GridFD.o
./lib/Parameters.o ./lib/MtInt.o ./lib/MtPol.o ./lib/MtDob.o -o
Test_cygwin_openmpi_gcc  -L./external/MUMPS/lib
-ldmumps_cygwin_openmpi_gcc -lmumps_common_cygwin_openmpi_gcc
-lpord_cygwin_openmpi_gcc -L./external/ParMETIS
-lparmetis_cygwin_openmpi_gcc -lmetis_cygwin_openmpi_gcc
-L./external/SCALAPACK -lscalapack_cygwin_openmpi_gcc
-L./external/BLACS/LIB -lblacs-0_cygwin_openmpi_gcc
-lblacsF77init-0_cygwin_openmpi_gcc -lblacsCinit-0_cygwin_openmpi_gcc
-lblacs-0_cygwin_openmpi_gcc -L./external/BLAS -lblas_cygwin_openmpi_gcc
-lmpi -lgfortran

The following error messages are returned:

./external/MUMPS/lib/libdmumps_cygwin_openmpi_gcc.a(dmumps_part3.o): In
function `dmumps_127_':
/cygdrive/d/Sergey/Research/Codes/Thinfilmsolver/external/MUMPS/src/dmumps_part3.F:6068:
undefined reference to `mpi_send_'


the fortran openmpi inteface is in libmpi_mpifh.dll.a

so try adding "-lmpi_mpifh" before "-lmpi -lgfortran"
and be sure to have libopenmpifh2-1.7.5-1 installed


/cygdrive/d/Sergey/Research/Codes/Thinfilmsolver/external/MUMPS/src/dmumps_part3.F:6068:(.text+0x1d3):
relocation truncated to fit: R_X86_64_PC32 against undefined symbol
`mpi_send_'


this could be a side effect, but it could also be a complete
different issue




Could you please advise me what further libraries should I include on
linking? Thank you.

Best regards,

Sergii



Regards
MArco



Re: [OMPI users] MPI_Init seems to hang, but works after a minute or two

2014-10-27 Thread Marco Atzeri

On 10/27/2014 8:32 AM, maxinator333 wrote:

Hello,


After compiling and running a MPI program, it seems to hang at
MPI_Init(), but it eventually will work after a minute or two.

While the problem occured on my Notebook it did not on my desktop PC.


It can be a timeout on a network interface.
I see a similar issue with wireless ON but not with wireless OFF
on my notebook.

In the past I saw with some virtual driver of Telecom company
for the 3G driver.


Both run on Win 7, cygwin 64 Bit, OpenMPI version 1.8.3 r32794
(ompi_info), g++ v 4.8.3. I actually synced the cygwin installations
later on, and it still didn't work, but it did for a short time after a
restart...


Regards
Marco


Re: [OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-27 Thread Marco Atzeri

On 10/27/2014 8:30 AM, maxinator333 wrote:

Hello,

I noticed this weird behavior, because after a certain time of more than
one minute the transfer rates of MPI_Send and MPI_Recv dropped by a
factor of 100+. By chance I saw, that my program did allocate more and
more memory. I have the following minimal working example:

#include 
#include 

const uint32_t MSG_LENGTH = 256;

int main(int argc, char* argv[]) {
 MPI_Init(NULL, NULL);
 int rank;
 MPI_Comm_rank(MPI_COMM_WORLD, &rank);

 volatile char * msg  = (char*) malloc( sizeof(char) * MSG_LENGTH );

 for (uint64_t i = 0; i < 1e9; i++) {
 if ( rank == 1 ) {
 MPI_Recv( const_cast(msg), MSG_LENGTH, MPI_CHAR,
   rank-1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
 MPI_Send( const_cast(msg), MSG_LENGTH, MPI_CHAR,
   rank-1, 0, MPI_COMM_WORLD);
 } else if ( rank == 0 ) {
 MPI_Send( const_cast(msg), MSG_LENGTH, MPI_CHAR,
   rank+1, 0, MPI_COMM_WORLD);
 MPI_Recv( const_cast(msg), MSG_LENGTH, MPI_CHAR,
   rank+1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
 }
 MPI_Barrier( MPI_COMM_WORLD );
 for (uint32_t k = 0; k < MSG_LENGTH; k++)
 msg[k]++;
 }

 MPI_Finalize();
 return 0;
}


I run this with mpirun -n 2 ./pingpong_memleak.exe

The program does nothing more than send a message from rank 0 to rank 1,
then from rank 1 to rank 0 and so on in standard blocking mode, not even
asynchronous.

Running the program will allocate roughly 30mb/s (Windows Task Manager)
until it stops at around 1.313.180kb. This is when the transfer rates
(not being measured in above snippet) drop significantly to maybe a
second per send instead of roughly 1µs.

I use Cygwin with Windows 7 and 16Gb RAM. I haven't tested this minimal
working example on other setups.


Can someone test on other platforms and confirm me that is a cygwin
specific issue ?

Regards
Marco


Re: [OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-27 Thread Marco Atzeri



On 10/27/2014 10:30 AM, Gilles Gouaillardet wrote:

Hi,

i tested on a RedHat 6 like linux server and could not observe any
memory leak.

BTW, are you running 32 or 64 bits cygwin ? and what is your configure
command line ?

Thanks,

Gilles



the problem is present in both versions.

cygwin 1.8.3-1 packages  are built with configure:

 --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin 
--libexecdir=/usr/libexec --datadir=/usr/share --localstatedir=/var 
--sysconfdir=/etc --libdir=/usr/lib --datarootdir=/usr/share 
--docdir=/usr/share/doc/openmpi --htmldir=/usr/share/doc/openmpi/html -C 
LDFLAGS=-Wl,--export-all-symbols --disable-mca-dso --disable-sysv-shmem 
--enable-cxx-exceptions --with-threads=posix --without-cs-fs 
--with-mpi-param_check=always --enable-contrib-no-build=vt,libompitrace 
--enable-mca-no-build=paffinity,installdirs-windows,timer-windows,shmem-sysv


Regards
Marco



Re: [OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-28 Thread Marco Atzeri

On 10/28/2014 12:04 PM, Gilles Gouaillardet wrote:

Marco,

here is attached a patch that fixes the issue
/* i could not find yet why this does not occurs on Linux ... */

could you please give it a try ?

Cheers,

Gilles



It solves the issue on 64 bit.
I see no growing memory usage anymore

I will build 32 bit and then upload both as 1.8.3-2

Thanks
Marco



Re: [OMPI users] OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-28 Thread Marco Atzeri

On 10/28/2014 4:41 PM, Gilles Gouaillardet wrote:

Thanks Marco,

pthread_mutex_init calls calloc under cygwin but does not allocate memory under 
linux, so not invoking pthread_mutex_destroy causes a memory leak only under 
cygwin.

Gilles


thanks for the work .

uploading 1.8.3-2 on www.cygwin.com

Regards
Marco


Re: [OMPI users] mpirun hangs without internet connection

2015-01-15 Thread Marco Atzeri



On 1/15/2015 5:39 PM, Klara Hornisova wrote:

I have installed OpenMPI 1.6.5 under cygwin. When trying test example

$mpirun hello


current cygwin package is 1.8.4-1, could you test it ?



or, e.g., more complex examples from scalapack, such as

$mpirun -np 4 xslu

everything works fine when there is an internet connection. However,
when the cable is disconnected, mpirun hangs without any error message.
With -d option the output stops before the line


there is a wireless on?


[my_computer...] [[3169,1],0] node[0].name my_computer... daemon0

which is included in the output when the internet is on.

The firewall is turned off.
I tried also to add options: -host localhost, -mca btl self, --mca
btl_tcp_if_include “127.0.0.1/8 <http://127.0.0.1/8>” and their
combinations, but nothing has changed.

Thank you in advance for advice.

Klara Hornisova



Regards
Marco


[OMPI users] OpenMPI virtualization aware

2016-05-27 Thread Marco D'Amico
Hi I'm recently investigating in Virtualization used in HPC field, and I
found out that MVAPICH has a "Virtualization aware" version, that permit to
overcome the big latencies problems of using a Virtualization environment
for HPC.

My question is if there is any similar efforts in OpenMPI, since I would
eventually contribute in it.

Best regards,
Marco D'Amico


Re: [OMPI users] Cygwin64 mpiexec freezes

2017-09-07 Thread Marco Atzeri

On 07/09/2017 21:12, Llelan D. wrote:
Windows 10 64bit, Cygwin64, openmpi 1.10.7-1 (dev, c, c++, fortran), GCC 
6.3.0-2 (core, gcc, g++, fortran)


I am compiling the standard "hello_c.c" example with *mgicc*:

$ mpicc -g hello_c.c -o hello_c

The showme:

gcc -g hello_c.c -o hello_c -fexceptions -L/usr/lib -lmpi -lopen-rte -lopen-pal 
-lm -lgdi32

This successfully creates hello_c.exe. When I run it directly, it 
performs as expected (The first time run brings up a Windows Firewall 
dialog and I click Accept):


$ ./hello_c
Hello World! I am 0 of 1, (Open MPI v1.10.7, package: Open MPI 
marco@GE-MATZERI-EU Distribution, ident: 1.10.7, repo rev: v1.10.6-48-g5e373bf, 
May 16, 2017, 129)

However, when I run it using mpiexec:

$ mpiexec -n 4 ./hello_c

$ ^C

Nothing is displayed and I have to ^C out. If I insert a puts("Start") 
just before the call to MPI_Init(&argc, &argv), and a puts("MPI_Init 
done.") just after, mpiexec will print "Start" for each process (4 times 
for the above example) and then freeze. It is never returning from the 
call to MPI_Init(...).


This is a freshly installed Cygwin64 and other non-mpi programs work 
fine. Can anyone give me an idea of what is going on?




same here.
I will investigate to check if is a side effect of the
 new 6.3.0-2 compiler or of the latest cygwin

Regards
Marco







___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Errors when compiled with Cygwin MinGW gcc

2017-09-08 Thread Marco Atzeri

On 08/09/2017 02:38, Llelan D. wrote:
Windows 10 64bit, Cygwin64, openmpi 1.10.7-1 (dev, c, c++, fortran), 
x86_64-w64-mingw32-gcc 6.3.0-1 (core, gcc, g++, fortran)


I am compiling the standard "hello_c.c" example with *mgicc* configured 
to use the Cygwin installed MinGW gcc compiler:


$ export OMPI_CC=x86_64-w64-mingw32-gcc
$ mpicc -idirafter /cygdrive/c/cygwin64/usr/include hello_c.c -o hello_c

For some unknown reason, I have to manually include the "usr/include" 
directory to pick up the "mpi.h" header, and it must be searched after 
the standard header directories to avoid "time_t" typedef conflicts. The 
showme:




you can not mix cygwin dll's with mingw compilations.
They use different world paradigma


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Cygwin64 mpiexec freezes

2017-09-08 Thread Marco Atzeri

On 07/09/2017 21:56, Marco Atzeri wrote:

On 07/09/2017 21:12, Llelan D. wrote:
Windows 10 64bit, Cygwin64, openmpi 1.10.7-1 (dev, c, c++, fortran), 
GCC 6.3.0-2 (core, gcc, g++, fortran)



However, when I run it using mpiexec:

$ mpiexec -n 4 ./hello_c

$ ^C

Nothing is displayed and I have to ^C out. If I insert a puts("Start") 
just before the call to MPI_Init(&argc, &argv), and a puts("MPI_Init 
done.") just after, mpiexec will print "Start" for each process (4 
times for the above example) and then freeze. It is never returning 
from the call to MPI_Init(...).


This is a freshly installed Cygwin64 and other non-mpi programs work 
fine. Can anyone give me an idea of what is going on?




same here.
I will investigate to check if is a side effect of the
  new 6.3.0-2 compiler or of the latest cygwin



I take back. It works fine.

$ cygcheck -cd openmpi cygwin gcc-core
Cygwin Package Information
Package  Version
cygwin   2.9.0-2
gcc-core 6.3.0-2
openmpi  1.10.7-1

 $  time mpirun -n 2 ./hello_c.exe
Hello, world, I am 0 of 2, (Open MPI v1.10.7, package: Open MPI 
marco@GE-MATZERI-EU Distribution, ident: 1.10.7, repo rev: 
v1.10.6-48-g5e373bf, May 16, 2017, 129)
Hello, world, I am 1 of 2, (Open MPI v1.10.7, package: Open MPI 
marco@GE-MATZERI-EU Distribution, ident: 1.10.7, repo rev: 
v1.10.6-48-g5e373bf, May 16, 2017, 129)


real0m3.500s
user0m1.309s
sys 0m2.851s


The most likely cause, that also caused my first reaction is some
network interface, usually a virtual one to be seen as active
but not operative.

In my case if the "PANGP Virtual Ethernet Adapter" is active
it causes mpirun/orterun to wait forever.


Looks at you network interface on

   Control Panel\Network and Internet\Network Connections

check for possible candidates and try disabling them.

Regards
Marco





___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] openmpi-1.6 undefined reference

2012-05-23 Thread marco atzeri

I am trying to build openmpi-1.6 for cygwin with dynamic libs

-
./autogen.sh
cd build_dir
source_dir/configure \
   LDFLAGS="-Wl,--export-all-symbols -no-undefined"  \
   --disable-mca-dso \
   --without-udapl \
   --enable-cxx-exceptions \
   --enable-mpi-threads \
   --enable-progress-threads \
   --with-threads=posix \
   --without-cs-fs \
   --enable-heterogeneous \
   --with-mpi-param_check=always \
   --enable-contrib-no-build=vt \

--enable-mca-nobuild=memory_mallopt,paffinity,installdirs-windows,timer-windows,shmem-sysv
make
-

the build stop here :
  CCLD   libompitrace.la
Creating library file: .libs/libompitrace.dll.a.libs/abort.o: In 
function `MPI_Abort':
/pub/devel/openmpi/openmpi-1.6-2/src/openmpi-1.6/ompi/contrib/libompitrace/abort.c:32: 
undefined reference to `_o   mpi_mpi_comm_world'
/pub/devel/openmpi/openmpi-1.6-2/src/openmpi-1.6/ompi/contrib/libompitrace/abort.c:32: 
undefined reference to `_P   MPI_Comm_rank'
/pub/devel/openmpi/openmpi-1.6-2/src/openmpi-1.6/ompi/contrib/libompitrace/abort.c:33: 
undefined reference to `_P   MPI_Comm_get_name'
/pub/devel/openmpi/openmpi-1.6-2/src/openmpi-1.6/ompi/contrib/libompitrace/abort.c:38: 
undefined reference to `_P   MPI_Abort'


I do not find "mpi_mpi_comm_world" defined in any of the
already built objects, except

./ompi/communicator/.libs/comm_init.o
0200 C _ompi_mpi_comm_world

and on libmpi.dll.a

d002278.o:
 i .idata$4
 i .idata$5
 i .idata$6
 i .idata$7
 t .text
 U __head_cygmpi_1_dll
 I __imp__ompi_mpi_comm_world
0000 I __nm__ompi_mpi_comm_world


Hint ?

Marco



Re: [OMPI users] openmpi-1.6 undefined reference

2012-05-23 Thread marco atzeri

On 5/23/2012 3:19 PM, Jeff Squyres (jsquyres) wrote:

Just curious - are you running autogen for any particular reason?

I don't know how much Cygwin testing we've done.

Sent from my phone. No type good.



experience says that autoreconf is a good approach on cygwin,
it is almost standard on our package build procedure.

As autogen is performing the same action I see no reason to bypass it
and run a standard autoreconf.

Regards
Marco




Re: [OMPI users] openmpi-1.6 undefined reference

2012-05-23 Thread marco atzeri
---<<
>><<
raw extraction in 2000 microsec
>>----<<
>><<
raw extraction in 0 microsec
>><<
PASS: ddt_raw.exe
==
All 5 tests passed
==
-

Regards
Marco


Re: [OMPI users] openmpi-1.6 undefined reference

2012-05-23 Thread marco atzeri

On 5/23/2012 11:20 PM, Jeff Squyres wrote:

On May 23, 2012, at 9:53 AM, marco atzeri wrote:


experience says that autoreconf is a good approach on cygwin,
it is almost standard on our package build procedure.


I'm still curious: why?  (I'm *assuming* that you're building from an official 
Open MPI tarball -- is that incorrect?)

I ask because we've already run autoreconf, meaning that official Open MPI 
tarballs are fully bootstrapped and do not need to have autogen (i.e., 
ultimately autoreconf) re-run on them.

Specifically: I'm unaware of a reason why you should need to re-run autogen 
(autoreconf) on an otherwise-unaltered Open MPI that was freshly extracted from 
a tarball.  Does something happen differently if you *don't* re-run autogen 
(autoreconf)?

Re-running autogen shouldn't be causing you any problems, of course -- this is 
just my curiosity asserting itself...



Hi Jeff,
~ 90% of the time we have mismatch problems between upstream and
cygwin on autoconf/automake/libtool versions that are not cygwin
aware or updated.

As safe approuch, we prefer apply "autoreconf -i -f" as default when
building binary packages.

see cygautoreconf on
http://cygwin-ports.svn.sourceforge.net/viewvc/cygwin-ports/cygport/trunk/README

Regards
Marco




Re: [OMPI users] openmpi-1.6 undefined reference

2012-05-24 Thread marco atzeri

On 5/24/2012 2:51 AM, Jeff Squyres wrote:

On May 23, 2012, at 6:20 PM, marco atzeri wrote:


~ 90% of the time we have mismatch problems between upstream and
cygwin on autoconf/automake/libtool versions that are not cygwin
aware or updated.


Ok, fair enough.

I'd be curious if you actually need to do this with Open MPI --

> we use very recent versions of the GNU Autotools to bootstrap our
> tarballs.




Just tested without autogen, no difference.

Cheers
Marco


Re: [OMPI users] Compiling 1.6.1 with cygwin 1.7 and gcc

2012-09-24 Thread marco atzeri

On 9/24/2012 7:02 AM, Roy Hogan wrote:

Greetings,

I’m trying to build version 1.6.1 on Cygwin (1.7), using the gcc 4.5.3
compilers.  I need to use the Cygwin  linux environment specifically so
I’m not interested in the cmake option on the windows side.   I’ve
searched the archives, but don’t find much on the Cygwin build option
over the last couple of years.

I’ve attached the logs for my “configure” and “make all” steps.   Our
email filter will not allow me to send zipped files, so I’ve attached
the two log files.   I’d appreciate any advice.

Thank you,

Roy



Hi Roy,
I built a cygwin version, some time ago
http://matzeri.altervista.org/cygwin-1.7/openmpi/

but never really tested  for other issue on my machine.
You should be able to replicate using cygport as build packager.

$ cygport openmpi-1.6-1.cygport almostall


The configure options were:

   LDFLAGS="-Wl,--export-all-symbols -no-undefined"  \
   --disable-mca-dso \
   --without-udapl \
   --enable-cxx-exceptions \
   --enable-mpi-threads \
   --enable-progress-threads \
   --with-threads=posix \
   --without-cs-fs \
   --enable-heterogeneous \
   --with-mpi-param_check=always \
   --enable-contrib-no-build=vt,libompitrace \

--enable-mca-no-build=memory_mallopt,paffinity,installdirs-windows,timer-windows,shmem-sysv


Regards
Marco




[OMPI users] Question on ssh search path

2012-10-16 Thread marco atzeri

Hi,
I am playing on OpenMpi(1.6.2) on cygwin platform, and
while compile and check were fine

the simple "mpirun hello_c.exe" is failing with the criptic

##
[MARCOATZERI:07440] [[15164,0],0] ORTE_ERROR_LOG: Not found in file 
/pub/devel/openmpi/openmpi-1.6.2-1/src/openmpi-1.6.2/orte/mca/plm/rsh/plm_rsh_module.c 
at line 197
[MARCOATZERI:07440] [[15164,0],0] ORTE_ERROR_LOG: Not found in file 
/pub/devel/openmpi/openmpi-1.6.2-1/src/openmpi-1.6.2/orte/mca/ess/hnp/ess_hnp_module.c 
at line 228

--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_plm_init failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--
[MARCOATZERI:07440] [[15164,0],0] ORTE_ERROR_LOG: Not found in file 
/pub/devel/openmpi/openmpi-1.6.2-1/src/openmpi-1.6.2/orte/runtime/orte_init.c 
at line 128

--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--
[MARCOATZERI:07440] [[15164,0],0] ORTE_ERROR_LOG: Not found in file 
/pub/devel/openmpi/openmpi-1.6.2-1/src/openmpi-1.6.2/orte/tools/orterun/orterun.c 
at line 694

#

trying to debug I notice a strange pattern on ssh search:
1)  ssh is only searched on the PATH directories that end with "bin"
other directories are skipped.
2) //usr/bin/ssh is not on the PATH but is searched.
   Why and where is defined in the code ?

  103  321183 [main] orterun 6304 normalize_posix_path: src 
/home/marco/bin/ssh
  100  324353 [main] orterun 6304 normalize_posix_path: src 
/usr/local/bin/ssh

   99  327381 [main] orterun 6304 normalize_posix_path: src /usr/bin/ssh
   36 1805679 [main] orterun 6304 normalize_posix_path: src 
/home/marco/bin/ssh
   34 1807010 [main] orterun 6304 normalize_posix_path: src 
/usr/local/bin/ssh

   34 1808236 [main] orterun 6304 normalize_posix_path: src /usr/bin/ssh
   37 1810858 [main] orterun 6304 normalize_posix_path: src //usr/bin/ssh

as immediately after the "//" search mpirun crashes

 703 9508968 [WNetOpenEnum] orterun 8020 cygthread::stub: thread 
'WNetOpenEnum', id 0x15A0, stack_ptr 0x28BAD40

--- Process 8020, exception 06AB at 776BB9BC
41286 9550254 [main] orterun 8020 fs_info::update: Cannot get volume 
attributes (\??\UNC), C010


I suspect this search is the culprit.

If someone is interested I put here
http://matzeri.altervista.org/works/ompi/

all the config, check and make logs plus the ompi_info output.

Regards
Marco


Re: [OMPI users] Question on ssh search path

2012-10-17 Thread marco atzeri

On 10/17/2012 9:05 PM, Ralph Castain wrote:

I'm not entirely certain, but I don't believe we have ever supported cygwin - I 
believe we only support native Windows.


I know. I am looking what is needed to make a port



trying to debug I notice a strange pattern on ssh search:
1)  ssh is only searched on the PATH directories that end with "bin"
other directories are skipped.
2) //usr/bin/ssh is not on the PATH but is searched.
   Why and where is defined in the code ?


any idea from where this //usr/bin/ssh is coming for ?



  103  321183 [main] orterun 6304 normalize_posix_path: src /home/marco/bin/ssh
  100  324353 [main] orterun 6304 normalize_posix_path: src /usr/local/bin/ssh
   99  327381 [main] orterun 6304 normalize_posix_path: src /usr/bin/ssh
   36 1805679 [main] orterun 6304 normalize_posix_path: src /home/marco/bin/ssh
   34 1807010 [main] orterun 6304 normalize_posix_path: src /usr/local/bin/ssh
   34 1808236 [main] orterun 6304 normalize_posix_path: src /usr/bin/ssh
   37 1810858 [main] orterun 6304 normalize_posix_path: src //usr/bin/ssh

as immediately after the "//" search mpirun crashes

703 9508968 [WNetOpenEnum] orterun 8020 cygthread::stub: thread 'WNetOpenEnum', 
id 0x15A0, stack_ptr 0x28BAD40
--- Process 8020, exception 06AB at 776BB9BC
41286 9550254 [main] orterun 8020 fs_info::update: Cannot get volume attributes 
(\??\UNC), C010

I suspect this search is the culprit.

If someone is interested I put here
http://matzeri.altervista.org/works/ompi/

all the config, check and make logs plus the ompi_info output.

Regards
Marco




Re: [OMPI users] Question on ssh search path

2012-10-18 Thread marco atzeri

On 10/18/2012 2:45 AM, Ralph Castain wrote:










trying to debug I notice a strange pattern on ssh search:
1)  ssh is only searched on the PATH directories that end with "bin"
other directories are skipped.
2) //usr/bin/ssh is not on the PATH but is searched.
   Why and where is defined in the code ?


any idea from where this //usr/bin/ssh is coming for ?


/usr/bin is one of the default posix locations for system binaries, so I would 
expect it is in your path.


/usr/binyes
but
//usr/bin   no.

on cygwin "//" is always a network path and "//usr/bin" does not exist.







  103  321183 [main] orterun 6304 normalize_posix_path: src /home/marco/bin/ssh
  100  324353 [main] orterun 6304 normalize_posix_path: src /usr/local/bin/ssh
   99  327381 [main] orterun 6304 normalize_posix_path: src /usr/bin/ssh
   36 1805679 [main] orterun 6304 normalize_posix_path: src /home/marco/bin/ssh
   34 1807010 [main] orterun 6304 normalize_posix_path: src /usr/local/bin/ssh
   34 1808236 [main] orterun 6304 normalize_posix_path: src /usr/bin/ssh


my path is much longer but only the bin directory are searched


   37 1810858 [main] orterun 6304 normalize_posix_path: src //usr/bin/ssh


this is the anomaly



as immediately after the "//" search mpirun crashes

703 9508968 [WNetOpenEnum] orterun 8020 cygthread::stub: thread 'WNetOpenEnum', 
id 0x15A0, stack_ptr 0x28BAD40
--- Process 8020, exception 06AB at 776BB9BC
41286 9550254 [main] orterun 8020 fs_info::update: Cannot get volume attributes 
(\??\UNC), C010

I suspect this search is the culprit.

If someone is interested I put here
http://matzeri.altervista.org/works/ompi/

all the config, check and make logs plus the ompi_info output.

Regards
Marco






[OMPI users] bug (?) opal_path_access incorrect call

2012-10-31 Thread marco atzeri

looking on a solution for
http://www.open-mpi.org/community/lists/users/2012/10/20495.php

I noticed that the issue disappears on 1.6.2 with the patch:


--- opal/util/path.c~   2012-04-03 16:29:52.0 +0200
+++ opal/util/path.c2012-10-30 20:31:43.772749400 +0100
@@ -82,7 +82,7 @@

 /* If absolute path is given, return it without searching. */
 if( opal_path_is_absolute(fname) ) {
-return opal_path_access(fname, "", mode);
+return opal_path_access(fname, NULL , mode);
 }

 /* Initialize. */



For what I can see on the function body, the test on path
is expecting path to be a null pointer and not a
pointer to an empty strings


char *opal_path_access(char *fname, char *path, int mode)
{
char *fullpath = NULL;
struct stat buf;

/* Allocate space for the full pathname. */
if (NULL == path) {
fullpath = opal_os_path(false, fname, NULL);
} else {
fullpath = opal_os_path(false, path, fname, NULL);
}
if (NULL == fullpath)
return NULL;


Regards
Marco


[OMPI users] tester for cygwin openmpi-1.6.3 package

2012-10-31 Thread marco atzeri

Hi,
I built and packaged openmpi-1.6.3 for cygwin.
Before deploying it as an official package, I would
like feedback from testers.

Source and binary here:
http://matzeri.altervista.org/cygwin-1.7/openmpi/

To install using cygwin setup program
setup.exe -X -O -s http://matzeri.altervista.org

Current configuration is:

 LDFLAGS="-Wl,--export-all-symbols -no-undefined"  \
 --disable-mca-dso \
--without-udapl \
--enable-cxx-exceptions \
--with-threads=posix \
--without-cs-fs \
--enable-heterogeneous \
--with-mpi-param_check=always \
--enable-contrib-no-build=vt,libompitrace \
--enable-mca-nobuild= memory_mallopt, paffinity, 
installdirs-windows, timer-windows, shmem-sysv


Only additional patch
https://svn.open-mpi.org/trac/ompi/changeset/27539

C, C++ and Fortran pass basic tests

$ time mpirun -n 4 ./hello_f90.exe
 Hello, world, I am0  of4
 Hello, world, I am2  of4
 Hello, world, I am1  of4
 Hello, world, I am3  of4

real1m9.607s
user0m1.542s
sys 0m2.135s

But I guess there is a long delay/timeout on startup.

Regards
Marco


Re: [OMPI users] tester for cygwin openmpi-1.6.3 package

2012-11-01 Thread marco atzeri

On 11/1/2012 5:08 PM, Ralph Castain wrote:

I think we'd be interested in looking at possibly adding this to the
code base. We still need to announce this (and will shortly), but our
Windows maintainer has moved on to other pastures. So support for native
Windows operations is ending with the 1.6 series, barring someone
stepping up to fill the void.

Having a cygwin alternative would let people at least continue to work
on the Windows platform, albeit in a different mode. Is this something
you are interested in pursuing?


Hi Ralph,
there is no special code on my packages, so deploying a cygwin package
was already on my plan.
It will be just an addition to the other packages for which I am already
the cygwin package maintainer
http://cygwin.com/cygwin-pkg-maint

I want just to be sure that it works as expect before deploying it,
and testing on a notebook is a bit limited.

The only peculiarity I noticed is that the stripped binaries do
not work, so the current package is based on unstripped installation.

Other step is to look if other options can be enabled:

  LDFLAGS="-Wl,--export-all-symbols -no-undefined"  \
  --disable-mca-dso \
  --without-udapl \
  --enable-cxx-exceptions \
  --with-threads=posix \
  --without-cs-fs \
  --enable-heterogeneous \
  --with-mpi-param_check=always \
  --enable-contrib-no-build=vt,libompitrace \

--enable-mca-no-build=memory_mallopt,paffinity,installdirs-windows,timer-windows,shmem-sysv

at least "shmem-sysv" should be possible.

Regards
Marco



Re: [OMPI users] tester for cygwin openmpi-1.6.3 package

2012-11-01 Thread marco atzeri

On 11/1/2012 7:55 PM, Ralph Castain wrote:


On Nov 1, 2012, at 11:47 AM, marco atzeri  wrote:


On 11/1/2012 5:08 PM, Ralph Castain wrote:

I think we'd be interested in looking at possibly adding this to the
code base. We still need to announce this (and will shortly), but our
Windows maintainer has moved on to other pastures. So support for native
Windows operations is ending with the 1.6 series, barring someone
stepping up to fill the void.

Having a cygwin alternative would let people at least continue to work
on the Windows platform, albeit in a different mode. Is this something
you are interested in pursuing?


Hi Ralph,
there is no special code on my packages, so deploying a cygwin package
was already on my plan.
It will be just an addition to the other packages for which I am already
the cygwin package maintainer
http://cygwin.com/cygwin-pkg-maint

I want just to be sure that it works as expect before deploying it,
and testing on a notebook is a bit limited.

The only peculiarity I noticed is that the stripped binaries do
not work, so the current package is based on unstripped installation.

Other step is to look if other options can be enabled:

  LDFLAGS="-Wl,--export-all-symbols -no-undefined"  \
  --disable-mca-dso \
  --without-udapl \
  --enable-cxx-exceptions \
  --with-threads=posix \
  --without-cs-fs \
  --enable-heterogeneous \
  --with-mpi-param_check=always \
  --enable-contrib-no-build=vt,libompitrace \
--enable-mca-no-build=memory_mallopt,paffinity,installdirs-windows,timer-windows,shmem-sysv

at least "shmem-sysv" should be possible.


I see - that is even better than I had hoped! One question for you: does this 
only allow single-machine operations, or can people run across machines, 
assuming both are executing cygwin?

Guess I'm not sure how the latter would work, but my knowledge of cygwin is 
very old.



across machine with ssh should work, but I have no way to test in
this moment.

Marco




[OMPI users] patch: enabling shmem-sysv on cygwin

2012-11-02 Thread marco atzeri
More as a note for other cygwin users than a general patch, but it 
should work in any case.


The attached patch allow to enable shmem_sysv.
as  SHM_R | SHM_W are not defined on cygwin (and on posix) [1]

Additional advise :on cygwin the SYSV shared memory requires the
cygserver service running, otherwise it will fail as

$ mpirun -n 4 ./hello_c.exe
Bad system call


Regards
Marco

[1] http://cygwin.com/ml/cygwin/2007-10/msg00284.html
--- orig/openmpi-1.6.3/opal/mca/shmem/sysv/shmem_sysv_component.c   
2012-04-03 16:29:49.0 +0200
+++ openmpi-1.6.3/opal/mca/shmem/sysv/shmem_sysv_component.c2012-11-01 
21:54:18.687021300 +0100
@@ -43,6 +43,7 @@
 #endif /* HAVE_SYS_IPC_H */
 #if HAVE_SYS_SHM_H
 #include 
+#include 
 #endif /* HAVE_SYS_SHM_H */

 #include "opal/constants.h"
@@ -165,7 +166,7 @@
 /* if we are here, then let the run-time test games begin */

 if (-1 == (shmid = shmget(IPC_PRIVATE, (size_t)(getpagesize()),
-  IPC_CREAT | IPC_EXCL | SHM_R | SHM_W))) {
+  IPC_CREAT | IPC_EXCL | S_IRWXU ))) {
 goto out;
 }
 else if ((void *)-1 == (addr = shmat(shmid, NULL, 0))) {
--- orig/openmpi-1.6.3/opal/mca/shmem/sysv/shmem_sysv_module.c  2012-04-03 
16:29:49.0 +0200
+++ openmpi-1.6.3/opal/mca/shmem/sysv/shmem_sysv_module.c   2012-11-01 
21:55:21.316603500 +0100
@@ -41,6 +41,7 @@
 #endif /* HAVE_SYS_IPC_H */
 #if HAVE_SYS_SHM_H
 #include 
+#include 
 #endif /* HAVE_SYS_SHM_H */
 #ifdef HAVE_STRING_H
 #include 
@@ -197,7 +198,7 @@
  * real_size here
  */
 if (-1 == (ds_buf->seg_id = shmget(IPC_PRIVATE, real_size,
-   IPC_CREAT | IPC_EXCL | SHM_R | SHM_W))) {
+   IPC_CREAT | IPC_EXCL | S_IRWXU ))) {
 int err = errno;
 char hn[MAXHOSTNAMELEN];
 gethostname(hn, MAXHOSTNAMELEN - 1);


[OMPI users] New package: openmpi-1.6.3-3

2012-11-16 Thread marco atzeri

First cygwin 1.6.3-3 version of packages

   libopenmpi
   libopenmpi-devel
   openmpi

are available in the Cygwin distribution:

CHANGES
Initial cygwin package.

This is based on mainstream release 1.6.3 plus
https://svn.open-mpi.org/trac/ompi/ticket/3371

Full upstream changes:
http://www.open-mpi.org/community/lists/announce/2012/10/0051.php


DESCRIPTION
Open MPI : A High Performance Message Passing Library

The Open MPI Project is an open source MPI-2 implementation that
is developed and maintained by a consortium of academic, research,
and industry partners

HOMEPAGE
http://www.open-mpi.org/


Marco Atzeri

If you have questions or comments, please send them to the
cygwin mailing list at: cygwin (at) cygwin (dot) com .

  *** CYGWIN-ANNOUNCE UNSUBSCRIBE INFO ***

If you want to unsubscribe from the cygwin-announce mailing list,
look at the "List-Unsubscribe: " tag in the email header of this
message. Send email to the address specified there. It will be in the 
format:


cygwin-announce-unsubscribe-you=yourdomain@cygwin.com

If you need more information on unsubscribing, start reading here:

http://sourceware.org/lists.html#unsubscribe-simple

Please read *all* of the information on unsubscribing that
is available starting at this URL.


Re: [OMPI users] tester for cygwin openmpi-1.6.3 package

2012-11-19 Thread marco atzeri

On 11/19/2012 7:24 PM, Jeff Squyres wrote:

Yes, thank you!  Let us know the final link, and we'll put it on our 1.6.x 
download page.



as I released an official cygwin package, the link is
"http://www.cygwin.com";

I am planning to update the package adding the libesmtp support asked
by another cygwin maintainer.



On Nov 2, 2012, at 12:05 PM, Ralph Castain wrote:


Very cool. Please let us know when you publish the package and give us a link 
to it - would be nice to include that on our web page so Windows users have a 
migration path now that native support is no longer available.



Regards
Marco



[OMPI users] network timeout

2012-11-24 Thread marco atzeri

on cygwin running on localhost on standalone computer I noticed
a large time discrepancy when the computer is connected or not to
the network.

Physical Connected:

marco@MARCOATZERI /pub/devel/openmpi/examples
$ time mpirun -n 4 ./hello_c.exe
Hello, world, I am 0 of 4
Hello, world, I am 1 of 4
Hello, world, I am 2 of 4
Hello, world, I am 3 of 4

real1m14.568s
user0m1.496s
sys 0m2.602s

NOT connected (all interface down)

$ time mpirun -n 4 ./hello_c.exe
Hello, world, I am 0 of 4
Hello, world, I am 2 of 4
Hello, world, I am 1 of 4
Hello, world, I am 3 of 4

real0m3.323s
user0m1.480s
sys 0m2.118s


I guess the 1 minute is due to some time of timeout.
Is such delay present on any other platform ?
Any workaround to remove it ?

Regards
Marco




Re: [OMPI users] network timeout

2012-12-13 Thread marco atzeri

On 11/24/2012 4:02 PM, Ralph Castain wrote:

Try limiting the interfaces we use to see if that's really the problem. I forget if 
cygwin has "ifconfig" or not, but use a tool to report the networks, and then 
start excluding them by adding

-mca oob_tcp_if_exclude foo,bar

to your cmd line until you find the one that is causing the hang. That will (a) 
confirm that it is a network timeout issue, and (b) which network is causing 
the problem.


Ralph,
I was unable to exclude in this way the interface using
one of the several "strange" name windows use for the interfaces

  {258B6C87-9B24-477D-A5D1-97AE07FEABAB}
  NPF_{258B6C87-9B24-477D-A5D1-97AE07FEABAB}


But I found the root cause: The driver of the Vodafone USB InternetKey.

So for the next one hitting the same or similar issues:
in theory the interface was disabled, but it seems that when queried
the driver tries to contact Vodafone servers through any active interface .
Thanks to Wireshark I was able to notice the driver polling behaviour.

After removing all versions of the driver ( following [1] ) ,
the delay disappeared.

$ time  orterun-n 4 ./hello_c.exe
Hello, world, I am 0 of 4
Hello, world, I am 2 of 4
Hello, world, I am 1 of 4
Hello, world, I am 3 of 4

real0m2.552s
user0m0.933s
sys 0m1.774s


[1] http://www.petri.co.il/removing-old-drivers-from-vista-and-windows7.htm

Regards
Marco





On Nov 24, 2012, at 1:00 AM, marco atzeri  wrote:


on cygwin running on localhost on standalone computer I noticed
a large time discrepancy when the computer is connected or not to
the network.

Physical Connected:

marco@MARCOATZERI /pub/devel/openmpi/examples
$ time mpirun -n 4 ./hello_c.exe
Hello, world, I am 0 of 4
Hello, world, I am 1 of 4
Hello, world, I am 2 of 4
Hello, world, I am 3 of 4

real1m14.568s
user0m1.496s
sys 0m2.602s

NOT connected (all interface down)

$ time mpirun -n 4 ./hello_c.exe
Hello, world, I am 0 of 4
Hello, world, I am 2 of 4
Hello, world, I am 1 of 4
Hello, world, I am 3 of 4

real0m3.323s
user0m1.480s
sys 0m2.118s


I guess the 1 minute is due to some time of timeout.
Is such delay present on any other platform ?
Any workaround to remove it ?

Regards
Marco


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] openmpi-1.9a1r27674 on Cygwin-1.7.17

2012-12-18 Thread marco atzeri
m \
--without-udapl \
--enable-cxx-exceptions \
--with-threads=posix \
--without-cs-fs \
--enable-heterogeneous \
--with-mpi-param_check=always \
--enable-contrib-no-build=vt,libompitrace \

--enable-mca-no-build=paffinity,installdirs-windows,timer-windows,shmem-sysv




I added the following constants to /usr/include/cygwin/shm.h before
I started to build openmpi-1.9a1r27674.

diff /usr/include/cygwin/shm.h /usr/include/cygwin/shm.h.orig
29,34d28
< /* Permission definitions   */
< #define SHM_R   0400/* read permission  */
< #define SHM_W   0200/* write permission */
<


Don't do that, that compiles does not mean that it will work.
Look on SHM.patch used on cygwin openmpi-1.6.3-4 package;
please note however that this functionality will require a running
cygserver process and it can be avoided.
As default I disabled it.
http://cygwin.com/ml/cygwin-apps/2012-11/msg00020.html




I used the following commands to configure Open MPI.
"/usr/local/jdk1.7.0" is a link to my Java installation
on Windows 7.


Windows Java is not cygwin aware...



cd /usr/local
ln -s /cygdrive/c/Program\ Files\ \(x86\)/jdk1.7.0 jdk1.7.0


../openmpi-1.9a1r27674/configure --prefix=/usr/local/openmpi-1.9 \
   --with-jdk-bindir=/usr/local/jdk1.7.0/bin \
   --with-jdk-headers=/usr/local/jdk1.7.0/include \
   JAVA_HOME=/usr/local/jdk1.7.0 \
   LDFLAGS="-m32 -Wl,--export-all-symbols -no-undefined" \
   CC="gcc" CXX="g++" FC="gfortran" \
   CFLAGS="-m32" CXXFLAGS="-m32" FCFLAGS="-m32" \
   CPP="cpp" CXXCPP="cpp" \
   CPPFLAGS="" CXXCPPFLAGS="" \
   C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \
   OBJC_INCLUDE_PATH="" OPENMPI_HOME="" \
   --enable-cxx-exceptions \
   --enable-mpi-java \
   --enable-heterogeneous \
   --enable-opal-multi-threads \
   --enable-mpi-thread-multiple \
   --with-threads=posix \
   --with-hwloc=internal \
   --without-verbs \
   --without-udapl \
   --without-sctp \
   --with-wrapper-cflags=-m32 \
   --enable-debug \
   --disable-mca-dso \
   --without-cs-fs \
   --enable-contrib-no-build=vt,libompitrace \
   
--enable-mca-no-build=memory_mallopt,paffinity,installdirs-windows,timer-windows
 \
   |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV

It would be great, if I could get a working Open MPI version with
Java interface on Cygwin. Thank you very much for any help in advance.


First you need to build jdk for cygwin




Kind regards

Siegmar



Regards
Marco




Re: [OMPI users] Windows Open MPI question

2012-12-18 Thread marco atzeri

On 12/18/2012 5:49 PM, Kumar, Sudhir wrote:

Hi
  Is struct ompi_datatype_t defined only for Linux or is there a windows 
equivalent. If so in which header file can it be found.
Thanks


 ompi/datatype/ompi_datatype.h

Regards
Marco



Re: [OMPI users] openmpi-1.9a1r27674 on Cygwin-1.7.17

2012-12-18 Thread marco atzeri

On 12/18/2012 6:55 PM, Jeff Squyres wrote:

...but only of v1.6.x.


okay, adding development version on Christmas wishlist
;-)



On Dec 18, 2012, at 10:32 AM, Ralph Castain wrote:


Also, be aware that the Cygwin folks have already released a fully functional 
port of OMPI to that environment as a package. So if you want OMPI on Cygwin, 
you can just download and install the Cygwin package - no need to build it 
yourself.




Regards
Marco



Re: [OMPI users] openmpi-1.9a1r27674 on Cygwin-1.7.17

2012-12-19 Thread marco atzeri

On 12/19/2012 11:04 AM, Siegmar Gross wrote:

Hi


On 12/18/2012 6:55 PM, Jeff Squyres wrote:

...but only of v1.6.x.


okay, adding development version on Christmas wishlist
;-)


Can you build the package with thread and Java support?

   --enable-mpi-java \
   --enable-opal-multi-threads \
   --enable-mpi-thread-multiple \
   --with-threads=posix \

I could build openmpi-1.6.4 with thread support without a problem
for Cygwin 1.7.17 but I failed to build openmpi-1.9 until now.



working on openmpi-1.7rc5.
It needs some cleaning and after I need to test.

java surely no as there is no cygwin Java.

--with-threads=posix  yes

not tested yet
--enable-opal-multi-threads \
--enable-mpi-thread-multiple \





Kind regards

Siegmar



Regards
Marco




Re: [OMPI users] openmpi-1.9a1r27674 on Cygwin-1.7.17

2012-12-19 Thread marco atzeri

On 12/19/2012 12:28 PM, marco atzeri wrote:


working on openmpi-1.7rc5.
It needs some cleaning and after I need to test.


built and passed test
http://www.open-mpi.org/community/lists/devel/2012/12/11855.php

Regards
Marco



Re: [OMPI users] problem compiling openmpi

2013-06-27 Thread marco atzeri

Il 6/27/2013 9:23 PM, rmjuberias ha scritto:

hi

i am trying to compile openmpi and when I make the "make all install" I have an 
error that I cant figure out. Any feedback would be appreciated.

Thanks!



openmpi-1.2.6  ?
Why not at least a 1.6.x series ?




Re: [OMPI users] Open MPI 1.6.5 "make all" fails on Win7 with "system cannot find file specified"

2013-07-11 Thread marco atzeri

Il 7/11/2013 11:15 AM, Don Warren ha scritto:

Hello, all,

I'm attempting to install Open MPI in a cygwin environment on Windows 7
(64-bit, but I think Open MPI is treating things like a 32-bit environment).


Hi Don,
cygwin already has a 1.7.1 package

$ cygcheck -c -d |grep mpi
libopenmpi   1.7.1-2
libopenmpi-devel 1.7.1-2
libopenmpicxx1   1.7.1-2
libopenmpifh21.7.1-2
libopenmpiuse1   1.7.1-2
openmpi  1.7.1-2

any reason to not use it ?
Any reason to need 1.6.5 ?

Cygwin is 32 bit enviroment, so of coure the package is 32bit;
However cygwin64, still beta, already has the same package version

$ cygcheck -c -d |grep mpi
libopenmpi  1.7.1-2
libopenmpi-devel1.7.1-2
libopenmpicxx1  1.7.1-2
libopenmpifh2   1.7.1-2
libopenmpiuse1  1.7.1-2
openmpi 1.7.1-2


The command I used to configure Open MPI was
---
./configure --with-mpi-f90-size=medium -prefix=/home/Don/openmpi
F77=gfortran FC=gfortran
---


too little ;-)
current 1.7.1-2 package is built with:

configure  \
LDFLAGS="-Wl,--export-all-symbols -no-undefined"  \
--disable-mca-dso \
--disable-sysv-shmem \
--without-udapl \
--enable-cxx-exceptions \
--with-threads=posix \
--without-cs-fs \
--enable-heterogeneous \
--with-mpi-param_check=always \
--enable-contrib-no-build=vt,libompitrace \

--enable-mca-no-build=paffinity,installdirs-windows,timer-windows,shmem-sysv

and I had a similar one on previous 1.6.4-2 package

Regards
MArco



Re: [OMPI users] need help in this error

2013-10-24 Thread marco atzeri

Il 10/24/2013 7:35 PM, Osman Khalid ha scritto:

Hi

I am for the first time installing OpenMPI on my windows XP machine,
using Cygwin.

The *./configure* command is successful.

However, when I give *make* command, i get the following error:

$ make
Making all in config
make[1]: Entering directory `/d/cygwin/home/OSMANK/openmpi-1.6.5/config'

[cut]

*make[2]: Entering directory
`/d/cygwin/home/OSMANK/openmpi-1.6.5/opal/libltdl'*
*/bin/sh /home/OSMANK/openmpi-1.6.5/opal/libltdl/config/install-sh -d .*
*/bin/sh: /home/OSMANK/openmpi-1.6.5/opal/libltdl/config/install-sh: No
such file or directory*
make[2]: *** [argz.h] Error 127
make[2]: Leaving directory
`/d/cygwin/home/OSMANK/openmpi-1.6.5/opal/libltdl'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/d/cygwin/home/OSMANK/openmpi-1.6.5/opal'
make: *** [all-recursive] Error 1


I went to the above folder, and found that the file  "install-sh" was there.

Would request help on that.

Best Regards
Osman



openmpi-1.7.2 is available on cygwin
Any specific reason to build 1.6.5 ?





Re: [OMPI users] need help in this error

2013-10-24 Thread marco atzeri

Il 10/24/2013 10:02 PM, Osman Khalid ha scritto:

Thank you Marco for reply

I changed the version to 1.7, but it is stilling give me exactly the
same error. I copy paste error below:



Hi Osman,
It seems I was not clear.
I mean that openmpi was already available as package
distributed  in cygwin

See http://cygwin.com/packages/ for all available packages.

Why do you need to build it by yourself ?
Can not you use the one already distributed ?

Regards
Marco



[OMPI users] Bug in MPI_Reduce/MPI_Comm_split?

2007-08-31 Thread Marco Sbrighi
c00, count=32768) at 
collect_noparms.c:248
#13 0x00404c8e in main (argc=1, argv=0x7fb308) at 
collect_noparms.c:187
(gdb) 
-


I think this bug is not related to my performance slowdown in collective
operations but . something seems to be wrong at an higher level in
MCA framework . 
Is there someone able to reproduce a similar bug? 
Is there someone having performance slowdown in collective operations
with big jobs using OFED 1.1 over InfiniBand interconnect? 
Does I need some further btl or coll tuning? (I've tried with SRQ but
that doesn't resolve my problems).  


Marco 

-- 
-
 Marco Sbrighi  m.sbri...@cineca.it

 HPC Group
 CINECA Interuniversity Computing Centre
 via Magnanelli, 6/3
 40033 Casalecchio di Reno (Bo) ITALY
 tel. 051 6171516
/ (c) Marco Sbrighi - CINECA /



#include "mpi.h"

#include 
#include 
#include 

#ifndef HOST_NAME_MAX
#pragma warn self defined HOST_NAME_MAX
#define HOST_NAME_MAX 255
#endif

#ifndef _POSIX_PATH_MAX
#pragma warn self defined _POSIX_PATH_MAX
#define _POSIX_PATH_MAX 2048
#endif

int ReduceExitStatus(int rank, int exitstat, FILE* out);
int exitall(int rank, int exitstat, FILE* out);
void checkAbort(MPI_Comm comm, int err);
int checkFail(MPI_Comm comm, int err);
int (*op ) (MPI_Comm,int);
int bcast(MPI_Comm,int);
int reduce(MPI_Comm,int);
int allreduce(MPI_Comm, int);


char myname[LINE_MAX];


char *wbuf, *rbuf;


int ReduceExitStatus(int rank, int exitstat, FILE* out)
{
  int commstat,retc;
  commstat=0;
  retc= MPI_Allreduce (&exitstat, &commstat,1, MPI_INT, MPI_BOR,MPI_COMM_WORLD);
  fprintf(stdout, "Reducing %d. Allreduce is exiting with status %d reporting %d to cummunicator.\n",exitstat,retc,commstat );

  return  (commstat);
}

int exitall(int rank, int exitstat, FILE* out) {
  int commstat;
  commstat=0;
  commstat=ReduceExitStatus(rank,exitstat,out);
  MPI_Finalize();
  return  (commstat);
}

void checkAbort(MPI_Comm comm, int err)
{
  if (err != MPI_SUCCESS) MPI_Abort(comm, err);
}

int checkFail(MPI_Comm comm, int err)
{
  return err == MPI_SUCCESS ? 1:0;
}


int myid, n_myid;


char processor_name[MPI_MAX_PROCESSOR_NAME];




int main(int argc, char *argv[])
{
int  i, namelen;


int last_opt,j;
size_t count;

//size_t bsize;

size_t minbuf, maxbuf, stepbuf;
int minc, maxc, stepc;
int err, color,key;
MPI_Comm n_comm;
double stime,etime,ttime;
double timeout;
double status;
int numprocs;
char* opname; 
//double sbuf[4];

//usec_timer_t t;

void *attr_value;
int flag, commsize;
size_t bufsize;
int rep, maxrep;

long long deltat;
//mpirun.lsf ./collect.sh -d 1 -minc 35 -minbuf 0 -maxbuf 1048576 -stepbuf 131072 -op reduce 

MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
MPI_Get_processor_name(processor_name,&namelen);

processor_name[namelen]=(char)0;

if ( numprocs < 2 ) {

  if (myid==0) { 
	fprintf(stderr, "--> please launch at least with 2 MPI processes\n");
  }
  return exitall(myid,0,stderr); 
}

minc=35;
maxc=numprocs;
stepc=1;
maxbuf=1048576;
minbuf=0;
stepbuf=131072;
maxrep=20;
op=reduce;
timeout= 3.0/1000.0;



if (myid==0) {
  /* sync Wtime? */
  err=MPI_Attr_get (MPI_COMM_WORLD, MPI_WTIME_IS_GLOBAL, &attr_value, &flag);
  checkAbort(MPI_COMM_WORLD,err);

  if (flag) {
	if ( *(int*)attr_value < 0 || *(int*)attr_value > 1)
	  fprintf(stdout, "The value of WTIME_IS_GLOBAL %d is not valid.\n", *(int*)attr_value );
	else 
	  fprintf(stdout, "This implementation support MPI_Wtime sync across processes! Enjoy.\n");
  }

}

fflush(stdout); 

if ( (wbuf=(char*) malloc ( maxbuf*sizeof(char))) == NULL) {
   fprintf(stderr, "%d - Unable to allocate %lld bytes of memory.\n", myid, (long long int) maxbuf);
   MPI_Abort(MPI_COMM_WORLD,2);
}

if ( (rbuf=(char*) malloc ( maxbuf*sizeof(char))) == NULL) {
   fprintf(stderr, "%d - Unable to allocate %lld bytes of memory.\n", myid, (long long int) maxbuf);
   MPI_Abort(MPI_COMM_WORLD,2);
}
/* root processor =0, for easy */

//RENAME_UTIMER(&t, "Collective");
// RESET_UTIMER(&t,NULL);  

if ( myid==0) fprintf(stdout,"RA  -  RBnp   size   msec\n"); 

for ( commsize = minc; commsize <= maxc; commsize += stepc) {

/*   err=MPI_Barrier(MPI_COMM_WORLD); */
/*   checkAbort ( MPI_COMM_WORLD, err ) ; */

  color = (myid < commsize ? 1 : 2);  key = 0;
  err=MPI_Comm_split(MPI_COMM_WORLD,color,key,&n_comm);
  checkAbort(MPI_COMM_WORLD,err);
  err=MPI_Comm_rank(n_comm, &n_myid);
  checkAbor

[OMPI users] Bug in oob_tcp_[in|ex]clude?

2007-12-17 Thread Marco Sbrighi


Dear Open MPI developers,

I'm using Open MPI 1.2.2 over OFED 1.2 on an 256 nodes, dual Opteron,
dual core, Linux cluster. Of course, with Infiniband 4x interconnect. 

Each cluster node is equipped with 4 (or more) ethernet interface,
namely 2 gigabit ones plus 2 IPoIB. The two gig are named  eth0,eth1,
while the two IPoIB are named ib0,ib1.

It happens that the eth0 is a management network, with poor
performances, and furthermore we wouldn't use the ib* to carry MPI's
traffic (neither OOB or TCP), so we would like the eth1 is used for open
MPI OOB and TCP.

In order to drive the OOB over only eth1 I've tried various combinations
of oob_tcp_[ex|in]clude MCA statements, starting from the obvious

oob_tcp_exclude = lo,eth0,ib0,ib1 

then trying the othe obvious:

oob_tcp_include = eth1

and both at the same time.

Next I've tried the following:

oob_tcp_exclude = eth0

but after the job starts, I still have a lot of tcp connections
established using eth0 or ib0 or ib1. 
Furthermore It happens the following error:

   [node191:03976] [0,1,14]-[0,1,12] mca_oob_tcp_peer_complete_connect:
connection failed: Connection timed out (110) - retrying

I've found only a way in order to have tcp connections binded only to
the eth1 interface, using both the following MCA directives in the
command line:

mpirun  --mca oob_tcp_include eth1 --mca oob_tcp_include lo,eth0,ib0,ib1 
.

This sounds me as bug. 

Is there someone able to reproduce this behaviour? 
If this is a bug, are there fixes?

Thanks.

Marco


-- 
-----
 Marco Sbrighi  m.sbri...@cineca.it

 HPC Group
 CINECA Interuniversity Computing Centre
 via Magnanelli, 6/3
 40033 Casalecchio di Reno (Bo) ITALY
 tel. 051 6171516



Re: [OMPI users] Bug in oob_tcp_[in|ex]clude?

2007-12-18 Thread Marco Sbrighi
On Mon, 2007-12-17 at 17:19 -0500, Jeff Squyres wrote:
> On Dec 17, 2007, at 8:35 AM, Marco Sbrighi wrote:
> 
> > I'm using Open MPI 1.2.2 over OFED 1.2 on an 256 nodes, dual Opteron,
> > dual core, Linux cluster. Of course, with Infiniband 4x interconnect.
> >
> > Each cluster node is equipped with 4 (or more) ethernet interface,
> > namely 2 gigabit ones plus 2 IPoIB. The two gig are named  eth0,eth1,
> > while the two IPoIB are named ib0,ib1.
> >
> > It happens that the eth0 is a management network, with poor
> > performances, and furthermore we wouldn't use the ib* to carry MPI's
> > traffic (neither OOB or TCP), so we would like the eth1 is used for  
> > open
> > MPI OOB and TCP.
> >
> > In order to drive the OOB over only eth1 I've tried various  
> > combinations
> > of oob_tcp_[ex|in]clude MCA statements, starting from the obvious
> >
> > oob_tcp_exclude = lo,eth0,ib0,ib1
> >
> > then trying the othe obvious:
> >
> > oob_tcp_include = eth1
> 
> This one statement (_include) should be sufficient.

I agree with your interpretation, but what I'm experimenting here is "it
should" but in fact it doesn't .

> 
> Assumedly this(these) statement(s) are in a config file that is being  
> read by Open MPI, such as $HOME/.openmpi/mca-params.conf?

I've tried many combinations: only in $HOME/.openmpi/mca-params.conf,
only in command line and both; but none seems to work correctly.
Nevertheless, what I'm expecting is that if something is specified in 
$HOME/.openmpi/mca-params.conf, then if differently specified in command
line, the last should be assumed, I think.
> 
> > and both at the same time.
> >
> > Next I've tried the following:
> >
> > oob_tcp_exclude = eth0
> >
> > but after the job starts, I still have a lot of tcp connections
> > established using eth0 or ib0 or ib1.
> > Furthermore It happens the following error:
> >
> >   [node191:03976] [0,1,14]-[0,1,12] mca_oob_tcp_peer_complete_connect:
> > connection failed: Connection timed out (110) - retrying
> 
> This is quite odd.  :-(
> 
> > I've found only a way in order to have tcp connections binded only to
> > the eth1 interface, using both the following MCA directives in the
> > command line:
> >
> > mpirun  --mca oob_tcp_include eth1 --mca oob_tcp_include  
> > lo,eth0,ib0,ib1 .
> >
> > This sounds me as bug.
> 
> Yes, it does.  Specifying the MCA same param twice on the command line  
> results in undefined behavior -- it will only take one of them, and I  
> assume it'll take the first (but I'd have to check the code to be sure).

OK, I can obtain the same behaviour using only one statement: 
--mca oob_tcp_include eth1,lo,eth0,ib0,ib1

note that using  --mca mpi_show_mca_params what I'm seeing in the report
is the same for both statements (twice and single):

.
 [node255:30188] oob_tcp_debug=0
[node255:30188] oob_tcp_include=eth1,lo,eth0,ib0,ib1
[node255:30188] oob_tcp_exclude=
...


> 
> > Is there someone able to reproduce this behaviour?
> > If this is a bug, are there fixes?
> 
> 
> I'm unfortunately unable to reproduce this behavior.  I have a test  
> cluster with 2 IP interfaces: ib0, eth0.  I have tried several  
> combinations of MCA params with 1.2.2:
> 
> --mca oob_tcp_include ib0
> --mca oob_tcp_include ib0,bogus
> --mca oob_tcp_include eth0
> --mca oob_tcp_include eth0,bogus
> --mca oob_tcp_exclude ib0
> --mca oob_tcp_exclude ib0,bogus
> --mca oob_tcp_exclude eth0
> --mca oob_tcp_exclude eth0,bogus
> 
> All do as they are supposed to -- including or excluding ib0 or eth0.
> 
> I do note, however, that the handling of these parameters changed in  
> 1.2.3 -- as well as their names.  The names changed to  
> "oob_tcp_if_include" and "oob_tcp_if_exclude" to match other MCA  
> parameter name conventions from other components.
> 
> Could you try with 1.2.3 or 1.2.4 (1.2.4 is the most recent; 1.2.5 is  
> due out "soon" -- it *may* get out before the holiday break, but no  
> promises...)?

we have 1.2.3 in another cluster and it performs the same behaviour as
1.2.2  (BTW the other cluster has the same eth ifaces)

> 
> If you can't upgrade, let me know and I can provide a debugging patch  
> that will give us a little more insight into what is happening on your  
> machines.  Thanks.

It is quite difficult for us to upgrade the open-mpi now. We have the
official CISCO packages installed, and I know the 1.2.2-1 is the only
official CISCO's open-mpi distribution today 

In any case I would like to try your debug patch.

Thanks

Marco 

> 
-- 
-
 Marco Sbrighi  m.sbri...@cineca.it

 HPC Group
 CINECA Interuniversity Computing Centre
 via Magnanelli, 6/3
 40033 Casalecchio di Reno (Bo) ITALY
 tel. 051 6171516



Re: [OMPI users] Bug in oob_tcp_[in|ex]clude?

2007-12-18 Thread Marco Sbrighi
On Mon, 2007-12-17 at 20:58 -0500, Brian Dobbins wrote:
> Hi Marco and Jeff,
> 
>   My own knowledge of OpenMPI's internals is limited, but I thought
> I'd add my less-than-two-cents...
> 
> > I've found only a way in order to have tcp connections
> binded only to
> > the eth1 interface, using both the following MCA directives
> in the
> > command line:
> >
> > mpirun  --mca oob_tcp_include eth1 --mca
> oob_tcp_include 
> > lo,eth0,ib0,ib1 .
> >
> > This sounds me as bug.
> 
> 
> Yes, it does.  Specifying the MCA same param twice on the
> command line
> results in undefined behavior -- it will only take one of
> them, and I 
> assume it'll take the first (but I'd have to check the code to
> be sure).
> 
>   I think that Marco intended to write:
>   mpirun  --mca oob_tcp_include eth1 --mca oob_tcp_exclude
> lo,eth0,ib0,ib1 ... 

no, I intended to write exactly what I wrote. The double statement is
reported by --mca mpi_show_mca_params exactly as I write one statement
only, as follows:

--mca oob_tcp_include eth1,lo,eth0,ib0,ib1

> 
>   Is this correct?  So you're not specifying include twice, you're
> specifying include and exclude, so each interface is explicitly stated
> in one list or the other.  I remember encountering this behaviour as
> well, in a slightly different format, but I can't seem to reproduce it
> now either. 

notice, the two lists are never intersecting.

>  That said, with these options, won't the MPI traffic (as opposed to
> the OOB traffic) still use the eth1,ib0 and ib1 interfaces?  You'd
> need to add '-mca btl_tcp_include eth1' in order to say it should only
> go over that NIC, I think. 

Yes I know, in fact -mca btl_tcp_[if]_exclude lo,eth0,ib0,ib1
works fine (seems). I'm using this MCA parameter since open-mpi 1.2.1
and the trouble with oob_tcp_[if]_[in|ex]clude sounded quite strange to
me, after all the code used for the parser should be more or less the
same . 

> 
>   As for the 'connection errors', two bizarre things to check are,
> first, that all of your nodes using eth1 actually have
> correct /etc/hosts mappings to the other nodes.  One system I ran on
> had this problem when some nodes had an IP address for node002 as one
> thing, and another node had node002's IP address as something
> different.   This should be easy enough by trying to run on one node
> first, then two nodes that you're sure have the correct addresses. 

Yes, I've already verified that. 

> 
>   .. The second situation is if you're launching an MPMD program.
> Here, you need to use '-gmca ' instead of '-mca '.
> 

No, currently I'm using only SPMD ones, and I hope to use them for the
rest of the century :-)

>   Hope some of that is at least a tad useful.  :) 
> 

Thanks you very much Brian,

Marco 

>   Cheers,
>   - Brian
> 
-- 
-
 Marco Sbrighi  m.sbri...@cineca.it

 HPC Group
 CINECA Interuniversity Computing Centre
 via Magnanelli, 6/3
 40033 Casalecchio di Reno (Bo) ITALY
 tel. 051 6171516



Re: [OMPI users] Occasional mpirun hang on completion

2008-01-22 Thread Marco Sbrighi



Dear Barry and Jeff,


using OpenMPI we are experimenting something like the behaviour reported
by Barry.
Let me to introduce the context:
we are using RHEL4 U4 on 2 way, AMD Opteron dual core, nodes.
Each node is equipped with 16 GB of RAM, plus 4 GB of SWAP.
OpenMPi is 1.2.2.
Sometimes, for jobs that runs for many hours (1 - 2 days), it happens
that mpirun generates "kernel memory crisis". This is an excerpt of what
we are seeing in syslog:

Jan  7 10:14:18 node203e0 kernel: mpirun: page allocation failure. order:5, 
mode:0xd0
Jan  7 10:14:18 node203e0 kernel:
Jan  7 10:14:18 node203e0 kernel: Call 
Trace:{__alloc_pages+768} 
{__get_free_pages+11}
Jan  7 10:14:18 node203e0 kernel:{kmem_getpages+36} 
{tcp_sendmsg+0}
Jan  7 10:14:18 node203e0 kernel:{tcp_sendmsg+0} 
{cache_alloc_refill+609}
Jan  7 10:14:18 node203e0 kernel:{__kmalloc+123} 
{alloc_skb+65}
Jan  7 10:14:18 node203e0 kernel:{tcp_sendmsg+363} 
{sock_sendmsg+271}
Jan  7 10:14:18 node203e0 kernel:
{__generic_file_aio_write_nolock+731}
Jan  7 10:14:18 node203e0 kernel:
{generic_file_aio_write+126} 
{autoremove_wake_function+0}
Jan  7 10:14:18 node203e0 kernel:
{:nfs:nfs_file_write+195} 
{sock_readv_writev+122}
Jan  7 10:14:18 node203e0 kernel:{sock_writev+61} 
{do_readv_writev+421}
Jan  7 10:14:18 node203e0 kernel:
{autoremove_wake_function+0} 
{poll_freewait+64}
Jan  7 10:14:18 node203e0 kernel:{dnotify_parent+34} 
{sys_writev+69}
Jan  7 10:14:18 node203e0 kernel:{system_call+126}
Jan  7 10:14:18 node203e0 kernel: Mem-info:
Jan  7 10:14:18 node203e0 kernel: Node 1 DMA per-cpu: empty
Jan  7 10:14:18 node203e0 kernel: Node 1 Normal per-cpu:
Jan  7 10:14:18 node203e0 kernel: cpu 0 hot: low 32, high 96, batch 16
Jan  7 10:14:18 node203e0 kernel: cpu 0 cold: low 0, high 32, batch 16
Jan  7 10:14:18 node203e0 kernel: cpu 1 hot: low 32, high 96, batch 16
Jan  7 10:14:18 node203e0 kernel: cpu 1 cold: low 0, high 32, batch 16
Jan  7 10:14:19 node203e0 kernel: cpu 2 hot: low 32, high 96, batch 16
Jan  7 10:14:19 node203e0 kernel: cpu 2 cold: low 0, high 32, batch 16
Jan  7 10:14:19 node203e0 kernel: cpu 3 hot: low 32, high 96, batch 16
Jan  7 10:14:19 node203e0 kernel: cpu 3 cold: low 0, high 32, batch 16
Jan  7 10:14:19 node203e0 kernel: Node 1 HighMem per-cpu: empty
Jan  7 10:14:19 node203e0 kernel: Node 0 DMA per-cpu:
Jan  7 10:14:19 node203e0 kernel: cpu 0 hot: low 2, high 6, batch 1
Jan  7 10:14:19 node203e0 kernel: cpu 0 cold: low 0, high 2, batch 1
Jan  7 10:14:19 node203e0 kernel: cpu 1 hot: low 2, high 6, batch 1
Jan  7 10:14:19 node203e0 kernel: cpu 1 cold: low 0, high 2, batch 1
Jan  7 10:14:19 node203e0 kernel: cpu 2 hot: low 2, high 6, batch 1
: 

The "crisis" may lead to an "mpirun hang", sometimes.
It seems that mpirun uses aggressively "socket calls", but we are not
sure about the motivation of such behaviour. Maybe there are a set of
synergistic causes, nevertheless when the kernel reports such kind of
"fault" the only implied process is mpirun , all the times.

marco


On Fri, 2008-01-18 at 22:13 -0500, Barry Rountree wrote:
> On Fri, Jan 18, 2008 at 08:33:10PM -0500, Jeff Squyres wrote:
> > Barry --
> > 
> > Could you check what apps are still running when it hangs?  I.e., I  
> > assume that all the uptime's are dead; are all the orted's dead on the  
> > remote nodes?  (orted = our helper process that is launched on the  
> > remote nodes to exert process control, funnel I/O back and forth to  
> > mpirun, etc.)
> > 
> > If any of the orted's are still running, can you connect to them with  
> > gdb and get a backtrace to see where they are hung?
> > 
> > Likewise, can you connect to mpirun with gdb and get a backtrace of  
> > where it's hung?
> > 
> > Ralph, the main ORTE developer, is pretty sure that it's stuck in the  
> > IO flushing routines that are executed at the end of time (look for  
> > function names like iof_flush or similar).  We thought we had fixed  
> > all of those on the 1.2 branch, but perhaps there's some other weird  
> > race condition happening that doesn't happen on our test machines...
> 
> I'm happy to help.  I've got a paper submission deadline on Tuesday, so
> it might not be until midweek.
> 
> Thanks for the reply,
> 
> Barry
> 
> > 
> > 
> > 
> > On Jan 13, 2008, at 10:17 AM, Barry Rountree wrote:
> > 
> > > On Sun, Jan 13, 2008 at 09:54:47AM -0500, Barry Rountree wrote:
> > > > Hello,
> > > >
> > > > The following command
> > > >
> > > > mpirun -np 2 -hostfile ~/hostfile uptime
> > > >
> > >

Re: [OMPI users] Configure failure

2015-04-27 Thread Marco Atzeri

On 4/27/2015 8:54 PM, Jeff Squyres (jsquyres) wrote:

Marco --

Have you run into this?

The m4 line in question that seems to be the problem is:

 [AS_VAR_SET(type_var, [`cat conftestval`])]

Does `cat foo` in cygwin result in a ^M in the resulting shell string?  If so, 
is there a standard way to get rid of it?



On cygwin it works fine:

configure:33436: checking size of Fortran CHARACTER
configure:33504: gcc -std=gnu99 -DNDEBUG -ggdb -O2 -pipe 
-Wimplicit-function-dec
laration 
-fdebug-prefix-map=/pub/devel/openmpi/openmpi-1.8.5rc3-1.x86_64/build=/
usr/src/debug/openmpi-1.8.5rc3-1 
-fdebug-prefix-map=/pub/devel/openmpi/openmpi-1
.8.5rc3-1.x86_64/src/openmpi-1.8.5rc3=/usr/src/debug/openmpi-1.8.5rc3-1 
-finline

-functions -fno-strict-aliasing -fexceptions -I. -c conftest.c
configure:33511: $? = 0
configure:33521: gfortran -ggdb -O2 -pipe 
-fdebug-prefix-map=/pub/devel/openmpi/
openmpi-1.8.5rc3-1.x86_64/build=/usr/src/debug/openmpi-1.8.5rc3-1 
-fdebug-prefix

-map=/pub/devel/openmpi/openmpi-1.8.5rc3-1.x86_64/src/openmpi-1.8.5rc3=/usr/src/
debug/openmpi-1.8.5rc3-1 -fexceptions  conftestf.f90 conftest.o -o 
conftest  -fe

xceptions
configure:33528: $? = 0
configure:33572: ./conftest
configure:33579: $? = 0
configure:33595: result: 1
configure:33618: checking for C type corresponding to CHARACTER


Re: [OMPI users] mpirun

2015-05-30 Thread Marco Atzeri

On 5/29/2015 9:53 PM, Walt Brainerd wrote:

It behaved this way with the Cygwin version (very recent update)
and with 1.8.5 that I built from source.

On Fri, May 29, 2015 at 12:35 PM, Ralph Castain mailto:r...@open-mpi.org>> wrote:

I assume you mean on cygwin? Or is this an older version that
supported native Windows?

 > On May 29, 2015, at 12:34 PM, Walt Brainerd
mailto:walt.brain...@gmail.com>> wrote:
 >
 > On Windows, mpirun appears to take about 5 seconds
 > to start. I can't try it on Linux. Intel takes no time to
 > start executing its version.
 >
 > Is this expected?
 >


I would say yes

$ time mpirun -n 2 ./hello_c.exe
Hello, world, I am 0 of 2, (Open MPI v1.8.5, package: Open MPI .., 
ident: 1.8.5, repo rev: v1.8.4-333-g039fb11, May 05, 2015, 127)
Hello, world, I am 1 of 2, (Open MPI v1.8.5, package: Open MPI .., 
ident: 1.8.5, repo rev: v1.8.4-333-g039fb11, May 05, 2015, 127)


real0m2.636s
user0m1.012s
sys 0m2.119s

I presume is wasting some time enumerating and rejecting the
available interfaces.
On windows they have unusual names

$ ./interface-64.exe
Interfaces (count = 10):
{EC2ABB5C-42A8-431D-A133-8F4BE0F309AF}
{9213DBB8-80C6-4316-AA7A-EBF8AD7661E1}
{8D78D8D9-CFF0-4C4A-AFC3-72CB0E275588}
{2449A164-BE1A-4393-8168-2A3EDC9AA6F0}
{97191531-3960-4C35-8D79-1851EF7EE9E0}
{6F8DABED-A5FE-4E8D-8BA1-02763080D9DC}
{2A3E9C71-E553-44D0-ABE3-327EB89C3863}
{9F4F7FD2-5E44-4796-ABE0-0785CF76C11E}
{C4069E93-6662-44BF-B363-5175A04681D5}
{846EE342-7039-11DE-9D20-806E6F6E6963}





Re: [OMPI users] CygWin compilation of OpenMPI-1.8.5

2015-06-08 Thread Marco Atzeri

On 6/7/2015 5:52 PM, Ilias Miroslav wrote:

Greetings,

CygWin is interesting intermediating environment between Windows and Linux-like 
architectures, and the OpenMPI project is good platform for enabling parallel 
calculations.

Here is my OpenMPI building experience with some problems encountered (with 
up-to-date CygWin & OpenMPI):


cygwin already provides openmpi 1.8.5 package.



1) the "default" OpenMPI configuration (no special flags) gives these linking 
errors:


the packages are built with

--disable-mca-dso \
--disable-sysv-shmem \
--enable-cxx-exceptions \
--with-threads=posix \
--without-cs-fs \
--with-mpi-param_check=always \
--enable-contrib-no-build=vt,libompitrace \

--enable-mca-no-build=paffinity,installdirs-windows,timer-windows,shmem-sysv



2) The OpenMPI configuration with the flags specified by
https://www.open-mpi.org/community/lists/users/2014/04/24166.php
produces working mpif90,mpicc,mpicxx... executables.

However, the "make check" testing  gives the second test wrong (see below).


the only current failure is on x86_64
   assertion "opal_atomic_cmpset

all the other tests pass.


Any help how to fix this test issue ?


use cygwin packages
;-)


Miro


Regards
Marco




Re: [OMPI users] Slides from the Open MPI SC'15 State of the Union BOF

2015-11-19 Thread Marco Atzeri



On 19/11/2015 16:15, Lev Givon wrote:

Received from Jeff Squyres (jsquyres) on Thu, Nov 19, 2015 at 10:03:33AM EST:

Thanks to the over 100 people who came to the Open MPI State of the Union BOF
yesterday.  George Bosilca from U. Tennessee, Nathan Hjelm from Los Alamos
National Lab, and I presented where we are with Open MPI development, and
where we're going.

If you weren't able to join us, feel free to read through the slides:

 http://www.open-mpi.org/papers/sc-2015/

Thank you!


FYI, there seems to be some problem with the posted PDF file - when I tried to
view it in Firefox 42 and 3 other PDF viewers (on Linux, at least), all of the
programs claimed that the file is either corrupted or misformatted.


Same on Windows.
The file seems incomplete; it misses the PDF closure

Regards
Marco



Re: [OMPI users] How to run OpenMPI C code under Windows 7

2015-11-22 Thread Marco Atzeri

On 22/11/2015 23:04, Philip Bitar wrote:

*How to run OpenMPI C code under Windows 7*

I'm trying to get OpenMPI C code to run under Windows 7 any way that I
can. Evidently there is no current support for running OpenMPI directly
under Windows 7, so I installed Cygwin. Is there a better way to run
OpenMPI C code under Windows 7?

Under Cygwin, I installed a GCC C compiler, which works.

I also installed an OpenMPI package. Here is a link to a list of the
files in the Cygwin OpenMPI package:

https://cygwin.com/cgi-bin2/package-cat.cgi?file=x86%2Flibopenmpi%2Flibopenmpi-1.8.6-1&grep=openmpi

My PATH variable is as follows:

/usr/local/bin:/usr/bin

mpicc will compile, but it won't link. It can't find the following:

-lmpi
-lopen-rte
-lopen-pal


have you installed libopenmpi-devel ?



[OMPI users] MPI_INIT gets stuck

2016-03-06 Thread Marco Lubosch

Hello guys,

I try to do the first steps with Open MPI and I finally got it work on 
Cygwin64(Windows 7 64bit).
I am able to compile plain C code without any issues via "mpicc ..." but 
when I try to initialize MPI the program is getting stuck within 
"MPI_INIT" without creating CPU load. Example from 
https://svn.open-mpi.org/source/xref/ompi_1.8/examples/:


   #include 
   #include "mpi.h"
   int main(int argc, char* argv[])
   {
int rank, size, len;
char version[MPI_MAX_LIBRARY_VERSION_STRING];
printf("1\n");
MPI_Init(&argc, &argv);
printf("2\n");
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("3\n");
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf("4\n");
MPI_Get_library_version(version, &len);
printf("5\n");
printf("Hello, world, I am %d of %d, (%s, %d)\n", rank, size,
   version, len);
MPI_Finalize();
printf("6\n");
return 0;
   }

Compiling works perfectly fine with "mpicc -o hello_c.exe hello_c.c". 
But when I run it with "mpirun -np 4 ./hello_c" it creates 4 threads 
printing "1" but then keeps on running without doing anything. I then 
have to kill the threads manually to keep on working with Cygwin.


Can you tell me what I am doing wrong?

Thanks
Marco

PS: Installed packages on Cygwin are libopenmpi, libopenmpi-devel, 
openmpi, gcc-core


Re: [OMPI users] MPI_INIT gets stuck

2016-03-07 Thread Marco Atzeri



On 06/03/2016 10:06, Marco Lubosch wrote:

Hello guys,

I try to do the first steps with Open MPI and I finally got it work on
Cygwin64(Windows 7 64bit).
I am able to compile plain C code without any issues via "mpicc ..." but
when I try to initialize MPI the program is getting stuck within
"MPI_INIT" without creating CPU load. Example from
https://svn.open-mpi.org/source/xref/ompi_1.8/examples/:

#include 
#include "mpi.h"
int main(int argc, char* argv[])
{
 int rank, size, len;
 char version[MPI_MAX_LIBRARY_VERSION_STRING];
 printf("1\n");
 MPI_Init(&argc, &argv);
 printf("2\n");
 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
 printf("3\n");
 MPI_Comm_size(MPI_COMM_WORLD, &size);
 printf("4\n");
 MPI_Get_library_version(version, &len);
 printf("5\n");
 printf("Hello, world, I am %d of %d, (%s, %d)\n", rank, size,
version, len);
 MPI_Finalize();
 printf("6\n");
 return 0;
}

Compiling works perfectly fine with "mpicc -o hello_c.exe hello_c.c".
But when I run it with "mpirun -np 4 ./hello_c" it creates 4 threads
printing "1" but then keeps on running without doing anything. I then
have to kill the threads manually to keep on working with Cygwin.

Can you tell me what I am doing wrong?

Thanks
Marco

PS: Installed packages on Cygwin are libopenmpi, libopenmpi-devel,
openmpi, gcc-core





It seems a local issue. On my W7 64 bit:

$ mpirun -n 4 ./prova_mpi.exe
1
1
1
1
2
3
4
5
Hello, world, I am 0 of 4, (Open MPI v1.8.8, .., Aug 05, 2015, 126)
2
3
4
5
Hello, world, I am 2 of 4, (Open MPI v1.8.8, package: ..., Aug 05, 2015, 
126)

2
3
4
5
Hello, world, I am 1 of 4, (Open MPI v1.8.8, ... , Aug 05, 2015, 126)
2
3
4
5
Hello, world, I am 3 of 4, (Open MPI v1.8.8, ... , Aug 05, 2015, 126)
6
6
6
6



Re: [OMPI users] MPI_INIT gets stuck

2016-03-07 Thread Marco Lubosch

Thanks Marco,

I reinstalled Cygwin and OMPI like 10 times. I had an issues with 
gcc(mingw) because it was preinstalled under windows. I then had to 
remove it and reinstall gcc under cygwin and got it working but as I 
said only copiling plain C code with "mpicc". I also disabled Windows 
Firewall and tried a different router.


Do you have any suggestions what could cause that problem?

Greetings
Marco

Am 07.03.2016 um 15:26 schrieb Marco Atzeri:



On 06/03/2016 10:06, Marco Lubosch wrote:

Hello guys,

I try to do the first steps with Open MPI and I finally got it work on
Cygwin64(Windows 7 64bit).
I am able to compile plain C code without any issues via "mpicc ..." but
when I try to initialize MPI the program is getting stuck within
"MPI_INIT" without creating CPU load. Example from
https://svn.open-mpi.org/source/xref/ompi_1.8/examples/:

#include 
#include "mpi.h"
int main(int argc, char* argv[])
{
 int rank, size, len;
 char version[MPI_MAX_LIBRARY_VERSION_STRING];
 printf("1\n");
 MPI_Init(&argc, &argv);
 printf("2\n");
 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
 printf("3\n");
 MPI_Comm_size(MPI_COMM_WORLD, &size);
 printf("4\n");
 MPI_Get_library_version(version, &len);
 printf("5\n");
 printf("Hello, world, I am %d of %d, (%s, %d)\n", rank, size,
version, len);
 MPI_Finalize();
 printf("6\n");
 return 0;
}

Compiling works perfectly fine with "mpicc -o hello_c.exe hello_c.c".
But when I run it with "mpirun -np 4 ./hello_c" it creates 4 threads
printing "1" but then keeps on running without doing anything. I then
have to kill the threads manually to keep on working with Cygwin.

Can you tell me what I am doing wrong?

Thanks
Marco

PS: Installed packages on Cygwin are libopenmpi, libopenmpi-devel,
openmpi, gcc-core





It seems a local issue. On my W7 64 bit:

$ mpirun -n 4 ./prova_mpi.exe
1
1
1
1
2
3
4
5
Hello, world, I am 0 of 4, (Open MPI v1.8.8, .., Aug 05, 2015, 126)
2
3
4
5
Hello, world, I am 2 of 4, (Open MPI v1.8.8, package: ..., Aug 05, 
2015, 126)

2
3
4
5
Hello, world, I am 1 of 4, (Open MPI v1.8.8, ... , Aug 05, 2015, 126)
2
3
4
5
Hello, world, I am 3 of 4, (Open MPI v1.8.8, ... , Aug 05, 2015, 126)
6
6
6
6

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/03/28651.php







Re: [OMPI users] MPI_INIT gets stuck

2016-03-07 Thread Marco Atzeri

On 07/03/2016 18:58, Marco Lubosch wrote:

Thanks Marco,

I reinstalled Cygwin and OMPI like 10 times. I had an issues with
gcc(mingw) because it was preinstalled under windows. I then had to
remove it and reinstall gcc under cygwin and got it working but as I
said only copiling plain C code with "mpicc". I also disabled Windows
Firewall and tried a different router.

Do you have any suggestions what could cause that problem?

Greetings
Marco



Does it works without networks ?
In the past I saw issues with virtual network drivers.

In addition as mentioned on
https://cygwin.com/problems.html

"Run cygcheck -s -v -r > cygcheck.out and include that file as an 
attachment in your report. Please do not compress or otherwise encode 
the output. Just attach it as a straight text file so that it can be 
easily viewed."


send me a copy of cygcheck.out, I will look for possible cygwin problem.

Regards
Marco





Re: [OMPI users] Building on Windows w/o Cygwin

2016-04-03 Thread Marco Atzeri

On 03/04/2016 17:54, Walt Brainerd wrote:

Has anybody built Open MPI on Windows without
using Cygwin?

When I try with anything else (msys2, mingw), there
appears to be all sorts of stuff missing.

Thanks.

--
Walt Brainerd



It was removed some versions ago as it was not supported by
any developers.

Around 1.6.x if I remember right.

Regards
Marco



Re: [OMPI users] OpenMPI v3.0 on Cygwin

2017-09-27 Thread Marco Atzeri

On 27/09/2017 09:30, Llelan D. wrote:

Can OpenMPI v3.0 be compiled for Cygwin64 on Windows 10?

Using:

./congifure --prefix=/usr/local
  [blah, blah... Apparently successful (At least it doesn't say there's 
an error)]

make -j 12 all

I'm getting a slew of compiler errors about redefinitions between:

    /usr/include/w32api/psdk_inc/_ip_types.h
     or /usr/include/w32api/winsock2.h
and    /usr/include/netdb.h
     or /usr/include/sys/socket.h

Are there magic variables, definitions, or switches for a Cygwin build 
I'm missing?




Hi Llelan,
I assume no, and expect it needs some patches as I am slowing
doing for 2.1.2.
There are portion of the code that rise definitions that collides
with the Windows headers for not Cygwin programs.

After I finish on 2.1.2 I will look on 3.0.

Regards
Marco



___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Using OMPI Standalone in a Windows/Cygwin Environment

2018-02-26 Thread Marco Atzeri

On 26/02/2018 18:14, Michael A. Saverino wrote:


I am running the v-1.10.7 OMPI package that is available via the Cygwin
package manager.  I have a requirement to run my OMPI application
standalone on a Windows/Cygwin system without any network connectivity.
If my OMPI system is not connected to the network, I get the following
errors when I try to run my OMPI application:
  


Michael,
do you mean without a network connected or without any network
services running ?

On my W7 it works with both the Wireless and Cable connection disabled
or disconnected.

$ mpirun -n 2 ./hello_c.exe
Hello, world, I am 1 of 2, (Open MPI v1.10.7, package: Open MPI 
marco@GE-MATZERI-EU Distribution, ident: 1.10.7, repo rev: 
v1.10.6-48-g5e373bf, May 16, 2017, 129)
Hello, world, I am 0 of 2, (Open MPI v1.10.7, package: Open MPI 
marco@GE-MATZERI-EU Distribution, ident: 1.10.7, repo rev: 
v1.10.6-48-g5e373bf, May 16, 2017, 129)



Is it possible that you have some type of virtual network driver
active , like VPN , active ?

Regards
Marco




___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Using OMPI Standalone in a Windows/Cygwin Environment

2018-02-26 Thread Marco Atzeri

On 26/02/2018 22:10, Michael A. Saverino wrote:


Marco,

I think oob still has a problem, at least on my machine, even though we
specify --mca oob ^tcp.   The workaround I found is to install the
Microsoft loopback adapter.   That satisfies OPMI at startup even though
the ethernet or WiFi is either disabled or disconnected.  You still have
to answer Windows firewall questions (if enabled) permitting/not
permitting orterun and my application.  Do you have the Microsoft
Loopback adapter installed on your system?

Many Thanks,

Mike...



Yes it is installed.

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Using OMPI Standalone in a Windows/Cygwin Environment

2018-02-26 Thread Marco Atzeri

On 26/02/2018 22:57, Michael A. Saverino wrote:


Marco,

If you disable the loopback as well as the other adapters via Device
Manager, you should be able to reproduce the error.

Mike...


It worked with also the loopback disabled.
Probably the installation of the loopback just enabled some
network basic functionality

Regards
Marco


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


[OMPI users] openmpi over tcp

2009-01-29 Thread Daniel De Marco
Hi All,

I'm doing some tests on a small cluster with gigabit and infiniband
interconnects with openmpi and I'm running into the same problem as
described in the following thread:
http://www.open-mpi.org/community/lists/users/2007/04/3082.php

Basically even if I run my test with:
mpirun --mca btl tcp,self --prefix /share/apps/openmpi-1.3/gcc_ifort/
--machinefile machines -np 2 ./osu_latency
I seem to be getting infiniband transport:
# OSU MPI Latency Test v3.1.1
# SizeLatency (us)
0 2.41
1 2.66
2 2.85
4 2.85
8 2.88
163.52
323.61
643.62
128   3.95
256   4.19
512   4.96
1024  6.31

I tried running it with --mca btl ^openib but the result is the same.
I even tried, as suggested in the thread above, to remove the *openib*
files from the lib/openmpi directory, but without any change.

I tried with 1.2.8 and with 1.3.0 with the same results.

Is there anything else I can try in order to be able to use gigabit
transport?

Thanks, Daniel.


Re: [OMPI users] openmpi over tcp

2009-01-29 Thread Daniel De Marco
Hi Ralph,

* Ralph Castain  [01/29/2009 14:27]:
> It is quite likely that you have IPoIB on your system. In that case, the 
> TCP BTL will pickup that interface and use it.
>
> If you have a specific interface you want to use, try -mca 
> btl_tcp_if_include eth0 (or whatever that interface is). This tell the TCP 
> BTL to only use the specified interface, so it will either fail (if that 
> interface isn't available or doesn't exist) or use only that one.

no, I don't have IPoIB configured. I tried anyway your suggestion and I
got the same results as before. The weird thing is that even if use
-mca btl_tcp_if_include eth2
where eth2 doesn't exist I get the same results...

Thanks, Daniel.


Re: [OMPI users] openmpi over tcp

2009-01-29 Thread Daniel De Marco
* Brock Palen  [01/29/2009 15:24]:
> What does your machinefile look like?  Just curious.

c0-0
c0-1

Daniel.


Re: [OMPI users] openmpi over tcp

2009-01-29 Thread Daniel De Marco
* Joe Landman  [01/29/2009 15:32]:
>   ifconfig ib0
> what does it respond with?

ib0: error fetching interface information: Device not found

Daniel.


Re: [OMPI users] openmpi over tcp

2009-01-29 Thread Daniel De Marco
Jeff,

I put most of the info at:
http://www.bartol.udel.edu/~ddm/ompi_debug.tgz
The tar file contains the config.log, the ifconfig for the two nodes and
the output of ompi_info --all.

As I said I was running with:
mpirun --mca btl tcp,self --prefix /share/apps/openmpi-1.3/gcc_ifort/   


--machinefile machines -np 2 ./osu_latency 
and I also tried adding -mca btl_tcp_if_include eth0 to the options.

Thanks for your help.
Please let me know if you need some other info.
Daniel.


* Jeff Squyres  [01/29/2009 16:30]:
> Can you send the full output described here (including all network setup 
> stuff):
>
> http://www.open-mpi.org/community/help/
>
>
> On Jan 29, 2009, at 3:18 PM, Daniel De Marco wrote:
>
>> Hi Ralph,
>>
>> * Ralph Castain  [01/29/2009 14:27]:
>>> It is quite likely that you have IPoIB on your system. In that case, the
>>> TCP BTL will pickup that interface and use it.
>>>
>>> If you have a specific interface you want to use, try -mca
>>> btl_tcp_if_include eth0 (or whatever that interface is). This tell the 
>>> TCP
>>> BTL to only use the specified interface, so it will either fail (if that
>>> interface isn't available or doesn't exist) or use only that one.
>>
>> no, I don't have IPoIB configured. I tried anyway your suggestion and I
>> got the same results as before. The weird thing is that even if use
>>  -mca btl_tcp_if_include eth2
>> where eth2 doesn't exist I get the same results...
>>
>> Thanks, Daniel.
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> -- 
> Jeff Squyres
> Cisco Systems
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] openmpi over tcp

2009-02-02 Thread Daniel De Marco
Jeff,

thanks a lot for taking the time to look at my file and sorry for not
having noticed that part of the README, it went straight past me.

Anyway with your suggestion it works perfectly.

Thanks again, Daniel.

* Jeff Squyres  [02/01/2009 06:49]:
> It looks like you compiled Open MPI against the QLogic PSM libraries -- I 
> see the PSM MTL plugin available.  Here's some text from the OMPI v1.3 
> README that clarifies the situation:
>
> - There are two MPI network models available: "ob1" and "cm".  "ob1"
>   uses BTL ("Byte Transfer Layer") components for each supported
>   network.  "cm" uses MTL ("Matching Tranport Layer") components for
>   each supported network.
>
>   - "ob1" supports a variety of networks that can be used in
> combination with each other (per OS constraints; e.g., there are
> reports that the GM and OpenFabrics kernel drivers do not operate
> well together):
> - OpenFabrics: InfiniBand and iWARP
> - Loopback (send-to-self)
> - Myrinet: GM and MX
> - Portals
> - Quadrics Elan
> - Shared memory
> - TCP
> - SCTP
> - uDAPL
>
>   - "cm" supports a smaller number of networks (and they cannot be
> used together), but may provide better better overall MPI
> performance:
> - Myrinet MX (not GM)
> - InfiniPath PSM
> - Portals
>
>   Open MPI will, by default, choose to use "cm" when the InfiniPath
>   PSM MTL can be used.  Otherwise, OB1 will be used and the
>   corresponding BTLs will be selected.  Users can force the use of ob1
>   or cm if desired by setting the "pml" MCA parameter at run-time:
>
> shell$ mpirun --mca pml ob1 ...
> or
> shell$ mpirun --mca pml cm ...
>
> So to force TCP to be used, you need to force the use of the ob1 PML and 
> then force the use of the TCP BTL.  Perhaps something like this:
>
> mpirun --mca pml ob1 --mca btl tcp,sm,self ...
>
>
>
> On Jan 29, 2009, at 7:20 PM, Daniel De Marco wrote:
>
>> Jeff,
>>
>> I put most of the info at:
>> http://www.bartol.udel.edu/~ddm/ompi_debug.tgz
>> The tar file contains the config.log, the ifconfig for the two nodes and
>> the output of ompi_info --all.
>>
>> As I said I was running with:
>> mpirun --mca btl tcp,self --prefix /share/apps/openmpi-1.3/gcc_ifort/
>> --machinefile machines -np 2 ./osu_latency
>> and I also tried adding -mca btl_tcp_if_include eth0 to the options.
>>
>> Thanks for your help.
>> Please let me know if you need some other info.
>> Daniel.
>>
>>
>> * Jeff Squyres  [01/29/2009 16:30]:
>>> Can you send the full output described here (including all network setup
>>> stuff):
>>>
>>>http://www.open-mpi.org/community/help/
>>>
>>>
>>> On Jan 29, 2009, at 3:18 PM, Daniel De Marco wrote:
>>>
>>>> Hi Ralph,
>>>>
>>>> * Ralph Castain  [01/29/2009 14:27]:
>>>>> It is quite likely that you have IPoIB on your system. In that case, 
>>>>> the
>>>>> TCP BTL will pickup that interface and use it.
>>>>>
>>>>> If you have a specific interface you want to use, try -mca
>>>>> btl_tcp_if_include eth0 (or whatever that interface is). This tell the
>>>>> TCP
>>>>> BTL to only use the specified interface, so it will either fail (if 
>>>>> that
>>>>> interface isn't available or doesn't exist) or use only that one.
>>>>
>>>> no, I don't have IPoIB configured. I tried anyway your suggestion and I
>>>> got the same results as before. The weird thing is that even if use
>>>>-mca btl_tcp_if_include eth2
>>>> where eth2 doesn't exist I get the same results...
>>>>
>>>> Thanks, Daniel.
>>>> ___
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> -- 
>>> Jeff Squyres
>>> Cisco Systems
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> -- 
> Jeff Squyres
> Cisco Systems
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-02-17 Thread Daniel De Marco
Hi,

* Reuti  [02/02/2009 03:43]:
> But despite the fact that SGE's qrsh is used automatically, more
> severe is the fact, that on the slave nodes the orted daemons will be
> pushed into daemonland and no longer be bound to the sge_shepherd:
>
>  3173 1 /usr/sge/bin/lx24-x86/sge_execd
>  3431 1 orted --daemonize -mca ess env -mca orte_ess_jobid 81199104 
> -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri 811
>  3432  3431  \_ /home/reuti/mpihello
>  3433  3431  \_ /home/reuti/mpihello

does anyone know a workaround for this problem? has it been fixed?

Thanks, Daniel.



Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-02-17 Thread Daniel De Marco
* Rolf Vandevaart  [02/17/2009 11:32]:
> There is a ticket for this.
>
> https://svn.open-mpi.org/trac/ompi/ticket/1783
>
> I am working on it.  I do not have a workaround.  I had a fix but ran into 
> some issues with getting the -notify flag to work right with a 
> non-daemonized orted.
>
> Fix will come soon to trunk, and to 1.3.2 soon thereafter.

Thanks!



Re: [OMPI users] Building on Cygwin

2020-05-04 Thread Marco Atzeri via users

Am 04.05.2020 um 22:00 schrieb Rudy Verderber via users:

I am trying to compile HDF5 on Cygwin.


HDF5 is already available for Cygwin.
It is just NOT compiled with OpenMPI.

Why do you need both ?



I downloaded Cygwin produced openwin(40) binaries (3.1.5).

I got a message the compiler(v ?) used to compile openwin was different 
than on my Cygwin box(gcc v 9.3.0).


I then tried to recompile openwin 4.0.3 and 3.3.6 on Cygwin.

Both times, I get the messages as shown below.  It might be simple to 
resolve this first issue by editing the code.  Is there something simple 
that I am missing for a configuration option?  Is there a recommended 
Cygwin build process? Should I try to use the original binaries by 
getting the same GNU comiper?


The recommended way is to download the Cygwin source package
for OpenPMI and use cygport to patch+compile+install+package
a new package.

Install cygport and see
/usr/share/doc/cygport/html/manual/index.html

for the full detailed manual.







Thanks

./configure --enable-static --prefix=/home/rudy/openmpi

….

Make all install

…

/usr/include/w32api/winsock2.h:1037:47: error: conflicting types for 
‘getprotobynumber’


1037 |   WINSOCK_API_LINKAGE struct protoent *WSAAPI 
getprotobynumber(int number);


   |   ^~~~

In file included from 
/home/rudy/TEMP/openmpi-3.1.6/opal/mca/event/libevent2022/libevent/include/event2/util.h:67,


  from 
/home/rudy/TEMP/openmpi-3.1.6/opal/mca/event/libevent2022/libevent/evutil.h:37,


  from 
../../opal/mca/event/libevent2022/libevent/event.h:57,


  from ../../opal/mca/event/libevent2022/libevent2022.h:58,

  from ../../opal/mca/event/event.h:77,

  from ../../opal/mca/pmix/pmix.h:24,

  from proc.c:22:

/usr/include/netdb.h:237:18: note: previous declaration of 
‘getprotobynumber’ was here


   237 | struct protoent *getprotobynumber (int);

   |  ^~~~

In file included from ../../opal/mca/event/libevent2022/libevent/event.h:63,

  from ../../opal/mca/event/libevent2022/libevent2022.h:58,

  from ../../opal/mca/event/event.h:77,

  from ../../opal/mca/pmix/pmix.h:24,

  from proc.c:22:

Sent from Mail  for 
Windows 10




Re: [OMPI users] MPI test suite

2020-07-23 Thread Marco Atzeri via users

On 23.07.2020 20:28, Zhang, Junchao via users wrote:

Hello,
   Does OMPI have a test suite that can let me validate MPI 
implementations from other vendors?


   Thanks
--Junchao Zhang


Have you considered the OSU Micro-Benchmarks ?

http://mvapich.cse.ohio-state.edu/benchmarks/


Re: [OMPI users] OMPI 4.1 in Cygwin packages?

2021-02-03 Thread Marco Atzeri via users

On 03.02.2021 21:35, Martín Morales via users wrote:

Hello,

I would like to know if any OMPI 4.1.* is going to be available in the 
Cygwin packages.


Thanks and regards,

Martín



Hi Martin,
anything in it that is abolutely needed short term ?

Any problem with current 4.0.5 package ?


Usually it is very time consuming the build
and I am busy with other cygwin stuff

Regards
Marco


Re: [OMPI users] OMPI 4.1 in Cygwin packages?

2021-02-05 Thread Marco Atzeri via users

On 05.02.2021 16:18, Martín Morales via users wrote:

Hi Gilles,

I tried but it hangs indefinitely and without any output.

Regards,

Martín



Hi Martin,

can you run get-interface available on

http://matzeri.altervista.org/works/interface/

so we can see how Cygwin identify all your network interface ?

Regards
Marco



Re: [OMPI users] OMPI 4.1 in Cygwin packages?

2021-02-06 Thread Marco Atzeri via users

Martin,

what is the IP address of the machine you can not connect ?

All those VMware interfaces look suspicious, anyway.


In the mean time I uploaded 4.1.0-1 for X86_64,
you can try to see if solve the issue.

the i686 version in still in build phase


On 05.02.2021 20:46, Martín Morales wrote:

Hi Marcos,

Pasted below the output.

Thank you. Regards,

Martín




/internal_name:  {A6301D34-A586-4439-B7A7-69FA905CA167}/

/flags: AF_INET6 up running multicast/

/address:   fe80::e5c6:c83:8653:3cd8%14/

/friendly_name: VMware Network Adapter VMnet1/

//

/internal_name:  {A6301D34-A586-4439-B7A7-69FA905CA167}/

/flags: AF_INET  up broadcast running multicast/

/address:   192.168.148.1/

/friendly_name: VMware Network Adapter VMnet1/