[OMPI users] configure/library question

2013-07-19 Thread Hodgess, Erin
Hello all!

I just downloaded the MPICH 3.0.4 tar.gz

Then I used
tar xfvz tar-3.0.4.tar.gz
./configure
make
make install

Now I'm trying to compile someone else's program and it can't find libmpi or 
libmpich.a

I did find libmpich.a, but no libmpi.

Does this sound familiar, please?

Thanks for any help!

Sincerely,
Erin



Re: [OMPI users] configure/library question

2013-07-19 Thread Ralph Castain
Probably a lot more familiar to the folks on the MPICH mailing list - this is 
the mailing list for Open MPI :-)

On Jul 19, 2013, at 12:03 PM, "Hodgess, Erin"  wrote:

> Hello all!
> 
> I just downloaded the MPICH 3.0.4 tar.gz
> 
> Then I used 
> tar xfvz tar-3.0.4.tar.gz
> ./configure
> make
> make install
> 
> Now I'm trying to compile someone else's program and it can't find libmpi or 
> libmpich.a
> 
> I did find libmpich.a, but no libmpi.
> 
> Does this sound familiar, please?
> 
> Thanks for any help!
> 
> Sincerely,
> Erin
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] configure/library question

2013-07-19 Thread Hodgess, Erin
I figured out how to uninstall and am going to install open mpi
Thanks,
Erin


From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of 
Ralph Castain [r...@open-mpi.org]
Sent: Friday, July 19, 2013 2:06 PM
To: Open MPI Users
Subject: Re: [OMPI users] configure/library question

Probably a lot more familiar to the folks on the MPICH mailing list - this is 
the mailing list for Open MPI :-)

On Jul 19, 2013, at 12:03 PM, "Hodgess, Erin" 
mailto:hodge...@uhd.edu>> wrote:

Hello all!

I just downloaded the MPICH 3.0.4 tar.gz

Then I used
tar xfvz tar-3.0.4.tar.gz
./configure
make
make install

Now I'm trying to compile someone else's program and it can't find libmpi or 
libmpich.a

I did find libmpich.a, but no libmpi.

Does this sound familiar, please?

Thanks for any help!

Sincerely,
Erin

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] opening a file with MPI-IO

2013-07-19 Thread Rob Latham
On Fri, May 17, 2013 at 11:00:36AM +0200, Peter van Hoof wrote:
> Dear users,
> 
> I have been banging my head against the wall for some time to find a
> reliable and portable way to determine if a call to
> MPI::File::Open() was successful or not.

Sorry for the long delay in responding

In C, we do it like this:

static void handle_error(int errcode, char *str)
{
char msg[MPI_MAX_ERROR_STRING];
int resultlen;
MPI_Error_string(errcode, msg, &resultlen);
fprintf(stderr, "%s: %s\n", str, msg);
MPI_Abort(MPI_COMM_WORLD, 1);
}


errcode = MPI_File_open(MPI_COMM_SELF, filename,
MPI_MODE_CREATE | MPI_MODE_RDWR, MPI_INFO_NULL,
&fh);
if (errcode != MPI_SUCCESS) handle_error(errcode, "MPI_FILE_OPEN");

With the C++ bindings... ugh what a mess. I had to crack open the
yellow book to find out the answer.   But on page 18 it's pretty
clear:

Quoting: C++ functions do not return error codes [...]  

More Quoting: Advice to Users: C++ programmers that want to handle MPI
errors on their own should use th MPI::ERRORS_THROW_EXCEPTIONS error
handler, rather than MPI::ERROR_RETURN, which is used for that purpose
in C. 


It's important to note that MPI-IO routines *do* use ERROR_RETURN as
the error handler, so you will have to take the additional step of
setting that. 

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


Re: [OMPI users] MPIIO max record size

2013-07-19 Thread Rob Latham
On Wed, May 22, 2013 at 12:23:36PM -0400, Eric Chamberland wrote:
> On 05/22/2013 11:33 AM, Tom Rosmond wrote:
> >Thanks for the confirmation of the MPIIO problem.  Interestingly, we
> >have the same problem when using MPIIO in INTEL MPI.  So something
> >fundamental seems to be wrong.
> >
> 
> I think but I am not sure that it is because the MPI I/O (ROMIO)
> code is the same for all distributions...
> 
> It has been written by Rob Latham.

Hello!  Rajeev wrote it when he was in grad school, then he passed the
torch to Rob Ross when he was a post-doc at Argonne, and now I've been
the caretaker for the last mumble-mumble years.  (now if i could only
find some other sucker)

Tom, Eric:  I have recently fixed this bug for some cases.   I don't
know when OpenMPI will re-sync with ROMIO (it's getting harder and
harder to keep ROMIO as "the standalone MPI-IO implementation") but it
should be straightforward to pick up that change 

(it's this one:
http://git.mpich.org/mpich.git/blobdiff/2de997d9b3e94bad01d5f46d76f163d71e2bd7bd..7d44307f269cae96118beb19760221aff99bd74a:/src/mpi/romio/adio/common/ad_read.c)


The functional descriptions for ROMIO are indeed "integer count of
some datatype", but one can still use that to say "write a billion
doubles".

ROMIO handles this internally with as many calls to the write(2)
system call as it takes to complete.

If you try to get fancy and make a struct of three thousand
megabyte-sized MPI_CONTIG types, MPICH will blow up.  I haven't tested
against OpenMPI. 

But basic types should be ok, now.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


[OMPI users] check point restart

2013-07-19 Thread Erik Nelson
I run mpi on an NSF computer. One of the conditions of use is that jobs are
limited to 24 hr
duration to provide democratic allotment to its users.

A long program can require many restarts, so it becomes necessary to store
the state of the
program in memory, print it, recompile, and and read the state to start
again.

I seem to remember a simpler approach (check point restart?) in which the
state of the .exe
code is saved and then simply restarted from its current position.

Is there something like this for restarting an mpi program?

Thanks, Erik


-- 
Erik Nelson

Howard Hughes Medical Institute
6001 Forest Park Blvd., Room ND10.124
Dallas, Texas 75235-9050

p : 214 645 5981
f : 214 645 5948


Re: [OMPI users] check point restart

2013-07-19 Thread Lloyd Brown
I know that in the past it has been supported via toolkits like BLCR,
but I don't know the current level of support, to be honest.  I think I
heard somewhere that the checkpoint/restart support in OpenMPI was going
away in some fashion.

In any case, if you have the ability to set up application-aware,
application-specific checkpointing, it will be a much better solution
than something that's application-agnostic.  The checkpoint files will
be smaller (the application knows what in memory is important, and what
isn't), coordination will be better between processes, you have some
level of assurance that you won't have PID conflicts or problems when
the PID ends up different, etc.

I suspect someone on the list can answer your question about the
built-in checkpoint/restart code better than I can.  But in general, if
you have a choice between checkpointing external and internal to your
application, choose the application-internal checkpointing.



Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

On 07/19/2013 01:34 PM, Erik Nelson wrote:
> I run mpi on an NSF computer. One of the conditions of use is that jobs
> are limited to 24 hr
> duration to provide democratic allotment to its users.
> 
> A long program can require many restarts, so it becomes necessary to
> store the state of the 
> program in memory, print it, recompile, and and read the state to start
> again.
> 
> I seem to remember a simpler approach (check point restart?) in which
> the state of the .exe
> code is saved and then simply restarted from its current position.
> 
> Is there something like this for restarting an mpi program?
> 
> Thanks, Erik
> 
> 
> -- 
> Erik Nelson
> 
> Howard Hughes Medical Institute
> 6001 Forest Park Blvd., Room ND10.124
> Dallas, Texas 75235-9050
> 
> p : 214 645 5981
> f : 214 645 5948
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


Re: [OMPI users] check point restart

2013-07-19 Thread Ralph Castain

On Jul 19, 2013, at 12:41 PM, Lloyd Brown  wrote:

> I know that in the past it has been supported via toolkits like BLCR,
> but I don't know the current level of support, to be honest.  I think I
> heard somewhere that the checkpoint/restart support in OpenMPI was going
> away in some fashion.

It is still somewhat there thru the 1.6 series, but may have suffered some 
bitrot in the latest 1.6 release(s). The developer who maintained that 
functionality has taken on another position, so support isn't as strong as it 
was. Currently, it isn't available in the 1.7 series.

> 
> In any case, if you have the ability to set up application-aware,
> application-specific checkpointing, it will be a much better solution
> than something that's application-agnostic.  The checkpoint files will
> be smaller (the application knows what in memory is important, and what
> isn't), coordination will be better between processes, you have some
> level of assurance that you won't have PID conflicts or problems when
> the PID ends up different, etc.
> 
> I suspect someone on the list can answer your question about the
> built-in checkpoint/restart code better than I can.  But in general, if
> you have a choice between checkpointing external and internal to your
> application, choose the application-internal checkpointing.

Definitely agree - internal is much better. I don't understand the comment 
about printing and recompiling. Usually, people just have the app write its 
intermediate results to a file, and provide a cmd line option pointing to that 
file upon restart so the app knows to read and start from that point. The app 
requires a routine to read the file and set itself up to continue, but that's a 
one-time implementation thing.

> 
> 
> 
> Lloyd Brown
> Systems Administrator
> Fulton Supercomputing Lab
> Brigham Young University
> http://marylou.byu.edu
> 
> On 07/19/2013 01:34 PM, Erik Nelson wrote:
>> I run mpi on an NSF computer. One of the conditions of use is that jobs
>> are limited to 24 hr
>> duration to provide democratic allotment to its users.
>> 
>> A long program can require many restarts, so it becomes necessary to
>> store the state of the 
>> program in memory, print it, recompile, and and read the state to start
>> again.
>> 
>> I seem to remember a simpler approach (check point restart?) in which
>> the state of the .exe
>> code is saved and then simply restarted from its current position.
>> 
>> Is there something like this for restarting an mpi program?
>> 
>> Thanks, Erik
>> 
>> 
>> -- 
>> Erik Nelson
>> 
>> Howard Hughes Medical Institute
>> 6001 Forest Park Blvd., Room ND10.124
>> Dallas, Texas 75235-9050
>> 
>> p : 214 645 5981
>> f : 214 645 5948
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] After OS Update MPI_Init fails on one host

2013-07-19 Thread Kevin H. Hobbs
I just upgraded the OS on one of my workstations from Fedora 17 to 18
and now I can't run even the simplest MPI programs.

I filed a bug report with Fedora's bug tracker :

https://bugzilla.redhat.com/show_bug.cgi?id=986409

My simple program is attached as mpi_simple.c

mpicc works :

  mpicc -g -o mpi_simple mpi_simple.c

I can even take the generated program to another computer and it runs fine.

I can run mon MPI programs with mpirun :

  mpirun -n 4 hostname
  murron.hobbs-hancock
  murron.hobbs-hancock
  murron.hobbs-hancock
  murron.hobbs-hancock

When I run a program that calls MPI_Init I get an error which includes :

--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_util_nidmap_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--

The output of :

 mpirun -n 1 mpi_simple

is attached as error.txt

I suspect it matters that this is a lenovo S10 with what /proc/cpuinfo
calls a "Intel(R) Core(TM)2 Quad CPUQ6600"

I did a bit of poking around in gdb but I don't know what I'm looking for.

Does anybody have an idea what's going on?
#include 
#include 
#include 

int main( int argc, char * argv[] )
{

  int rank, size;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  printf("my rank is %i of %i\n", rank, size );

  MPI_Finalize();

  return EXIT_SUCCESS;
}
[murron.hobbs-hancock:22465] [[38938,1],0] ORTE_ERROR_LOG: Error in file 
util/nidmap.c at line 148
[murron.hobbs-hancock:22465] [[38938,1],0] ORTE_ERROR_LOG: Error in file 
ess_env_module.c at line 174
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_util_nidmap_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--
[murron.hobbs-hancock:22465] [[38938,1],0] ORTE_ERROR_LOG: Error in file 
runtime/orte_init.c at line 128
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Error" (-1) instead of "Success" (0)
--
[murron.hobbs-hancock:22465] *** An error occurred in MPI_Init
[murron.hobbs-hancock:22465] *** on a NULL communicator
[murron.hobbs-hancock:22465] *** Unknown error
[murron.hobbs-hancock:22465] *** MPI_ERRORS_ARE_FATAL: your MPI job will now 
abort
--
An MPI process is aborting at a time when it cannot guarantee that all
of its peer processes in the job will be killed properly.  You should
double check that everything has shut down cleanly.

  Reason: Before MPI_INIT completed
  Local host: murron.hobbs-hancock
  PID:22465
--
--
mpirun has exited due to process rank 0 with PID 22465 on
node murron.hobbs-hancock exiting improperly. There are two reasons this could 
occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it w

Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-19 Thread Ralph Castain
Are you sure you're using the same version of OMPI on this new OS? They 
typically distribute one in your default path, so I'd check to ensure that you 
really are using the version you think.


On Jul 19, 2013, at 12:49 PM, "Kevin H. Hobbs"  wrote:

> I just upgraded the OS on one of my workstations from Fedora 17 to 18
> and now I can't run even the simplest MPI programs.
> 
> I filed a bug report with Fedora's bug tracker :
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=986409
> 
> My simple program is attached as mpi_simple.c
> 
> mpicc works :
> 
>  mpicc -g -o mpi_simple mpi_simple.c
> 
> I can even take the generated program to another computer and it runs fine.
> 
> I can run mon MPI programs with mpirun :
> 
>  mpirun -n 4 hostname
>  murron.hobbs-hancock
>  murron.hobbs-hancock
>  murron.hobbs-hancock
>  murron.hobbs-hancock
> 
> When I run a program that calls MPI_Init I get an error which includes :
> 
> --
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>  orte_util_nidmap_init failed
>  --> Returned value Error (-1) instead of ORTE_SUCCESS
> --
> 
> The output of :
> 
> mpirun -n 1 mpi_simple
> 
> is attached as error.txt
> 
> I suspect it matters that this is a lenovo S10 with what /proc/cpuinfo
> calls a "Intel(R) Core(TM)2 Quad CPUQ6600"
> 
> I did a bit of poking around in gdb but I don't know what I'm looking for.
> 
> Does anybody have an idea what's going on?
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] configure/library question

2013-07-19 Thread Jeff Hammond
Whoever designed the application you're trying to use to work only
with LIBS="-lmpi" indicates poor software engineering and a
low-quality application.

You can install or uninstall whatever you like but it is incorrect to
think that MPICH is broken because it does not provide libmpi.{a,so}.

In the absence of a sufficient understanding of how to link against
MPI, your best bet is to use CC=mpicc (and friends for LD, CXX,
FC,...).

Jeff

On Fri, Jul 19, 2013 at 2:12 PM, Hodgess, Erin  wrote:
> I figured out how to uninstall and am going to install open mpi
> Thanks,
> Erin
>
> 
> From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of
> Ralph Castain [r...@open-mpi.org]
> Sent: Friday, July 19, 2013 2:06 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] configure/library question
>
> Probably a lot more familiar to the folks on the MPICH mailing list - this
> is the mailing list for Open MPI :-)
>
> On Jul 19, 2013, at 12:03 PM, "Hodgess, Erin"  wrote:
>
> Hello all!
>
> I just downloaded the MPICH 3.0.4 tar.gz
>
> Then I used
> tar xfvz tar-3.0.4.tar.gz
> ./configure
> make
> make install
>
> Now I'm trying to compile someone else's program and it can't find libmpi or
> libmpich.a
>
> I did find libmpich.a, but no libmpi.
>
> Does this sound familiar, please?
>
> Thanks for any help!
>
> Sincerely,
> Erin
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
Jeff Hammond
jeff.scie...@gmail.com


Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-19 Thread Kevin H. Hobbs
On 07/19/2013 05:11 PM, Ralph Castain wrote:
> Are you sure you're using the same version of OMPI on this new OS?

No, I'm sure I'm using a different version of Open MPI in Fedora
18 from the one I was using in Fedora 17.

I have only the Open MPI provided by the Fedora distribution.

> They typically distribute one in your default path, 

Fedora allows both Open MPI and MPICH to be installed at the same
time by using the module system.

Neither is in the default path, I have to put them in my path with :

  module load mpi/openmpi-x86_64

which is in my ~/.bashrc .

> so I'd check to ensure that you really are using the version you think.

'locate mpicc' and 'locate mpirun' only find one hit each so I'm
reasonably sure I'm running what I think I'm running.

That being said packaging bugs have happened before is there any
library, config file, or executable that you would suggest I look
for that might have come from a prior version of Open MPI or its
dependencies?




signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] check point restart

2013-07-19 Thread Erik Nelson
Thanks Lloyd, Ralph . . regarding Ralph's comment,

>I don't understand the comment about printing and recompiling. Usually,
people just have the app
>write its intermediate results to a file, and provide a cmd line option ..

right, I shouldn't have written compile. It probably wouldn't increase the
communications overhead
that much to do this, I was just wondering if there might be something
simpler.

Erik


Re: [OMPI users] After OS Update MPI_Init fails on one host

2013-07-19 Thread Jeff Squyres (jsquyres)
Not offhand.  The error you're seeing *typically* indicates that you've got a 
mismatch of OMPI version somewhere.  Are you running on multiple machines with 
different Open MPI versions, perchance?

If you're running only on a single machine, try completely uninstalling the 
Open MPI package, re-installing it, recompiling your trivial app, and see what 
happens.

Also, you might want to check the output of "mpicc yourapp.c --showme" and see 
if it's pointing to the right libraries, etc.


On Jul 19, 2013, at 7:06 PM, "Kevin H. Hobbs"  wrote:

> On 07/19/2013 05:11 PM, Ralph Castain wrote:
>> Are you sure you're using the same version of OMPI on this new OS?
> 
> No, I'm sure I'm using a different version of Open MPI in Fedora
> 18 from the one I was using in Fedora 17.
> 
> I have only the Open MPI provided by the Fedora distribution.
> 
>> They typically distribute one in your default path, 
> 
> Fedora allows both Open MPI and MPICH to be installed at the same
> time by using the module system.
> 
> Neither is in the default path, I have to put them in my path with :
> 
>  module load mpi/openmpi-x86_64
> 
> which is in my ~/.bashrc .
> 
>> so I'd check to ensure that you really are using the version you think.
> 
> 'locate mpicc' and 'locate mpirun' only find one hit each so I'm
> reasonably sure I'm running what I think I'm running.
> 
> That being said packaging bugs have happened before is there any
> library, config file, or executable that you would suggest I look
> for that might have come from a prior version of Open MPI or its
> dependencies?
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/