[OMPI users] Performing partial calculation on a single node in an MPI job

2016-10-16 Thread Vahid Askarpour
Hello,

I am attempting to modify a relatively large code (Quantum Espresso/EPW) and 
here I will try to summarize the problem in general terms.

I am using an OPENMPI-compiled fortran 90 code in which, midway through the 
code, say 10 points x(3,10) are broadcast  across say 4 nodes. The index 3 
refers to x,y,z. For each point, a number of calculations are done and an 
array, B(3,20,n) is generated. The integer n depends on the symmetry of the 
system and so varies from node to node.

When I run this code serially, I can print all the correct B values to file, so 
I know the algorithm works.  When I run it in parallel, I get numbers that are 
meaningless. Collecting the points would not help because I need to collect the 
B values. I have tried to run that section of the code on one node by setting 
the processor index “mpime" equal to “ionode" or “root” using the following IF 
statement:

IF (mpime .eq. root ) THEN
do the calculation and print B
ENDIF

Neither ionode nor root returns the correct B array.

What would be the best way to extract the B array?

Thank you,

Vahid


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Performing partial calculation on a single node in an MPI job

2016-10-18 Thread Vahid Askarpour
Hi George and Jeff,

Thank you for taking the time to respond to my query. You did inspire me in the 
right direction. Some of the variables involved in the calculation of B were 
not broadcast. In addition,
a  double do-loop combined with an IF statement was overwriting on the correct 
B values. Interestingly, none of the variables are declared contiguous. And I 
did not have to convert
B into a 1-D array. So at the end, it all worked out and I get the correct B 
matrix out of the code.

Thank you again,

Vahid


On Oct 17, 2016, at 10:23 PM, George Bosilca 
mailto:bosi...@icl.utk.edu>> wrote:

I should have been more precise: you cannot use Fortran's vector subscript with 
Open MPI.

George.

On Mon, Oct 17, 2016 at 2:19 PM, Jeff Hammond 
mailto:jeff.scie...@gmail.com>> wrote:
George:

http://mpi-forum.org/docs/mpi-3.1/mpi31-report/node422.htm

Jeff

On Sun, Oct 16, 2016 at 5:44 PM, George Bosilca 
mailto:bosi...@icl.utk.edu>> wrote:
Vahid,

You cannot use Fortan's vector subscript with MPI.

--
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Installation of openmpi-1.10.7 fails

2018-01-05 Thread Vahid Askarpour
I am attempting to install openmpi-1.10.7 on CentOS Linux (7.4.1708) using 
GCC-6.4.0. 

When compiling, I get the following error:

make[2]: Leaving directory '/home/vaskarpo/bin/openmpi-1.10.7/ompi/mca/pml/ob1'
Making all in mca/pml/ucx
make[2]: Entering directory '/home/vaskarpo/bin/openmpi-1.10.7/ompi/mca/pml/ucx'
  CC   pml_ucx.lo
  CC   pml_ucx_request.lo
  CC   pml_ucx_datatype.lo
  CC   pml_ucx_component.lo
  CCLD mca_pml_ucx.la
libtool:   error: require no space between '-L' and '-lrt'
make[2]: *** [Makefile:1725: mca_pml_ucx.la] Error 1
make[2]: Leaving directory '/home/vaskarpo/bin/openmpi-1.10.7/ompi/mca/pml/ucx'
make[1]: *** [Makefile:3261: all-recursive] Error 1
make[1]: Leaving directory '/home/vaskarpo/bin/openmpi-1.10.7/ompi'
make: *** [Makefile:1777: all-recursive] Error 1

Thank you,

Vahid
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Installation of openmpi-1.10.7 fails

2018-01-05 Thread Vahid Askarpour
Thank you Jeff for your suggestion to use the v.2.1 series.

I am attempting to use openmpi with EPW. On the EPW website 
(http://epw.org.uk/Main/DownloadAndInstall), it is stated that:


Compatibility of EPW

EPW is tested and should work on the following compilers and libraries:

  *   gcc640 serial
  *   gcc640 + openmpi-1.10.7
  *   intel 12 + openmpi-1.10.7
  *   intel 17 + impi
  *   PGI 17 + mvapich2.3

EPW is know to have the following incompatibilities with:

  *   openmpi 2.0.2 (but likely on all the 2.x.x version): Works but memory 
leak. If you open and close a file a lot of times with openmpi 2.0.2, the 
memory increase linearly with the number of times the file is open.

So I am hoping to avoid the 2.x.x series and use the 1.10.7 version suggested 
by the EPW developers. However, it appears that this is not possible.

Vahid

On Jan 5, 2018, at 5:06 PM, Jeff Squyres (jsquyres) 
mailto:jsquy...@cisco.com>> wrote:

I forget what the underlying issue was, but this issue just came up and was 
recently fixed:

   https://github.com/open-mpi/ompi/issues/4345

However, the v1.10 series is fairly ancient -- the fix was not applied to that 
series.  The fix was applied to the v2.1.x series, and a snapshot tarball 
containing the fix is available here (generally just take the latest tarball):

   https://www.open-mpi.org/nightly/v2.x/

The fix is still pending for the v3.0.x and v3.1.x series (i.e., there are 
pending pull requests that haven't been merged yet, so the nightly snapshots 
for the v3.0.x and v3.1.x branches do not yet contain this fix).



On Jan 5, 2018, at 1:34 PM, Vahid Askarpour 
mailto:vh261...@dal.ca>> wrote:

I am attempting to install openmpi-1.10.7 on CentOS Linux (7.4.1708) using 
GCC-6.4.0.

When compiling, I get the following error:

make[2]: Leaving directory '/home/vaskarpo/bin/openmpi-1.10.7/ompi/mca/pml/ob1'
Making all in mca/pml/ucx
make[2]: Entering directory '/home/vaskarpo/bin/openmpi-1.10.7/ompi/mca/pml/ucx'
CC   pml_ucx.lo
CC   pml_ucx_request.lo
CC   pml_ucx_datatype.lo
CC   pml_ucx_component.lo
CCLD mca_pml_ucx.la<http://mca_pml_ucx.la>
libtool:   error: require no space between '-L' and '-lrt'
make[2]: *** [Makefile:1725: mca_pml_ucx.la<http://mca_pml_ucx.la>] Error 1
make[2]: Leaving directory '/home/vaskarpo/bin/openmpi-1.10.7/ompi/mca/pml/ucx'
make[1]: *** [Makefile:3261: all-recursive] Error 1
make[1]: Leaving directory '/home/vaskarpo/bin/openmpi-1.10.7/ompi'
make: *** [Makefile:1777: all-recursive] Error 1

Thank you,

Vahid
___
users mailing list
users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users


--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>



___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Installation of openmpi-1.10.7 fails

2018-01-05 Thread Vahid Askarpour
Gilles,

I will try the 3.0.1rc1 version to see how it goes.

Thanks,

Vahid

On Jan 5, 2018, at 8:40 PM, Gilles Gouaillardet 
mailto:gilles.gouaillar...@gmail.com>> wrote:

 Vahid,

This looks like the description of the issue reported at 
https://github.com/open-mpi/ompi/issues/4336
The fix is currently available in 3.0.1rc1, and I will back port the fix fo the 
v2.x branch.
A workaround is to use ROM-IO instead of ompio, you can achieve this with
mpirun —mca io ^ompio ...
(FWIW 1.10 series use ROM-IO by default, so there is no leak out of the box)

IIRC, a possible (and ugly) workaround for the compilation issue is to
configure —with-ucx=/usr ...
That being said, you should really upgrade to a supported version of Open MPI 
as previously suggested

Cheers,

Gilles

On Saturday, January 6, 2018, Jeff Squyres (jsquyres) 
mailto:jsquy...@cisco.com>> wrote:
You can still give Open MPI 2.1.1 a try.  It should be source compatible with 
EPW.  Hopefully the behavior is close enough that it should work.

If not, please encourage the EPW developers to upgrade.  v3.0.x is the current 
stable series; v1.10.x is ancient.



> On Jan 5, 2018, at 5:22 PM, Vahid Askarpour 
> mailto:vh261...@dal.ca>> wrote:
>
> Thank you Jeff for your suggestion to use the v.2.1 series.
>
> I am attempting to use openmpi with EPW. On the EPW website 
> (http://epw.org.uk/Main/DownloadAndInstall), it is stated that:
>
>> Compatibility of EPW
>>
>> EPW is tested and should work on the following compilers and libraries:
>>
>>  • gcc640 serial
>>  • gcc640 + openmpi-1.10.7
>>  • intel 12 + openmpi-1.10.7
>>  • intel 17 + impi
>>  • PGI 17 + mvapich2.3
>> EPW is know to have the following incompatibilities with:
>>
>>  • openmpi 2.0.2 (but likely on all the 2.x.x version): Works but memory 
>> leak. If you open and close a file a lot of times with openmpi 2.0.2, the 
>> memory increase linearly with the number of times the file is open.
>
> So I am hoping to avoid the 2.x.x series and use the 1.10.7 version suggested 
> by the EPW developers. However, it appears that this is not possible.
>
> Vahid
>
>> On Jan 5, 2018, at 5:06 PM, Jeff Squyres (jsquyres) 
>> mailto:jsquy...@cisco.com>> wrote:
>>
>> I forget what the underlying issue was, but this issue just came up and was 
>> recently fixed:
>>
>>https://github.com/open-mpi/ompi/issues/4345
>>
>> However, the v1.10 series is fairly ancient -- the fix was not applied to 
>> that series.  The fix was applied to the v2.1.x series, and a snapshot 
>> tarball containing the fix is available here (generally just take the latest 
>> tarball):
>>
>>https://www.open-mpi.org/nightly/v2.x/
>>
>> The fix is still pending for the v3.0.x and v3.1.x series (i.e., there are 
>> pending pull requests that haven't been merged yet, so the nightly snapshots 
>> for the v3.0.x and v3.1.x branches do not yet contain this fix).
>>
>>
>>
>>> On Jan 5, 2018, at 1:34 PM, Vahid Askarpour 
>>> mailto:vh261...@dal.ca>> wrote:
>>>
>>> I am attempting to install openmpi-1.10.7 on CentOS Linux (7.4.1708) using 
>>> GCC-6.4.0.
>>>
>>> When compiling, I get the following error:
>>>
>>> make[2]: Leaving directory 
>>> '/home/vaskarpo/bin/openmpi-1.10.7/ompi/mca/pml/ob1'
>>> Making all in mca/pml/ucx
>>> make[2]: Entering directory 
>>> '/home/vaskarpo/bin/openmpi-1.10.7/ompi/mca/pml/ucx'
>>> CC   pml_ucx.lo
>>> CC   pml_ucx_request.lo
>>> CC   pml_ucx_datatype.lo
>>> CC   pml_ucx_component.lo
>>> CCLD mca_pml_ucx.la<http://mca_pml_ucx.la/>
>>> libtool:   error: require no space between '-L' and '-lrt'
>>> make[2]: *** [Makefile:1725: mca_pml_ucx.la<http://mca_pml_ucx.la/>] Error 1
>>> make[2]: Leaving directory 
>>> '/home/vaskarpo/bin/openmpi-1.10.7/ompi/mca/pml/ucx'
>>> make[1]: *** [Makefile:3261: all-recursive] Error 1
>>> make[1]: Leaving directory '/home/vaskarpo/bin/openmpi-1.10.7/ompi'
>>> make: *** [Makefile:1777: all-recursive] Error 1
>>>
>>> Thank you,
>>>
>>> Vahid
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com<mailto:jsquy...@cisco.com>
>>
>>
>>
>> 

Re: [OMPI users] Installation of openmpi-1.10.7 fails

2018-01-11 Thread Vahid Askarpour
Hi Jeff,

I looked for the 3.0.1 version but I only found the 3.0.0 version available for 
download. So I thought it may take a while for the 3.0.1 to become available. 
Or did I miss something?

Thanks,

Vahid

> On Jan 11, 2018, at 12:04 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
> Vahid --
> 
> Were you able to give it a whirl?
> 
> Thanks.
> 
> 
>> On Jan 5, 2018, at 7:58 PM, Vahid Askarpour  wrote:
>> 
>> Gilles,
>> 
>> I will try the 3.0.1rc1 version to see how it goes.
>> 
>> Thanks,
>> 
>> Vahid
>> 
>>> On Jan 5, 2018, at 8:40 PM, Gilles Gouaillardet 
>>>  wrote:
>>> 
>>> Vahid,
>>> 
>>> This looks like the description of the issue reported at 
>>> https://github.com/open-mpi/ompi/issues/4336
>>> The fix is currently available in 3.0.1rc1, and I will back port the fix fo 
>>> the v2.x branch.
>>> A workaround is to use ROM-IO instead of ompio, you can achieve this with
>>> mpirun —mca io ^ompio ...
>>> (FWIW 1.10 series use ROM-IO by default, so there is no leak out of the box)
>>> 
>>> IIRC, a possible (and ugly) workaround for the compilation issue is to
>>> configure —with-ucx=/usr ...
>>> That being said, you should really upgrade to a supported version of Open 
>>> MPI as previously suggested
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> On Saturday, January 6, 2018, Jeff Squyres (jsquyres)  
>>> wrote:
>>> You can still give Open MPI 2.1.1 a try.  It should be source compatible 
>>> with EPW.  Hopefully the behavior is close enough that it should work.
>>> 
>>> If not, please encourage the EPW developers to upgrade.  v3.0.x is the 
>>> current stable series; v1.10.x is ancient.
>>> 
>>> 
>>> 
>>>> On Jan 5, 2018, at 5:22 PM, Vahid Askarpour  wrote:
>>>> 
>>>> Thank you Jeff for your suggestion to use the v.2.1 series.
>>>> 
>>>> I am attempting to use openmpi with EPW. On the EPW website 
>>>> (http://epw.org.uk/Main/DownloadAndInstall), it is stated that:
>>>> 
>>>>> Compatibility of EPW
>>>>> 
>>>>> EPW is tested and should work on the following compilers and libraries:
>>>>> 
>>>>> • gcc640 serial
>>>>> • gcc640 + openmpi-1.10.7
>>>>> • intel 12 + openmpi-1.10.7
>>>>> • intel 17 + impi
>>>>> • PGI 17 + mvapich2.3
>>>>> EPW is know to have the following incompatibilities with:
>>>>> 
>>>>> • openmpi 2.0.2 (but likely on all the 2.x.x version): Works but 
>>>>> memory leak. If you open and close a file a lot of times with openmpi 
>>>>> 2.0.2, the memory increase linearly with the number of times the file is 
>>>>> open.
>>>> 
>>>> So I am hoping to avoid the 2.x.x series and use the 1.10.7 version 
>>>> suggested by the EPW developers. However, it appears that this is not 
>>>> possible.
>>>> 
>>>> Vahid
>>>> 
>>>>> On Jan 5, 2018, at 5:06 PM, Jeff Squyres (jsquyres)  
>>>>> wrote:
>>>>> 
>>>>> I forget what the underlying issue was, but this issue just came up and 
>>>>> was recently fixed:
>>>>> 
>>>>>   https://github.com/open-mpi/ompi/issues/4345
>>>>> 
>>>>> However, the v1.10 series is fairly ancient -- the fix was not applied to 
>>>>> that series.  The fix was applied to the v2.1.x series, and a snapshot 
>>>>> tarball containing the fix is available here (generally just take the 
>>>>> latest tarball):
>>>>> 
>>>>>   https://www.open-mpi.org/nightly/v2.x/
>>>>> 
>>>>> The fix is still pending for the v3.0.x and v3.1.x series (i.e., there 
>>>>> are pending pull requests that haven't been merged yet, so the nightly 
>>>>> snapshots for the v3.0.x and v3.1.x branches do not yet contain this fix).
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jan 5, 2018, at 1:34 PM, Vahid Askarpour  wrote:
>>>>>> 
>>>>>> I am attempting to install openmpi-1.10.7 on CentOS Linux (7.4.1708) 
>>>>>> using GCC-6.4.0.
>>>>>> 
>>>>>> When compiling, I get the following error:
>>>>>> 
>>>>>>

Re: [OMPI users] Installation of openmpi-1.10.7 fails

2018-01-11 Thread Vahid Askarpour
Great. I will try the 3.0.x version to see how it goes.

On a side note, I did manage to run EPW without getting memory leaks using 
openmpi-1.8.8 and gcc-4.8.5. These are the tools that apparently worked when 
the code was developed as seen on their Test Farm 
(http://epw.org.uk/Main/TestFarm).

Thanks,

Vahid

On Jan 11, 2018, at 12:50 PM, Jeff Squyres (jsquyres) 
mailto:jsquy...@cisco.com>> wrote:

You are correct: 3.0.1 has not been released yet.

However, our nightly snapshots of the 3.0.x branch are available for download.  
These are not official releases, but they are great for getting users to test 
what will eventually become an official release (i.e., 3.0.1) to see if 
particular bugs have been fixed.  This is one of the benefits of open source.  
:-)

Here's where the 3.0.1 nightly snapshots are available for download:

   https://www.open-mpi.org/nightly/v3.0.x/

They are organized by date.


On Jan 11, 2018, at 11:34 AM, Vahid Askarpour 
mailto:vh261...@dal.ca>> wrote:

Hi Jeff,

I looked for the 3.0.1 version but I only found the 3.0.0 version available for 
download. So I thought it may take a while for the 3.0.1 to become available. 
Or did I miss something?

Thanks,

Vahid

On Jan 11, 2018, at 12:04 PM, Jeff Squyres (jsquyres) 
mailto:jsquy...@cisco.com>> wrote:

Vahid --

Were you able to give it a whirl?

Thanks.


On Jan 5, 2018, at 7:58 PM, Vahid Askarpour 
mailto:vh261...@dal.ca>> wrote:

Gilles,

I will try the 3.0.1rc1 version to see how it goes.

Thanks,

Vahid

On Jan 5, 2018, at 8:40 PM, Gilles Gouaillardet 
mailto:gilles.gouaillar...@gmail.com>> wrote:

Vahid,

This looks like the description of the issue reported at 
https://github.com/open-mpi/ompi/issues/4336
The fix is currently available in 3.0.1rc1, and I will back port the fix fo the 
v2.x branch.
A workaround is to use ROM-IO instead of ompio, you can achieve this with
mpirun —mca io ^ompio ...
(FWIW 1.10 series use ROM-IO by default, so there is no leak out of the box)

IIRC, a possible (and ugly) workaround for the compilation issue is to
configure —with-ucx=/usr ...
That being said, you should really upgrade to a supported version of Open MPI 
as previously suggested

Cheers,

Gilles

On Saturday, January 6, 2018, Jeff Squyres (jsquyres) 
mailto:jsquy...@cisco.com>> wrote:
You can still give Open MPI 2.1.1 a try.  It should be source compatible with 
EPW.  Hopefully the behavior is close enough that it should work.

If not, please encourage the EPW developers to upgrade.  v3.0.x is the current 
stable series; v1.10.x is ancient.



On Jan 5, 2018, at 5:22 PM, Vahid Askarpour 
mailto:vh261...@dal.ca>> wrote:

Thank you Jeff for your suggestion to use the v.2.1 series.

I am attempting to use openmpi with EPW. On the EPW website 
(http://epw.org.uk/Main/DownloadAndInstall), it is stated that:

Compatibility of EPW

EPW is tested and should work on the following compilers and libraries:

  • gcc640 serial
  • gcc640 + openmpi-1.10.7
  • intel 12 + openmpi-1.10.7
  • intel 17 + impi
  • PGI 17 + mvapich2.3
EPW is know to have the following incompatibilities with:

  • openmpi 2.0.2 (but likely on all the 2.x.x version): Works but memory leak. 
If you open and close a file a lot of times with openmpi 2.0.2, the memory 
increase linearly with the number of times the file is open.

So I am hoping to avoid the 2.x.x series and use the 1.10.7 version suggested 
by the EPW developers. However, it appears that this is not possible.

Vahid

On Jan 5, 2018, at 5:06 PM, Jeff Squyres (jsquyres) 
mailto:jsquy...@cisco.com>> wrote:

I forget what the underlying issue was, but this issue just came up and was 
recently fixed:

https://github.com/open-mpi/ompi/issues/4345

However, the v1.10 series is fairly ancient -- the fix was not applied to that 
series.  The fix was applied to the v2.1.x series, and a snapshot tarball 
containing the fix is available here (generally just take the latest tarball):

https://www.open-mpi.org/nightly/v2.x/

The fix is still pending for the v3.0.x and v3.1.x series (i.e., there are 
pending pull requests that haven't been merged yet, so the nightly snapshots 
for the v3.0.x and v3.1.x branches do not yet contain this fix).



On Jan 5, 2018, at 1:34 PM, Vahid Askarpour 
mailto:vh261...@dal.ca>> wrote:

I am attempting to install openmpi-1.10.7 on CentOS Linux (7.4.1708) using 
GCC-6.4.0.

When compiling, I get the following error:

make[2]: Leaving directory '/home/vaskarpo/bin/openmpi-1.10.7/ompi/mca/pml/ob1'
Making all in mca/pml/ucx
make[2]: Entering directory '/home/vaskarpo/bin/openmpi-1.10.7/ompi/mca/pml/ucx'
CC   pml_ucx.lo
CC   pml_ucx_request.lo
CC   pml_ucx_datatype.lo
CC   pml_ucx_component.lo
CCLD mca_pml_ucx.la<http://mca_pml_ucx.la>
libtool:   error: require no space between '-L' and '-lrt'
make[2]: *** [Makefile:1725: mca_pml_ucx.la<http://mca_pml_ucx.la

Re: [OMPI users] Installation of openmpi-1.10.7 fails

2018-01-18 Thread Vahid Askarpour
Hi Jeff,

I compiled Quantum Espresso/EPW with openmpi-3.0.x. The openmpi was compiled 
with intel14.

A preliminary run for EPW using Quantum Espresso crashed with the following 
message:

end of file while reading crystal k points

There are 1728 k points in the input file and Quantum Espresso, by default, can 
read up to 4 k points.

This error did not occur with openmpi-1.8.1.

So I will just continue to use openmpi-1.8.1 as it does not crash.

Thanks,

Vahid

> On Jan 11, 2018, at 12:50 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
> You are correct: 3.0.1 has not been released yet.
> 
> However, our nightly snapshots of the 3.0.x branch are available for 
> download.  These are not official releases, but they are great for getting 
> users to test what will eventually become an official release (i.e., 3.0.1) 
> to see if particular bugs have been fixed.  This is one of the benefits of 
> open source.  :-)
> 
> Here's where the 3.0.1 nightly snapshots are available for download:
> 
>https://www.open-mpi.org/nightly/v3.0.x/
> 
> They are organized by date.
> 
> 
>> On Jan 11, 2018, at 11:34 AM, Vahid Askarpour  wrote:
>> 
>> Hi Jeff,
>> 
>> I looked for the 3.0.1 version but I only found the 3.0.0 version available 
>> for download. So I thought it may take a while for the 3.0.1 to become 
>> available. Or did I miss something?
>> 
>> Thanks,
>> 
>> Vahid
>> 
>>> On Jan 11, 2018, at 12:04 PM, Jeff Squyres (jsquyres)  
>>> wrote:
>>> 
>>> Vahid --
>>> 
>>> Were you able to give it a whirl?
>>> 
>>> Thanks.
>>> 
>>> 
>>>> On Jan 5, 2018, at 7:58 PM, Vahid Askarpour  wrote:
>>>> 
>>>> Gilles,
>>>> 
>>>> I will try the 3.0.1rc1 version to see how it goes.
>>>> 
>>>> Thanks,
>>>> 
>>>> Vahid
>>>> 
>>>>> On Jan 5, 2018, at 8:40 PM, Gilles Gouaillardet 
>>>>>  wrote:
>>>>> 
>>>>> Vahid,
>>>>> 
>>>>> This looks like the description of the issue reported at 
>>>>> https://github.com/open-mpi/ompi/issues/4336
>>>>> The fix is currently available in 3.0.1rc1, and I will back port the fix 
>>>>> fo the v2.x branch.
>>>>> A workaround is to use ROM-IO instead of ompio, you can achieve this with
>>>>> mpirun —mca io ^ompio ...
>>>>> (FWIW 1.10 series use ROM-IO by default, so there is no leak out of the 
>>>>> box)
>>>>> 
>>>>> IIRC, a possible (and ugly) workaround for the compilation issue is to
>>>>> configure —with-ucx=/usr ...
>>>>> That being said, you should really upgrade to a supported version of Open 
>>>>> MPI as previously suggested
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Gilles
>>>>> 
>>>>> On Saturday, January 6, 2018, Jeff Squyres (jsquyres) 
>>>>>  wrote:
>>>>> You can still give Open MPI 2.1.1 a try.  It should be source compatible 
>>>>> with EPW.  Hopefully the behavior is close enough that it should work.
>>>>> 
>>>>> If not, please encourage the EPW developers to upgrade.  v3.0.x is the 
>>>>> current stable series; v1.10.x is ancient.
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jan 5, 2018, at 5:22 PM, Vahid Askarpour  wrote:
>>>>>> 
>>>>>> Thank you Jeff for your suggestion to use the v.2.1 series.
>>>>>> 
>>>>>> I am attempting to use openmpi with EPW. On the EPW website 
>>>>>> (http://epw.org.uk/Main/DownloadAndInstall), it is stated that:
>>>>>> 
>>>>>>> Compatibility of EPW
>>>>>>> 
>>>>>>> EPW is tested and should work on the following compilers and libraries:
>>>>>>> 
>>>>>>>   • gcc640 serial
>>>>>>>   • gcc640 + openmpi-1.10.7
>>>>>>>   • intel 12 + openmpi-1.10.7
>>>>>>>   • intel 17 + impi
>>>>>>>   • PGI 17 + mvapich2.3
>>>>>>> EPW is know to have the following incompatibilities with:
>>>>>>> 
>>>>>>>   • openmpi 2.0.2 (but likely on all the 2.x.x version): Works but 
>>>>>>> memory leak. If you open and close a file a lot of times with openmpi 
>>>>>>> 2

Re: [OMPI users] Installation of openmpi-1.10.7 fails

2018-01-18 Thread Vahid Askarpour
My openmpi3.0.x run (called nscf run) was reading data from a routine Quantum 
Espresso input file edited by hand. The preliminary run (called scf run) was 
done with openmpi3.0.x on a similar input file also edited by hand. 

Vahid



> On Jan 18, 2018, at 6:39 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
> FWIW: If your Open MPI 3.0.x runs are reading data that was written by MPI IO 
> via Open MPI 1.10.x or 1.8.x runs, that data formats may not be compatible 
> (and could lead to errors like you're seeing -- premature end of file, etc.).
> 
> 
>> On Jan 18, 2018, at 5:34 PM, Vahid Askarpour  wrote:
>> 
>> Hi Jeff,
>> 
>> I compiled Quantum Espresso/EPW with openmpi-3.0.x. The openmpi was compiled 
>> with intel14.
>> 
>> A preliminary run for EPW using Quantum Espresso crashed with the following 
>> message:
>> 
>> end of file while reading crystal k points
>> 
>> There are 1728 k points in the input file and Quantum Espresso, by default, 
>> can read up to 4 k points.
>> 
>> This error did not occur with openmpi-1.8.1.
>> 
>> So I will just continue to use openmpi-1.8.1 as it does not crash.
>> 
>> Thanks,
>> 
>> Vahid
>> 
>>> On Jan 11, 2018, at 12:50 PM, Jeff Squyres (jsquyres)  
>>> wrote:
>>> 
>>> You are correct: 3.0.1 has not been released yet.
>>> 
>>> However, our nightly snapshots of the 3.0.x branch are available for 
>>> download.  These are not official releases, but they are great for getting 
>>> users to test what will eventually become an official release (i.e., 3.0.1) 
>>> to see if particular bugs have been fixed.  This is one of the benefits of 
>>> open source.  :-)
>>> 
>>> Here's where the 3.0.1 nightly snapshots are available for download:
>>> 
>>>  https://www.open-mpi.org/nightly/v3.0.x/
>>> 
>>> They are organized by date.
>>> 
>>> 
>>>> On Jan 11, 2018, at 11:34 AM, Vahid Askarpour  wrote:
>>>> 
>>>> Hi Jeff,
>>>> 
>>>> I looked for the 3.0.1 version but I only found the 3.0.0 version 
>>>> available for download. So I thought it may take a while for the 3.0.1 to 
>>>> become available. Or did I miss something?
>>>> 
>>>> Thanks,
>>>> 
>>>> Vahid
>>>> 
>>>>> On Jan 11, 2018, at 12:04 PM, Jeff Squyres (jsquyres) 
>>>>>  wrote:
>>>>> 
>>>>> Vahid --
>>>>> 
>>>>> Were you able to give it a whirl?
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> 
>>>>>> On Jan 5, 2018, at 7:58 PM, Vahid Askarpour  wrote:
>>>>>> 
>>>>>> Gilles,
>>>>>> 
>>>>>> I will try the 3.0.1rc1 version to see how it goes.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Vahid
>>>>>> 
>>>>>>> On Jan 5, 2018, at 8:40 PM, Gilles Gouaillardet 
>>>>>>>  wrote:
>>>>>>> 
>>>>>>> Vahid,
>>>>>>> 
>>>>>>> This looks like the description of the issue reported at 
>>>>>>> https://github.com/open-mpi/ompi/issues/4336
>>>>>>> The fix is currently available in 3.0.1rc1, and I will back port the 
>>>>>>> fix fo the v2.x branch.
>>>>>>> A workaround is to use ROM-IO instead of ompio, you can achieve this 
>>>>>>> with
>>>>>>> mpirun —mca io ^ompio ...
>>>>>>> (FWIW 1.10 series use ROM-IO by default, so there is no leak out of the 
>>>>>>> box)
>>>>>>> 
>>>>>>> IIRC, a possible (and ugly) workaround for the compilation issue is to
>>>>>>> configure —with-ucx=/usr ...
>>>>>>> That being said, you should really upgrade to a supported version of 
>>>>>>> Open MPI as previously suggested
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> 
>>>>>>> Gilles
>>>>>>> 
>>>>>>> On Saturday, January 6, 2018, Jeff Squyres (jsquyres) 
>>>>>>>  wrote:
>>>>>>> You can still give Open MPI 2.1.1 a try.  It should be source 
>>>>>>> compatible with EPW.  Hopefully the behavior is close enough that it 
>>&

Re: [OMPI users] Installation of openmpi-1.10.7 fails

2018-01-19 Thread Vahid Askarpour
Gilles,

I have submitted that job with --mca io romio314. If it finishes, I will let 
you know. It is sitting in Conte’s queue at Purdue.

As to Edgar’s question about the file system, here is the output of df -Th:

vaskarpo@conte-fe00:~ $ df -Th
Filesystem   TypeSize  Used Avail Use% Mounted on
/dev/sda1ext4435G   16G  398G   4% /
tmpfstmpfs16G  1.4M   16G   1% /dev/shm
persistent-nfs.rcac.purdue.edu:/persistent/home
 nfs  80T   64T   17T  80% /home
persistent-nfs.rcac.purdue.edu:/persistent/apps
 nfs 8.0T  4.0T  4.1T  49% /apps
mds-d01-ib.rcac.purdue.edu@o2ib1:mds-d02-ib.rcac.purdue.edu@o2ib1:/lustreD
 lustre  1.4P  994T  347T  75% /scratch/conte
depotint-nfs.rcac.purdue.edu:/depot
 nfs 4.5P  3.0P  1.6P  66% /depot
172.18.84.186:/persistent/fsadmin
 nfs 200G  130G   71G  65% /usr/rmt_share/fsadmin

The code is compiled in my $HOME and is run on the scratch.

Cheers,

Vahid

> On Jan 18, 2018, at 10:14 PM, Gilles Gouaillardet 
>  wrote:
> 
> Vahid,
> 
> i the v1.10 series, the default MPI-IO component was ROMIO based, and
> in the v3 series, it is now ompio.
> You can force the latest Open MPI to use the ROMIO based component with
> mpirun --mca io romio314 ...
> 
> That being said, your description (e.g. a hand edited file) suggests
> that I/O is not performed with MPI-IO,
> which makes me very puzzled on why the latest Open MPI is crashing.
> 
> Cheers,
> 
> Gilles
> 
> On Fri, Jan 19, 2018 at 10:55 AM, Edgar Gabriel  
> wrote:
>> I will try to reproduce this problem with 3.0.x, but it might take me a
>> couple of days to get to it.
>> 
>> Since it seemed to have worked with 2.0.x (except for the running out file
>> handles problem), there is the suspicion that one of the fixes that we
>> introduced since then is the problem.
>> 
>> What file system did you run it on? NFS?
>> 
>> Thanks
>> 
>> Edgar
>> 
>> 
>> On 1/18/2018 5:17 PM, Jeff Squyres (jsquyres) wrote:
>>> 
>>> On Jan 18, 2018, at 5:53 PM, Vahid Askarpour  wrote:
>>>> 
>>>> My openmpi3.0.x run (called nscf run) was reading data from a routine
>>>> Quantum Espresso input file edited by hand. The preliminary run (called scf
>>>> run) was done with openmpi3.0.x on a similar input file also edited by 
>>>> hand.
>>> 
>>> Gotcha.
>>> 
>>> Well, that's a little disappointing.
>>> 
>>> It would be good to understand why it is crashing -- is the app doing
>>> something that is accidentally not standard?  Is there a bug in (soon to be
>>> released) Open MPI 3.0.1?  ...?
>>> 
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Installation of openmpi-1.10.7 fails

2018-01-19 Thread Vahid Askarpour
To run EPW, the command for running the preliminary nscf run is 
(http://epw.org.uk/Documentation/B-dopedDiamond):

~/bin/openmpi-v3.0/bin/mpiexec -np 64 
/home/vaskarpo/bin/qe-6.0_intel14_soc/bin/pw.x -npool 64 < nscf.in > nscf.out

So I submitted it with the following command:

~/bin/openmpi-v3.0/bin/mpiexec --mca io romio314 -np 64 
/home/vaskarpo/bin/qe-6.0_intel14_soc/bin/pw.x -npool 64 < nscf.in > nscf.out

And it crashed like the first time.

It is interesting that the preliminary scf run works fine. The scf run requires 
Quantum Espresso to generate the k points automatically as shown below:

K_POINTS (automatic)
12 12 12 0 0 0

The nscf run which crashes includes a list of k points (1728 in this case) as 
seen below:

K_POINTS (crystal)
1728
  0.  0.  0.  5.787037e-04
  0.  0.  0.0833  5.787037e-04
  0.  0.  0.1667  5.787037e-04
  0.  0.  0.2500  5.787037e-04
  0.  0.  0.  5.787037e-04
  0.  0.  0.4167  5.787037e-04
  0.  0.  0.5000  5.787037e-04
  0.  0.  0.5833  5.787037e-04
  0.  0.  0.6667  5.787037e-04
  0.  0.  0.7500  5.787037e-04
…….
…….

To build openmpi (either 1.10.7 or 3.0.x), I loaded the fortran compiler 
module, configured with only the “--prefix="  and then “make all install”. I 
did not enable or disable any other options.

Cheers,

Vahid


On Jan 19, 2018, at 10:23 AM, Edgar Gabriel 
mailto:egabr...@central.uh.edu>> wrote:


thanks, that is interesting. Since /scratch is a lustre file system, Open MPI 
should actually utilize romio314 for that anyway, not ompio. What I have seen 
however happen on at least one occasions is that ompio was still used since ( I 
suspect) romio314 didn't pick up correctly the configuration options. It is a 
little bit of a mess from that perspective that we have to pass the romio 
arguments with different flag/options than for ompio, e.g.

--with-lustre=/path/to/lustre/ 
--with-io-romio-flags="--with-file-system=ufs+nfs+lustre 
--with-lustre=/path/to/lustre"

ompio should pick up the lustre options correctly if lustre headers/libraries 
are found at the default location, even if the user did not pass the 
--with-lustre option. I am not entirely sure what happens in romio if the user 
did not pass the --with-file-system=ufs+nfs+lustre but the lustre 
headers/libraries are found at the default location, i.e. whether the lustre 
adio component is still compiled or not.

Anyway, lets wait for the outcome of your run enforcing using the romio314 
component, and I will still try to reproduce your problem on my system.

Thanks
Edgar

On 1/19/2018 7:15 AM, Vahid Askarpour wrote:

Gilles,

I have submitted that job with --mca io romio314. If it finishes, I will let 
you know. It is sitting in Conte’s queue at Purdue.

As to Edgar’s question about the file system, here is the output of df -Th:

vaskarpo@conte-fe00:~ $ df -Th
Filesystem   TypeSize  Used Avail Use% Mounted on
/dev/sda1ext4435G   16G  398G   4% /
tmpfstmpfs16G  1.4M   16G   1% /dev/shm
persistent-nfs.rcac.purdue.edu<http://persistent-nfs.rcac.purdue.edu>:/persistent/home
 nfs  80T   64T   17T  80% /home
persistent-nfs.rcac.purdue.edu<http://persistent-nfs.rcac.purdue.edu>:/persistent/apps
 nfs 8.0T  4.0T  4.1T  49% /apps
mds-d01-ib.rcac.purdue.edu@o2ib1:mds-d02-ib.rcac.purdue.edu@o2ib1:/lustreD<mailto:mds-d01-ib.rcac.purdue.edu@o2ib1:mds-d02-ib.rcac.purdue.edu@o2ib1:/lustreD>
 lustre  1.4P  994T  347T  75% /scratch/conte
depotint-nfs.rcac.purdue.edu<http://depotint-nfs.rcac.purdue.edu>:/depot
 nfs 4.5P  3.0P  1.6P  66% /depot
172.18.84.186:/persistent/fsadmin
 nfs 200G  130G   71G  65% /usr/rmt_share/fsadmin

The code is compiled in my $HOME and is run on the scratch.

Cheers,

Vahid



On Jan 18, 2018, at 10:14 PM, Gilles Gouaillardet 
<mailto:gilles.gouaillar...@gmail.com> wrote:

Vahid,

i the v1.10 series, the default MPI-IO component was ROMIO based, and
in the v3 series, it is now ompio.
You can force the latest Open MPI to use the ROMIO based component with
mpirun --mca io romio314 ...

That being said, your description (e.g. a hand edited file) suggests
that I/O is not performed with MPI-IO,
which makes me very puzzled on why the latest Open MPI is crashing.

Cheers,

Gilles

On Fri, Jan 19, 2018 at 10:55 AM, Edgar Gabriel 
<mailto:egabr...@central.uh.edu> wrote:


I will try to reproduce this problem with 3.0.x, but it might take me a
couple of days to get to it.

Since it seemed to have worked with 2.0.x (except for the running out file
handles problem), there is the suspicion that one of the fixes that we
introduced since then is the problem.

W

Re: [OMPI users] Installation of openmpi-1.10.7 fails

2018-01-19 Thread Vahid Askarpour
Hi Edgar,

Just to let you know that the nscf run with --mca io ompio crashed like the 
other two runs.

Thank you,

Vahid

On Jan 19, 2018, at 12:46 PM, Edgar Gabriel 
mailto:egabr...@central.uh.edu>> wrote:


ok, thank you for the information. Two short questions and requests. I have 
qe-6.2.1 compiled and running on my system (although it is with gcc-6.4 instead 
of the intel compiler), and I am currently running the parallel test suite. So 
far, all the tests passed, although it is still running.

My question is now, would it be possible for you to give me access to exactly 
the same data set that you are using?  You could upload to a webpage or similar 
and just send me the link.

The second question/request, could you rerun your tests one more time, this 
time forcing using ompio? e.g. --mca io ompio

Thanks

Edgar

On 1/19/2018 10:32 AM, Vahid Askarpour wrote:
To run EPW, the command for running the preliminary nscf run is 
(http://epw.org.uk/Documentation/B-dopedDiamond):

~/bin/openmpi-v3.0/bin/mpiexec -np 64 
/home/vaskarpo/bin/qe-6.0_intel14_soc/bin/pw.x -npool 64 < nscf.in > nscf.out

So I submitted it with the following command:

~/bin/openmpi-v3.0/bin/mpiexec --mca io romio314 -np 64 
/home/vaskarpo/bin/qe-6.0_intel14_soc/bin/pw.x -npool 64 < nscf.in > nscf.out

And it crashed like the first time.

It is interesting that the preliminary scf run works fine. The scf run requires 
Quantum Espresso to generate the k points automatically as shown below:

K_POINTS (automatic)
12 12 12 0 0 0

The nscf run which crashes includes a list of k points (1728 in this case) as 
seen below:

K_POINTS (crystal)
1728
  0.  0.  0.  5.787037e-04
  0.  0.  0.0833  5.787037e-04
  0.  0.  0.1667  5.787037e-04
  0.  0.  0.2500  5.787037e-04
  0.  0.  0.  5.787037e-04
  0.  0.  0.4167  5.787037e-04
  0.  0.  0.5000  5.787037e-04
  0.  0.  0.5833  5.787037e-04
  0.  0.  0.6667  5.787037e-04
  0.  0.  0.7500  5.787037e-04
…….
…….

To build openmpi (either 1.10.7 or 3.0.x), I loaded the fortran compiler 
module, configured with only the “--prefix="  and then “make all 
install”. I did not enable or disable any other options.

Cheers,

Vahid


On Jan 19, 2018, at 10:23 AM, Edgar Gabriel 
mailto:egabr...@central.uh.edu>> wrote:


thanks, that is interesting. Since /scratch is a lustre file system, Open MPI 
should actually utilize romio314 for that anyway, not ompio. What I have seen 
however happen on at least one occasions is that ompio was still used since ( I 
suspect) romio314 didn't pick up correctly the configuration options. It is a 
little bit of a mess from that perspective that we have to pass the romio 
arguments with different flag/options than for ompio, e.g.

--with-lustre=/path/to/lustre/ 
--with-io-romio-flags="--with-file-system=ufs+nfs+lustre 
--with-lustre=/path/to/lustre"

ompio should pick up the lustre options correctly if lustre headers/libraries 
are found at the default location, even if the user did not pass the 
--with-lustre option. I am not entirely sure what happens in romio if the user 
did not pass the --with-file-system=ufs+nfs+lustre but the lustre 
headers/libraries are found at the default location, i.e. whether the lustre 
adio component is still compiled or not.

Anyway, lets wait for the outcome of your run enforcing using the romio314 
component, and I will still try to reproduce your problem on my system.

Thanks
Edgar

On 1/19/2018 7:15 AM, Vahid Askarpour wrote:

Gilles,

I have submitted that job with --mca io romio314. If it finishes, I will let 
you know. It is sitting in Conte’s queue at Purdue.

As to Edgar’s question about the file system, here is the output of df -Th:

vaskarpo@conte-fe00:~ $ df -Th
Filesystem   TypeSize  Used Avail Use% Mounted on
/dev/sda1ext4435G   16G  398G   4% /
tmpfstmpfs16G  1.4M   16G   1% /dev/shm
persistent-nfs.rcac.purdue.edu<http://persistent-nfs.rcac.purdue.edu/>:/persistent/home
 nfs  80T   64T   17T  80% /home
persistent-nfs.rcac.purdue.edu<http://persistent-nfs.rcac.purdue.edu/>:/persistent/apps
 nfs 8.0T  4.0T  4.1T  49% /apps
mds-d01-ib.rcac.purdue.edu@o2ib1:mds-d02-ib.rcac.purdue.edu@o2ib1:/lustreD<mailto:mds-d01-ib.rcac.purdue.edu@o2ib1:mds-d02-ib.rcac.purdue.edu@o2ib1:/lustreD>
 lustre  1.4P  994T  347T  75% /scratch/conte
depotint-nfs.rcac.purdue.edu<http://depotint-nfs.rcac.purdue.edu/>:/depot
 nfs 4.5P  3.0P  1.6P  66% /depot
172.18.84.186:/persistent/fsadmin
 nfs 200G  130G   71G  65% /usr/rmt_share/fsadmin

The code is compiled in my $HOME and is run on the scratch.

Re: [OMPI users] Installation of openmpi-1.10.7 fails

2018-01-19 Thread Vahid Askarpour
Concerning the following error

 from pw_readschemafile : error # 1
 xml data file not found

The nscf run uses files generated by the scf.in run. So I first run scf.in and 
when it finishes, I run nscf.in. If you have done this and still get the above 
error, then this could be another bug. It does not happen for me with 
intel14/openmpi-1.8.8.

Thanks for the update,

Vahid

On Jan 19, 2018, at 3:08 PM, Edgar Gabriel 
mailto:egabr...@central.uh.edu>> wrote:


ok, here is what found out so far, will have to stop for now however for today:

 1. I can in fact reproduce your bug on my systems.

 2. I can confirm that the problem occurs both with romio314 and ompio. I 
*think* the issue is that the input_tmp.in file is incomplete. In both cases 
(ompio and romio) the end of the file looks as follows (and its exactly the 
same for both libraries):

gabriel@crill-002:/tmp/gabriel/qe-6.2.1/QE_input_files<mailto:gabriel@crill-002:/tmp/gabriel/qe-6.2.1/QE_input_files>>
 tail -10 input_tmp.in
  0.6667  0.5000  0.8333  5.787037e-04
  0.6667  0.5000  0.9167  5.787037e-04
  0.6667  0.5833  0.  5.787037e-04
  0.6667  0.5833  0.0833  5.787037e-04
  0.6667  0.5833  0.1667  5.787037e-04
  0.6667  0.5833  0.2500  5.787037e-04
  0.6667  0.5833  0.  5.787037e-04
  0.6667  0.5833  0.4167  5.787037e-04
  0.6667  0.5833  0.5000  5.787037e-04
  0.6667  0.5833  0.5833  5

which is what I *think* causes the problem.

 3. I tried to find where input_tmp.in is generated, but haven't completely 
identified the location. However, I could not find MPI file_write(_all) 
operations anywhere in the code, although there are some MPI_file_read(_all) 
operations.

 4. I can confirm that the behavior with Open MPI 1.8.x is different. 
input_tmp.in looks more complete (at least it doesn't end in the middle of the 
line). The simulation does still not finish for me, but the bug reported is 
slightly different, I might just be missing a file or something

 from pw_readschemafile : error # 1
 xml data file not found

Since I think input_tmp.in is generated from data that is provided in nscf.in, 
it might very well be something in the MPI_File_read(_all) operation that 
causes the issue, but since both ompio and romio are affected, there is good 
chance that something outside of the control of io components is causing the 
trouble (maybe a datatype issue that has changed from 1.8.x series to 3.0.x).

 5. Last but not least, I also wanted to mention that I ran all parallel tests 
that I found in the testsuite  (run-tests-cp-parallel, run-tests-pw-parallel, 
run-tests-ph-parallel, run-tests-epw-parallel ), and they all passed with ompio 
(and romio314 although I only ran a subset of the tests with romio314).


Thanks

Edgar

-



On 01/19/2018 11:44 AM, Vahid Askarpour wrote:
Hi Edgar,

Just to let you know that the nscf run with --mca io ompio crashed like the 
other two runs.

Thank you,

Vahid

On Jan 19, 2018, at 12:46 PM, Edgar Gabriel 
mailto:egabr...@central.uh.edu>> wrote:


ok, thank you for the information. Two short questions and requests. I have 
qe-6.2.1 compiled and running on my system (although it is with gcc-6.4 instead 
of the intel compiler), and I am currently running the parallel test suite. So 
far, all the tests passed, although it is still running.

My question is now, would it be possible for you to give me access to exactly 
the same data set that you are using?  You could upload to a webpage or similar 
and just send me the link.

The second question/request, could you rerun your tests one more time, this 
time forcing using ompio? e.g. --mca io ompio

Thanks

Edgar

On 1/19/2018 10:32 AM, Vahid Askarpour wrote:
To run EPW, the command for running the preliminary nscf run is 
(http://epw.org.uk/Documentation/B-dopedDiamond):

~/bin/openmpi-v3.0/bin/mpiexec -np 64 
/home/vaskarpo/bin/qe-6.0_intel14_soc/bin/pw.x -npool 64 < nscf.in > nscf.out

So I submitted it with the following command:

~/bin/openmpi-v3.0/bin/mpiexec --mca io romio314 -np 64 
/home/vaskarpo/bin/qe-6.0_intel14_soc/bin/pw.x -npool 64 < nscf.in > nscf.out

And it crashed like the first time.

It is interesting that the preliminary scf run works fine. The scf run requires 
Quantum Espresso to generate the k points automatically as shown below:

K_POINTS (automatic)
12 12 12 0 0 0

The nscf run which crashes includes a list of k points (1728 in this case) as 
seen below:

K_POINTS (crystal)
1728
  0.  0.  0.  5.787037e-04
  0.  0.  0.0833  5.787037e-04
  0.  0.  0.1667  5.787037e-04
  0.  0.  0.2500  5.787037e-04
  0.  0.  0.  5.787037e-04
  0.  0.  0.4167  5.787037e-04
  0.  0.  0.5000  5.787037e-04
  0.000

Re: [OMPI users] Installation of openmpi-1.10.7 fails

2018-01-23 Thread Vahid Askarpour
This would work for Quantum Espresso input. I am waiting to see what happens to 
EPW. I don’t think EPW accepts the -i argument. I will report back once the EPW 
job is done.

Cheers,

Vahid

On Jan 22, 2018, at 6:05 PM, Edgar Gabriel 
mailto:egabr...@central.uh.edu>> wrote:


well, my final comment on this topic, as somebody suggested earlier in this 
email chain, if you provide the input with the -i argument instead of piping 
from standard input, things seem to work as far as I can see (disclaimer: I do 
not know what the final outcome should be. I just see that the application does 
not complain about the 'end of file while reading crystal k points'). So maybe 
that is the most simple solution.

Thanks

Edgar

On 1/22/2018 1:17 PM, Edgar Gabriel wrote:

after some further investigation, I am fairly confident that this is not an MPI 
I/O problem.

The input file input_tmp.in is generated in this sequence of instructions 
(which is in Modules/open_close_input_file.f90)

---

  IF ( TRIM(input_file_) /= ' ' ) THEn
 !
 ! copy file to be opened into input_file
 !
 input_file = input_file_
 !
  ELSE
 !
 ! if no file specified then copy from standard input
 !
 input_file="input_tmp.in"
 OPEN(UNIT = stdtmp, FILE=trim(input_file), FORM='formatted', &
  STATUS='unknown', IOSTAT = ierr )
 IF ( ierr > 0 ) GO TO 30
 !
 dummy=' '
 WRITE(stdout, '(5x,a)') "Waiting for input..."
 DO WHILE ( TRIM(dummy) .NE. "MAGICALME" )
READ (stdin,fmt='(A512)',END=20) dummy
WRITE (stdtmp,'(A)') trim(dummy)
 END DO
 !
20   CLOSE ( UNIT=stdtmp, STATUS='keep' )



Basically, if no input file has been provided, the input file is generated by 
reading from standard input. Since the application is being launched e.g. with

mpirun -np 64 ../bin/pw.x -npool 64 nscf.out


the data comes from nscf.in. I simply do not know enough about IO forwarding do 
be able to tell why we do not see the entire file, but one interesting detail 
is that if I run it in the debugger, the input_tmp.in is created correctly. 
However, if I run it using mpirun as shown above, the file is cropped 
incorrectly, which leads to the error message mentioned in this email chain.

Anyway, I would probably need some help here from somebody who knows the 
runtime better than me on what could go wrong at this point.

Thanks

Edgar



On 1/19/2018 1:22 PM, Vahid Askarpour wrote:
Concerning the following error

 from pw_readschemafile : error # 1
 xml data file not found

The nscf run uses files generated by the scf.in run. So I first run scf.in and 
when it finishes, I run nscf.in. If you have done this and still get the above 
error, then this could be another bug. It does not happen for me with 
intel14/openmpi-1.8.8.

Thanks for the update,

Vahid

On Jan 19, 2018, at 3:08 PM, Edgar Gabriel 
mailto:egabr...@central.uh.edu>> wrote:


ok, here is what found out so far, will have to stop for now however for today:

 1. I can in fact reproduce your bug on my systems.

 2. I can confirm that the problem occurs both with romio314 and ompio. I 
*think* the issue is that the input_tmp.in file is incomplete. In both cases 
(ompio and romio) the end of the file looks as follows (and its exactly the 
same for both libraries):

gabriel@crill-002:/tmp/gabriel/qe-6.2.1/QE_input_files<mailto:gabriel@crill-002:/tmp/gabriel/qe-6.2.1/QE_input_files>>
 tail -10 input_tmp.in
  0.6667  0.5000  0.8333  5.787037e-04
  0.6667  0.5000  0.9167  5.787037e-04
  0.6667  0.5833  0.  5.787037e-04
  0.6667  0.5833  0.0833  5.787037e-04
  0.6667  0.5833  0.1667  5.787037e-04
  0.6667  0.5833  0.2500  5.787037e-04
  0.6667  0.5833  0.  5.787037e-04
  0.6667  0.5833  0.4167  5.787037e-04
  0.6667  0.5833  0.5000  5.787037e-04
  0.6667  0.5833  0.5833  5

which is what I *think* causes the problem.

 3. I tried to find where input_tmp.in is generated, but haven't completely 
identified the location. However, I could not find MPI file_write(_all) 
operations anywhere in the code, although there are some MPI_file_read(_all) 
operations.

 4. I can confirm that the behavior with Open MPI 1.8.x is different. 
input_tmp.in looks more complete (at least it doesn't end in the middle of the 
line). The simulation does still not finish for me, but the bug reported is 
slightly different, I might just be missing a file or something

 from pw_readschemafile : error # 1
 xml data file not found

Since I think input_tmp.in is generated from data that is provided in nscf.in, 
it might very well be something in the MPI_File_read(_all) operation that 
causes the issue, but since both ompio and romio are affected, t

Re: [OMPI users] OMPI users] Installation of openmpi-1.10.7 fails

2018-01-23 Thread Vahid Askarpour
Gilles,

I have not tried compiling the latest openmpi with GCC. I am waiting to see how 
the intel version turns out before attempting GCC.

Cheers,

Vahid

On Jan 23, 2018, at 9:33 AM, Gilles Gouaillardet 
mailto:gilles.gouaillar...@gmail.com>> wrote:

Vahid,

There used to be a bug in the IOF part, but I am pretty sure this has already 
been fixed.

Does the issue also occur with GNU compilers ?
There used to be an issue with Intel Fortran runtime (short read/write were 
silently ignored) and that was also fixed some time ago.

Cheers,

Gilles

Vahid Askarpour mailto:vh261...@dal.ca>> wrote:
This would work for Quantum Espresso input. I am waiting to see what happens to 
EPW. I don’t think EPW accepts the -i argument. I will report back once the EPW 
job is done.

Cheers,

Vahid

On Jan 22, 2018, at 6:05 PM, Edgar Gabriel 
mailto:egabr...@central.uh.edu>> wrote:


well, my final comment on this topic, as somebody suggested earlier in this 
email chain, if you provide the input with the -i argument instead of piping 
from standard input, things seem to work as far as I can see (disclaimer: I do 
not know what the final outcome should be. I just see that the application does 
not complain about the 'end of file while reading crystal k points'). So maybe 
that is the most simple solution.

Thanks

Edgar

On 1/22/2018 1:17 PM, Edgar Gabriel wrote:

after some further investigation, I am fairly confident that this is not an MPI 
I/O problem.

The input file input_tmp.in is generated in this sequence of instructions 
(which is in Modules/open_close_input_file.f90)

---

  IF ( TRIM(input_file_) /= ' ' ) THEn
 !
 ! copy file to be opened into input_file
 !
 input_file = input_file_
 !
  ELSE
 !
 ! if no file specified then copy from standard input
 !
 input_file="input_tmp.in"
 OPEN(UNIT = stdtmp, FILE=trim(input_file), FORM='formatted', &
  STATUS='unknown', IOSTAT = ierr )
 IF ( ierr > 0 ) GO TO 30
 !
 dummy=' '
 WRITE(stdout, '(5x,a)') "Waiting for input..."
 DO WHILE ( TRIM(dummy) .NE. "MAGICALME" )
READ (stdin,fmt='(A512)',END=20) dummy
WRITE (stdtmp,'(A)') trim(dummy)
 END DO
 !
20   CLOSE ( UNIT=stdtmp, STATUS='keep' )



Basically, if no input file has been provided, the input file is generated by 
reading from standard input. Since the application is being launched e.g. with

mpirun -np 64 ../bin/pw.x -npool 64 nscf.out


the data comes from nscf.in. I simply do not know enough about IO forwarding do 
be able to tell why we do not see the entire file, but one interesting detail 
is that if I run it in the debugger, the input_tmp.in is created correctly. 
However, if I run it using mpirun as shown above, the file is cropped 
incorrectly, which leads to the error message mentioned in this email chain.

Anyway, I would probably need some help here from somebody who knows the 
runtime better than me on what could go wrong at this point.

Thanks

Edgar



On 1/19/2018 1:22 PM, Vahid Askarpour wrote:
Concerning the following error

 from pw_readschemafile : error # 1
 xml data file not found

The nscf run uses files generated by the scf.in run. So I first run scf.in and 
when it finishes, I run nscf.in. If you have done this and still get the above 
error, then this could be another bug. It does not happen for me with 
intel14/openmpi-1.8.8.

Thanks for the update,

Vahid

On Jan 19, 2018, at 3:08 PM, Edgar Gabriel 
mailto:egabr...@central.uh.edu>> wrote:


ok, here is what found out so far, will have to stop for now however for today:

 1. I can in fact reproduce your bug on my systems.

 2. I can confirm that the problem occurs both with romio314 and ompio. I 
*think* the issue is that the input_tmp.in file is incomplete. In both cases 
(ompio and romio) the end of the file looks as follows (and its exactly the 
same for both libraries):

gabriel@crill-002:/tmp/gabriel/qe-6.2.1/QE_input_files<mailto:gabriel@crill-002:/tmp/gabriel/qe-6.2.1/QE_input_files>>
 tail -10 input_tmp.in
  0.6667  0.5000  0.8333  5.787037e-04
  0.6667  0.5000  0.9167  5.787037e-04
  0.6667  0.5833  0.  5.787037e-04
  0.6667  0.5833  0.0833  5.787037e-04
  0.6667  0.5833  0.1667  5.787037e-04
  0.6667  0.5833  0.2500  5.787037e-04
  0.6667  0.5833  0.  5.787037e-04
  0.6667  0.5833  0.4167  5.787037e-04
  0.6667  0.5833  0.5000  5.787037e-04
  0.6667  0.5833  0.5833  5

which is what I *think* causes the problem.

 3. I tried to find where input_tmp.in is generated, but haven't completely 
identified the location. However, I could not find MPI file_write(_all) 
operations anywhere in the code, although there are some MPI_file_read(

Re: [OMPI users] OMPI users] OMPI users] Installation of openmpi-1.10.7 fails

2018-01-30 Thread Vahid Askarpour
This is just an update on how things turned out with openmpi-3.0.x.

I compiled both EPW and openmpi with intel14. In the past, EPW crashed for both 
intel16 and 17. However, with intel14 and openmpi/1.8.8 , I have been getting 
results consistently.

The nscf.in worked with the -i argument. However, when I ran EPW with 
intel14/openmpi-3.0.x, I get the following error:

mca_base_component_repository_open: unable to open mca_io_romio314: libgpfs.so: 
cannot open shared object file: No such file or directory (ignored)

What is interesting is that this error occurs in the middle of a long loop. 
Since the loop repeats over different coordinates, the error may not be coming 
from the gpfs library.

Cheers,

Vahid

On Jan 23, 2018, at 9:52 AM, Gilles Gouaillardet 
mailto:gilles.gouaillar...@gmail.com>> wrote:

Fair enough,

To be on the safe side, I encourage you to use the latest Intel compilers

Cheers,

Gilles

Vahid Askarpour mailto:vh261...@dal.ca>> wrote:
Gilles,

I have not tried compiling the latest openmpi with GCC. I am waiting to see how 
the intel version turns out before attempting GCC.

Cheers,

Vahid

On Jan 23, 2018, at 9:33 AM, Gilles Gouaillardet 
mailto:gilles.gouaillar...@gmail.com>> wrote:

Vahid,

There used to be a bug in the IOF part, but I am pretty sure this has already 
been fixed.

Does the issue also occur with GNU compilers ?
There used to be an issue with Intel Fortran runtime (short read/write were 
silently ignored) and that was also fixed some time ago.

Cheers,

Gilles

Vahid Askarpour mailto:vh261...@dal.ca>> wrote:
This would work for Quantum Espresso input. I am waiting to see what happens to 
EPW. I don’t think EPW accepts the -i argument. I will report back once the EPW 
job is done.

Cheers,

Vahid

On Jan 22, 2018, at 6:05 PM, Edgar Gabriel 
mailto:egabr...@central.uh.edu>> wrote:


well, my final comment on this topic, as somebody suggested earlier in this 
email chain, if you provide the input with the -i argument instead of piping 
from standard input, things seem to work as far as I can see (disclaimer: I do 
not know what the final outcome should be. I just see that the application does 
not complain about the 'end of file while reading crystal k points'). So maybe 
that is the most simple solution.

Thanks

Edgar

On 1/22/2018 1:17 PM, Edgar Gabriel wrote:

after some further investigation, I am fairly confident that this is not an MPI 
I/O problem.

The input file input_tmp.in is generated in this sequence of instructions 
(which is in Modules/open_close_input_file.f90)

---

  IF ( TRIM(input_file_) /= ' ' ) THEn
 !
 ! copy file to be opened into input_file
 !
 input_file = input_file_
 !
  ELSE
 !
 ! if no file specified then copy from standard input
 !
 input_file="input_tmp.in"
 OPEN(UNIT = stdtmp, FILE=trim(input_file), FORM='formatted', &
  STATUS='unknown', IOSTAT = ierr )
 IF ( ierr > 0 ) GO TO 30
 !
 dummy=' '
 WRITE(stdout, '(5x,a)') "Waiting for input..."
 DO WHILE ( TRIM(dummy) .NE. "MAGICALME" )
READ (stdin,fmt='(A512)',END=20) dummy
WRITE (stdtmp,'(A)') trim(dummy)
 END DO
 !
20   CLOSE ( UNIT=stdtmp, STATUS='keep' )



Basically, if no input file has been provided, the input file is generated by 
reading from standard input. Since the application is being launched e.g. with

mpirun -np 64 ../bin/pw.x -npool 64 nscf.out


the data comes from nscf.in. I simply do not know enough about IO forwarding do 
be able to tell why we do not see the entire file, but one interesting detail 
is that if I run it in the debugger, the input_tmp.in is created correctly. 
However, if I run it using mpirun as shown above, the file is cropped 
incorrectly, which leads to the error message mentioned in this email chain.

Anyway, I would probably need some help here from somebody who knows the 
runtime better than me on what could go wrong at this point.

Thanks

Edgar



On 1/19/2018 1:22 PM, Vahid Askarpour wrote:
Concerning the following error

 from pw_readschemafile : error # 1
 xml data file not found

The nscf run uses files generated by the scf.in run. So I first run scf.in and 
when it finishes, I run nscf.in. If you have done this and still get the above 
error, then this could be another bug. It does not happen for me with 
intel14/openmpi-1.8.8.

Thanks for the update,

Vahid

On Jan 19, 2018, at 3:08 PM, Edgar Gabriel 
mailto:egabr...@central.uh.edu>> wrote:


ok, here is what found out so far, will have to stop for now however for today:

 1. I can in fact reproduce your bug on my systems.

 2. I can confirm that the problem occurs both with romio314 and ompio. I 
*think* the issue is that the input_tmp.in file is incomplete. In both cases 
(ompio and romio) the end of the file look

Re: [OMPI users] OMPI users] OMPI users] Installation of openmpi-1.10.7 fails

2018-01-30 Thread Vahid Askarpour
No, I installed this version of openmpi once and with intel14.

Vahid

> On Jan 30, 2018, at 4:41 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
> Did you install one version of Open MPI over another version?
> 
>https://www.open-mpi.org/faq/?category=building#install-overwrite
> 
> 
>> On Jan 30, 2018, at 2:09 PM, Vahid Askarpour  wrote:
>> 
>> This is just an update on how things turned out with openmpi-3.0.x.
>> 
>> I compiled both EPW and openmpi with intel14. In the past, EPW crashed for 
>> both intel16 and 17. However, with intel14 and openmpi/1.8.8 , I have been 
>> getting results consistently.
>> 
>> The nscf.in worked with the -i argument. However, when I ran EPW with 
>> intel14/openmpi-3.0.x, I get the following error:
>> 
>> mca_base_component_repository_open: unable to open mca_io_romio314: 
>> libgpfs.so: cannot open shared object file: No such file or directory 
>> (ignored)
>> 
>> What is interesting is that this error occurs in the middle of a long loop. 
>> Since the loop repeats over different coordinates, the error may not be 
>> coming from the gpfs library.
>> 
>> Cheers,
>> 
>> Vahid
>> 
>>> On Jan 23, 2018, at 9:52 AM, Gilles Gouaillardet 
>>>  wrote:
>>> 
>>> Fair enough,
>>> 
>>> To be on the safe side, I encourage you to use the latest Intel compilers
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> Vahid Askarpour  wrote:
>>> Gilles,
>>> 
>>> I have not tried compiling the latest openmpi with GCC. I am waiting to see 
>>> how the intel version turns out before attempting GCC.
>>> 
>>> Cheers,
>>> 
>>> Vahid
>>> 
>>>> On Jan 23, 2018, at 9:33 AM, Gilles Gouaillardet 
>>>>  wrote:
>>>> 
>>>> Vahid,
>>>> 
>>>> There used to be a bug in the IOF part, but I am pretty sure this has 
>>>> already been fixed.
>>>> 
>>>> Does the issue also occur with GNU compilers ?
>>>> There used to be an issue with Intel Fortran runtime (short read/write 
>>>> were silently ignored) and that was also fixed some time ago.
>>>> 
>>>> Cheers,
>>>> 
>>>> Gilles
>>>> 
>>>> Vahid Askarpour  wrote:
>>>> This would work for Quantum Espresso input. I am waiting to see what 
>>>> happens to EPW. I don’t think EPW accepts the -i argument. I will report 
>>>> back once the EPW job is done.
>>>> 
>>>> Cheers,
>>>> 
>>>> Vahid
>>>> 
>>>>> On Jan 22, 2018, at 6:05 PM, Edgar Gabriel  
>>>>> wrote:
>>>>> 
>>>>> well, my final comment on this topic, as somebody suggested earlier in 
>>>>> this email chain, if you provide the input with the -i argument instead 
>>>>> of piping from standard input, things seem to work as far as I can see 
>>>>> (disclaimer: I do not know what the final outcome should be. I just see 
>>>>> that the application does not complain about the 'end of file while 
>>>>> reading crystal k points'). So maybe that is the most simple solution.
>>>>> 
>>>>> Thanks
>>>>> Edgar
>>>>> 
>>>>> On 1/22/2018 1:17 PM, Edgar Gabriel wrote:
>>>>>> after some further investigation, I am fairly confident that this is not 
>>>>>> an MPI I/O problem. 
>>>>>> The input file input_tmp.in is generated in this sequence of 
>>>>>> instructions (which is in Modules/open_close_input_file.f90)
>>>>>> ---
>>>>>>  IF ( TRIM(input_file_) /= ' ' ) THEn
>>>>>> !
>>>>>> ! copy file to be opened into input_file
>>>>>> !
>>>>>> input_file = input_file_
>>>>>> !
>>>>>>  ELSE
>>>>>> !
>>>>>> ! if no file specified then copy from standard input
>>>>>> !
>>>>>> input_file="input_tmp.in"
>>>>>> OPEN(UNIT = stdtmp, FILE=trim(input_file), FORM='formatted', &
>>>>>>  STATUS='unknown', IOSTAT = ierr )
>>>>>> IF ( ierr > 0 ) GO TO 30
>>>>>> !
>>>>>> dummy=' '
>>>>>> WRITE(stdout, &#

[OMPI users] Printing in a fortran MPI/OpenMP environment

2023-01-31 Thread Vahid Askarpour via users
Hi,

I am running a fortran code (Perturbo) compiled in hybrid openmp/openmpi. The 
code runs on 2 nodes (128 processors) with 32 MPI processes and 4 threads/MPI 
process. I am attempting to verify that a variable involved in the calculations 
in all the MPI processes and threads has the same value. So I would like to 
print this variable with the thread #  for all the 32 MPI processes. Would a 
simple print statement do the job or would such a print statement only print 
the information for the processes on the master node and not both nodes? Is it 
possible to print this variable for all 64 threads on node 1 and all 64 threads 
on node 2 separately?

Thank you,
Vahid

Re: [OMPI users] Printing in a fortran MPI/OpenMP environment

2023-02-01 Thread Vahid Askarpour via users
I did use the omp_get_thread_num() and printed it along with the variable. It 
does get messy in the printout but it did verify that the variable has the same 
value in all the threads for all the processes. 

Thanks,
Vahid

> On Feb 1, 2023, at 2:21 AM, Benson Muite via users  
> wrote:
> 
> On 2/1/23 00:50, Vahid Askarpour via users wrote:
>> Hi,
>> 
>> I am running a fortran code (Perturbo) compiled in hybrid openmp/openmpi. 
>> The code runs on 2 nodes (128 processors) with 32 MPI processes and 4 
>> threads/MPI process. I am attempting to verify that a variable involved in 
>> the calculations in all the MPI processes and threads has the same value. So 
>> I would like to print this variable with the thread #  for all the 32 MPI 
>> processes. Would a simple print statement do the job or would such a print 
>> statement only print the information for the processes on the master node 
>> and not both nodes? Is it possible to print this variable for all 64 threads 
>> on node 1 and all 64 threads on node 2 separately?
>> 
>> Thank you,
>> Vahid
> Print statement should work, though maybe messy. You might look at
> collective operations like min/max or sum.  Perhaps do a local check
> first using OpenMP, then a collective operation using MPI.