Re: [OMPI users] OpenMPI 1.2.x segfault as regular user

2011-03-23 Thread Youri LACAN-BARTLEY
Hi,

Thanks for your feedback and advice.

SELinux is currently disabled at runtime on all nodes as well as on the head 
node.
So I don't believe this might be the issue here.

I have indeed compiled Open MPI myself and haven't specified anything peculiar 
other than a --prefix and --enable-mpirun-prefix-by-default.
Have I overlooked something?

The problem doesn't occur with Open MPI 1.4.
I've tried running simple jobs directly on the head node to eliminate any 
networking or IB wizardry and mpirun systematically segfaults as a non-root 
user.

Here's one part of a strace call on mpirun that might be of some significance:
mmap(NULL, 4294967296, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
= -1 ENOMEM (Cannot allocate memory)

For further information you can refer to the strace files attached to this 
email.

Youri LACAN-BARTLEY

-Message d'origine-
De : users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] De la part 
de Prentice Bisbal
Envoyé : lundi 21 mars 2011 14:56
À : Open MPI Users
Objet : Re: [OMPI users] OpenMPI 1.2.x segfault as regular user

On 03/20/2011 06:22 PM, kevin.buck...@ecs.vuw.ac.nz wrote:
> 
>> It's not hard to test whether or not SELinux is the problem. You can
>> turn SELinux off on the command-line with this command:
>>
>> setenforce 0
>>
>> Of course, you need to be root in order to do this.
>>
>> After turning SELinux off, you can try reproducing the error. If it
>> still occurs, it's SELinux, if it doesn't the problem is elswhere. When
>> your done, you can reenable SELinux with
>>
>> setenforce 1
>>
>> If you're running your job across multiple nodes, you should disable
>> SELinux on all of them for testing.
> 
> You are not actually disabling SELinux with setenforce 0, just
> putting it into "permissive" mode: SELinux is still active.
> 

That's correct. Thanks for catching my inaccurate choice of words.

> Running SELinux in its permissive mode, as opposed to disabling it
> at boot time, sees SELinux making a log of things that would cause
> it to dive in, were it running in "enforcing" mode.

I forgot about that. Checking those logs will make debugging even easier
for the original poster.

> 
> There's then a tool you can run over that log that will suggest
> the ACL changes you need to make to fix the issue from an SELinux
> perspective.
> 

-- 
Prentice
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


mpirun-strace.tar.gz
Description: mpirun-strace.tar.gz


Re: [OMPI users] Is there an mca parameter equivalent to -bind-to-core?

2011-03-23 Thread Gus Correa

Ralph Castain wrote:

On Mar 21, 2011, at 9:27 PM, Eugene Loh wrote:


Gustavo Correa wrote:


Dear OpenMPI Pros

Is there an MCA parameter that would do the same as the mpiexec switch 
'-bind-to-core'?
I.e., something that I could set up not in the mpiexec command line,
but for the whole cluster, or for an user, etc.

In the past I used '-mca mpi mpi_paffinity_alone=1'.


Must be a typo here - the correct command is '-mca mpi_paffinity_alone 1'


But that was before '-bind-to-core' came along.
However, my recollection of some recent discussions here in the list
is that the latter would not do the same as '-bind-to-core',
and that the recommendation was to use '-bind-to-core' in the mpiexec command 
line.


Just to be clear: mpi_paffinity_alone=1 still works and will cause the same 
behavior as bind-to-core.



A little awkward, but how about

--bycorermaps_base_schedule_policy  core
--bysocket  rmaps_base_schedule_policy  socket
--bind-to-core  orte_process_bindingcore
--bind-to-socketorte_process_bindingsocket
--bind-to-none  orte_process_bindingnone

___


Thank you Ralph and Eugene

Ralph, forgive me the typo in the previous message, please.
Equal sign inside the openmpi-mca-params.conf file,
but no equal sign on the mpiexec command line, right?

I am using OpenMPI 1.4.3
I inserted the line
"mpi_paffinity_alone = 1"
in my openmpi-mca-params.conf file, following Ralph's suggestion
that it is equivalent to '-bind-to-core'.

However, now when I do "ompi_info -a",
the output shows the non-default value 1 twice in a row,
then later it shows the default value 0 again!
Please see the output enclosed below.

I am confused.

1) Is this just a glitch in ompi_info,
or did mpi_paffinity_alone get reverted to zero?

2) How can I increase the verbosity level to make sure I have processor
affinity set (i.e. that the processes are bound to cores/processors)?


Many thanks,
Gus Correa

##

ompi_info -a

...

 MCA mpi: parameter "mpi_paffinity_alone" (current 
value: "1", data source: file 
[/home/soft/openmpi/1.4.3/gnu-intel/etc/openmpi-mca-params.conf], 
synonym of: opal_paffinity_alone)
  If nonzero, assume that this job is the only 
(set of) process(es) running on each node and bind processes to 
processors, starting with processor ID 0


 MCA mpi: parameter "mpi_paffinity_alone" (current 
value: "1", data source: file 
[/home/soft/openmpi/1.4.3/gnu-intel/etc/openmpi-mca-params.conf], 
synonym of: opal_paffinity_alone)
  If nonzero, assume that this job is the only 
(set of) process(es) running on each node and bind processes to 
processors, starting with processor ID 0


...

[ ... and after 'mpi_leave_pinned_pipeline' ...]

 MCA mpi: parameter "mpi_paffinity_alone" (current 
value: "0", data source: default value)
  If nonzero, assume that this job is the only 
(set of) process(es) running on each node and bind processes to 
processors, starting with processor ID 0


...


Re: [OMPI users] Is there an mca parameter equivalent to -bind-to-core?

2011-03-23 Thread Eugene Loh

Gus Correa wrote:


Ralph Castain wrote:


On Mar 21, 2011, at 9:27 PM, Eugene Loh wrote:


Gustavo Correa wrote:


Dear OpenMPI Pros

Is there an MCA parameter that would do the same as the mpiexec 
switch '-bind-to-core'?

I.e., something that I could set up not in the mpiexec command line,
but for the whole cluster, or for an user, etc.

In the past I used '-mca mpi mpi_paffinity_alone=1'.




Must be a typo here - the correct command is '-mca 
mpi_paffinity_alone 1'



But that was before '-bind-to-core' came along.
However, my recollection of some recent discussions here in the list
is that the latter would not do the same as '-bind-to-core',
and that the recommendation was to use '-bind-to-core' in the 
mpiexec command line.




Just to be clear: mpi_paffinity_alone=1 still works and will cause 
the same behavior as bind-to-core.




A little awkward, but how about

--bycorermaps_base_schedule_policy  core
--bysocket  rmaps_base_schedule_policy  socket
--bind-to-core  orte_process_bindingcore
--bind-to-socketorte_process_bindingsocket
--bind-to-none  orte_process_bindingnone

___




Thank you Ralph and Eugene

Ralph, forgive me the typo in the previous message, please.
Equal sign inside the openmpi-mca-params.conf file,
but no equal sign on the mpiexec command line, right?

I am using OpenMPI 1.4.3
I inserted the line
"mpi_paffinity_alone = 1"
in my openmpi-mca-params.conf file, following Ralph's suggestion
that it is equivalent to '-bind-to-core'.

However, now when I do "ompi_info -a",
the output shows the non-default value 1 twice in a row,
then later it shows the default value 0 again!
Please see the output enclosed below.

I am confused.

1) Is this just a glitch in ompi_info,
or did mpi_paffinity_alone get reverted to zero?

2) How can I increase the verbosity level to make sure I have processor
affinity set (i.e. that the processes are bound to cores/processors)?


Just a quick answer on 2).  The FAQ 
http://www.open-mpi.org/faq/?category=tuning#using-paffinity-v1.4 (or 
"man mpirun" or "mpirun --help") mentions --report-bindings.


If this is on a Linux system with numactl, you can also try "mpirun ... 
numactl --show".



##

ompi_info -a

...

  MCA mpi: parameter "mpi_paffinity_alone" (current 
value: "1", data source: file 
[/home/soft/openmpi/1.4.3/gnu-intel/etc/openmpi-mca-params.conf], 
synonym of: opal_paffinity_alone)
   If nonzero, assume that this job is the 
only (set of) process(es) running on each node and bind processes to 
processors, starting with processor ID 0


  MCA mpi: parameter "mpi_paffinity_alone" (current 
value: "1", data source: file 
[/home/soft/openmpi/1.4.3/gnu-intel/etc/openmpi-mca-params.conf], 
synonym of: opal_paffinity_alone)
   If nonzero, assume that this job is the 
only (set of) process(es) running on each node and bind processes to 
processors, starting with processor ID 0


...

[ ... and after 'mpi_leave_pinned_pipeline' ...]

  MCA mpi: parameter "mpi_paffinity_alone" (current 
value: "0", data source: default value)
   If nonzero, assume that this job is the 
only (set of) process(es) running on each node and bind processes to 
processors, starting with processor ID 0


...
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





[OMPI users] keyval parser: error 1 reading file mpicc-wrapper-data.txt

2011-03-23 Thread Gus Correa

Dear OpenMPI Pros

Why am I getting the parser error below?
It seems not to recognize comment lines (#).

This is OpenMPI 1.4.3.
The same error happens with the other compiler wrappers too.
However, the wrappers compile and produce an executable.

Thank you,
Gus Correa

Parser error:

$ mpicc hello_c.c
[myhost.mydomain:06489] keyval parser: error 1 reading file 
/my/path/to/openmpi/share/openmpi/mpicc-wrapper-data.txt at line 1:

  # There can be multiple blocks of configuration data, chosen by



Re: [OMPI users] Is there an mca parameter equivalent to -bind-to-core?

2011-03-23 Thread Ralph Castain

On Mar 23, 2011, at 2:20 PM, Gus Correa wrote:

> Ralph Castain wrote:
>> On Mar 21, 2011, at 9:27 PM, Eugene Loh wrote:
>>> Gustavo Correa wrote:
>>> 
 Dear OpenMPI Pros
 
 Is there an MCA parameter that would do the same as the mpiexec switch 
 '-bind-to-core'?
 I.e., something that I could set up not in the mpiexec command line,
 but for the whole cluster, or for an user, etc.
 
 In the past I used '-mca mpi mpi_paffinity_alone=1'.
>> Must be a typo here - the correct command is '-mca mpi_paffinity_alone 1'
 But that was before '-bind-to-core' came along.
 However, my recollection of some recent discussions here in the list
 is that the latter would not do the same as '-bind-to-core',
 and that the recommendation was to use '-bind-to-core' in the mpiexec 
 command line.
>> Just to be clear: mpi_paffinity_alone=1 still works and will cause the same 
>> behavior as bind-to-core.
>>> A little awkward, but how about
>>> 
>>>--bycorermaps_base_schedule_policy  core
>>>--bysocket  rmaps_base_schedule_policy  socket
>>>--bind-to-core  orte_process_bindingcore
>>>--bind-to-socketorte_process_bindingsocket
>>>--bind-to-none  orte_process_bindingnone
>>> 
>>> ___
> 
> Thank you Ralph and Eugene
> 
> Ralph, forgive me the typo in the previous message, please.
> Equal sign inside the openmpi-mca-params.conf file,
> but no equal sign on the mpiexec command line, right?
> 
> I am using OpenMPI 1.4.3
> I inserted the line
> "mpi_paffinity_alone = 1"
> in my openmpi-mca-params.conf file, following Ralph's suggestion
> that it is equivalent to '-bind-to-core'.
> 
> However, now when I do "ompi_info -a",
> the output shows the non-default value 1 twice in a row,
> then later it shows the default value 0 again!
> Please see the output enclosed below.
> 
> I am confused.
> 
> 1) Is this just a glitch in ompi_info,
> or did mpi_paffinity_alone get reverted to zero?
> 

Just tested it myself, and it looks like the output has a weird behavior in it 
- only shows when you use "-a" option. If you instead do:

ompi_info --params opal all

you'll get the correct output:

   MCA opal: parameter "opal_paffinity_alone" (current value: "1", 
data source: file
  
[/Users/rhc/openmpi/build/etc/openmpi-mca-params.conf], synonyms: 
mpi_paffinity_alone, mpi_paffinity_alone)
  If nonzero, assume that this job is the only (set of) 
process(es) running on each node and bind processes to processors,
  starting with processor ID 0

I'll have to pass that on to Jeff - I suspect it is because mpi_paffinity_alone 
is a synonym for opal_paffinity_alone, and so it sorta gets defined twice in 
the code.

> 2) How can I increase the verbosity level to make sure I have processor
> affinity set (i.e. that the processes are bound to cores/processors)?

As Eugene said, just --report-bindings, which is tied to "-mca 
orte_report_bindings 1"


> 
> 
> Many thanks,
> Gus Correa
> 
> ##
> 
> ompi_info -a
> 
> ...
> 
> MCA mpi: parameter "mpi_paffinity_alone" (current value: "1", 
> data source: file 
> [/home/soft/openmpi/1.4.3/gnu-intel/etc/openmpi-mca-params.conf], synonym of: 
> opal_paffinity_alone)
>  If nonzero, assume that this job is the only (set 
> of) process(es) running on each node and bind processes to processors, 
> starting with processor ID 0
> 
> MCA mpi: parameter "mpi_paffinity_alone" (current value: "1", 
> data source: file 
> [/home/soft/openmpi/1.4.3/gnu-intel/etc/openmpi-mca-params.conf], synonym of: 
> opal_paffinity_alone)
>  If nonzero, assume that this job is the only (set 
> of) process(es) running on each node and bind processes to processors, 
> starting with processor ID 0
> 
> ...
> 
> [ ... and after 'mpi_leave_pinned_pipeline' ...]
> 
> MCA mpi: parameter "mpi_paffinity_alone" (current value: "0", 
> data source: default value)
>  If nonzero, assume that this job is the only (set 
> of) process(es) running on each node and bind processes to processors, 
> starting with processor ID 0
> 
> ...
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] keyval parser: error 1 reading file mpicc-wrapper-data.txt

2011-03-23 Thread Ralph Castain

On Mar 23, 2011, at 3:19 PM, Gus Correa wrote:

> Dear OpenMPI Pros
> 
> Why am I getting the parser error below?
> It seems not to recognize comment lines (#).
> 
> This is OpenMPI 1.4.3.
> The same error happens with the other compiler wrappers too.
> However, the wrappers compile and produce an executable.

No idea - I just tested it and didn't get that error. Did you configure this 
for script wrapper compilers instead of binaries?

> 
> Thank you,
> Gus Correa
> 
> Parser error:
> 
> $ mpicc hello_c.c
> [myhost.mydomain:06489] keyval parser: error 1 reading file 
> /my/path/to/openmpi/share/openmpi/mpicc-wrapper-data.txt at line 1:
>  # There can be multiple blocks of configuration data, chosen by
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] keyval parser: error 1 reading file mpicc-wrapper-data.txt

2011-03-23 Thread Gus Correa

Ralph Castain wrote:

On Mar 23, 2011, at 3:19 PM, Gus Correa wrote:


Dear OpenMPI Pros

Why am I getting the parser error below?
It seems not to recognize comment lines (#).

This is OpenMPI 1.4.3.
The same error happens with the other compiler wrappers too.
However, the wrappers compile and produce an executable.


No idea - I just tested it and didn't get that error. Did you configure this 
for script wrapper compilers instead of binaries?


Thank you,
Gus Correa

Parser error:

$ mpicc hello_c.c
[myhost.mydomain:06489] keyval parser: error 1 reading file 
/my/path/to/openmpi/share/openmpi/mpicc-wrapper-data.txt at line 1:
 # There can be multiple blocks of configuration data, chosen by


Thank you, Ralph.

I have two OpenMPI 1.4.3 builds on this cluster.
One with gcc/g++/gfortran,
the other with gcc/g++ and Intel ifort (12.0.0).
The Gnu-only works right, no parser error.
The error is restricted to the Gnu-Intel combination.
Awkward.

Both were configured with these parameters:

--prefix=${MYINSTALLDIR} \
--with-libnuma=/usr \
--with-tm=/opt/torque/2.4.11 \
--with-openib=/usr \
--enable-static

The opal_wrapper is a binary in both cases.

To make things more confusing, on another cluster with (older)
Intel compiler 10.1.017 the Gnu+Intel build of OpenMPI 1.4.3
doesn't have this parser error.


Thank you,
Gus Correa




Re: [OMPI users] Is there an mca parameter equivalent to -bind-to-core?

2011-03-23 Thread Jeff Squyres
On Mar 23, 2011, at 4:20 PM, Gus Correa wrote:

> However, now when I do "ompi_info -a",
> the output shows the non-default value 1 twice in a row,
> then later it shows the default value 0 again!

It's because we wanted to confuse you!

;-)

Sorry about that; this is a legitimate bug.  I've fixed it on the trunk and 
submitted CMRs for both v1.4 and v1.5.  I believe that the correct value is 
actually being used, despite what your ompi_info is saying.  

(I can explain further, if you care)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/