[OMPI users] OpenMPI 1.6.3 problem

2013-09-24 Thread Bryan, Clifton W ERDC-RDE-MSRC-MS Contractor
Hi,



We are having problems with OpenMPI 1.6.3 - it gives the below error message 
when trying to run:





$ mpirun -np 32 ./mpi_test.x



--



WARNING: It appears that your OpenFabrics subsystem is configured to only allow 
registering part of your physical memory.  This can cause MPI jobs to run with 
erratic performance, hang, and/or crash.





This may be caused by your OpenFabrics vendor limiting the amount of physical 
memory that can be registered.  You should investigate the relevant Linux 
kernel module parameters that control how much physical memory can be 
registered, and increase them to allow registering all physical memory on your 
machine.





See this Open MPI FAQ item for more information on these Linux kernel module



parameters:





http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages 






  Local host:  akutilm-0006.ors.hpc.mil



  Registerable memory: 131072 MiB



  Total memory:258542 MiB





Your MPI job will continue, but may be behave poorly and/or hang.



--



akutilm-0006.ors.hpc.mil



akutilm-0006.ors.hpc.mil



[akutilm-0006.ors.hpc.mil:10970] 31 more processes have sent help message 
help-mpi-btl-openib.txt / reg mem limit low [akutilm-0006.ors.hpc.mil:10970] 
Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error 
messages





Openmpi 1.4.3 works fine.





Any help would be greatly appreciated.





Thanks,



Clif





Re: [OMPI users] OpenMPI 1.6.3 problem

2013-09-24 Thread Ralph Castain
Just to be clear - are you saying the job fails to run? Or just that it emits 
this warning (not error) and then runs to completion?

This is a warning we added at some point because jobs were hanging due to 
exhausting registered memory, and people didn't know why. If you check out the 
link, I believe we tell you how to turn off the warning if you are sure your 
system is correctly configured.


On Sep 24, 2013, at 12:20 PM, "Bryan, Clifton W ERDC-RDE-MSRC-MS Contractor" 
 wrote:

> Hi,
>  
> We are having problems with OpenMPI 1.6.3 – it gives the below error message 
> when trying to run:
>  
>  
> $ mpirun -np 32 ./mpi_test.x
>  
> --
>  
> WARNING: It appears that your OpenFabrics subsystem is configured to only 
> allow registering part of your physical memory.  This can cause MPI jobs to 
> run with erratic performance, hang, and/or crash.
>  
>  
> This may be caused by your OpenFabrics vendor limiting the amount of physical 
> memory that can be registered.  You should investigate the relevant Linux 
> kernel module parameters that control how much physical memory can be 
> registered, and increase them to allow registering all physical memory on 
> your machine.
>  
>  
> See this Open MPI FAQ item for more information on these Linux kernel module
>  
> parameters:
>  
>  
> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages 
> 
>  
>  
>   Local host:  akutilm-0006.ors.hpc.mil
>  
>   Registerable memory: 131072 MiB
>  
>   Total memory:258542 MiB
>  
>  
> Your MPI job will continue, but may be behave poorly and/or hang.
>  
> --
>  
> akutilm-0006.ors.hpc.mil
>  
> akutilm-0006.ors.hpc.mil
>  
> [akutilm-0006.ors.hpc.mil:10970] 31 more processes have sent help message 
> help-mpi-btl-openib.txt / reg mem limit low [akutilm-0006.ors.hpc.mil:10970] 
> Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error 
> messages
>  
>  
> Openmpi 1.4.3 works fine.
>  
>  
> Any help would be greatly appreciated.
>  
>  
> Thanks,
>  
> Clif
>  
>  
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] OpenMPI 1.6.3 problem

2013-09-24 Thread Jeff Squyres (jsquyres)
Have you visited the URL that is cited?  :-)

It talks all about the issue, and describes how to fix it.  Let us know if 
there's something unclear in that FAQ text.


On Sep 24, 2013, at 3:20 PM, "Bryan, Clifton W ERDC-RDE-MSRC-MS Contractor" 
 wrote:

> Hi,
>  
> We are having problems with OpenMPI 1.6.3 – it gives the below error message 
> when trying to run:
>  
>  
> $ mpirun -np 32 ./mpi_test.x
>  
> --
>  
> WARNING: It appears that your OpenFabrics subsystem is configured to only 
> allow registering part of your physical memory.  This can cause MPI jobs to 
> run with erratic performance, hang, and/or crash.
>  
>  
> This may be caused by your OpenFabrics vendor limiting the amount of physical 
> memory that can be registered.  You should investigate the relevant Linux 
> kernel module parameters that control how much physical memory can be 
> registered, and increase them to allow registering all physical memory on 
> your machine.
>  
>  
> See this Open MPI FAQ item for more information on these Linux kernel module
>  
> parameters:
>  
>  
> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages 
> 
>  
>  
>   Local host:  akutilm-0006.ors.hpc.mil
>  
>   Registerable memory: 131072 MiB
>  
>   Total memory:258542 MiB
>  
>  
> Your MPI job will continue, but may be behave poorly and/or hang.
>  
> --
>  
> akutilm-0006.ors.hpc.mil
>  
> akutilm-0006.ors.hpc.mil
>  
> [akutilm-0006.ors.hpc.mil:10970] 31 more processes have sent help message 
> help-mpi-btl-openib.txt / reg mem limit low [akutilm-0006.ors.hpc.mil:10970] 
> Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error 
> messages
>  
>  
> Openmpi 1.4.3 works fine.
>  
>  
> Any help would be greatly appreciated.
>  
>  
> Thanks,
>  
> Clif
>  
>  
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/