Just to be clear - are you saying the job fails to run? Or just that it emits 
this warning (not error) and then runs to completion?

This is a warning we added at some point because jobs were hanging due to 
exhausting registered memory, and people didn't know why. If you check out the 
link, I believe we tell you how to turn off the warning if you are sure your 
system is correctly configured.


On Sep 24, 2013, at 12:20 PM, "Bryan, Clifton W ERDC-RDE-MSRC-MS Contractor" 
<clifton.w.br...@erdc.dren.mil> wrote:

> Hi,
>  
> We are having problems with OpenMPI 1.6.3 – it gives the below error message 
> when trying to run:
>  
>  
> $ mpirun -np 32 ./mpi_test.x
>  
> --------------------------------------------------------------------------
>  
> WARNING: It appears that your OpenFabrics subsystem is configured to only 
> allow registering part of your physical memory.  This can cause MPI jobs to 
> run with erratic performance, hang, and/or crash.
>  
>  
> This may be caused by your OpenFabrics vendor limiting the amount of physical 
> memory that can be registered.  You should investigate the relevant Linux 
> kernel module parameters that control how much physical memory can be 
> registered, and increase them to allow registering all physical memory on 
> your machine.
>  
>  
> See this Open MPI FAQ item for more information on these Linux kernel module
>  
> parameters:
>  
>  
>     http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages 
> <http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages>
>  
>  
>   Local host:              akutilm-0006.ors.hpc.mil
>  
>   Registerable memory:     131072 MiB
>  
>   Total memory:            258542 MiB
>  
>  
> Your MPI job will continue, but may be behave poorly and/or hang.
>  
> --------------------------------------------------------------------------
>  
> akutilm-0006.ors.hpc.mil
>  
> akutilm-0006.ors.hpc.mil
>  
> [akutilm-0006.ors.hpc.mil:10970] 31 more processes have sent help message 
> help-mpi-btl-openib.txt / reg mem limit low [akutilm-0006.ors.hpc.mil:10970] 
> Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error 
> messages
>  
>  
> Openmpi 1.4.3 works fine.
>  
>  
> Any help would be greatly appreciated.
>  
>  
> Thanks,
>  
> Clif
>  
>  
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to