[OMPI users] AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-08 Thread Prentice Bisbal via users
We just added about a dozen nodes to our cluster, which have AMD EPYC 
7281 processors. When a particular users jobs fall on one of these 
nodes, he gets these error messages:


--
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

  Node:  dawson205

This usually is due to not having the required NUMA support installed
on the node. In some Linux distributions, the required support is
contained in the libnumactl and libnumactl-devel packages.
This is a warning only; your job will continue, though performance may 
be degraded.

--
--
A request was made to bind to that would result in binding more
processes than cpus on a resource:

   Bind to: NONE
   Node:    dawson205
   #processes:  2
   #cpus:   1

You can override this protection by adding the "overload-allowed"
option to your binding directive.
--

The OS is CentOS 6, and numactl and numactl-devel are installed. Any 
idea what the issue is and how to fix it? Is SMT enabled when it 
shouldn't be, or something along those lines?


--
Prentice



Re: [OMPI users] AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-08 Thread Brice Goglin via users
Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit :
> We just added about a dozen nodes to our cluster, which have AMD EPYC
> 7281 processors. When a particular users jobs fall on one of these
> nodes, he gets these error messages:
>
> --
>
> WARNING: a request was made to bind a process. While the system
> supports binding the process itself, at least one node does NOT
> support binding memory to the process location.
>
>   Node:  dawson205


I wonder if the CentOS 6 kernel properly supports these recent
processors. Does lstopo show NUMA nodes as expected?

Brice




Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-08 Thread Prentice Bisbal via users



On 1/8/20 3:30 PM, Brice Goglin via users wrote:

Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit :

We just added about a dozen nodes to our cluster, which have AMD EPYC
7281 processors. When a particular users jobs fall on one of these
nodes, he gets these error messages:

--

WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

   Node:  dawson205


I wonder if the CentOS 6 kernel properly supports these recent
processors. Does lstopo show NUMA nodes as expected?

Brice

lstopo shows different numa nodes, and it appears to be correct, but I 
don't use lstopo that much, so I'm not 100%  confident that what it's 
showing is correct. I'm at about 98%.


Prentice



Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-08 Thread Brice Goglin via users
Le 08/01/2020 à 21:51, Prentice Bisbal via users a écrit :
>
> On 1/8/20 3:30 PM, Brice Goglin via users wrote:
>> Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit :
>>> We just added about a dozen nodes to our cluster, which have AMD EPYC
>>> 7281 processors. When a particular users jobs fall on one of these
>>> nodes, he gets these error messages:
>>>
>>> --
>>>
>>>
>>> WARNING: a request was made to bind a process. While the system
>>> supports binding the process itself, at least one node does NOT
>>> support binding memory to the process location.
>>>
>>>    Node:  dawson205
>>
>> I wonder if the CentOS 6 kernel properly supports these recent
>> processors. Does lstopo show NUMA nodes as expected?
>>
>> Brice
>>
> lstopo shows different numa nodes, and it appears to be correct, but I
> don't use lstopo that much, so I'm not 100%  confident that what it's
> showing is correct. I'm at about 98%.
>

Now, check memory binding in hwloc:

* Does something like "hwloc-bind node:1 -- echo foobar" fail?

* What do these lines return?

hwloc-bind --membind node:1 -- hwloc-bind --get --membind --nodeset

=> should return something like 0x0001 (bind)

hwloc-bind --membind --get --nodeset

=> should return something like 0x00ff (firsttouch)

By the way, which OMPI did you use? If you told OMPI not to use its
embedded hwloc, which hwloc do you use?

Brice




Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-08 Thread Raymond Muno via users
We are running EPYC 7451 and 7702 nodes.  I do not recall that CentOS 6 
was able to support these. We moved on to CentOS 7.6 at first and are 
now running 7.7 to support the EPYC2/Rome nodes. The kernel in earlier 
releases did not support x2APIC and could not handle 256 threads. Not 
and issue on EPYC/Naples, but it was an issue on dual 64 core EPYC2.


Redhat lists 7.4 as the minimum for EPYC(Naples) support and 7.6.6 for 
EPYC2(Rome).


-Ray Muno

On 1/8/20 2:51 PM, Prentice Bisbal via users wrote:


On 1/8/20 3:30 PM, Brice Goglin via users wrote:

Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit :

We just added about a dozen nodes to our cluster, which have AMD EPYC
7281 processors. When a particular users jobs fall on one of these
nodes, he gets these error messages:

-- 



WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

   Node:  dawson205


I wonder if the CentOS 6 kernel properly supports these recent
processors. Does lstopo show NUMA nodes as expected?

Brice

lstopo shows different numa nodes, and it appears to be correct, but I 
don't use lstopo that much, so I'm not 100%  confident that what it's 
showing is correct. I'm at about 98%.


Prentice


--
 
 Ray Muno

 IT Manager
 University of Minnesota
 Aerospace Engineering and Mechanics Mechanical Engineering



Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-08 Thread Raymond Muno via users
We are running EPYC 7451 and 7702 nodes.  I do not recall that CentOS 6 
was able to support these. We moved on to CentOS 7.6 at first and are 
now running 7.7 to support the EPYC2/Rome nodes. The kernel in earlier 
releases did not support x2APIC and could not handle 256 threads. Not 
and issue on EPYC/Naples, but it was an issue on dual 64 core EPYC2.


Redhat lists 7.4 as the minimum for EPYC(Naples) support and 7.6.6 for 
EPYC2(Rome).


-Ray Muno

On 1/8/20 2:51 PM, Prentice Bisbal via users wrote:


On 1/8/20 3:30 PM, Brice Goglin via users wrote:

Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit :

We just added about a dozen nodes to our cluster, which have AMD EPYC
7281 processors. When a particular users jobs fall on one of these
nodes, he gets these error messages:

-- 



WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

   Node:  dawson205


I wonder if the CentOS 6 kernel properly supports these recent
processors. Does lstopo show NUMA nodes as expected?

Brice

lstopo shows different numa nodes, and it appears to be correct, but I 
don't use lstopo that much, so I'm not 100%  confident that what it's 
showing is correct. I'm at about 98%.


Prentice


--
 
 Ray Muno

 IT Manager
 e-mail:   m...@aem.umn.edu
 Phone:   (612) 625-9531

  University of Minnesota
 Aerospace Engineering and Mechanics Mechanical Engineering
 110 Union St. S.E.  111 Church Street SE
 Minneapolis, MN 55455   Minneapolis, MN 55455



Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-08 Thread Raymond Muno via users
AMD, list the minimum supported kernel for EPYC/NAPLES as RHEL/Centos 
kernel 3.10-862, which is RHEL/CentOS 7.5 or later. Upgraded kernels can 
be used in 7.4.


http://developer.amd.com/wp-content/resources/56420.pdf

-Ray Muno

On 1/8/20 7:37 PM, Raymond Muno wrote:
We are running EPYC 7451 and 7702 nodes.  I do not recall that CentOS 
6 was able to support these. We moved on to CentOS 7.6 at first and 
are now running 7.7 to support the EPYC2/Rome nodes. The kernel in 
earlier releases did not support x2APIC and could not handle 256 
threads. Not and issue on EPYC/Naples, but it was an issue on dual 64 
core EPYC2.


Redhat lists 7.4 as the minimum for EPYC(Naples) support and 7.6.6 for 
EPYC2(Rome).


-Ray Muno

On 1/8/20 2:51 PM, Prentice Bisbal via users wrote:


On 1/8/20 3:30 PM, Brice Goglin via users wrote:

Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit :

We just added about a dozen nodes to our cluster, which have AMD EPYC
7281 processors. When a particular users jobs fall on one of these
nodes, he gets these error messages:

-- 



WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

   Node:  dawson205


I wonder if the CentOS 6 kernel properly supports these recent
processors. Does lstopo show NUMA nodes as expected?

Brice

lstopo shows different numa nodes, and it appears to be correct, but 
I don't use lstopo that much, so I'm not 100%  confident that what 
it's showing is correct. I'm at about 98%.


Prentice


--
 
 Ray Muno

 IT Manager
 University of Minnesota
 Aerospace Engineering and Mechanics Mechanical Engineering