[OMPI users] AMD EPYC 7281: does NOT, support binding memory to the process location
We just added about a dozen nodes to our cluster, which have AMD EPYC 7281 processors. When a particular users jobs fall on one of these nodes, he gets these error messages: -- WARNING: a request was made to bind a process. While the system supports binding the process itself, at least one node does NOT support binding memory to the process location. Node: dawson205 This usually is due to not having the required NUMA support installed on the node. In some Linux distributions, the required support is contained in the libnumactl and libnumactl-devel packages. This is a warning only; your job will continue, though performance may be degraded. -- -- A request was made to bind to that would result in binding more processes than cpus on a resource: Bind to: NONE Node: dawson205 #processes: 2 #cpus: 1 You can override this protection by adding the "overload-allowed" option to your binding directive. -- The OS is CentOS 6, and numactl and numactl-devel are installed. Any idea what the issue is and how to fix it? Is SMT enabled when it shouldn't be, or something along those lines? -- Prentice
Re: [OMPI users] AMD EPYC 7281: does NOT, support binding memory to the process location
Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit : > We just added about a dozen nodes to our cluster, which have AMD EPYC > 7281 processors. When a particular users jobs fall on one of these > nodes, he gets these error messages: > > -- > > WARNING: a request was made to bind a process. While the system > supports binding the process itself, at least one node does NOT > support binding memory to the process location. > > Node: dawson205 I wonder if the CentOS 6 kernel properly supports these recent processors. Does lstopo show NUMA nodes as expected? Brice
Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location
On 1/8/20 3:30 PM, Brice Goglin via users wrote: Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit : We just added about a dozen nodes to our cluster, which have AMD EPYC 7281 processors. When a particular users jobs fall on one of these nodes, he gets these error messages: -- WARNING: a request was made to bind a process. While the system supports binding the process itself, at least one node does NOT support binding memory to the process location. Node: dawson205 I wonder if the CentOS 6 kernel properly supports these recent processors. Does lstopo show NUMA nodes as expected? Brice lstopo shows different numa nodes, and it appears to be correct, but I don't use lstopo that much, so I'm not 100% confident that what it's showing is correct. I'm at about 98%. Prentice
Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location
Le 08/01/2020 à 21:51, Prentice Bisbal via users a écrit : > > On 1/8/20 3:30 PM, Brice Goglin via users wrote: >> Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit : >>> We just added about a dozen nodes to our cluster, which have AMD EPYC >>> 7281 processors. When a particular users jobs fall on one of these >>> nodes, he gets these error messages: >>> >>> -- >>> >>> >>> WARNING: a request was made to bind a process. While the system >>> supports binding the process itself, at least one node does NOT >>> support binding memory to the process location. >>> >>> Node: dawson205 >> >> I wonder if the CentOS 6 kernel properly supports these recent >> processors. Does lstopo show NUMA nodes as expected? >> >> Brice >> > lstopo shows different numa nodes, and it appears to be correct, but I > don't use lstopo that much, so I'm not 100% confident that what it's > showing is correct. I'm at about 98%. > Now, check memory binding in hwloc: * Does something like "hwloc-bind node:1 -- echo foobar" fail? * What do these lines return? hwloc-bind --membind node:1 -- hwloc-bind --get --membind --nodeset => should return something like 0x0001 (bind) hwloc-bind --membind --get --nodeset => should return something like 0x00ff (firsttouch) By the way, which OMPI did you use? If you told OMPI not to use its embedded hwloc, which hwloc do you use? Brice
Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location
We are running EPYC 7451 and 7702 nodes. I do not recall that CentOS 6 was able to support these. We moved on to CentOS 7.6 at first and are now running 7.7 to support the EPYC2/Rome nodes. The kernel in earlier releases did not support x2APIC and could not handle 256 threads. Not and issue on EPYC/Naples, but it was an issue on dual 64 core EPYC2. Redhat lists 7.4 as the minimum for EPYC(Naples) support and 7.6.6 for EPYC2(Rome). -Ray Muno On 1/8/20 2:51 PM, Prentice Bisbal via users wrote: On 1/8/20 3:30 PM, Brice Goglin via users wrote: Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit : We just added about a dozen nodes to our cluster, which have AMD EPYC 7281 processors. When a particular users jobs fall on one of these nodes, he gets these error messages: -- WARNING: a request was made to bind a process. While the system supports binding the process itself, at least one node does NOT support binding memory to the process location. Node: dawson205 I wonder if the CentOS 6 kernel properly supports these recent processors. Does lstopo show NUMA nodes as expected? Brice lstopo shows different numa nodes, and it appears to be correct, but I don't use lstopo that much, so I'm not 100% confident that what it's showing is correct. I'm at about 98%. Prentice -- Ray Muno IT Manager University of Minnesota Aerospace Engineering and Mechanics Mechanical Engineering
Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location
We are running EPYC 7451 and 7702 nodes. I do not recall that CentOS 6 was able to support these. We moved on to CentOS 7.6 at first and are now running 7.7 to support the EPYC2/Rome nodes. The kernel in earlier releases did not support x2APIC and could not handle 256 threads. Not and issue on EPYC/Naples, but it was an issue on dual 64 core EPYC2. Redhat lists 7.4 as the minimum for EPYC(Naples) support and 7.6.6 for EPYC2(Rome). -Ray Muno On 1/8/20 2:51 PM, Prentice Bisbal via users wrote: On 1/8/20 3:30 PM, Brice Goglin via users wrote: Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit : We just added about a dozen nodes to our cluster, which have AMD EPYC 7281 processors. When a particular users jobs fall on one of these nodes, he gets these error messages: -- WARNING: a request was made to bind a process. While the system supports binding the process itself, at least one node does NOT support binding memory to the process location. Node: dawson205 I wonder if the CentOS 6 kernel properly supports these recent processors. Does lstopo show NUMA nodes as expected? Brice lstopo shows different numa nodes, and it appears to be correct, but I don't use lstopo that much, so I'm not 100% confident that what it's showing is correct. I'm at about 98%. Prentice -- Ray Muno IT Manager e-mail: m...@aem.umn.edu Phone: (612) 625-9531 University of Minnesota Aerospace Engineering and Mechanics Mechanical Engineering 110 Union St. S.E. 111 Church Street SE Minneapolis, MN 55455 Minneapolis, MN 55455
Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location
AMD, list the minimum supported kernel for EPYC/NAPLES as RHEL/Centos kernel 3.10-862, which is RHEL/CentOS 7.5 or later. Upgraded kernels can be used in 7.4. http://developer.amd.com/wp-content/resources/56420.pdf -Ray Muno On 1/8/20 7:37 PM, Raymond Muno wrote: We are running EPYC 7451 and 7702 nodes. I do not recall that CentOS 6 was able to support these. We moved on to CentOS 7.6 at first and are now running 7.7 to support the EPYC2/Rome nodes. The kernel in earlier releases did not support x2APIC and could not handle 256 threads. Not and issue on EPYC/Naples, but it was an issue on dual 64 core EPYC2. Redhat lists 7.4 as the minimum for EPYC(Naples) support and 7.6.6 for EPYC2(Rome). -Ray Muno On 1/8/20 2:51 PM, Prentice Bisbal via users wrote: On 1/8/20 3:30 PM, Brice Goglin via users wrote: Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit : We just added about a dozen nodes to our cluster, which have AMD EPYC 7281 processors. When a particular users jobs fall on one of these nodes, he gets these error messages: -- WARNING: a request was made to bind a process. While the system supports binding the process itself, at least one node does NOT support binding memory to the process location. Node: dawson205 I wonder if the CentOS 6 kernel properly supports these recent processors. Does lstopo show NUMA nodes as expected? Brice lstopo shows different numa nodes, and it appears to be correct, but I don't use lstopo that much, so I'm not 100% confident that what it's showing is correct. I'm at about 98%. Prentice -- Ray Muno IT Manager University of Minnesota Aerospace Engineering and Mechanics Mechanical Engineering