Re: [OMPI users] How to set paffinity on a multi-cpu node?

Jeff Squyres Wed, 29 Nov 2006 08:49:18 -0500

There's a few issues involved here:

- Brian was pointing out that AMDs are NUMA (and Intel may well goNUMA someday -- scaling up to hundreds of cores, unless somethingquite unexpected happens in terms of computer architectures, simplydoes not scale in UMA architectures). So each core is *not* createdequal -- mainly in terms of locality to resources. If MPI allocatesresources local to core X and you end up pinning yourself to core Y,what happens if X and Y are not local to each other? You've justkilled your performance because of the latency hit to get to MPI- (orother) allocated resources.

- If you're going to use the Linux sched_setaffinity(), beware thatthis function has changed signatures multiple times over the historyof Linux (there are at least 3 versions that I'm aware of).Shameless plug: try the Portable Linux Processor Affinity (PLPA)micro-library that provides a simple, consistent interface to Linuxprocessor affinity regardless of your version of Linux kernel andglibc (http://www.open-mpi.org/software/plpa/). The library hasnothing to do with MPI and can be used in any application that wantsto use paffinity.

- There's also the issue that some clusters -- particularly thosesetup for high-core-count hosts -- may well be setup to allowmultiple MPI jobs to land on the same host. In that case, how doesthe MPI app know which core to bind itself to? If every MPI jobstarts binding itself to core 0 and counting upwards, the case wheremultiple MPI jobs land on the same host becomes a disaster.

- There's also the issue that the BIOS determines core/socket ordermapping to Linux virtual processor IDs. Linux virtual processor 0 isalways socket 0, core 0. But what is linux virtual processor 1? Isit socket 0, core 1, or socket 1, core 0? This stuff is quitecomplicated to figure out, and can have large implications(particularly in NUMA environments).




On Nov 29, 2006, at 1:08 AM, Durga Choudhury wrote:

Brian

But does it matter which core the process gets bound to? They areall identical, and as long as the task is parallelized in equalchunks (that's the key part), it should not matter. The last time Ihad to do this, the problem had to do with real-time processing ofa very large radar image. My approach was to spawn *ONE* MPIprocess per blade and 12 threads (to utilize the 12 processors).Inside the task entry point of each pthread, I calledsched_setaffinity(). Then I set the scheduling algorithm to realtime with a very high task priority to avoid preemption. It turnsout that the last two steps did not buy me much because ours was alean, embedded architecture anyway, designed to run real-timeapplications, but I definitely got a speed up from the taskdistribution.

It sure would be very nice for openMPI to have this feature; noquestions about that. All I am saying is: if a user wants it today,a reasonable workaround is available so he/she does not need to wait.


This is my $0.01's worth, since I am probably a lot less experienced.

Durga

On 11/29/06, Brian W. Barrett <bbarr...@lanl.gov> wrote: It wouldbe difficult to do well without some MPI help, in my

opinion.  You certainly could use the Linux processor affinity API
directly in the MPI application.  But how would the process know
which core to bind to?  It could wait until after MPI_INIT and call
MPI_COMM_RANK, but MPI implementations allocate many of their
resources during MPI_INIT, so there's high potential of the resources
(ie, memory) ending up associated with a different processor than the
one the process gets pinned to.  That isn't a big deal on Intel
machines, but is a major issue for AMD processors.

Just my $0.02, anyway.

Brian

On Nov 28, 2006, at 6:09 PM, Durga Choudhury wrote:

> Jeff (and everybody else)
>
> First of all, pardon me if this is a stupid comment; I am learning
> the nuts-and-bolts of parallel programming; but my comment is as
> follows:
>
> Why can't this be done *outside* openMPI, by calling Linux's
> processor affinity APIs directly? I work with a blade server kind
> of archirecture, where each blade has 12 CPUs. I use pthread within
> each blade and MPI to talk across blades. I use the Linux system
> calls to attach a thread to a specific CPU and it seems to work
> fine. The only drawback is: it makes the code unportable to a
> different OS. But even if you implemented paffinity within openMPI,
> the code will still be unportable to a different implementation of
> MPI, which, as is, it is not.
>
> Hope this helps to the original poster.
>
> Durga
>
>
> On 11/28/06, Jeff Squyres < jsquy...@cisco.com> wrote: There is not,
> right now.  However, this is mainly because back when I
> implemented the processor affinity stuff in OMPI (well over a year
> ago), no one had any opinions on exactly what interface to expose to
> the use.  :-)
>
> So right now there's only this lame control:
>
>       http://www.open-mpi.org/faq/?category=tuning#using-paffinity
>
> I am not opposed to implementing more flexible processor affinity
> controls, but the Big Discussion over the past few months is exactly
> how to expose it to the end user.  There have been several formats

> proposed (e.g., mpirun command line parameters, magic MPIattributes,

> MCA parameters, etc.), but nothing that has been "good" and "right".
> So here's the time to chime in -- anyone have any opinions on this?
>
>
>
> On Nov 25, 2006, at 9:31 AM, shap...@isp.nsc.ru wrote:
>
> > Hello,
> > i cant figure out, is there a way with open-mpi to bind all
> > threads on a given node to a specified subset of CPUs.
> > For example, on a multi-socket multi-core machine, i want to use
> > only a single core on each CPU.
> > Thank You.
> >
> > Best Regards,
> > Alexander Shaposhnikov
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Server Virtualization Business Unit
> Cisco Systems
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> Devil wanted omnipresence;
> He therefore created communists.
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
  Brian Barrett
  Open MPI Team, CCS-1
  Los Alamos National Laboratory


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Devil wanted omnipresence;
He therefore created communists.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Re: [OMPI users] How to set paffinity on a multi-cpu node?

Reply via email to