Gilbert Grosdidier wrote:
Bonsoir Eugene,Bon matin chez moi. Here follows some output for a 1024 core run.Assuming this corresponds meaningfully with your original e-mail, 1024 cores means performance of 700 vs 900. So, that looks roughly consistent with the 28% MPI time you show here. That seems to imply that the slowdown is due entirely to long MPI times (rather than slow non-MPI times). Just a sanity check. Unfortunately, I'm yet unable to have the equivalent MPT chart.That may be all right. If one run clearly shows a problem (which is perhaps the case here), then a "good profile" is not needed. Here, a "good profile" would perhaps be used only to confirm that near-zero MPI time is possible. #IPMv0.983####################################################################No glaring evidence here of load imbalance being the sole explanation, but hard to tell from these numbers. (If min comm time is 0%, then that process is presumably holding everyone else up.) # [time] [calls] <%mpi> <%wall>If at 1024 cores, performance is 700 compared to 900, then whatever the problem is still hasn't dominated the entire application performance. So, it looks like MPI_Waitall is the problem, even if it doesn't dominate overall application time. Looks like on average each MPI_Waitall call is completing 8+ MPI_Isend calls and 8+ MPI_Irecv calls. I think IPM gives some point-to-point messaging information. Maybe you can tell what the distribution is of message sizes, etc. Or, maybe you already know the characteristic pattern. Does a stand-alone message-passing test (without the computational portion) capture the performance problem you're looking for? Le 22/12/2010 18:50, Eugene Loh a écrit :Can you isolate a bit more where the time is being spent? The performance effect you're describing appears to be drastic. Have you profiled the code? Some choices of tools can be found in the FAQ http://www.open-mpi.org/faq/?category=perftools The results may be "uninteresting" (all time spent in your MPI_Waitall calls, for example), but it'd be good to rule out other possibilities (e.g., I've seen cases where it's the non-MPI time that's the culprit). |
- [OMPI users] Running OpenMPI on SGI Altix with 4096 cor... Gilbert Grosdidier
- Re: [OMPI users] Running OpenMPI on SGI Altix with... George Bosilca
- Re: [OMPI users] Running OpenMPI on SGI Altix ... Gilbert Grosdidier
- Re: [OMPI users] Running OpenMPI on SGI Al... George Bosilca
- Re: [OMPI users] Running OpenMPI on SG... Gilbert Grosdidier
- Re: [OMPI users] Running OpenMPI ... Matthieu Brucher
- Re: [OMPI users] Running Open... Gilbert Grosdidier
- Re: [OMPI users] Running ... Eugene Loh
- Re: [OMPI users] Running ... Gilbert Grosdidier
- Re: [OMPI users] Running ... Eugene Loh
- Re: [OMPI users] Running OpenMPI on SGI Altix ... David Singleton
- Re: [OMPI users] Running OpenMPI on SGI Al... Gilbert Grosdidier
- Re: [OMPI users] Running OpenMPI on SG... Jeff Squyres
- Re: [OMPI users] Running OpenMPI ... Gilbert Grosdidier
- Re: [OMPI users] Running Open... Jeff Squyres
- Re: [OMPI users] Running Open... John Hearns
- Re: [OMPI users] Running ... Jeff Squyres
- Re: [OMPI users] Running ... Tim Prince
- Re: [OMPI users] Running ... Gilbert Grosdidier
- Re: [OMPI users] Running OpenMPI ... Gilbert Grosdidier