Nifty Tom Mitchell wrote:
I'm unclear what you're asking about. Are you asking that a BTL would limit the performance delivered to the application? E.g., the interconnect is capable of 1 Gbyte/sec, but you only deliver 100 Mbyte/sec (or whatever the user selects) to the app so the user can see whether bandwidth is a sensitive parameter for the app?On Thu, Jun 25, 2009 at 08:37:21PM -0400, Jeff Squyres wrote:Subject: Re: [OMPI users] 50%performance reduction due to OpenMPI v 1.3.2forcing allMPI traffic over Ethernet instead of using Infiniband<musing>While the previous thread on "performance reduction" went left, right, forward and beyond the initial topic it tickled an idea for application profiling or characterizing. What if the various transports (btl) had knobs that permitted stepwise insertion of bandwidth limits and latency limits etc. so the application might be characterized better? If so, I have a few thoughts. 1) The actual limitations of an MPI implementation may hard to model. E.g., the amount of handshaking between processes, synchronization delays, etc. 2) For the most part, you could (actually even should) try doing this stuff much higher up than the BTLs. E.g., how about developing a PMPI layer that does what you're talking about. 3) I think folks have tried this sort of thing in the past by instrumenting the code and then "playing it back" or "simulating" with other performance parameters. E.g., "I run for X cycles, then I send a N-byte message, then compute another Y cycles, then post a receive, then ..." and then turn the knobs for latency, bandwidth, etc., to see at what point any of these become sensitive parameters. You might see: gosh, as long as latency is lower than about 30-70 usec, it really isn't important. Or, whatever. Off hand, I think different people have tried this approach and (without bothering to check my notes to see if my memory is any good) I think Dimemmas (associated with Paraver and CEPBA Barcelona) was one such tool. Most micro benchmarks are designed to measure various hardware characteristics but it is moderately hard to know what an application depends on. The value of this is that: *the application authors might learn something about their code that is hard to know at a well abstracted API level. *the purchasing decision maker would have the ability to access a well instrumented cluster and build a weighted value equation to help structure the decision. *the hardware vendor can learn what is valuable when deciding what feature and function needs the most attention/ transistors. i.e. it might be as valuable to benchmark "your code" on a single well instrumented platform as it might be to benchmark all the hardware you can get "yer hands on". </musing> |
- [OMPI users] Profiling performance by forcing transport... Nifty Tom Mitchell
- Re: [OMPI users] Profiling performance by forcing ... Eugene Loh