Hi Peter,

We have HP ProCurve 2848 GigE switches here (48 port). The problem is more
severe the more nodes (=ports) are involved. It starts to show up at 16
ports for a limited range of message sizes and gets really bad for 32
nodes. The switch has a 96 Gbit/s backplane and should therefore be
able to forward the in and out traffic of all 48 ports simultaneously, as
long as not two nodes send to one receiver. The ordered communication pattern
takes care about the latter (e.g. having only pairs communicate at the
same time). Maybe the switch gets problems when switching from one pair to
another? I will try if I can get another switch for testing.

Thanks!
  Carsten



On Wed, 4 Jan 2006, Peter [iso-8859-1] Kjellstr?m wrote:

> Hello Carsten,
>
> Have you considered the possibility that this is the effect of a non-optimal
> ethernet switch? I don't know how many nodes you need to reproduce it on or
> if you even have physical access (and opportunity) but popping in another
> decent 16-port switch for a testrun might be interesting.
>
> just my .02 euros,
>  Peter
>
> On Tuesday 03 January 2006 18:45, Carsten Kutzner wrote:
> > On Tue, 3 Jan 2006, Graham E Fagg wrote:
> > > Do you have any tools such as Vampir (or its Intel equivalent) available
> > > to get a time line graph ? (even jumpshot of one of the bad cases such as
> > > the 128/32 for 256 floats below would help).
> >
> > Hi Graham,
> >
> > I have attached an slog file of an all-to-all run for 1024 floats (ompi
> > tuned alltoall). I could not get clog files for >32 processes - is this
> > perhaps a limitation of MPE? So I decided to take the case 32 CPUs on
> > 32 nodes which is performance-critical as well. From the run output you
> > can see that 2 of the 5 tries yield a fast execution while the others
> > are slow (see below).
> >
> > Carsten
> >
> >
> >
> > ckutzne@node001:~/mpe> mpirun -hostfile ./bhost1 -np 32 ./phas_mpe.x
> > Alltoall Test on 32 CPUs. 5 repetitions.
> > --- New category (first test not counted) ---
> > MPI: sending    1024 floats (    4096 bytes) to 32 processes (      1
> > times) took ...    0.00690 seconds
> > ---------------------------------------------
> > MPI: sending    1024 floats (    4096 bytes) to 32 processes (      1
> > times) took ...    0.00320 seconds MPI: sending    1024 floats (    4096
> > bytes) to 32 processes (      1 times) took ...    0.26392 seconds ! MPI:
> > sending    1024 floats (    4096 bytes) to 32 processes (      1 times)
> > took ...    0.26868 seconds ! MPI: sending    1024 floats (    4096 bytes)
> > to 32 processes (      1 times) took ...    0.26398 seconds ! MPI: sending
> >   1024 floats (    4096 bytes) to 32 processes (      1 times) took ...
> > 0.00339 seconds Summary (5-run average, timer resolution 0.000001):
> >       1024 floats took 0.160632 (0.143644) seconds. Min: 0.003200  max:
> > 0.268681 Writing logfile....
> > Finished writing logfile.
>
> --
> ------------------------------------------------------------
>   Peter Kjellstr?m               |
>   National Supercomputer Centre  |
>   Sweden                         | http://www.nsc.liu.se
>


---------------------------------------------------
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics Department
Am Fassberg 11
37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
eMail ckut...@gwdg.de
http://www.gwdg.de/~ckutzne


Reply via email to