My view is that it depends entirely on the workload, and the systems with which 
your compute needs to interact.  A few things I’ve experienced before.


  1.  Modern ethernet networks have pretty good latency these days, and so MPI 
codes can run over them.   Whether IB is worth the money is a cost/benefit 
calculation for the codes you want to run.  The ethernet network we put in at 
Sanger in 2016 or so we measured as having similar latency, in practice, as FDR 
infiniband, if I remember correctly.  So it wasn’t as good as state-of-the-art 
IB at the time, but not bad.  Certainly good enough for our purposes, and we 
gained a lot of flexibility through software-defined networking, important if 
you have workloads which require better security boundaries than just a big 
shared network.
  2.  If your workload is predominantly single node, embarrassingly parallel, 
you might do better to go with ethernet and invest the saved money in more 
compute nodes.
  3.  If you only have ethernet, your cluster will be simpler, and require less 
specialised expertise to run
  4.  If your parallel filesystem is Lustre, IB seems to be the more well-worn 
path than ethernet.  We encountered a few Lustre bugs early on because of that.
  5.  On the other hand, if you need to talk to Weka, ethernet is the well-worn 
path.  Weka’s IB implementation requires the dedication of some cores on every 
client node, so you lose some compute capacity, which you don’t need to do if 
you’re using ethernet.

So, as any lawyer would say “it depends”.  Most of my career has been in 
genomics, where IB definitely wasn’t necessary.  Now that I’m in pharma, 
there’s more MPI code, so there’s more of a case for it.

Ultimately, I think you need to run the real benchmarks with real code, and as 
Jason says, work out whether the additional complexity and cost of the IB 
network is worth it for your particular workload.  I don’t think the mantra 
“It’s HPC so it has to be Infiniband” is a given.

Tim

--
Tim Cutts
Scientific Computing Platform Lead
AstraZeneca

Find out more about R&D IT Data, Analytics & AI and how we can support you by 
visiting our Service 
Catalogue<https://azcollaboration.sharepoint.com/sites/CMU993> |


From: Jason Simms via slurm-users <slurm-users@lists.schedmd.com>
Date: Monday, 26 February 2024 at 01:13
To: Dan Healy <daniel.t.he...@gmail.com>
Cc: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Re: Question about IB and Ethernet networks
Hello Daniel,

In my experience, if you have a high-speed interconnect such as IB, you would 
do IPoIB. You would likely still have a "regular" Ethernet connection for 
management purposes, and yes that means both an IB switch and an Ethernet 
switch, but that switch doesn't have to be anything special. Any "real" traffic 
is routed over IB, everything is mounted via IB, etc. That's how the last two 
clusters I've worked with have been configured, and the next one will be the 
same (but will use Omnipath rather than IB). We likewise use BeeGFS.

These next comments are perhaps more likely to encounter differences of 
opinion, but I would say that sufficiently fast Ethernet is often "good enough" 
for most workloads (e.g., MPI). I'd wager that for all but the most demanding 
of workloads, it's entirely acceptable. You'll also save a bit of money, of 
course. HOWEVER, I do think there is, shall we say, an expectation from many 
researchers that any cluster worth its salt will have some kind of fast 
interconnect, even if at the scale of most on-prem work, you might be 
hard-pressed in real-world conditions to notice much of a difference. If you're 
running jobs that take weeks and hundreds of nodes, the time (and other) 
savings may add up, but if we're talking the difference between a job running 
on 5 nodes taking 48 hours vs. slightly less, then?? Your mileage may vary, as 
they say...

Warmest regards,
Jason

On Sun, Feb 25, 2024 at 3:13 PM Dan Healy via slurm-users 
<slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>> wrote:
Hi Fellow Slurm Users,

This question is not slurm-specific, but it might develop into that.

My question relates to understanding how typical HPCs are designed in terms of 
networking. To start, is it typical for there to be a high speed Ethernet and 
Infiniband networks (meaning separate switches, NICs)? I know you can easily 
setup IP over IB, but is IB usually fully reserved for MPI messages? I’m 
tempted to spec all new HPCs with only a high speed (200Gbps) IB network, and 
use IPoIB for all slurm comms with compute nodes. I plan on using BeeGFS for 
the file system with RDMA.

Just looking for some feedback, please. Is this OK? Is there a better way? If 
yes, please share why it’s better.

Thanks,

Daniel Healy

--
slurm-users mailing list -- 
slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>
To unsubscribe send an email to 
slurm-users-le...@lists.schedmd.com<mailto:slurm-users-le...@lists.schedmd.com>


--
Jason L. Simms, Ph.D., M.P.H.
Manager of Research Computing
Swarthmore College
Information Technology Services
(610) 328-8102
Schedule a meeting: https://calendly.com/jlsimms<https://calendly.com/jlsimms>
________________________________

AstraZeneca UK Limited is a company incorporated in England and Wales with 
registered number:03674842 and its registered office at 1 Francis Crick Avenue, 
Cambridge Biomedical Campus, Cambridge, CB2 0AA.

This e-mail and its attachments are intended for the above named recipient only 
and may contain confidential and privileged information. If they have come to 
you in error, you must not copy or show them to anyone; instead, please reply 
to this e-mail, highlighting the error to the sender and then immediately 
delete the message. For information about how AstraZeneca UK Limited and its 
affiliates may process information, personal data and monitor communications, 
please see our privacy notice at 
www.astrazeneca.com<https://www.astrazeneca.com>
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to