First, I want to thank you all for helping out
   I am now checking what you all advice and will return with results and
   perhaps more questions
   It seems atm that pbis-open is working flawlessly.
   @Patrick, NFS is a problem to me atm because our nodes weren't buy
   with hpc cluster in mind so each server has its own raid, I think i
   will go for distrib file system and start as you suggested with beefs
   or gpfs, It seems also my email perhaps mislead you to think i have a
   working nfs , its not the case I was comparing with other faculties
   nfs impressions, perhaps they haven't tuned it up or their lan isn't
   fast enough.

   I hope I will have answer in a few weeks

   On 25/09/2017 21:14, Patrick Goetz wrote:


     We use NFS exclusively*, and file access times have not been one
     of your problems. But then maybe this is because we only have 48
     compute nodes? They are being used fairly intensively, though.

     * Users' home directories are NFS-mounted. I did set up 800GB of
     SSD for /tmp, but only because the nodes shipped with disks, so I
     decided to put them to use.


     On 09/25/2017 08:44 AM, John Hearns wrote:

       Nadav, ps. How low is your NFS performance versus local files?
       I bet of you looked at the NFS networkign parameters you would
       get a good performance boost.

       May we ask what network links the compute servers?

       On 25 September 2017 at 15:40, John Hearns <<!-- tmpl_var LEFT_BRACKET 
-->1<!-- tmpl_var RIGHT_BRACKET -->hear...@googlemail.com
       <!-- tmpl_var LEFT_BRACKET -->2<!-- tmpl_var RIGHT_BRACKET 
--><mailto:hear...@googlemail.com>>
       wrote:

       Nadav,
       I will pick up on the points regarding distributing the files.
       The defautl level is to use NFS shares. Yes, I do appreciate
       your
       points regarding performance of NFS.
       However if you have 10Gbps Ethernet then you should look at
       tuning
       the network parameters for 10Gbps.
       Also NFS over RDMA is said to work so if you have the network/
       interface cards for this this is an option.
       Also just ask your systems guy to look at the parameters for
       NFS
       anyway - large rsize, wsize settings,
       mounts with noatime and async mounts. It is really surprisign
       how
       much performance gains you get just by doing them.


       The second thing to discuss is 'data staging' - ie automatic
       transfer of files at the start of the job to a local storage
       area,
       then transfer back at the end of a job.
       The local storage are on the node could eb a partition on the
       local
       hard drive, an SSD drive or a RAMdisk area.
       I had quite an extensive thread on this topic on this list
       about six
       months ago. Surprisingly, to me, only Cray systems
       seem to be actively supported here.

       Thirdly we come onto parallel filesystems. These are quite
       mature
       now, and can be easily deployed.
       I am familiar with Panasas (proprietary hardware), Lustre,
       GPFS
       (Spectrum Scale), BeeGFS and Glustre.
       (I'll count Glustre as a parallel filesystem).
       These gain their performance by scaling out over several
       storage
       targets, and can scale hugely.
       You can start with one storage server though.

       My advice to you
       a) start by getting your existing NFS working better. Look at
       those
       network tuning parameters, offloadign on your NICs
       and the moutn options.
       Heck, ask yourself - for the depp learnign models I wan tto
       run, what is the ratio of data moving/readign times to
       computation?
       If the ratio is huge then you're OK. If the ratio is comign
       closer to 1:1 then you need sto start optimising.

       b) Look at setting up a single BeeGFS server.
       I admit to rather takign a shine to GPFS recently, and I find
       it a joy to use. However I shoudl imagine that you are wanting
       to
       accomplish this withotu licensed software?



























       On 25 September 2017 at 12:09, Diego Zuccato <<!-- tmpl_var LEFT_BRACKET 
-->3<!-- tmpl_var RIGHT_BRACKET -->diego.zucc...@unibo.it
       <!-- tmpl_var LEFT_BRACKET -->4<!-- tmpl_var RIGHT_BRACKET 
--><mailto:diego.zucc...@unibo.it>>
       wrote:


       Il 24/09/2017 12:10, Marcin Stolarek ha scritto:

       > So do I, however, I'm using sssd with AD provider joined
       into AD domain.
       > It's tricky and requires good sssd understanding, but it
       works... in
       > general.
       We are using PBIS-open to join the nodes. Quite easy to setup,
       just
       "sometimes" (randomly, but usually after many months) some
       machines lose
       the join.
       I couldn't make sssd work with our AD (I'm not an AD admin, I
       can only
       join machines, and there's no special bind-account).

       --
       Diego Zuccato
       Servizi Informatici
       Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
       V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
       tel.: +39 051 20 95786 <tel:%2B39%20051%2020%2095786>
       mail:
       <!-- tmpl_var LEFT_BRACKET -->5<!-- tmpl_var RIGHT_BRACKET 
-->diego.zucc...@unibo.it
       <!-- tmpl_var LEFT_BRACKET -->6<!-- tmpl_var RIGHT_BRACKET 
--><mailto:diego.zucc...@unibo.it>



     .

   


   <!-- tmpl_var LEFT_BRACKET -->1<!-- tmpl_var RIGHT_BRACKET --> 
mailto:hear...@googlemail.com
   <!-- tmpl_var LEFT_BRACKET -->2<!-- tmpl_var RIGHT_BRACKET --> 
mailto:hear...@googlemail.com
   <!-- tmpl_var LEFT_BRACKET -->3<!-- tmpl_var RIGHT_BRACKET --> 
mailto:diego.zucc...@unibo.it
   <!-- tmpl_var LEFT_BRACKET -->4<!-- tmpl_var RIGHT_BRACKET --> 
mailto:diego.zucc...@unibo.it
   <!-- tmpl_var LEFT_BRACKET -->5<!-- tmpl_var RIGHT_BRACKET --> 
mailto:diego.zucc...@unibo.it
   <!-- tmpl_var LEFT_BRACKET -->6<!-- tmpl_var RIGHT_BRACKET --> 
mailto:diego.zucc...@unibo.it


Reply via email to