> Might have a bigger impact if we can improve petsc-pkg-hash > infrastructure to avoid rebuilds in more cases. [i.e make it more > tolerant to configure changes - but its not clear to me - which > changes wont require rebuilds]
I think we currently save the hash of config/ directory and other information like the compilers and environment with the installed package, We should only hash the part of config/ for all the "active" config/packages and ignore the inactive ones. For example if the build does not use hypre don't include hypre.py in the hash. Doing this is not terribly difficult. Barry > On Oct 11, 2020, at 1:38 PM, Satish Balay <[email protected]> wrote: > > Well I don't think the download time is significant [for all the > builds at ANL] - as compared to the build times. > > For ex: most of the time - petsc-pkg-hash gets reused [and this saves > on both downloads and builds] - such builds take about 2h. But when > packages have to be rebuilt - it can take 2:45 to 3h [so download part > must be pretty small] > > But yeah - its wasted bandwidth - and not tolerant to network > disruptions. > > And the other issue: might help with CI on low-bandwidth locations > [say run a CI instance at my house on a spare laptop] > > But yes - this requires infrastructure. The way I look at it is - we > need a "local mirror" or "cache" infrastructure. > > i.e keep the cache part separate from the build part [and not intertwine them] > > Spack does stuff in this direction [and also has remote cache as one > of the 100 remote sites from where the packages can downloaded can be > down - but its not tolerant to certain changes - so I have to > periodically clean it - to have confidence in my build]. > > > Note: If there is a git repo locally cached (and mirrored) - we don't > have to deal with shallow clones. > > Might have a bigger impact if we can improve petsc-pkg-hash > infrastructure to avoid rebuilds in more cases. [i.e make it more > tolerant to configure changes - but its not clear to me - which > changes wont require rebuilds] > > Satish > > > On Sun, 11 Oct 2020, Barry Smith wrote: > >> >> Satish, >> >> Do you think the time to download all the external packages for each job >> is significant? >> >> Would using super shallow clones on the external packages help much in >> time? Maybe we should to them anyways to stop wasting bandwidth? >> Currently we do full clones? but we don't need the huge histories. >> >> A much more elaborate way to save more time >> >> On each test machine have repositories of all the external packages >> >> For each job, >> >> do pull in all these repositories from remote that job depends on >> (usually this will get nothing so take no time) >> >> For each package either >> >> - build in a unique build directory of the repository directory >> directly (for CMAKE and packages that support out of base directory builds) >> >> - make a local shallow clone of the local copy of the repository >> to externalpackages for the rest and do those builds there >> >> The average cost of this will just some shallow local clones instead >> of copying over from remote machines. >> The PETSc test directories can still be completely cleaned out for >> each job so Satish need not worry about testing with dirty directories. >> >> This requires a bit of infrastructure, if it saves a minute it is not >> worth it, but if it cuts the pipeline time from 180 minutes to 150 maybe? >> Probably not worth it. Could also be done just for a couple of the >> most external package intense jobs. >> >> Barry >> >> >> >> >> >> >> >> >> >
