> Might have a bigger impact if we can improve petsc-pkg-hash
> infrastructure to avoid rebuilds in more cases. [i.e make it more
> tolerant to configure changes - but its not clear to me - which
> changes wont require rebuilds]

I think we currently save the hash of config/ directory and other information 
like the compilers and environment with the installed package, 

We should only hash the part of config/ for all the "active" config/packages 
and ignore the inactive ones. For example if the build does not use hypre don't 
include hypre.py in the hash. Doing this is not terribly difficult.

  Barry

> On Oct 11, 2020, at 1:38 PM, Satish Balay <[email protected]> wrote:
> 
> Well I don't think the download time is significant [for all the
> builds at ANL] - as compared to the build times.
> 
> For ex: most of the time - petsc-pkg-hash gets reused [and this saves
> on both downloads and builds] - such builds take about 2h. But when
> packages have to be rebuilt - it can take 2:45 to 3h [so download part
> must be pretty small]
> 
> But yeah - its wasted bandwidth - and not tolerant to network
> disruptions.
> 
> And the other issue: might help with CI on low-bandwidth locations
> [say run a CI instance at my house on a spare laptop]
> 
> But yes - this requires infrastructure. The way I look at it is - we
> need a "local mirror" or "cache" infrastructure. 
> 
> i.e keep the cache part separate from the build part [and not intertwine them]
> 
> Spack does stuff in this direction [and also has remote cache as one
> of the 100 remote sites from where the packages can downloaded can be
> down - but its not tolerant to certain changes - so I have to
> periodically clean it - to have confidence in my build].
> 
> 
> Note: If there is a git repo locally cached (and mirrored) - we don't
> have to deal with shallow clones.
> 
> Might have a bigger impact if we can improve petsc-pkg-hash
> infrastructure to avoid rebuilds in more cases. [i.e make it more
> tolerant to configure changes - but its not clear to me - which
> changes wont require rebuilds]
> 
> Satish
> 
> 
> On Sun, 11 Oct 2020, Barry Smith wrote:
> 
>> 
>>  Satish,
>> 
>>   Do you think the time to download all the external packages for each job 
>> is significant?
>> 
>>   Would using super shallow clones on the external packages help much in 
>> time? Maybe we should to them anyways to stop wasting bandwidth?
>>   Currently we do full clones? but we don't need the huge histories. 
>> 
>>   A much more elaborate way to save more time
>> 
>>     On each test machine have repositories of all the external packages
>> 
>>     For each job, 
>> 
>>         do pull in all these repositories from remote that job depends on 
>> (usually this will get nothing so take no time)
>> 
>>        For each package either
>> 
>>            - build in a unique build directory of the repository directory 
>> directly (for CMAKE and packages that support out of base directory builds)
>> 
>>            - make a local shallow clone of the local copy of the repository 
>> to externalpackages for the rest and do those builds there 
>> 
>>       The average cost of this will just some shallow local clones instead 
>> of copying over from remote machines. 
>>       The PETSc test directories can still be completely   cleaned out for 
>> each job so Satish need not worry about testing with dirty directories.
>> 
>>       This requires a bit of infrastructure, if it saves a minute it is not 
>> worth it, but if it cuts the pipeline time from 180 minutes to 150 maybe? 
>>       Probably not worth it. Could also be done just for a couple of the 
>> most external package intense jobs.
>> 
>>  Barry
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 

Reply via email to