> On Apr 24, 2018, at 8:04 PM, Allen Wittenauer <a...@effectivemachines.com> 
> wrote:
> 
> 
>> On Apr 24, 2018, at 5:01 PM, Greg Stein <gst...@gmail.com> wrote:
>> 
>> Let's go back to the start: stuff older than six months will be deleted.
>> What could possibly need to be retained?
> 
>       - Not every job runs every day.  Some are extremely situational.

The artifacts do not need to be kept in perpetuity. When every project does 
this, there are significant costs in both disk space and performance. Our 
policy has been 30 days or 10 jobs retention.



>       - Some users might have specifically marked certain data to be retained 
> for very specific reasons.
> 
>       I know in my case I marked some logs to not be deleted because I was 
> using them to debug the systemic Jenkins build node crashes. I want to keep 
> the data to see if the usage numbers, etc, go down over time.


Part of the systemic problems are due to copious amounts of historical data 
which are loaded into jenkins on startup, inflating the memory usage and 
startup times. Again, when every job does this, it adds up, and many of the 
problems we’re facing appear to be rooted in the very large number of artifacts 
we have.


> 
>       So yes, there may be some value to some of that data that will not be 
> obvious to an outside observer.
> 
>> Assume all jobs will be touched.
> 
>       … which is why giving a directory listing of just the base directory 
> would be useful to see who needs to look. If INFRA is unwilling to provide 
> that data, then keep any directories that reference:


Please dispense with the passive aggressive “unwilling to provide” nonsense. 
This is inflammatory and anti-Infra for no valid reason. This process is meant 
to be a pragmatic approach to cleaning up and improving a service used by a 
large number of projects. The fact that I didn’t have time to post the job list 
in the 4 hours since my last reply does not need to be construed as reticence 
on Infra’s part to provide it.

The top-level list of jobs is available here: https://paste.apache.org/r37e 
<https://paste.apache.org/r37e>

I am happy to provide further information, however, due to the disk IO issues 
on jenkins-master and the size of the jobs/ dir, multiple scans and data 
analytics are difficult to provide due to the timescale.


As I previously mentioned, the list of actual artifacts currently slated for 
deletion is 590MB and took several hours to generate. I also misspoke earlier, 
that list is for artifacts over one year old. The space which would be freed up 
is over 480GB. The list of artifacts over 180 days old is going to be much 
longer, but I can look into making it available somewhere. I question the 
utility though, as the 1 year data is over 3 million lines.


> 
>       - precommit
>       - hadoop
>       - yarn
>       - hdfs
>       - mapreduce
>       - hbase
>       - yetus



We will not be cherry-picking jobs to exclude from the purge unless there is a 
compelling operational reason to do so. Jenkins is a shared resource, and all 
projects are affected equally.


Let me do some further research and compare the size and file counts for 
artifacts vs. build metadata (logs, etc.)

The main things we want to purge are:

- all artifacts and metadata where the job/project longer exists
- binary artifacts with no value older than 180 days

and, to a lesser extent, jobs which fall outside our general 30 day/10 jobs 
retention policy.


As an example of ancient binary artifacts, there are 22MB of javadocs from 2013 
in /x1/jenkins/jenkins-home/jobs/ManifoldCF-mvn

Using the yetus jobs as a reference, yetus-java builds 480 and 481 are nearly a 
year old, but only contain a few kilobytes of data. While removing them saves 
no space, they also provide no value, but are still loaded/parsed by jenkins. 
Since they don’t contain valid jenkins objects, they don’t even show up in the 
build history, but are still part of the constant scanning of the jobs/ 
directory that jenkins does, and contribute to high load and disk IO. Those two 
are the only +180 day artifacts for yetus with the exception of a zero-byte 
legacyIds file for -qbt.

root@jenkins-master:/x1/jenkins/jenkins-home/jobs# find yetus-* -mtime +180 -ls
 69210803      4 drwxr-xr-x   2 jenkins  jenkins      4096 Jul 12  2017 
yetus-java/builds/481
 69210815      4 -rw-r--r--   1 jenkins  jenkins       457 Jul  8  2017 
yetus-java/builds/481/polling.log
 65813999      0 lrwxrwxrwx   1 jenkins  jenkins         2 May 23  2016 
yetus-java/builds/lastUnstableBuild -> -1
 65814012      0 -rw-r--r--   1 jenkins  jenkins         0 May 23  2016 
yetus-java/builds/legacyIds
 69210796      4 drwxr-xr-x   2 jenkins  jenkins      4096 Jul 12  2017 
yetus-java/builds/480
 69210810      4 -rw-r--r--   1 jenkins  jenkins       456 Jul  7  2017 
yetus-java/builds/480/polling.log
 23725477      0 lrwxrwxrwx   1 jenkins  jenkins         2 Jun 15  2017 
yetus-qbt/builds/lastStableBuild -> -1
 23741645      0 lrwxrwxrwx   1 jenkins  jenkins         2 Apr 14  2016 
yetus-qbt/builds/lastUnstableBuild -> -1
 23725478      0 lrwxrwxrwx   1 jenkins  jenkins         2 Jun 15  2017 
yetus-qbt/builds/lastSuccessfulBuild -> -1
 23741647      0 -rw-r--r--   1 jenkins  jenkins         0 Apr 14  2016 
yetus-qbt/builds/legacyIds

For mapreduce, there is an empty Mapreduce-Patch-vesta.apache.org 
<http://mapreduce-patch-vesta.apache.org/> from 2010, and a bunch of jobs from 
June 2017 for PreCommit-MAPREDUCE-Build (6999-7006.) Again, while they take up 
very little space, they are still loaded into jenkins and scanned by the 
threads which watch the jobs/ dir for changes. Multiply this times 2381 top 
level job configs, and you can see why we’re hoping this type of purge will 
help improve jenkins performance and the frequent crashing.


Since we are looking to move to expensive NVMe disks (nearly 4TB worth) we also 
need to perform due diligence to insure that we are not migrating and 
maintaining ancient data.

-Chris



Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to