Perhaps I wasn't clear. I am not concerned about Jenkins projects being removed, just the artifacts of those projects. I am trying to make two points:
1) Old artifacts might have value as a known good build for a job that may not get run for years. So please don't run a clean up that deletes all artifacts older than N days. 2) Somebody else pointed this out as well, but there appear to be folders listed for which there is no current Jenkins job. Thanks, -Alex On 4/26/18, 11:07 AM, "Greg Stein" <gst...@gmail.com> wrote: Note that jobs will remain. This is only about deleting build artifacts. On Thu, Apr 26, 2018 at 12:40 PM, Alex Harui <aha...@adobe.com.invalid> wrote: > HI Chris, > > Thanks for the list. > > I’m going through the Flex-related jobs and have some feedback: > > flex-blazeds (maven) We’ve kept this build around even though it hasn’t > run in a while in case we need to do another release of blazeds. I would > like to keep at least one known good build in case we have trouble > resurrecting it later if we need it, even though it may sit idle for years. > > flex-flexunit_1 > flex-sdk_1 > flex-sdk_pixelbender_1 > flex-sdk_release_1 > flex-tlf_1 > flex_sdk_version I cannot find a project with these names in Jenkins. So > feel free to toss it. > > > flex-flexunit (maven) This project was never completed to build > successfully, but it would be nice to keep it around in case we need it. > > FlexJS Compiler (maven) > FlexJS Framework (maven) > FlexJS Pipeline > FlexJS Typedefs (maven) Looks like we never set the build limit, so I > just did that. The project is disabled, we are keeping it around as > archival for Royale, so not sure it will clean itself up. > > flex-productdashboard I deleted this project. > > flex-tool-api (maven) > flex-sdk-converter (maven) I’m not seeing old artifacts in these > projects, but they may also sit idle for years until some bug needs fixing. > > Flex-Site (Maven) This project never took off, but again it would be nice > to keep it around in case it gets revived > > In sum, a project like Flex may have several kinds of “products” with > varying activity levels and thus may have jobs that are idle for years and > it can be helpful to keep at least the last build around as a reference in > case the next time we run the build there is a failure. Please notify us > if we miss limiting the number of old builds. I think I fixed the ones > that didn’t have limits. But there does seem to be folders left around for > builds I think we deleted. > > Thanks, > -Alex > > > From: Chris Lambertus <c...@apache.org> > Reply-To: <builds@apache.org> > Date: Wednesday, April 25, 2018 at 12:04 AM > To: <builds@apache.org> > Subject: Re: purging of old job artifacts > > > > > On Apr 24, 2018, at 8:04 PM, Allen Wittenauer <a...@effectivemachines.com< > mailto:a...@effectivemachines.com>> wrote: > > > > On Apr 24, 2018, at 5:01 PM, Greg Stein <gst...@gmail.com<mailto:gstei > n...@gmail.com>> wrote: > > Let's go back to the start: stuff older than six months will be deleted. > What could possibly need to be retained? > > - Not every job runs every day. Some are extremely > situational. > > The artifacts do not need to be kept in perpetuity. When every project > does this, there are significant costs in both disk space and performance. > Our policy has been 30 days or 10 jobs retention. > > > > > - Some users might have specifically marked certain data > to be retained for very specific reasons. > > I know in my case I marked some logs to not be deleted > because I was using them to debug the systemic Jenkins build node crashes. > I want to keep the data to see if the usage numbers, etc, go down over time. > > > Part of the systemic problems are due to copious amounts of historical > data which are loaded into jenkins on startup, inflating the memory usage > and startup times. Again, when every job does this, it adds up, and many of > the problems we’re facing appear to be rooted in the very large number of > artifacts we have. > > > > So yes, there may be some value to some of that data that > will not be obvious to an outside observer. > > > Assume all jobs will be touched. > > … which is why giving a directory listing of just the base > directory would be useful to see who needs to look. If INFRA is unwilling > to provide that data, then keep any directories that reference: > > > Please dispense with the passive aggressive “unwilling to provide” > nonsense. This is inflammatory and anti-Infra for no valid reason. This > process is meant to be a pragmatic approach to cleaning up and improving a > service used by a large number of projects. The fact that I didn’t have > time to post the job list in the 4 hours since my last reply does not need > to be construed as reticence on Infra’s part to provide it. > > The top-level list of jobs is available here: > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpaste.apache.org%2Fr37e&data=02%7C01%7Caharui%40adobe.com%7C1099543a29ab494553cc08d5aba08f41%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636603628504178337&sdata=dZOVSreSfQIqbL%2FScGSEW5m93SmDcKeb2%2FZMbPaIUzA%3D&reserved=0 > > I am happy to provide further information, however, due to the disk IO > issues on jenkins-master and the size of the jobs/ dir, multiple scans and > data analytics are difficult to provide due to the timescale. > > > As I previously mentioned, the list of actual artifacts currently slated > for deletion is 590MB and took several hours to generate. I also misspoke > earlier, that list is for artifacts over one year old. The space which > would be freed up is over 480GB. The list of artifacts over 180 days old is > going to be much longer, but I can look into making it available somewhere. > I question the utility though, as the 1 year data is over 3 million lines. > > > > > - precommit > - hadoop > - yarn > - hdfs > - mapreduce > - hbase > - yetus > > > We will not be cherry-picking jobs to exclude from the purge unless there > is a compelling operational reason to do so. Jenkins is a shared resource, > and all projects are affected equally. > > > Let me do some further research and compare the size and file counts for > artifacts vs. build metadata (logs, etc.) > > The main things we want to purge are: > > - all artifacts and metadata where the job/project longer exists > - binary artifacts with no value older than 180 days > > and, to a lesser extent, jobs which fall outside our general 30 day/10 > jobs retention policy. > > > As an example of ancient binary artifacts, there are 22MB of javadocs from > 2013 in /x1/jenkins/jenkins-home/jobs/ManifoldCF-mvn > > Using the yetus jobs as a reference, yetus-java builds 480 and 481 are > nearly a year old, but only contain a few kilobytes of data. While removing > them saves no space, they also provide no value, but are still > loaded/parsed by jenkins. Since they don’t contain valid jenkins objects, > they don’t even show up in the build history, but are still part of the > constant scanning of the jobs/ directory that jenkins does, and contribute > to high load and disk IO. Those two are the only +180 day artifacts for > yetus with the exception of a zero-byte legacyIds file for -qbt. > > root@jenkins-master:/x1/jenkins/jenkins-home/jobs# find yetus-* -mtime > +180 -ls > 69210803 4 drwxr-xr-x 2 jenkins jenkins 4096 Jul 12 2017 > yetus-java/builds/481 > 69210815 4 -rw-r--r-- 1 jenkins jenkins 457 Jul 8 2017 > yetus-java/builds/481/polling.log > 65813999 0 lrwxrwxrwx 1 jenkins jenkins 2 May 23 2016 > yetus-java/builds/lastUnstableBuild -> -1 > 65814012 0 -rw-r--r-- 1 jenkins jenkins 0 May 23 2016 > yetus-java/builds/legacyIds > 69210796 4 drwxr-xr-x 2 jenkins jenkins 4096 Jul 12 2017 > yetus-java/builds/480 > 69210810 4 -rw-r--r-- 1 jenkins jenkins 456 Jul 7 2017 > yetus-java/builds/480/polling.log > 23725477 0 lrwxrwxrwx 1 jenkins jenkins 2 Jun 15 2017 > yetus-qbt/builds/lastStableBuild -> -1 > 23741645 0 lrwxrwxrwx 1 jenkins jenkins 2 Apr 14 2016 > yetus-qbt/builds/lastUnstableBuild -> -1 > 23725478 0 lrwxrwxrwx 1 jenkins jenkins 2 Jun 15 2017 > yetus-qbt/builds/lastSuccessfulBuild -> -1 > 23741647 0 -rw-r--r-- 1 jenkins jenkins 0 Apr 14 2016 > yetus-qbt/builds/legacyIds > > For mapreduce, there is an empty Mapreduce-Patch-vesta.apache.org< > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2FMapreduce-Patch-vesta.apache.org&data=02%7C01%7Caharui%40adobe.com%7C1099543a29ab494553cc08d5aba08f41%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636603628504178337&sdata=KxgEx5GbH1omcfeO7sw0to9qBk7eJ%2BY3zhuM4k%2FlyUA%3D&reserved=0> from 2010, and a bunch of jobs > from June 2017 for PreCommit-MAPREDUCE-Build (6999-7006.) Again, while they > take up very little space, they are still loaded into jenkins and scanned > by the threads which watch the jobs/ dir for changes. Multiply this times > 2381 top level job configs, and you can see why we’re hoping this type of > purge will help improve jenkins performance and the frequent crashing. > > > Since we are looking to move to expensive NVMe disks (nearly 4TB worth) we > also need to perform due diligence to insure that we are not migrating and > maintaining ancient data. > > -Chris > > > >