> On Jan 28, 2020, at 8:22 PM, Allen Wittenauer
> wrote:
[snip]
> [1] - The best on-prem solution I came up with (before I moved my $DAYJOB
> stuff to cloud) was to run each executor in a VM on the box. That VM would
> also have a regularly scheduled job that would cause it to wipe itsel
> On Jan 28, 2020, at 8:02 PM, Chris Lambertus wrote:
>
>
> Allen, can you elaborate on what a “proper” implementation is? As far as I
> know, this is baked into jenkins. We could raise process limits for the
> jenkins user, but these situations only tend to arise when a build has gone
>
> On Jan 27, 2020, at 10:52 PM, Allen Wittenauer
> wrote:
>
>
>
>> On Jan 27, 2020, at 6:37 PM, Andriy Redko wrote:
>>
>> Thanks a lot for looking into it. From the CXF perspective, I have seen that
>> many CXF builds have been aborted
>> because of the connection with master is lost (do
FYI, Apache Kafka builds take 3 to 4 hours to run currently (build, unit
and integration tests).
Thanks,
Ismael
On Wed, Jan 22, 2020 at 4:55 PM Chris Lambertus wrote:
> Folks,
>
> Over the last week or so we have received many reports of broken builds
> due to nodes out of resources. As noted i
> On Jan 27, 2020, at 10:52 PM, Allen Wittenauer
> wrote:
>
> This is almost always because whatever is running on the two executors
> have suffocated the system resources.
... and before I forget, a reminder: Java threads take up a file descriptor.
Hadoop's unit tests were firing u
> On Jan 27, 2020, at 6:37 PM, Andriy Redko wrote:
>
> Thanks a lot for looking into it. From the CXF perspective, I have seen that
> many CXF builds have been aborted
> because of the connection with master is lost (don't have exact builds to
> point since we keep only last 3),
> that could
Hi Chris,
Thanks a lot for looking into it. From the CXF perspective, I have seen that
many CXF builds have been aborted
because of the connection with master is lost (don't have exact builds to point
since we keep only last 3),
that could probably explain the hanging builds.
Best Regards,
Hello,
I checked PR [1] and found many build failures on H41 during 'git'
command execution [2]. The last failure was OOM [3]. Finally build was
successful using H29 [4].
1. https://github.com/apache/cxf/pull/631
2. https://builds.apache.org/job/CXF-Trunk-PR/1389/console
3. https://builds.apache.
Another incident of CXF build junk sticking around on H40, although in that
case, the machine appears to be hosed because of broken container jobs from
2019, with over 1100(!) processes identical to:
containerd-shim -namespace moby -workdir
/var/lib/containerd/io.containerd.runtime.v1.linux/mob
Here is some data from H24, which also contains many broken CXF jobs (not
Karaf) from Jan 22. The builds on H41 use karaf artifacts, but they were CXF
builds, not karaf builds as previously noted. Copying dev@CXF since this build
seems to be related to ongoing node problems.
Additionally, ther
Hi,
On Sun, Jan 26, 2020 at 8:48 AM Mike Jumper wrote:
> It would be nice if Jenkins could be configured to recognize when a node is
> unusable due to lack of resources and automatically take it offline.
>
That would be a feature request with the people that write the Jenkins
software.
Gav...
It would be nice if Jenkins could be configured to recognize when a node is
unusable due to lack of resources and automatically take it offline.
- Mike
On Fri, Jan 24, 2020, 13:57 Chris Thistlethwaite wrote:
> Here is some data from H41, which was rebooted last night and ran out of
> threads to
Here is some data from H41, which was rebooted last night and ran out of
threads today. https://paste.apache.org/lkmpq
In this case it looks like Karaf was still stuck/broken even though
there were no builds running on H41 at the time I investigated.
-Chris T.
#asfinfra
On 1/24/20 4:26 AM,
> Is there some way we can improve the visibility into disk usage on the
> build nodes? How full they are? And what projects are taking up space?
> Does jenkins provide this info? Or could infra dump a `du …` report
> somewhere?
There are two Jenkins plugins that help with this situation
On 2020-01-23 4:50, Chesnay Schepler wrote:
On 23/01/2020 10:19, Thomas Bouron wrote:
On Thu, 23 Jan 2020 at 08:56, Robert Munteanu wrote:
On Wed, 2020-01-22 at 17:53 -0800, Chris Lambertus wrote:
Additionally, orphaned docker jobs are causing major resource
contention. I will be adding a we
Hi Chris,
On Thu, Jan 23, 2020 at 1:55 AM Chris Lambertus wrote:
> I will be implementing a system to kill jenkins processes based on duration
> of run. My initial feeling is to kill any single process which has been
> running for longer than one hour real-time.
Can you provide some details on
On 23/01/2020 10:19, Thomas Bouron wrote:
On Thu, 23 Jan 2020 at 08:56, Robert Munteanu wrote:
On Wed, 2020-01-22 at 17:53 -0800, Chris Lambertus wrote:
Additionally, orphaned docker jobs are causing major resource
contention. I will be adding a weekly job to docker system prune —all
&& servi
On Thu, 23 Jan 2020 at 08:56, Robert Munteanu wrote:
> On Wed, 2020-01-22 at 17:53 -0800, Chris Lambertus wrote:
> > Additionally, orphaned docker jobs are causing major resource
> > contention. I will be adding a weekly job to docker system prune —all
> > && service docker restart.
>
> +1, it's
On Wed, 2020-01-22 at 17:53 -0800, Chris Lambertus wrote:
> Additionally, orphaned docker jobs are causing major resource
> contention. I will be adding a weekly job to docker system prune —all
> && service docker restart.
+1, it's easy to get this wrong. It would be great if you could also
docume
The Cassandra dtest builds take ~12 hours. The unit tests over an hour
We are looking into parallelising these, but work hasn't started on that yet.
We recently parallelised a number of the unit test builds, and added pipeline
builds, and subsequently builds have been crashing with full disks. Ye
Hi,
our average build time for the main archiva build job is about 1 hour on
the apache build servers.
We have a timeout of 2h configured in our pipeline.
So, one hour is too short for us and we would appreciate, if you
consider to increase your kill timeout to some higher value.
Regards
M
Hi,
The Heron project has a build that will last for about 2 hours and 40
minutes on average. It is a single Jenkins job that spins up two
different docker containers consecutively. We only run this job to
generate artifacts for a release. You can see the job here:
https://builds.apache.org/job
> On Jan 22, 2020, at 4:55 PM, Chris Lambertus wrote:
>
> Folks,
>
> Over the last week or so we have received many reports of broken builds due
> to nodes out of resources. As noted in INFRA-19751, builds appear to fail yet
> continue to run, using up all available resources on a build nod
Folks,
Over the last week or so we have received many reports of broken builds due to
nodes out of resources. As noted in INFRA-19751, builds appear to fail yet
continue to run, using up all available resources on a build node.
I will be implementing a system to kill jenkins processes based on
24 matches
Mail list logo