Better stability with docker authenticated jenkins agents

2021-04-06 Thread Mick Semb Wever
tl;dr
Can and should all jenkins agents be (automatically) docker authenticated,
for improved stability around docker commands?


This past week the ci-cassandra.apache.org CI fell over because a fair
percentage of docker pulls failed. Our pipeline runs a lot of docker
containers. In the past week the number of containers run went from ~180 to
~270 and that pushed something over the edge. All of a sudden all docker
commands had a significant chance of failing, that was high enough to
ensure every pipeline was guaranteed to fail. These failures came in a few
different forms, the list can be read in INFRA-21666. Googling them shows
that this is a known problem, sometimes around firewalls, networks, dns,
etc. Our jenkins agents are donated by a handful of different companies and
are located in various different places, so such issues don't make much
sense. The other typical fix reported was to just run docker authenticated,
i.e. `docker login`. Trying this immediately solved all problems on
ci-cassandra.apache.org. This was done with a temporary (and empty)
dockerhub account, that each agent has manually logged in with. Based on
all this, the request has been made for an official apache CI
dockerhub account and to have jenkins agents automatically logged in, with
credentials stored in an appropriate manner.

Has anyone experience with such issues before?
Is this a sound and reasonable request to ask of Infra?

regards,
Mick


Re: Better stability with docker authenticated jenkins agents

2021-04-06 Thread Joan Touzet
Hi Mick,

On 06/04/2021 06:34, Mick Semb Wever wrote:
> tl;dr
> Can and should all jenkins agents be (automatically) docker authenticated,
> for improved stability around docker commands?
> 
> 
> This past week the ci-cassandra.apache.org CI fell over because a fair
> percentage of docker pulls failed. Our pipeline runs a lot of docker
> containers. In the past week the number of containers run went from ~180 to
> ~270 and that pushed something over the edge. All of a sudden all docker
> commands had a significant chance of failing, that was high enough to
> ensure every pipeline was guaranteed to fail. These failures came in a few
> different forms, the list can be read in INFRA-21666. Googling them shows
> that this is a known problem, sometimes around firewalls, networks, dns,
> etc. Our jenkins agents are donated by a handful of different companies and
> are located in various different places, so such issues don't make much
> sense. The other typical fix reported was to just run docker authenticated,
> i.e. `docker login`. Trying this immediately solved all problems on
> ci-cassandra.apache.org. This was done with a temporary (and empty)
> dockerhub account, that each agent has manually logged in with. Based on
> all this, the request has been made for an official apache CI
> dockerhub account and to have jenkins agents automatically logged in, with
> credentials stored in an appropriate manner.
> 
> Has anyone experience with such issues before?

Yes. See https://issues.apache.org/jira/browse/INFRA-20795 for detail.

In short, once Infra agrees to create the images for us, we'll move all
our CI dependencies into those containers, and should no longer have issues.

> Is this a sound and reasonable request to ask of Infra?

This could work too but *only* when Infra manages the agents. For most
of our agents, Infra does not have ssh access (due to NAT-style
proxying) so there's nothing they can do... _other_ than give us those
dockerhub creds. That sounds more unwieldy than the approach outlined in
ticket 20795.

-Joan


Re: Better stability with docker authenticated jenkins agents

2021-04-06 Thread Mick Semb Wever
>
> > Has anyone experience with such issues before?
>
> Yes. See https://issues.apache.org/jira/browse/INFRA-20795 for detail.
>
> In short, once Infra agrees to create the images for us, we'll move all
> our CI dependencies into those containers, and should no longer have
> issues.
>


Thanks Joan. We do have our testing images already in the apache docker
account. And the unauthenticated docker commands were falling over even
pulling just official ubuntu images.



> > Is this a sound and reasonable request to ask of Infra?
>
> This could work too but *only* when Infra manages the agents.
>


I was wondering if secrets could be stored into Jenkins, but yeah unmanaged
agents 😒

Docker accounts also come with api tokens, so different tokens might be
handed out to the different jenkins clusters. But I really don't know
what's best here, especially with little protection over `.docker/config
.json` … 🤷🏻‍♀️


Re: Better stability with docker authenticated jenkins agents

2021-04-06 Thread Joan Touzet
On 06/04/2021 18:09, Mick Semb Wever wrote:
>>
>>> Has anyone experience with such issues before?
>>
>> Yes. See https://issues.apache.org/jira/browse/INFRA-20795 for detail.
>>
>> In short, once Infra agrees to create the images for us, we'll move all
>> our CI dependencies into those containers, and should no longer have
>> issues.
>>
> 
> 
> Thanks Joan. We do have our testing images already in the apache docker
> account. And the unauthenticated docker commands were falling over even
> pulling just official ubuntu images.

My understanding is that pulls of all images from the apache/* namespace
are not subject to rate limiting. Thus, the recommendation to move
everything you need inside of it.

If that's not practical, or you're building images that require other
assets... this won't work for you.

-Joan


Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-06 Thread Hyukjin Kwon
Hi all,

I am an Apache Spark PMC, and would like to know the future plan about
GitHub Actions in ASF.
Please also see the INFRA ticket I filed:
https://issues.apache.org/jira/browse/INFRA-21646.

I am aware of the limited GitHub Actions resources that are shared
across all projects in ASF,
and many projects suffer from it. This issue significantly slows down the
development cycle of
 other projects, at least Apache Spark.

How do we plan to increase the resources in GitHub Actions, and what are
the blockers? I would appreciate any input and thoughts on this.

Thank you so much.

CC'ing Spark @dev  for more visibility. Please take
it out if considered inappropriate.