Re: ephemeral builds via AWS ECS and/or EKS? GPU Nodes?

Chris Lambertus Sat, 29 Jan 2022 11:27:54 -0800

There is no timeline and certainly no design doc. We have funding, but little 
in-house Infra experience with such an endeavor. We are looking for a community 
champion with experience in this area to help us design a solution.


Our funding is in AWS, so yes, we could provide IAM access to specific services 
once we get a general idea of the type of solution we want to provide.

Short term initiative:

- develop a process for deploying 'on demand' build resources within Jenkins 
via EC2
- allow for the use of GPU nodes
- figure out how to track usage and constrain spending within the funding limit
- figure out how to deal with security push credentials (nexus, nightlies, 
dockerhub, etc.)

Longer-term

- provide EKS/ECS integration where appropriate

The simplest case here would be for builds which are already containerized 
(e.g. don't require Infra-deployed dependencies), as we could deploy a "bare 
metal" AMI. If we needed to add the large number of tools Infra maintains, 
creating and updating the AMI would be quite cumbersome. This is something that 
will need to be sorted out if we are to roll out general-purpose build nodes 
'on-demand'.

Here are some points of note from the thread so far:

- Amazon EC2 Plugin for Jenkins can help
- GPU nodes desired by some projects
- Use of auto-scaling groups rather than containers

Projects interested in contributing to setup/design:

- SystemDS
- Airflow
- Heron
 



> On Jan 22, 2022, at 4:29 AM, Janardhan Pulivarthi 
> <janardhan.pulivar...@gmail.com> wrote:
> 
> Hi Chris,
> 
> At present we would want to use AWS for GPU instances for testing and
> for building docker (gpu) images.
> 
> Is there any timeline or design doc.
> 
> How does the quota work for projects?
> Would you like to provide iam accounts with specific services in need
> for a project?
> 
> Thanks and Regards,
> Janardhan
> 
> On Sat, Jan 1, 2022 at 12:19 AM Allen Wittenauer
> <a...@effectivemachines.com.invalid> wrote:
>> 
>> 
>> 
>>> On Dec 30, 2021, at 10:58 AM, Chris Lambertus <c...@apache.org> wrote:
>>> 
>>> Hi folks,
>>> 
>>> We have some funding to explore providing ephemeral builds via ECS or EKS 
>>> in the Amazon ecosystem, but Infra does not have expertise in this area. We 
>>> would like to integrate such a service with Jenkins.
>>> 
>>> Does anyone have experience with using these services for CI, and would you 
>>> be interested in assisting Infra in developing a prototype?
>>> 
>>> Additionally, we may be able to provide some build nodes with GPUs. Do we 
>>> have projects which could/would make use of GPUs for integration testing?
>> 
>> 
>> At $DAYJOB, I configured the Amazon EC2 plug-in ( 
>> https://plugins.jenkins.io/ec2 ) to do this type of thing using spot 
>> instances with labels tied to the particular EC2 node type that our jobs 
>> use.  I avoided using the EC2 Fleet plug-in  ( 
>> https://plugins.jenkins.io/ec2-fleet ) mainly because it always seemed to 
>> keep at least one node running which is not really want you want to get the 
>> most bang for your buck. In other words, startup time is less important to 
>> me than having a node run idle all weekend.
>> 
>> Biggest issues we’ve hit with this setup are:
>> 
>> a) Depending upon your spot price, you may get outbid and the node gets 
>> killed out from underneath you (rarely happens but it does happen with our 
>> bid)
>> 
>> b) You need to know ahead of time what types of nodes you want to allocate 
>> and then set a label to match. For the ASF, that might be tricky given a lot 
>> of people have no idea what the actual requirements for their jobs are.
>> 
>> c) During a Jenkins restart on rare occasions, the plug-in will ‘lose track’ 
>> of allocated nodes. We have limits for how long our allocations will last  
>> based on # of runs and idle time so generally can spot a ‘stuck’ node after 
>> a day or so.
>> 
>> I haven’t tried configuring it use EKS because none of our stuff needs k8s 
>> yet.

Re: ephemeral builds via AWS ECS and/or EKS? GPU Nodes?

Reply via email to