Hi Chris,

At present we would want to use AWS for GPU instances for testing and
for building docker (gpu) images.

Is there any timeline or design doc.

How does the quota work for projects?
Would you like to provide iam accounts with specific services in need
for a project?

Thanks and Regards,
Janardhan

On Sat, Jan 1, 2022 at 12:19 AM Allen Wittenauer
<a...@effectivemachines.com.invalid> wrote:
>
>
>
> > On Dec 30, 2021, at 10:58 AM, Chris Lambertus <c...@apache.org> wrote:
> >
> > Hi folks,
> >
> > We have some funding to explore providing ephemeral builds via ECS or EKS 
> > in the Amazon ecosystem, but Infra does not have expertise in this area. We 
> > would like to integrate such a service with Jenkins.
> >
> > Does anyone have experience with using these services for CI, and would you 
> > be interested in assisting Infra in developing a prototype?
> >
> > Additionally, we may be able to provide some build nodes with GPUs. Do we 
> > have projects which could/would make use of GPUs for integration testing?
>
>
> At $DAYJOB, I configured the Amazon EC2 plug-in ( 
> https://plugins.jenkins.io/ec2 ) to do this type of thing using spot 
> instances with labels tied to the particular EC2 node type that our jobs use. 
>  I avoided using the EC2 Fleet plug-in  ( 
> https://plugins.jenkins.io/ec2-fleet ) mainly because it always seemed to 
> keep at least one node running which is not really want you want to get the 
> most bang for your buck. In other words, startup time is less important to me 
> than having a node run idle all weekend.
>
> Biggest issues we’ve hit with this setup are:
>
> a) Depending upon your spot price, you may get outbid and the node gets 
> killed out from underneath you (rarely happens but it does happen with our 
> bid)
>
> b) You need to know ahead of time what types of nodes you want to allocate 
> and then set a label to match. For the ASF, that might be tricky given a lot 
> of people have no idea what the actual requirements for their jobs are.
>
> c) During a Jenkins restart on rare occasions, the plug-in will ‘lose track’ 
> of allocated nodes. We have limits for how long our allocations will last  
> based on # of runs and idle time so generally can spot a ‘stuck’ node after a 
> day or so.
>
> I haven’t tried configuring it use EKS because none of our stuff needs k8s 
> yet.

Reply via email to