Re: ephemeral builds via AWS ECS and/or EKS? GPU Nodes?

Janardhan Pulivarthi Sat, 29 Jan 2022 19:35:06 -0800

> Is there any chance we could work in/learn about
build caching in this process?  Full builds for Heron take several hours,
it'd be nice to speed them up.


Fortunately, it is possible with s3[1]. For we upload them to s3 via
travis[2]. We can choose the nearest region and using from travis ci
seems straight forward [3].

[1] https://docs.aws.amazon.com/general/latest/gr/s3.html
[2] https://docs.travis-ci.com/user/caching/#how-does-caching-work
[3] 
https://docs.travis-ci.com/user/deployment/codedeploy/#s3-deployment-or-github-deployment

Regards,
Janardhan


On Sun, Jan 30, 2022 at 2:29 AM Josh Fischer <[email protected]> wrote:
>
> +1 from the Heron team.  Is there any chance we could work in/learn about
> build caching in this process?  Full builds for Heron take several hours,
> it'd be nice to speed them up.
>
> We use Bazel to build, here are some details:
> https://docs.bazel.build/versions/main/remote-caching.html
>
> On Sat, Jan 29, 2022 at 1:27 PM Chris Lambertus <[email protected]> wrote:
>
> > There is no timeline and certainly no design doc. We have funding, but
> > little in-house Infra experience with such an endeavor. We are looking for
> > a community champion with experience in this area to help us design a
> > solution.
> >
> > Our funding is in AWS, so yes, we could provide IAM access to specific
> > services once we get a general idea of the type of solution we want to
> > provide.
> >
> > Short term initiative:
> >
> > - develop a process for deploying 'on demand' build resources within
> > Jenkins via EC2
> > - allow for the use of GPU nodes
> > - figure out how to track usage and constrain spending within the funding
> > limit
> > - figure out how to deal with security push credentials (nexus, nightlies,
> > dockerhub, etc.)
> >
> > Longer-term
> >
> > - provide EKS/ECS integration where appropriate
> >
> > The simplest case here would be for builds which are already containerized
> > (e.g. don't require Infra-deployed dependencies), as we could deploy a
> > "bare metal" AMI. If we needed to add the large number of tools Infra
> > maintains, creating and updating the AMI would be quite cumbersome. This is
> > something that will need to be sorted out if we are to roll out
> > general-purpose build nodes 'on-demand'.
> >
> > Here are some points of note from the thread so far:
> >
> > - Amazon EC2 Plugin for Jenkins can help
> > - GPU nodes desired by some projects
> > - Use of auto-scaling groups rather than containers
> >
> > Projects interested in contributing to setup/design:
> >
> > - SystemDS
> > - Airflow
> > - Heron
> >
> >
> >
> >
> > > On Jan 22, 2022, at 4:29 AM, Janardhan Pulivarthi <
> > [email protected]> wrote:
> > >
> > > Hi Chris,
> > >
> > > At present we would want to use AWS for GPU instances for testing and
> > > for building docker (gpu) images.
> > >
> > > Is there any timeline or design doc.
> > >
> > > How does the quota work for projects?
> > > Would you like to provide iam accounts with specific services in need
> > > for a project?
> > >
> > > Thanks and Regards,
> > > Janardhan
> > >
> > > On Sat, Jan 1, 2022 at 12:19 AM Allen Wittenauer
> > > <[email protected]> wrote:
> > >>
> > >>
> > >>
> > >>> On Dec 30, 2021, at 10:58 AM, Chris Lambertus <[email protected]> wrote:
> > >>>
> > >>> Hi folks,
> > >>>
> > >>> We have some funding to explore providing ephemeral builds via ECS or
> > EKS in the Amazon ecosystem, but Infra does not have expertise in this
> > area. We would like to integrate such a service with Jenkins.
> > >>>
> > >>> Does anyone have experience with using these services for CI, and
> > would you be interested in assisting Infra in developing a prototype?
> > >>>
> > >>> Additionally, we may be able to provide some build nodes with GPUs. Do
> > we have projects which could/would make use of GPUs for integration testing?
> > >>
> > >>
> > >> At $DAYJOB, I configured the Amazon EC2 plug-in (
> > https://plugins.jenkins.io/ec2 ) to do this type of thing using spot
> > instances with labels tied to the particular EC2 node type that our jobs
> > use.  I avoided using the EC2 Fleet plug-in  (
> > https://plugins.jenkins.io/ec2-fleet ) mainly because it always seemed to
> > keep at least one node running which is not really want you want to get the
> > most bang for your buck. In other words, startup time is less important to
> > me than having a node run idle all weekend.
> > >>
> > >> Biggest issues we’ve hit with this setup are:
> > >>
> > >> a) Depending upon your spot price, you may get outbid and the node gets
> > killed out from underneath you (rarely happens but it does happen with our
> > bid)
> > >>
> > >> b) You need to know ahead of time what types of nodes you want to
> > allocate and then set a label to match. For the ASF, that might be tricky
> > given a lot of people have no idea what the actual requirements for their
> > jobs are.
> > >>
> > >> c) During a Jenkins restart on rare occasions, the plug-in will ‘lose
> > track’ of allocated nodes. We have limits for how long our allocations will
> > last  based on # of runs and idle time so generally can spot a ‘stuck’ node
> > after a day or so.
> > >>
> > >> I haven’t tried configuring it use EKS because none of our stuff needs
> > k8s yet.
> >
> >

Re: ephemeral builds via AWS ECS and/or EKS? GPU Nodes?

Reply via email to