Re: External CI Service Limitations

2019-07-22 Thread Jarek Potiuk
Hello Everyone (especially the infrastructure),

Can we increase a number of workers/jobs we have per project now?
Decreasing it to 5 (which I believe is the case) is terrible for us now
We are nearing  1.10.4 release with Airflow and if we have more than one PR
in the queue it waits for several hours to run!

Can we increase the limits to 15 or so (3 parallell builds for Airflow as
we are running 5 jobs per stage).

J.


On Fri, Jul 19, 2019 at 12:25 PM Chesnay Schepler 
wrote:

> For reference, the Travis CI usage was also discussed in INFRA-18533.
>
> Since July 8th the Flink project is no longer running PR builds on ASF
> resources.Instead we mirror pull requests into an external repository
> and run these builds in a sponsored Travis account. If anyone is
> interested, check out the above JIRA.
>
> The reduction in ASF resource usage is less than expected because we're
> currently in the process of creating a new release, which usually
> doubles our CI usage temporarily (since many fixes are merged to master
> and a release branch).
>
> Nevertheless, looking at the resource usage since then (via
> https://kibble.dev), we are now in 4th place, behind incubator-druid.
>
> In the future we also plan to migrate our push/cron builds to the
> external repository, but I can't provide an ETA for this.
>
> On 2019/07/19 03:51:08, David Nalley  wrote:
>  > On Tue, Jul 16, 2019 at 5:37 PM Greg Stein  wrote:
>  > >
>  > > Ray,
>  > >
>  > > Thanks for the offer of 50k minutes/project. That will definitely
> work for
>  > > most projects.
>  > >
>  > > While we don't have precise measurements, some projects used *way*
> more
>  > > than that within Travis last month:
>  > >
>  > > flink: 350k minutes
>  > > arrow: 260k minutes
>  > > cloudstack: 190k minutes
>  > > incubator-druid: 96k
>  > > airflow: 77k
>  > > ... others: less than 50k
>  > >
>  > > I don't know what would be needed from Infra, to enable the use of
> Gitlab
>  > > CI for our projects. ??
>  > >
>  > > Thanks,
>  > > Greg Stein
>  > > Infrastructure Administrator, ASF
>  > >
>  >
>  > It's also worth noting that those numbers are likely constrained by
>  > our capacity. IIRC Humbedooh said we'd need something close to double
>  > our currently capacity to satiate the backlog of build requests that
>  > we had.
>  >
>  > Our current travis load should be something like 1.72MM minutes per
> month.
>  >
>  > --David
>  >
>


-- 

Jarek Potiuk
Polidea  | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] 


Re: External CI Service Limitations

2019-07-22 Thread Vladimir Sitnikov
Jarek>Decreasing it to 5 (which I believe is the case) is terrible for us
now
Jarek>We are nearing  1.10.4 release with Airflow and if we have more than
one PR
Jarek>in the queue it waits for several hours to run!

I see 5 "pre-test" Travis jobs take 5-8 minutes each, and all the time is
basically spent in "building airflow image".
Have you considered merging all the jobs to one?

For instance:

https://travis-ci.org/apache/airflow/jobs/561927828
The job took 2m20sec, and the payload was as follows:
> Finished the script ci_check_license.sh
> It took 2 seconds

It is not like it would dramatically reduce Airflow's impact on Travis,
however it would free up executors with little to no drawbacks.

Vladimir


JDK 13 enters Rampdown Phase Two

2019-07-22 Thread Rory O'Donnell


Hi Mark & Gavin,

Any issues to report on JDK 13 , would like to hear the status as we are 
now in rampdown phase 2 ?


**OpenJDK builds *- JDK 13 Early Access build 30 **is now available **at 
: - jdk.java.net/13/*


 * Per the JDK 13 schedule [1], we are now in Rampdown Phase Two.
 o For more details , see Mark Reinhold's email to jdk-dev mailing
   list [2]
 o The overall feature set is frozen, no further JEPs will be
   targeted to this release.
 o Per the JDK Release Process [3] we now turn our focus to P1 and
   P2 bugs.

 * I want to draw your attention to some noteable changes in previous
   builds of JDK 13. These changes  are important for those that
   develop/maintain their own socket implementation
   (java.net.SocketImpl) or use the setSocketImplFactory or
   setSocketFactory APIs to change the system-wide socket implementation:

 o http://jdk.java.net/13/release-notes#JDK-8224477 - delivered in
   build 23
 o http://jdk.java.net/13/release-notes#JDK-8216978 - delivered in
   build 20
 o http://jdk.java.net/13/release-notes#JDK-8220493 - delivered in
   build 13

**OpenJDK builds *- JDK 14 Early Access build 6 is **now available **at 
: - jdk.java.net/14/*


 * These early-access, open-source builds are provided under the
 o GNU General Public License, version 2, with the Classpath
   Exception .
 * Changes of interest since last email
 o 8225239: Refactor NetworkInterface lookups
 o 8226409: Enable argument profiling for sun.misc.Unsafe.put*/get*
 * JEP targeted to JDK 14:
 o JEP352: Non-Volatile Mapped
 * Bug fixes reported by Open Source Projects  :
 o JDK-8227080 - fixed in b5 -reported by Eclipse Jetty

The Java Crypto Roadmap 
 has been updated :


 * Released - 16-July-2019 - Release Affected JDK 7u231 - Disabled
   Kerberos DES encryption by default
 * Targeted Date - 2020 - Targeted Release - JDK 8 - Transport Layer
   Security (TLS) 1.3

Rgds,Rory

[1] http://openjdk.java.net/projects/jdk/13/#Schedule
[2] https://mail.openjdk.java.net/pipermail/jdk-dev/2019-July/003170.html


--
Rgds, Rory O'Donnell
Quality Engineering Manager
Oracle EMEA, Dublin, Ireland



Re: External CI Service Limitations

2019-07-22 Thread Jarek Potiuk
Thanks for looking and analysis :). Very correct.
However this is a special case actually. It only builds it that long
because we have a change in this branch that requires rebuilding more
layers of the image. In this case we add mysql dependencies in a different
way and we rebuild the whole stage of the docker image.

Once it gets merged to master and DockerHub image is built, this step will
be skipped as we will pull the pre-built image from dockerhub (via Docker
caching) and Docker rebuilding only source layers will be way faster.

Typical case is more like this one:
https://travis-ci.org/apache/airflow/jobs/561963482 where the docker
pull&build step takes 1:40 seconds rather than 5 minutes and the steps
takes less than 3 minutes rather than 5-8 minutes.

But indeed one of changes that we are going to implement soon is going to
pre-commit hooks and moving (most) of the static checks to single job: POC
here:
https://travis-ci.org/PolideaInternal/airflow/builds/555443375?utm_source=github_status&utm_medium=notification.
In this case most of the static checks take all of less than 5 minutes in
one job. In the POC we have separate pylint job and check licence job but
we will be able to put it in two separate jobs after some recent changes.

So that optimisation is also in progress (but it needs a bit more work :D ).

J.

On Mon, Jul 22, 2019 at 11:45 AM Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Jarek>Decreasing it to 5 (which I believe is the case) is terrible for us
> now
> Jarek>We are nearing  1.10.4 release with Airflow and if we have more than
> one PR
> Jarek>in the queue it waits for several hours to run!
>
> I see 5 "pre-test" Travis jobs take 5-8 minutes each, and all the time is
> basically spent in "building airflow image".
> Have you considered merging all the jobs to one?
>
> For instance:
>
> https://travis-ci.org/apache/airflow/jobs/561927828
> The job took 2m20sec, and the payload was as follows:
> > Finished the script ci_check_license.sh
> > It took 2 seconds
>
> It is not like it would dramatically reduce Airflow's impact on Travis,
> however it would free up executors with little to no drawbacks.
>
> Vladimir
>


-- 

Jarek Potiuk
Polidea  | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] 


Re: External CI Service Limitations

2019-07-22 Thread Greg Stein
On Mon, Jul 22, 2019 at 2:20 AM Jarek Potiuk 
wrote:

> Hello Everyone (especially the infrastructure),
>
> Can we increase a number of workers/jobs we have per project now?
> Decreasing it to 5 (which I believe is the case) is terrible for us now
> We are nearing  1.10.4 release with Airflow and if we have more than one PR
> in the queue it waits for several hours to run!
>
> Can we increase the limits to 15 or so (3 parallell builds for Airflow as
> we are running 5 jobs per stage).
>

Sorry to say, but "no". Travis is a *shared* resource.

As noted elsethread, before we applied limits, Airflow used about 77,000
minutes in a month. That is tying up two executors full-time for the entire
month. We have a hundred projects using Travis, and Airflow consumed a 20th
of our entire capacity.

The limit for all projects shall remain at five (5).

You can always run your tests locally, to prepare for your upcoming
release. The Foundation's paid resources need to remain shared/available
for all of our projects.

Regards,
Greg Stein
Infrastructure Administrator, ASF


Re: External CI Service Limitations

2019-07-22 Thread Jarek Potiuk
Harsh "no" indeed - I understand where it comes from as I am looking from
my project perspective.

But maybe instead of binary yes/no answer - we can think about some
compromise/temporary solution (for a week or so while we run the release
and try to migrate out to another CI).
Maybe Increasing just our project's capacity (if possible selectively) to
10 or 15 for a week might help us in the event of the coming release and
migration to another CI.

I feel really sad and angry about it, that in this case Apache Inra is not
a bit more helpful.

Seems that we are suddenly - without a real warning - being punished by
having very well tested code with good engineering practices (small
commits, one commit per PR), strong expectations that every commit goes
through extensive full testing on all environments and very active
community with a lot of contributors. Our workflow really depends on those
tests to work and such sudden limitation is not really nice signal to the
community.

We do whatever we can to limit the pressure on Travis:

   - We already decreased the build matrix to very limited set of tests.
   - I've already asked all contributors to run most of the testing - on
   static analysis especially -  locally (and we even merged a big change last
   week to make it super-easy).
   - We are adding pre-commit hook framework to make it fully automated and
   even easier for the contributors to run those checks before they hit Travis
   - We are going to merge some of those jobs in one so that they limit
   number of jobs.
   - We merged a change that makes it easy to migrate out of Travis
   - We already are in a process of moving to the GKE-provided
   infrastructure, we initiated discussions with GitLabCI and we managed to
   secure resources from Google in GCP to run our workflows. I hope I can have
   a working POC this week and be able to migrate soon after.

The problem is that with the current limitation even our effort to move out
is hampered because we have a number of changes pending to be able to
finally migrate but it takes hours for those changes to even go through,
not mentioning merging.
Is there anything Apache Infra can help with it?


J.

On Mon, Jul 22, 2019 at 2:31 PM Greg Stein  wrote:

> On Mon, Jul 22, 2019 at 2:20 AM Jarek Potiuk 
> wrote:
>
> > Hello Everyone (especially the infrastructure),
> >
> > Can we increase a number of workers/jobs we have per project now?
> > Decreasing it to 5 (which I believe is the case) is terrible for us now
> > We are nearing  1.10.4 release with Airflow and if we have more than one
> PR
> > in the queue it waits for several hours to run!
> >
> > Can we increase the limits to 15 or so (3 parallell builds for Airflow as
> > we are running 5 jobs per stage).
> >
>
> Sorry to say, but "no". Travis is a *shared* resource.
>
> As noted elsethread, before we applied limits, Airflow used about 77,000
> minutes in a month. That is tying up two executors full-time for the entire
> month. We have a hundred projects using Travis, and Airflow consumed a 20th
> of our entire capacity.
>
> The limit for all projects shall remain at five (5).
>
> You can always run your tests locally, to prepare for your upcoming
> release. The Foundation's paid resources need to remain shared/available
> for all of our projects.
>
> Regards,
> Greg Stein
> Infrastructure Administrator, ASF
>


-- 

Jarek Potiuk
Polidea  | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea]