New Windows Jenkins Nodes

2019-07-02 Thread Chris Thistlethwaite

Greetings all!

This might seem like a very quick update/change, but we needed to get 
some new Windows 2016 boxes into Jenkins rotation. I've swapped out 
windows-2012-2, 3, windows-2016-2, 3 for new nodes. There are currently 
builds running on 2 of the other nodes, which I'll let finish before 
move those out of rotation. The new nodes are jenkins-win-he-de-1 
through 6. They are better, faster, stronger, so the upgrade should see 
improved build speeds. If you have any issues with builds on those 
nodes, please create a JIRA ticket or ping us/me on Slack.


Thanks!

-Chris T.
#asfinfra


Re: Many Maven builds hanging on Jenkins

2019-07-02 Thread Robert Scholte
I was offline for a week, is it still an issue?
On 22-6-2019 14:12:29, Stefan Seelmann  wrote:
Seems it happened again. Ctrl-F "Maven TLP" shows 335 builds running or
waiting in the queue.

On 6/17/19 7:31 PM, Tibor Digana wrote:
> Who can rework the Jenkins plugin we use, so that the build won't be
> triggered after Groovy libs have changed?
> Somebody changes [1] and [2] and then all 100 Maven projects run all bunch
> of branches.
> The queue is huge in Jenkins, and this is the blocker for the entire
> organization and not only for us Maven!
>
> [1]: https://gitbox.apache.org/repos/asf?p=maven-jenkins-env.git
> [2]: https://gitbox.apache.org/repos/asf?p=maven-jenkins-lib.git
>
>
> On Mon, Jun 17, 2019 at 7:16 PM Robert Scholte wrote:
>
>> Sure, will investigate.
>>
>> thanks,
>> Robert
>>
>> On Mon, 17 Jun 2019 15:21:30 +0200, Robert Munteanu
>> wrote:
>>
>>> Hi,
>>>
>>> I noticed today that Jenkins is taking more time to start building
>>> various jobs. Looking at the executors, I think there is a
>>> misconfiguration/problem with some Maven-related jobs
>>>
>>> -
>>> https://builds.apache.org/job/maven-box/job/maven/job/MNG-6672/6/console
>>>
>>> This was started 3d17h ago.
>>>
>>> -
>>>
>> https://builds.apache.org/job/maven-box/job/maven-resolver/job/MRESOLVER-12/6/console
>>>
>>> This was started 4d4h ago
>>>
>>> -
>>> https://builds.apache.org/job/maven-box/job/maven/job/master/226/console
>>>
>>> This was started 1d16h ago.
>>>
>>> And quite some more, but I guess you get the point :-) It would be
>>> great if someon from the Maven project could look into this.
>>>
>>> Thanks!
>>>
>>> Robert
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org
>> For additional commands, e-mail: dev-h...@maven.apache.org
>>
>>
>



External CI Service Limitations

2019-07-02 Thread Daniel Gruno
Hi folks,

As this seems to be a hot topic as of late, I'll provide some
information about our usage of external CI services.

Travis CI: The foundation has an agreement with Travis CI to provide our
projects with external CI services through them. At current, we have
approximately 40 executors there, half of which are currently
(build-time-wise) being occupied by three projects; Flink (21%), Arrow
(18%) and Airflow (13%). The foundation is currently not looking at
increasing the number of executors there, as we are assessing long-term
costs and benefits, and we advise projects that have higher immediate
needs for CI to either use our Jenkins CI system or figure out a
budget/plan (whether that be internal or external to the foundation) for
other options, and to also assess whether the number of builds and their
duration fit with the overall goal of the CI need and the resources
available at our disposal. Currently, all projects have, as of this
week, been capped at 5 concurrent builds with Travis.

AppVeyor and CircleCI: The foundation makes use of the free tier of
these, as we have not received any requests for an increase, nor
assessed whether this is beneficial, and as there are, as things are
right now, permission issues that go against our standard policies for
repository access.

With regards, Daniel on behalf of ASF Infra.


Re: External CI Service Limitations

2019-07-02 Thread Jarek Potiuk
We also experience huge delays for Airflow (seems that we are the third "whale" 
according to 
https://lists.apache.org/thread.html/af52e2a3e865c01596d46374e8b294f2740587dbd59d85e132429b6c@%3Cbuilds.apache.org%3E)
 

We are evaluating other options for funding as well (including getting some 
credits from Google for Google Cloud Build / GCP) but it will take time to get 
resources and to switch. 

In the meantime maybe INFRA can help to coordinate some effort between 
Flink/Arrow/Airflow to decrease pressure on Travis? We considered few options 
(and are going to implement some of them shortly I think). Some of them are not 
direct changes in Travis CI builds but some other workflow/infrastructure 
changes that will decrease pressure on Travis:

* We are going to decrease the matrix of builds we run - currently we have 
several combinations of Airflow builds (postgres/mysql/sqlite) x (python3.5/ 
python 3.6) - but we will only run subset of those rather than full matrix 

* we are going to combine several of our jobs into one using parallel 
processing. This is mainly for static code analysis - currently we have one job 
for each analysis which makes them run in parallel. After the change - when you 
include machine boot times and use all processors, the overall build time might 
be even faster than today - AND there will be far less vms to start for the 
builds. 

* we have separate kubernetes-related job. It currently runs only one suite of 
tests specific to Kubernetes as it requires special setup of the environment, 
but we are looking into possibility of merging Kubernetes tests into main tests 
(and faster environment setup with docker-compose) and save 1 job (25% of our 
test jobs). The main jobs will run a bit longer, but the whole overhead for 
starting extra job will be gone. 

* We introduce (PR is in the final stages of review) an easy way for 
contributors to run static code analysis on their own environment. A lot of our 
builds are PR failing because of static code analysis that is run on Travis. 
Currently it was a bit convoluted and not easily reproducible to run full 
analysis locally , but we are moving to a fully dockerised setup for builds 
that will allow contributors to easily run such checks on their machines and we 
will encourage people to run it locally, rather than submit PRs just to check 
if the code is right. 

* Even more - we are introducing and encouraging easy-to-use "pre-commit" 
framework in our developer workflow where the analysis will be run at commit 
time for only the changes being committed - this might further decrease the 
number of builds submitted by the contributors. 

* Lastly - we are introducing an easy to use "simplified development 
environment" where developers will be able to run all or subset of test suites 
easily on their machine. Currently our setup is fairly convoluted as well but 
we have a PR in progress to address it and have a very easy way (again - fully 
dockerised) to reproduce the test environment. 

Maybe the committers from Flink and Arrow can also take a look at non-obvious 
ways how their projects can decrease pressure on Travis (at least for the time 
being). Maybe there are some quick wins we can apply in short time in 
coordinated way and buy more time for switching the infrastructure ? 




Re: External CI Service Limitations

2019-07-02 Thread Greg Stein
On Tue, Jul 2, 2019 at 11:56 PM Jarek Potiuk  wrote:
>...

> In the meantime maybe INFRA can help to coordinate some effort between
> Flink/Arrow/Airflow to decrease pressure on Travis? We considered few
> options (and are going to implement some of them shortly I think). Some of
> them are not direct changes in Travis CI builds but some other
> workflow/infrastructure changes that will decrease pressure on Travis:
>
>...

> Maybe the committers from Flink and Arrow can also take a look at
> non-obvious ways how their projects can decrease pressure on Travis (at
> least for the time being). Maybe there are some quick wins we can apply in
> short time in coordinated way and buy more time for switching the
> infrastructure ?
>

The above is fabulous. Please continue trading thoughts and working to
reduce your Travis loads, for your own benefit and for your fellow projects
at the Foundation.

This list is the best space to trade such ideas. I'm not sure what Infra
can do, as our skillset is quite a bit different from what your projects
need, for reducing load.

We'll keep this list apprised of anything we find. If anybody knows of,
and/or can recommend a similar type of outsourced build service ... we
*absolutely* would welcome pointers.

We're gonna keep Jenkins and buildbot around for the foreseeable future,
and are interested in outsourced solutions.

Cheers,
Greg Stein
Infrastructure Administrator, ASF


Re: External CI Service Limitations

2019-07-02 Thread Allen Wittenauer


> On Jul 2, 2019, at 10:21 PM, Greg Stein  wrote:
> 
> We'll keep this list apprised of anything we find. If anybody knows of,
> and/or can recommend a similar type of outsourced build service ... we
> *absolutely* would welcome pointers.

FWIW, we’ve been collecting them bit by bit into Apache Yetus ( 
http://yetus.apache.org/documentation/in-progress/precommit-robots/ ):

* Azure Pipelines
* Circle CI
* Cirrus CI
* Gitlab CI
* Semaphore CI
* Travis CI

They all have some pros and cons.  I’m not going to rank them or 
anything.

I will say, however, it really feels like Gitlab CI is the best bet to 
pursue since one can add their own runners to the Gitlab CI infrastructure 
dedicated to their own projects.  That ultimately means that replacing Jenkins 
slaves is a very real possibility.

(Also, I’ve requested access to the Github Actions beta, but haven’t 
received anything yet.  I have a hunch that the reworking of the OAuth 
permission model is related, which may make some of these more viable for the 
ASF.)

Re: External CI Service Limitations

2019-07-02 Thread Jeff MAURY
Azure pipeline vas the big plus of supporting Linux Windows and macos nodes
And i think you can add you nodes to the pools

Jeff

Le mer. 3 juil. 2019 à 08:04, Allen Wittenauer
 a écrit :

>
> > On Jul 2, 2019, at 10:21 PM, Greg Stein  wrote:
> >
> > We'll keep this list apprised of anything we find. If anybody knows of,
> > and/or can recommend a similar type of outsourced build service ... we
> > *absolutely* would welcome pointers.
>
> FWIW, we’ve been collecting them bit by bit into Apache Yetus (
> http://yetus.apache.org/documentation/in-progress/precommit-robots/ ):
>
> * Azure Pipelines
> * Circle CI
> * Cirrus CI
> * Gitlab CI
> * Semaphore CI
> * Travis CI
>
> They all have some pros and cons.  I’m not going to rank them or
> anything.
>
> I will say, however, it really feels like Gitlab CI is the best
> bet to pursue since one can add their own runners to the Gitlab CI
> infrastructure dedicated to their own projects.  That ultimately means that
> replacing Jenkins slaves is a very real possibility.
>
> (Also, I’ve requested access to the Github Actions beta, but
> haven’t received anything yet.  I have a hunch that the reworking of the
> OAuth permission model is related, which may make some of these more viable
> for the ASF.)


Re: External CI Service Limitations

2019-07-02 Thread Allen Wittenauer



> On Jul 2, 2019, at 11:12 PM, Jeff MAURY  wrote:
> 
> Azure pipeline vas the big plus of supporting Linux Windows and macos nodes

There’s a few that support various combinations of non-Linux.  Gitlab 
CI has been there for a while.  Circle CI has had OS X and is in beta with 
Windows.  Cirrus CI has all those plus FreeBSD. etc, etc.  It’s quickly 
becoming required that cloud-based CI systems do more than just throw up a 
Linux box. 

> And i think you can add you nodes to the pools

I think they are limited to being on Azure tho, IIRC.  But I’m probably 
not.  I pretty much gave up on doing anything serious with it.

I really wanted to like pipelines.  The UI is nice.  But in the end, 
Pipelines was one of the more frustrating ones to work with in my 
experience—and that was with some help from the MS folks. It suffers by a death 
of a thousand cuts (lack of complex, real-world examples, custom docker binary, 
pre-populated bits here and there, a ton of env vars, artifact system is a 
total disaster, etc, etc).  Lots of small problems that add up to just not 
being worth the effort.  

Hopefully it’s improved since I last looked at it months and months ago 
though.