On 6 Jan 2019, at 18:32, Allen Wittenauer 
<a...@effectivemachines.com.INVALID<mailto:a...@effectivemachines.com.INVALID>> 
wrote:


a) The ASF has been running untrusted code since before Github existed.  From 
my casual watching of Jenkins, most of the change code we run doesn’t come from 
Github PRs.  Any solution absolutely needs to consider what happens in a 
JIRA-based patch file world. [footnote 1,2]

b) Making everything get reviewed by a committer before executing is a 
non-starter.  For large communities, precommit testing acts as a way for 
contributors to get feedback prior to a committer even getting involved.  This 
allows for change iteration prior to another human spending time on it.  But 
the secondary effect is that it acts as a funnel: if a project gets thousands 
of change requests a year [footnote 3], it’s now trivial for committers to 
focus their energy on the ones that are closest to commit.

c) We’ve needed disposable environments (what Stephen Connolly called throwaway 
hardware and is similar to what Dominik Psenner talked about wrt gitlab 
runners) for a while.  When INFRA enabled multiple executors per node (which 
they did for good reasons), it triggered an avalanche of problems:  maven’s 
lack of repo locking, noisy neighbors, Jenkins’ problems galore (security and 
DoS which still exist today!), systemd’s cgroup limitations, and a whole lot 
more.  Getting security out of them is really just extra at this point.

====

1 - With the forced moved to gitbox, this may change, but time will tell.

2 -  FWIW: Gavin and I have been playing with Jenkins’ JIRA Trigger Plugin and 
finding that it’s got some significant weaknesses and needs a lot of support 
code to make viable. This means we’ll likely be sticking with some form of 
Yetus’ precommit-admin for a while longer. :(  So the bright side here is that 
at least the ASF owns the code to make it happen.

3 - Some perspective: Hadoop generated ~6500 JIRAs with patch files attached 
last year alone for the nearly 15 or so active committers to review.  If half 
of the issues had the initial patch plus a single iteration, that’s 13,000 
patches that got tested on Jenkins.


As one of those people, be assured, I don't really look at things until there's 
been 1 iteration of a test run —I feel overloaded enough as it is.
it's the initial due diligence: does this patch compile, break things, what 
breaks, etc. And I expect the same courtesy from others: nobody will look @ my 
code until Yetus stops vetoing it. Oh, and for any object store: you must 
declare the specific endpoint .e.g "S3 ireland" you tested against. The AWS 
infra doesn't test those (no credentials), and asking for the specific endpoint 
has proven to be the best way to get an honest statement from submitters (*)

One thing which Spark does is to have

* a list of people who are trusted enough to have their PRs auto tested on some 
UCB infra. I think you get on that list once you have 1+ PR actually accepted
* a list of people who have the right to kick off a test by asking for it on 
the PR https://github.com/apache/spark/pull/21286#issuecomment-388096882

I quite like that approach. It has a barrier to an unknown-malicious-entity for 
submitting work, yet allows an intermediate trust level "they seem OK So far". 
And let's be honest: the real malware we have to worry about is more subtle: 
it's the PR with the maven plugin with the transitive dependency on some 
non-ASF artifact pre-planted into maven central with the malicious payload 
hiding deep inside, a payload only kicked off once a change to a github file 
tells it to 
https://blog.trendmicro.com/trendlabs-security-intelligence/winnti-abuses-github/


(*) I do tend to a final test run before the final commit, but all has been 
reviewed by then

Reply via email to