On 6 Jan 2019, at 18:32, Allen Wittenauer <a...@effectivemachines.com.INVALID<mailto:a...@effectivemachines.com.INVALID>> wrote:
a) The ASF has been running untrusted code since before Github existed. From my casual watching of Jenkins, most of the change code we run doesn’t come from Github PRs. Any solution absolutely needs to consider what happens in a JIRA-based patch file world. [footnote 1,2] b) Making everything get reviewed by a committer before executing is a non-starter. For large communities, precommit testing acts as a way for contributors to get feedback prior to a committer even getting involved. This allows for change iteration prior to another human spending time on it. But the secondary effect is that it acts as a funnel: if a project gets thousands of change requests a year [footnote 3], it’s now trivial for committers to focus their energy on the ones that are closest to commit. c) We’ve needed disposable environments (what Stephen Connolly called throwaway hardware and is similar to what Dominik Psenner talked about wrt gitlab runners) for a while. When INFRA enabled multiple executors per node (which they did for good reasons), it triggered an avalanche of problems: maven’s lack of repo locking, noisy neighbors, Jenkins’ problems galore (security and DoS which still exist today!), systemd’s cgroup limitations, and a whole lot more. Getting security out of them is really just extra at this point. ==== 1 - With the forced moved to gitbox, this may change, but time will tell. 2 - FWIW: Gavin and I have been playing with Jenkins’ JIRA Trigger Plugin and finding that it’s got some significant weaknesses and needs a lot of support code to make viable. This means we’ll likely be sticking with some form of Yetus’ precommit-admin for a while longer. :( So the bright side here is that at least the ASF owns the code to make it happen. 3 - Some perspective: Hadoop generated ~6500 JIRAs with patch files attached last year alone for the nearly 15 or so active committers to review. If half of the issues had the initial patch plus a single iteration, that’s 13,000 patches that got tested on Jenkins. As one of those people, be assured, I don't really look at things until there's been 1 iteration of a test run —I feel overloaded enough as it is. it's the initial due diligence: does this patch compile, break things, what breaks, etc. And I expect the same courtesy from others: nobody will look @ my code until Yetus stops vetoing it. Oh, and for any object store: you must declare the specific endpoint .e.g "S3 ireland" you tested against. The AWS infra doesn't test those (no credentials), and asking for the specific endpoint has proven to be the best way to get an honest statement from submitters (*) One thing which Spark does is to have * a list of people who are trusted enough to have their PRs auto tested on some UCB infra. I think you get on that list once you have 1+ PR actually accepted * a list of people who have the right to kick off a test by asking for it on the PR https://github.com/apache/spark/pull/21286#issuecomment-388096882 I quite like that approach. It has a barrier to an unknown-malicious-entity for submitting work, yet allows an intermediate trust level "they seem OK So far". And let's be honest: the real malware we have to worry about is more subtle: it's the PR with the maven plugin with the transitive dependency on some non-ASF artifact pre-planted into maven central with the malicious payload hiding deep inside, a payload only kicked off once a change to a github file tells it to https://blog.trendmicro.com/trendlabs-security-intelligence/winnti-abuses-github/ (*) I do tend to a final test run before the final commit, but all has been reviewed by then