On 16 Jan 2017, at 10:35, assaf.mendelson <assaf.mendel...@rsa.com<mailto:assaf.mendel...@rsa.com>> wrote:
Hi, In the documentation it says spark is supported on windows. The problem, however, is that the documentation description on windows is lacking. There are sources (such as https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html and many more) which explain how to make spark run on windows, however, they all involve downloading a third party winutil.exe file. Since this file is downloaded from a repository belonging to a private person, A repository belonging to me, ste...@apache.org<mailto:ste...@apache.org> this can be an issue (e.g. getting approval to install on a company computer can be an issue). An a committer on the Hadoop PMC, those signed artifacts are no less trustworthy than anything you get from the ASF itself. It's clean built off a windows VM that is only ever used for build/test of Hadoop code, no other use at all; the VM is powered off most of its life. This actually makes it less of a security risk than the main desktop. And you can check the GPG signature of the artifacts to see they've not been tampered with. There are tons of jira tickets on the subject (most are marked as duplicate or not a problem), however, I believe that if we say spark is supported on windows there should be a clear explanation on how to run it and one shouldn’t have to use executable from a private person. While I recognise your concerns, if I wanted to run code on your machines, rest assured, I wouldn't do it in such an obvious way. I'd do it via transitive maven artifacts with a harmless name like "org.example.xml-unit-diags" which would so something useful except in the special case that is' running on code in your subnet, get a patch a pom.xml to pull it into org.apache.hadoop somewhere, release a version of hadoop with that dependency, then wait for it to propagate downstream into everything, including all those server farms running linux only. Writing a malicious windows native excutable would require me to write C/C++ windows code, and I don't want to go there. Of course, if I did any of these I'd be in trouble when caught, lose my job, never be trusted to submit a line of code to any OSS project, lose all my friends, etc, etc. I have nothing to gain by doing so. If you really don't trust me the instructions for building it are up online; build a windows system for compiuling hadoop, check out the branch and then go mvn -T 1C package -Pdist -Dmaven.javadoc.skip=true -DskipTests Or go to hortonworks.com<http://hortonworks.com>, download the windows version and lift the windows binaries. Same thing, built by a colleague-managed release VM. If indeed using winutil.exe is the correct solution, I believe it should be bundled to the spark binary distribution along with clear instructions on how to add it. I recognise that it is good to question the provenance of every line of code executed on machines you care about. I am reasonably confident as so the quality of this code; given the fact it was a checkout & build of the ASF tagged release, then signed my me, it'd either need my VM corrupted, my VM's feed from the ASF HTTPS repo subverted by a fake SSL cert, or by someone getting hold of my GPG key and github keys and uploading something malicious in my name. Interestingly, that is a vulnerability, one I covered last year in my "Household infosec in a post-sony era: talk: https://www.youtube.com/watch?v=tcRjG1CCrPs You'll be pleased to know that the relevant keys now live on a yubikey, so even malicious code executed on my desktop cannot get the secrets off the (encrypted) local drive. It'd need physical access to the key, and I'd notice it was missing, revoke everything, etc, etc, making the risk of my keys being stolen low. That leaves the general problem of "our entire build process is based on the assumption that we truest the maven repositories and the people who wrote the JARs" That's a far more serious problem than the provenance of a single exe file on github -Steve