+1, +1, +1 (non-binding) Supporting Comments:
Build-time scripts: Using a platform independent language such as python (or maven in certain cases) will greatly help in reducing build breaks and improve on build script maintainability. Run-time scripts: Most run-time scripts are end-user visible and are scripts that are needed to be run by admin such as starting/stop Hadoop cluster (hadoop-daemons) or by developers submitting a job (hadoop.cmd). There seem to be two types of script files: - Scripts intended for a cluster admin or an IT admin: - It is desirable to use a common set of python scripts that work across all platforms. However, in a Windows enterprise environment IT admins won't like it if they have to run python scripts to start/stop a cluster. So for these, there should be a PowerShell interface wrapper that can accept the right parameters and pass it down to the python script. Hopefully, the power-shell layer can be a simple pass-thru. This way the python scripts is like any other Java code hidden behind a well-known API surface. IT Admins can't debug it or modify it easily, but this is fine since for scripts like the aforementioned there isn't a requirement that IT Admins should be able to easily be able to view/modify the underlying code. - For Windows specific things not supported by Python natively, such as setting ACLs, starting/stopping windows services it should be possible to re-factor the code appropriately. But a little bit of powershell/cmd for these call outs would be unavoidable. - Scripts intended for developers/cluster users: - Most of these scripts (e.g. hadoop.cmd) would be behind other API surface such as WebHDFS, ODBC, JDBC, Templeton etc. So the advantage of having a common script across platforms outweighs the use of cmd/powershell as a native windows feature. Again, it should also be possible to provide simple powershell wrappers for a windows environment. Thanks, Mahadevan. -----Original Message----- From: Ivan Mitic [mailto:iva...@microsoft.com] Sent: Thursday, November 29, 2012 3:41 PM To: common-dev@hadoop.apache.org; ma...@apache.org Subject: RE: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack +1, +1, +1 (some comments inline) -----Original Message----- From: mfo...@hortonworks.com [mailto:mfo...@hortonworks.com] On Behalf Of Matt Foley Sent: Saturday, November 24, 2012 12:13 PM To: common-dev@hadoop.apache.org Subject: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack For discussion, please see previous thread "[PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack". This vote consists of three separate items: 1. Contributors shall be allowed to use Python as a platform-independent scripting language for build-time tasks, and add Python as a build-time dependency. Please vote +1, 0, -1. 2. Contributors shall be encouraged to use Maven tasks in combination with either plug-ins or Groovy scripts to do cross-platform build-time tasks, even under ant in Hadoop-1. Please vote +1, 0, -1. >>> I believe 1&2 in combination make a total sense. I ported a few scripts to >>> Python, and thus far, it showed to be up to the task and satisfy the >>> cross-platform requirements. In my option, it is also important to agree on >>> the version, as I've run into some breaking changes in version 3+. 3. Contributors shall be allowed to use Python as a platform-independent scripting language for run-time tasks, and add Python as a run-time dependency. >>> This is a great aspirational goal! Maintaining two sets of scripts would be >>> a real challenge. Please vote +1, 0, -1. Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to use Maven plug-ins or Groovy as the only means of cross-platform build-time tasks, or to simply continue using platform-dependent scripts as is being done today. Vote closes at 12:30pm PST on Saturday 1 December. --------- Personally, my vote is +1, +1, +1. I think #2 is preferable to #1, but still has many unknowns in it, and until those are worked out I don't want to delay moving to cross-platform scripts for build-time tasks. Best regards, --Matt