I like Alejandro's idea about Maven for a few of reasons: - bringing in a scripting environment which is known for its inter-version idiosyncrasies just because Windows can't handle trivial shell scripting looks like an overkill to me - relative to above, there's a chance that Python's pre-requisites used in Hadoop might get into a conflict with some other components in the stack. This will be a nightmare for the integrator projects i.e. Bigtop - Maven is de-facto standard for Java stacks - Maven has built-in scripting language (Groovy) if some plugins aren't sufficient for achieving whatever goals
Addressing Matt's later point about non-Mavenized Hadoop-1 line: it uses Maven stuff suchs as deploy/install via custom ant tasks. Same approach would work for saveVersion.sh and others, I am sure. Cos On Wed, Nov 21, 2012 at 11:25AM, Alejandro Abdelnur wrote: > Hey Matt, > > We already require java/mvn/protoc/cmake/forrest (forrest is hopefully on > its way out with the move of docs to APT) > > Why not do a maven-plugin to do that? > > Colin already has something to simplify all the cmake calls from the builds > using a maven-plugin (https://issues.apache.org/jira/browse/HADOOP-8887) > > We could do the same with protoc, thus simplifying the POMs. > > The saveVersion.sh seems like another prime candidate for a maven plugin, > and in this case it would not require external tools. > > Does this make sense? > > Thx > > On Wed, Nov 21, 2012 at 11:15 AM, Matt Foley <ma...@apache.org> wrote: > > > This discussion started in > > HADOOP-8924<https://issues.apache.org/jira/browse/HADOOP-8924> > > , where it was proposed to replace the build-time utility "saveVersion.sh" > > with a python script. This would require Python as a build-time > > dependency. Here's the background: > > > > Those of us involved in the branch-1-win port of Hadoop to Windows without > > use of Cygwin, have faced the issue of frequent use of shell scripts > > throughout the system, both in build time (eg, the utility > > "saveVersion.sh"), > > and run time (config files like "hadoop-env.sh" and the start/stop scripts > > in "bin/*" ). Similar usages exist throughout the Hadoop stack, in all > > projects. > > > > The vast majority of these shell scripts do not do anything platform > > specific; they can be expressed in a posix-conforming way. Therefore, it > > seems to us that it makes sense to start using a cross-platform scripting > > language, such as python, in place of shell for these purposes. For those > > rare occasions where platform-specific functionality really is needed, > > python also supports quite a lot of platform-specific functionality on both > > Linux and Windows; but where that is inadequate, one could still > > conditionally invoke a platform-specific module written in shell (for > > Linux/*nix) or powershell or bat (for Windows). > > > > The primary motive for moving to a cross-platform scripting language is > > maintainability. The alternative would be to maintain two complete suites > > of scripts, one for Linux and one for Windows (and perhaps others in the > > future). We want to avoid the need to update dual modules in two different > > languages when functionality changes, especially given that many Linux > > developers are not familiar with powershell or bat, and many Windows > > developers are not familiar with shell or bash. > > > > Regarding the choice of python: > > > > - There are already a few instances of python usage in Hadoop, such as > > the utility (currently broken) "relnotes.py", and massive usage of > > python > > in the examples/ and contrib/ directories. > > - Python is also used in Bigtop build-time. > > - The Python language is available for free on essentially all > > platforms, under an Apache-compatible > > license<http://www.apache.org/legal/resolved.html>. > > > > - It is supported in Eclipse and similar IDEs. > > - Most importantly, it is widely accepted as a reasonably good OO > > scripting language, and it is easily learned by anyone who already knows > > shell or perl, or other common scripting languages. > > - On the Tiobe index of programming language > > popularity< > > http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html>, > > which seeks to measure the relative number of software engineers who > > know > > and use each language, Python far exceeds Perl and Ruby. The only more > > well-known scripting languages are PHP and Visual Basic, neither of > > which > > seems a prime candidate for this use. > > > > For build-time usage, I think we should immediately approve python as a > > build-time dependency, and allow people who are motivated to do so, to open > > jiras for migrating existing build-time shell scripts to python. > > > > For run-time, there is likely to be a lot more discussion. Lots of folks, > > including me, aren't real happy with use of active scripts for > > configuration, and various others, including I believe some of the Bigtop > > folks, have issues with the way the start/stop scripts work. Nevertheless, > > all those scripts exist today and are widely used. And they present an > > impediment to porting to Windows-without-cygwin. > > > > Nothing about run-time use of scripts has changed significantly over the > > past three years, and I don't think we should hold up the Windows port > > while we have a huge discussion about issues that veer dangerously into > > religious/aesthetic domains. It would be fun to have that discussion, but I > > don't want this decision to be dependent on it! > > > > So I propose that we go ahead and also approve python as a run-time > > dependency, and allow the inclusion of python scripts in place of current > > shell-based functionality. The unpleasant alternative is to spawn a bunch > > of powershell scripts in parallel to the current shell scripts, with a very > > negative impact on maintainability. The Windows port must, after all, be > > allowed to proceed. > > > > Let's have a discussion, and then I'll put both issues, separately, to a > > vote (unless we miraculously achieve consensus without a vote :-) > > > > I also encourage members of the other Hadoop-related projects, to carry > > this discussion into those forums. It would be very cool to agree on a > > whole-stack solution for the scripting problem. > > > > Best regards, > > --Matt > > > > > > -- > Alejandro
signature.asc
Description: Digital signature