[ https://issues.apache.org/jira/browse/HIVE-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
shanyu zhao reassigned HIVE-7288: --------------------------------- Assignee: shanyu zhao > Enable support for -libjars and -archives in WebHcat for Streaming MapReduce > jobs > --------------------------------------------------------------------------------- > > Key: HIVE-7288 > URL: https://issues.apache.org/jira/browse/HIVE-7288 > Project: Hive > Issue Type: New Feature > Components: WebHCat > Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.13.1 > Environment: HDInsight deploying HDP 2.1; Also HDP 2.1 on Windows > Reporter: Azim Uddin > Assignee: shanyu zhao > > Issue: > ====== > Due to lack of parameters (or support for) equivalent of '-libjars' and > '-archives' in WebHcat REST API, we cannot use an external Java Jars or > Archive files with a Streaming MapReduce job, when the job is submitted via > WebHcat/templeton. > I am citing a few use cases here, but there can be plenty of scenarios like > this- > #1 > (for -archives):In order to use R with a hadoop distribution like HDInsight > or HDP on Windows, we could package the R directory up in a zip file and > rename it to r.jar and put it into HDFS or WASB. We can then do > something like this from hadoop command line (ignore the wasb syntax, same > command can be run with hdfs) - > hadoop jar %HADOOP_HOME%\lib\hadoop-streaming.jar -archives > wasb:///example/jars/r.jar -files > "wasb:///example/apps/mapper.r,wasb:///example/apps/reducer.r" -mapper > "./r.jar/bin/Rscript.exe mapper.r" -reducer "./r.jar/bin/Rscript.exe > reducer.r" -input /example/data/gutenberg -output /probe/r/wordcount > This works from hadoop command line, but due to lack of support for > '-archives' parameter in WebHcat, we can't submit the same Streaming MR job > via WebHcat. > #2 (for -libjars): > Consider a scenario where a user would like to use a custom inputFormat with > a Streaming MapReduce job and wrote his own custom InputFormat JAR. From a > hadoop command line we can do something like this - > hadoop jar /path/to/hadoop-streaming.jar \ > -libjars /path/to/custom-formats.jar \ > -D map.output.key.field.separator=, \ > -D mapred.text.key.partitioner.options=-k1,1 \ > -input my_data/ \ > -output my_output/ \ > -outputformat test.example.outputformat.DateFieldMultipleOutputFormat > \ > -mapper my_mapper.py \ > -reducer my_reducer.py \ > But due to lack of support for '-libjars' parameter for streaming MapReduce > job in WebHcat, we can't submit the above streaming MR job (that uses a > custom Java JAR) via WebHcat. > Impact: > ======== > We think, being able to submit jobs remotely is a vital feature for hadoop to > be enterprise-ready and WebHcat plays an important role there. Streaming > MapReduce job is also very important for interoperability. So, it would be > very useful to keep WebHcat on par with hadoop command line in terms of > streaming MR job submission capability. > Ask: > ==== > Enable parameter support for 'libjars' and 'archives' in WebHcat for Hadoop > streaming jobs in WebHcat. -- This message was sent by Atlassian JIRA (v6.2#6252)