> On June 7, 2014, 1:05 a.m., Eugene Koifman wrote: > > 1. I think webhcat-default.xml should be modified to include the jars that > > are now required in templeton.libjars to minimize out-of-the-box config for > > end users. > > 2. Is there any test (e2e) that can be added for this? (with reasonable > > amount of effort) > > 3. When you tested that Pig/Hive jobs get properly tagged, you mean you > > tested that MR jobs that are generated by Pig/Hive are tagged, correct? > > Eugene Koifman wrote: > 4. Actually, instead of doing 1, could WebHCat dynamically figure out > which hadoop version it's talking to and add only the necessary shim jar, > rather than shipping all of them? It reduces the amount of config needed. > It would also be better if we can only ship the minimal set of jars. >
1. I like your proposal from #4. I actually started this route but run into some issues when I tried to add libjars programmatically. Let me try harder and I'll reply back. 2. Will have to check out what we have currently. 3. Correct, I validated that MR jobs generated by Pig/Hive are tagged properly. > On June 7, 2014, 1:05 a.m., Eugene Koifman wrote: > > hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java, > > line 44 > > <https://reviews.apache.org/r/22329/diff/1/?file=604984#file604984line44> > > > > I think it would be useful to add a more detailed description of these > > props. Something like what is in the JIRA ticket. I would have added the > > ticket number to the comment, but Hive prohibits that. Will fix this, thanks > On June 7, 2014, 1:05 a.m., Eugene Koifman wrote: > > hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java, > > line 126 > > <https://reviews.apache.org/r/22329/diff/1/?file=604985#file604985line126> > > > > Which user will this use? Is it the user running WebHCat or the value > > of 'doAs' parameter? This is running in the context of the task itself. In unsecure hadoop this is in the same context as nodemanager/tasktracker. In secure hadoop I believe this is in the context of the user submitting the job. > On June 7, 2014, 1:05 a.m., Eugene Koifman wrote: > > shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java, > > line 157 > > <https://reviews.apache.org/r/22329/diff/1/?file=604987#file604987line157> > > > > Is LOG.info() the right log level? Seems like it will pollute the log > > file. I think this is totally fine, it's just a single entry in the task syslog. This is super useful info (IMO must have) for users to understand what templeton launcher job does. > On June 7, 2014, 1:05 a.m., Eugene Koifman wrote: > > shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java, > > line 160 > > <https://reviews.apache.org/r/22329/diff/1/?file=604987#file604987line160> > > > > Is LOG.info() the right level? I think this is ok. > On June 7, 2014, 1:05 a.m., Eugene Koifman wrote: > > shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java, > > line 189 > > <https://reviews.apache.org/r/22329/diff/1/?file=604987#file604987line189> > > > > log level Same as above, I think this is ok. - Ivan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22329/#review44992 ----------------------------------------------------------- On June 6, 2014, 10:02 p.m., Ivan Mitic wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/22329/ > ----------------------------------------------------------- > > (Updated June 6, 2014, 10:02 p.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > ------- > > Approach in the patch is similar to what Oozie does to handle this situation. > Specifically, all child map jobs get tagged with the launcher MR job id. On > launcher task restart, launcher queries RM for the list of jobs that have the > tag and kills them. After that it moves on to start the same child job again. > Again, similarly to what Oozie does, a new templeton.job.launch.time property > is introduced that captures the launcher job submit timestamp and later used > to reduce the search window when RM is queried. > > To validate the patch, you will need to add webhcat shim jars to > templeton.libjars as now webhcat launcher also has a dependency on hadoop > shims. > > I have noticed that in case of the SqoopDelegator webhcat currently does not > set the MR delegation token when optionsFile flag is used. This also creates > the problem in this scenario. This looks like something that should be > handled via a separate Jira. > > > Diffs > ----- > > > hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/HiveDelegator.java > 23b1c4f > > hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JarDelegator.java > 41b1dc5 > > hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LauncherDelegator.java > 04a5c6f > > hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/PigDelegator.java > 04e061d > > hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SqoopDelegator.java > adcd917 > > hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java > a6355a6 > > hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java > 556ee62 > shims/0.20S/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim20S.java > d3552c1 > shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java > 5a728b2 > shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java > 299e918 > > Diff: https://reviews.apache.org/r/22329/diff/ > > > Testing > ------- > > I have validated that MR, Pig and Hive jobs do get tagged appropriately. I > have also validated that previous child jobs do get killed on RM > failover/task failure. > > > Thanks, > > Ivan Mitic > >