> On July 11, 2017, 11:14 p.m., Sahil Takiar wrote: > > ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java > > Lines 305-308 (patched) > > <https://reviews.apache.org/r/60432/diff/2/?file=1763797#file1763797line305> > > > > Is the if statement necessary? Is there a problem with always doing > > this? > > > > I think it would make more sense to always do this. Right now the code > > assumes `mvTask.getWork() == mrAnvMvTask.getChildTasks().get(0).getWork()` > > which doesn't seem like a smart idea. If `mvAndMvTask` is the tasks that is > > going to be added to `resTask` then it would make more sense to just use > > the `MoveWork` from `mvAndMvTask.getChildTasks.get(0)` > > Marta Kuczora wrote: > Thanks a lot for the review Sahil! > I absolutely agree with you that it would be better to get the move task > from mergeAndMoveMoveTask always. The reason why I added the if is that I > wasn't 100% sure if there cannot be any use-case when this approach is not > working. So I wanted to be cautious and only change the behavior for the > blobstore optimization use-case and leave it as it was for the other > use-cases. But if you think that this caution is not necessary, I an happy to > remove the if condition. > Do you think it would make sense to upload a patch where I always use the > child task and see if all tests are successful in the pre-commit run? > > Marta Kuczora wrote: > Updated the patch accordingly and the pre-commit tests were successful > except for 5 tests which are failing for a long time.
Yeah, I think it worth creating a patch where we always use the child task and see if all tests are successful in pre-commit. - Sahil ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60432/#review180254 ----------------------------------------------------------- On July 13, 2017, 3:05 p.m., Marta Kuczora wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/60432/ > ----------------------------------------------------------- > > (Updated July 13, 2017, 3:05 p.m.) > > > Review request for hive and Sergio Pena. > > > Bugs: HIVE-16845 > https://issues.apache.org/jira/browse/HIVE-16845 > > > Repository: hive-git > > > Description > ------- > > The following steps lead to the NPE in the > ConditionalResolverMergeFiles.generateActualTasks method: > > In the GenMapRedUtils.createCondTask method, the tasks for the merge, move > and "merge and move" use cases are created and set as task list to the > ConditionalWork. Originally the moveOnlyMoveTask and the mergeAndMoveMoveTask > was created from the same moveWork, which was the dummyWork created like this > in the createMRWorkForMergingFiles method: > > MoveWork dummyMv = new MoveWork(null, null, null, > new LoadFileDesc(fsInputDesc.getFinalDirName(), finalName, true, > null, null), false); > > > Then in the ConditionalResolverMergeFiles.generateActualTasks method we get > these tasks and use them to create result "resTsks" list. > > For the "merge and move" use case, the code looks like this: > > if (toMove.size() > 0) { > resTsks.add(mrAndMvTask); > > MoveWork mvWork = (MoveWork) mvTask.getWork(); > LoadFileDesc lfd = mvWork.getLoadFileWork(); > > ... > > LoadMultiFilesDesc lmfd = new LoadMultiFilesDesc(toMove, > targetDirs, lfd.getIsDfsDir(), lfd.getColumns(), > lfd.getColumnTypes()); > mvWork.setLoadFileWork(null); > mvWork.setLoadTableWork(null); > mvWork.setMultiFilesDesc(lmfd); > } > > It adds the mrAndMvTask task to the resTsks list and modifies the move work > to move all necessary files in one-step. The mrAndMvTask contains a move task > as child task, which is the same as the mvWork work. > > With the blobstore optimization on, the moveOnlyMoveTask task is created from > a different move work, not from the dummyMoveWork as before: > > MoveWork workForMoveOnlyTask; > if (shouldMergeMovePaths) { > workForMoveOnlyTask = mergeMovePaths(condInputPath, > moveTaskToLink.getWork()); > } else { > workForMoveOnlyTask = dummyMoveWork; > } > > ... > > Task<? extends Serializable> mergeOnlyMergeTask = > TaskFactory.get(mergeWork, conf); > Task<? extends Serializable> moveOnlyMoveTask = > TaskFactory.get(workForMoveOnlyTask, conf); > Task<? extends Serializable> mergeAndMoveMergeTask = > TaskFactory.get(mergeWork, conf); > Task<? extends Serializable> mergeAndMoveMoveTask = > TaskFactory.get(dummyMoveWork, conf); > > Because of this the mvWork in the > ConditionalResolverMergeFiles.generateActualTasks method will also be > different. It has the LoadTableDesc variable set and not the LoadFileDesc, > that causes the NPE. > > When the blobstore optimization is on and the move work is changed, we should > use the child move task of the mrAndMvTask in the generateActualTasks method, > instead of the mvTask. Not just to avoid the NPE, but because this is the > correct move task for the "merge and move" use case. > > > Diffs > ----- > > > itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions_merge_move.q > PRE-CREATION > > itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions_merge_only.q > PRE-CREATION > > itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions_move_only.q > PRE-CREATION > > itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions_merge_move.q.out > PRE-CREATION > > itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions_merge_only.q.out > PRE-CREATION > > itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions_move_only.q.out > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java > 4266569 > > > Diff: https://reviews.apache.org/r/60432/diff/4/ > > > Testing > ------- > > Created q tests for the move-only, merge-only and move-and-merge use cases. > The tests do an "insert overwiew" with blobstore optimizations on and off. > > > Thanks, > > Marta Kuczora > >