[ https://issues.apache.org/jira/browse/HIVE-19703?focusedWorklogId=444963&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-444963 ]
ASF GitHub Bot logged work on HIVE-19703: ----------------------------------------- Author: ASF GitHub Bot Created on: 12/Jun/20 15:28 Start Date: 12/Jun/20 15:28 Worklog Time Spent: 10m Work Description: belugabehr commented on a change in pull request #377: URL: https://github.com/apache/hive/pull/377#discussion_r439487202 ########## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java ########## @@ -201,10 +214,16 @@ public HiveSplitGenerator(InputInitializerContext initializerContext) throws IOE LOG.info("The preferred split size is " + preferredSplitSize); } + float waves; // Create the un-grouped splits - float waves = - conf.getFloat(TezMapReduceSplitsGrouper.TEZ_GROUPING_SPLIT_WAVES, - TezMapReduceSplitsGrouper.TEZ_GROUPING_SPLIT_WAVES_DEFAULT); + if (numSplits.isPresent()) { + waves = (float)numSplits.getAsInt() / availableSlots; Review comment: How about `numSplits.get().floatValue()` ? ########## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java ########## @@ -91,9 +92,16 @@ private final MapWork work; private final SplitGrouper splitGrouper = new SplitGrouper(); private final SplitLocationProvider splitLocationProvider; + private final OptionalInt numSplits; Review comment: Since the constructor accepts an `Integer` type anyway, can you please make this `Optional<Integer>` ? ########## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java ########## @@ -117,6 +125,11 @@ public HiveSplitGenerator(Configuration conf, MapWork work, final boolean genera // initialized, which may cause it to drop events. // No dynamic partition pruning pruner = null; + if (numSplits == null) { Review comment: If `Optional<Integer>` is used, this just becomes `Optional.ofNullable()` ########## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java ########## @@ -232,7 +251,7 @@ public HiveSplitGenerator(InputInitializerContext initializerContext) throws IOE splits[0] = new HiveInputFormat.HiveInputSplit(fileSplit, partIF); } else { // Raw splits - splits = inputFormat.getSplits(jobConf, (int) (availableSlots * waves)); + splits = inputFormat.getSplits(jobConf, numSplits.orElse((int) (availableSlots * waves))); Review comment: Take a look at using `Math.multiplyExact(availableSlots, waves)` here ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 444963) Time Spent: 20m (was: 10m) > GenericUDTFGetSplits never uses num splits argument > --------------------------------------------------- > > Key: HIVE-19703 > URL: https://issues.apache.org/jira/browse/HIVE-19703 > Project: Hive > Issue Type: Bug > Components: UDF > Reporter: Eric Wohlstadter > Assignee: Jaume M > Priority: Major > Labels: pull-request-available > Attachments: HIVE-19703.1.patch > > Time Spent: 20m > Remaining Estimate: 0h > > The description for GenericUDTFGetSplits says > {code} > Returns an array of length int serialized splits for the referenced tables > string. > {code} > but the argument to control the number of splits is DOA. -- This message was sent by Atlassian Jira (v8.3.4#803005)