[ https://issues.apache.org/jira/browse/HIVE-24819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ayush Saxena updated HIVE-24819: -------------------------------- Priority: Major (was: Critical) > CombineHiveInputFormat format seems to be returning row count in the multiple > of Maps > -------------------------------------------------------------------------------------- > > Key: HIVE-24819 > URL: https://issues.apache.org/jira/browse/HIVE-24819 > Project: Hive > Issue Type: Bug > Environment: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Beeline version 3.1.0.3.1.0.0-78 by Apache Hive > Reporter: Jitender Kumar > Priority: Major > > Hi Team, > This is the first time I am writing a bug using apache Jira, so pardon me if > I am unintentionally breaking any protocols. > I am facing the following issue (on a multi-node cluster) when I set > hive.tez.input.format to > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat. > Just for demonstration purposes, I will be executing the following query for > multiple cases. > _select count(1) from dbname.personal_data_rc tablesample(1000 rows);_ > *Case1* > mapred.map.tasks=2 > hive.tez.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat > *Output* > 1000 > *Case 2* > mapred.map.tasks=2 > hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat > *Output* > 2000 > *Case 3* > mapred.map.tasks=3 > hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat > *Output* > 3000 > After 3 maps set as default, out remains same, i.e multiple of 3. > Can you help me understand why if I have TABLESAMPLE set to 1000 rows, it is > giving me more number of rows? Is there any other property that must be used > with CombineHiveInputFormat or is it an issue with CombineHiveInputFormat > only? > I have tried to look for a solution but in the end i had to come here. Please > share your inputs ASAP as one of our client is looking for a solution or > explaination regarding this? > For now as a workaround we have changed it to following. > *hive.tez.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat* > -- This message was sent by Atlassian Jira (v8.20.10#820010)