[jira] [Updated] (HIVE-24819) CombineHiveInputFormat format seems to be returning row count in the multiple of Maps

Ayush Saxena (Jira) Thu, 21 Dec 2023 11:50:00 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-24819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ayush Saxena updated HIVE-24819:
--------------------------------
    Priority: Major  (was: Critical)

> CombineHiveInputFormat format seems to be returning row count in the multiple 
> of Maps 
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-24819
>                 URL: https://issues.apache.org/jira/browse/HIVE-24819
>             Project: Hive
>          Issue Type: Bug
>         Environment: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
>            Reporter: Jitender Kumar
>            Priority: Major
>
> Hi Team,
> This is the first time I am writing a bug using apache Jira, so pardon me if 
> I am unintentionally breaking any protocols. 
> I am facing the following issue (on a multi-node cluster) when I set 
> hive.tez.input.format to  
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat. 
> Just for demonstration purposes, I will be executing the following query for 
> multiple cases. 
> _select count(1) from dbname.personal_data_rc tablesample(1000 rows);_
> *Case1*
> mapred.map.tasks=2
> hive.tez.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat
> *Output*
> 1000
> *Case 2*
> mapred.map.tasks=2
> hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
> *Output*
> 2000
> *Case 3*
> mapred.map.tasks=3
> hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
> *Output*
> 3000
> After 3 maps set as default, out remains same, i.e multiple of 3. 
> Can you help me understand why if I have TABLESAMPLE set to 1000 rows, it is 
> giving me more number of rows? Is there any other property that must be used 
> with CombineHiveInputFormat or is it an issue with CombineHiveInputFormat 
> only? 
> I have tried to look for a solution but in the end i had to come here. Please 
> share your inputs ASAP as one of our client is looking for a solution or 
> explaination regarding this? 
> For now as a workaround we have changed it to following.  
> *hive.tez.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat*
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-24819) CombineHiveInputFormat format seems to be returning row count in the multiple of Maps

Reply via email to