[ https://issues.apache.org/jira/browse/HIVE-14633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449148#comment-15449148 ]
Abhishek Somani commented on HIVE-14633: ---------------------------------------- I think number of mappers can be controlled via other means like split size configurations, using CombineHiveInputFormat etc. Is this a Tez usecase? Tez does split grouping as well which should lead to lesser mappers. > #.of Files in a partition ! = #.Of buckets in a partitioned,bucketed table > -------------------------------------------------------------------------- > > Key: HIVE-14633 > URL: https://issues.apache.org/jira/browse/HIVE-14633 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 1.2.1 > Environment: HDP 2.3.2 > Reporter: Hanu > > Ideally the number of files should be equal to number of buckets declared in > a table DDL. It is working fine whenever an initial insert or every insert > overwrite is performed. But, insert into hive bucketed table is creating > extra files. > ex: > # of Buckets = 4 > No. of files after Initial insert --> 4 > No. of files after 2nd insert --> 8 > No. of files after 3rd insert --> 12 > No. of files after n insert --> n* # of Buckets. > First insert list : > -rwxrwxrwx 3 hvallur hdfs 49 2016-08-25 12:42 > hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000000_0 > -rwxrwxrwx 3 hvallur hdfs 49 2016-08-25 12:42 > hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000001_0 > -rwxrwxrwx 3 hvallur hdfs 308 2016-08-25 12:42 > hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000002_0 > -rwxrwxrwx 3 hvallur hdfs 49 2016-08-25 12:42 > hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000003_0 > 2nd Insert: > -rwxrwxrwx 3 hvallur hdfs 49 2016-08-25 12:42 > hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000000_0 > -rwxrwxrwx 3 hvallur hdfs 49 2016-08-25 12:47 > hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000000_0_copy_1 > -rwxrwxrwx 3 hvallur hdfs 49 2016-08-25 12:42 > hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000001_0 > -rwxrwxrwx 3 hvallur hdfs 49 2016-08-25 12:47 > hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000001_0_copy_1 > -rwxrwxrwx 3 hvallur hdfs 308 2016-08-25 12:42 > hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000002_0 > -rwxrwxrwx 3 hvallur hdfs 302 2016-08-25 12:47 > hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000002_0_copy_1 > -rwxrwxrwx 3 hvallur hdfs 49 2016-08-25 12:42 > hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000003_0 > -rwxrwxrwx 3 hvallur hdfs 49 2016-08-25 12:47 > hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/000003_0_copy_1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)