Rajesh Balamohan created HIVE-26975:
---------------------------------------
Summary: MERGE: Wrong reducer estimate causing smaller files to be
created
Key: HIVE-26975
URL: https://issues.apache.org/jira/browse/HIVE-26975
Project: Hive
Issue Type: Improvement
Components: Iceberg integration
Reporter: Rajesh Balamohan
* "Merge into" estimates wrong number of reducers causing more number of small
files to be created.* e.g 400+ files in 3+ MB file each.*
* This can be reproduced by writing data into "store_sales" table in iceberg
format via another source table (using merge-into).
** e.g Running this few times will create wrong number of reduce tasks
causing lot of small files to be created in iceberg table.
{noformat}
MERGE INTO store_sales_t t
using ssv s
ON ( t.ss_item_sk = s.ss_item_sk
AND t.ss_customer_sk = s.ss_customer_sk
AND t.ss_sold_date_sk = "2451181"
AND ( ( Floor(( s.ss_item_sk ) / 1000) * 1000 ) BETWEEN 1000 AND 2000 )
AND s.ss_ext_discount_amt < 0.0 )
WHEN matched AND t.ss_ext_discount_amt IS NULL THEN
UPDATE SET ss_ext_discount_amt = 0.0
WHEN NOT matched THEN
INSERT ( ss_sold_time_sk,
ss_item_sk,
ss_customer_sk,
ss_cdemo_sk,
ss_hdemo_sk,
ss_addr_sk,
ss_store_sk,
ss_promo_sk,
ss_ticket_number,
ss_quantity,
ss_wholesale_cost,
ss_list_price,
ss_sales_price,
ss_ext_discount_amt,
ss_ext_sales_price,
ss_ext_wholesale_cost,
ss_ext_list_price,
ss_ext_tax,
ss_coupon_amt,
ss_net_paid,
ss_net_paid_inc_tax,
ss_net_profit,
ss_sold_date_sk )
VALUES ( s.ss_sold_time_sk,
s.ss_item_sk,
s.ss_customer_sk,
s.ss_cdemo_sk,
s.ss_hdemo_sk,
s.ss_addr_sk,
s.ss_store_sk,
s.ss_promo_sk,
s.ss_ticket_number,
s.ss_quantity,
s.ss_wholesale_cost,
s.ss_list_price,
s.ss_sales_price,
s.ss_ext_discount_amt,
s.ss_ext_sales_price,
s.ss_ext_wholesale_cost,
s.ss_ext_list_price,
s.ss_ext_tax,
s.ss_coupon_amt,
s.ss_net_paid,
s.ss_net_paid_inc_tax,
s.ss_net_profit,
"2451181")
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)