Hello Hive, I'm a developer using Hive to process TB level data, and I'm having some difficulty loading the data to table. I have 2 tables now:
-- table_1: CREATE EXTERNAL TABLE `table_1`( `keyword` string, `domain` string, `url` string ) PARTITIONED BY (yearmonth INT, partition1 STRING) STORED AS RCfile -- table_2: CREATE EXTERNAL TABLE `table_2`( `keyword` string, `domain` string, `url` string ) PARTITIONED BY (yearmonth INT, partition2 STRING) STORED AS Parquet I'm doing an INSERT OVERWRITE to table_2 from SELECT FROM table_1 with dynamic partitioning, and the number of partitions grows dramatically from 1500 to 40k (because I want to use something else as partitioning). The mapreduce job was fine. Somehow the process stucked at " Loading data to table default.table_2 (yearmonth=null, domain_prefix=null) ", and I've been waiting for hours. Is this expected when we have 40k partitions? -------------------------------------------------------------- Refs - Here are the parameters that I used: export HADOOP_HEAPSIZE=16384 set PARQUET_FILE_SIZE=268435456; set parquet.block.size=268435456; set dfs.blocksize=268435456; set parquet.compression=SNAPPY; SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.max.dynamic.partitions=500000; SET hive.exec.max.dynamic.partitions.pernode=50000; SET hive.exec.max.created.files=1000000; Thank you very much! Tianqi Tong