I created a hive table, use insert select to load existing impala data to hive table. I noticed 2 things.
1. The data size is more than twice the size of old data. Old data used impala to do the compression. 2. No matter how large I set parquet block size, hive always generate parquet files with similar file size. I did this before inserting. set hive.exec.dynamic.partition.mode=nonstrict; SET parquet.column.index.access=true; SET hive.merge.mapredfiles=true; SET hive.exec.compress.output=true; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; SET mapred.output.compression.type=BLOCK; SET dfs.block.size=445644800; SET parquet.block.size=445644800; Can anyone please point out what I did wrong? I'm using version Hive 1.1.0 Thank you!