subject:"Re\: Optimal approach for changing file format of a partitioned table"

Re: Optimal approach for changing file format of a partitioned table

2018-08-06 Thread Gopal Vijayaraghavan

A hive version would help to preface this, because that matters for this (like TEZ-3709 doesn't apply for hive-1.2). > I’m trying to simply change the format of a very large partitioned table from > Json to ORC. I’m finding that it is unexpectedly resource intensive, > primarily due to a shu

Re: Optimal approach for changing file format of a partitioned table

2018-08-06 Thread Furcy Pin

Hi Elliot, >From your description of the problem, I'm assuming that you are doing a INSERT OVERWRITE table PARTITION(p1, p2) SELECT * FROM table or something close, like a CREATE TABLE AS ... maybe. If this is the case, I suspect that your shuffle phase comes from dynamic partitioning, and in pa