A hive version would help to preface this, because that matters for this (like
TEZ-3709 doesn't apply for hive-1.2).
> I’m trying to simply change the format of a very large partitioned table from
> Json to ORC. I’m finding that it is unexpectedly resource intensive,
> primarily due to a shu
Hi Elliot,
>From your description of the problem, I'm assuming that you are doing a
INSERT OVERWRITE table PARTITION(p1, p2) SELECT * FROM table
or something close, like a CREATE TABLE AS ... maybe.
If this is the case, I suspect that your shuffle phase comes from dynamic
partitioning, and in pa