Re: Total length of orc clustered table is always 2^31 in TezSplitGrouper

2018-07-25 Thread 何宝宁
Thank you Gopal for pointing the root cause. After running command alter table xxx compact ‘major’ to request a force compaction, total length is right ! Is there any way to do compact immediately after insert values. Bob He Thanks On 25 Jul 2018, at 1:45 PM, Gopal Vijayaraghavan wrote: > Sea

Re: Total length of orc clustered table is always 2^31 in TezSplitGrouper

2018-07-24 Thread Gopal Vijayaraghavan
> Search ’Total length’ in log sys_dag_xxx, it is 2147483648. This is the INT_MAX “placeholder” value for uncompacted ACID tables. This is because with ACIDv1 there is no way to generate splits against uncompacted files, so this gets “an empty bucket + unknown number of inserts + updates” plac

Total length of orc clustered table is always 2^31 in TezSplitGrouper

2018-07-24 Thread 何宝宁
Hi, When I was tuning initial mapper number with Hive+Tez, found if orc table is clustered, total length return by estimator is always 2^31. Hive: 2.3.3 Tez: 0.8.4 (TezSplitGrouper.java:197) How to replicate: create table test (f1 string, f2 string) clustered by (f1) into 1 buckets stored as