> I kept hearing about vectorization, but later found out it was going to work
> if i used ORC.
Yes, it's a tautology - if you cared about performance, you'd use ORC, because
ORC is the fastest format.
And doing performance work to support folks who don't quite care about it, is
not exactly
> 1711647 -1032220119
Ok, so this is the hashCode skew issue, probably the one we already know about.
https://github.com/apache/hive/commit/fcc737f729e60bba5a241cf0f607d44f7eac7ca4
String hashcode distribution is much better in master after that. Hopefully
that fixes the distinct speed issue h
The admin user of my hive is named appuser.I have create a database named
wuchang_test and a table named abtestmsg. Yes , I describe the database, the
OWNER NAME of this database is appuser and OWNER TYPE is USER ,just like below:
0: jdbc:hive2://hive.data.ms.netease.com:1000> describe database