date:20170622

Re: Format dillema

2017-06-22 Thread Gopal Vijayaraghavan

> I kept hearing about vectorization, but later found out it was going to work > if i used ORC. Yes, it's a tautology - if you cared about performance, you'd use ORC, because ORC is the fastest format. And doing performance work to support folks who don't quite care about it, is not exactly

Re: Hive query on ORC table is really slow compared to Presto

2017-06-22 Thread Gopal Vijayaraghavan

> 1711647 -1032220119 Ok, so this is the hashCode skew issue, probably the one we already know about. https://github.com/apache/hive/commit/fcc737f729e60bba5a241cf0f607d44f7eac7ca4 String hashcode distribution is much better in master after that. Hopefully that fixes the distinct speed issue h

User already granted INSERT privilege, but hdfs permission denied

2017-06-22 Thread wuchang

The admin user of my hive is named appuser.I have create a database named wuchang_test and a table named abtestmsg. Yes , I describe the database, the OWNER NAME of this database is appuser and OWNER TYPE is USER ,just like below: 0: jdbc:hive2://hive.data.ms.netease.com:1000> describe database