Upgrading Metastore schema 2.0.0->2.1.0

2016-06-29 Thread Jose Rozanec
Hi all, Upgrading DB schema from 2.0.0 to 2.1.0 is causing an error. Did anyone experience similar issues? Below we leave the command and stacktrace. Thanks, *./schematool -dbType mysql -upgradeSchemaFrom 2.0.0* Starting upgrade metastore schema from version 2.0.0 to 2.1.0 Upgrade script upgrad

Re: Hash table in map join - Hive

2016-06-29 Thread Ross Guth
Hi Gopal, I saw the log files and the hash table information in it. Thanks. Also, I enforced shuffle hash join. I had a couple of questions around it: 1. In the query plan, it still says Map Join Operator (Would have expected it to be named as Reduce side operator). 2. The edges in this query pl

Re: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-29 Thread @Sanjiv Singh
Hi Dudu, I tried the same on same table which has 6357592675 rows. See response of all three. *I tried 1st one , its giving duplicates for rows. * > CREATE TEMPORARY TABLE INTER_ETL_T AS > select * > ,cast (floor(r*100) + 1 as bigint) + (100 * (row_number () over > (partition by cast (

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-29 Thread Markovitz, Dudu
1. This is strange. The negative numbers are due to overflow of the ‘int’ type, but for that reason exactly I’ve casted the expressions in my code to ‘bigint’. I’ve tested this code before sending it to you and it worked fine, returning results that are beyond the range of the ‘int’ type. Please

RE: Implementing a custom StorageHandler

2016-06-29 Thread Lavelle, Shawn
I don’t have answers for you, except for #1 – mapreduce are the new classes in Hadoop, from my understanding. They’ve been out for a while, but the Hive storage handler API hasn’t been updated to make use of them. Which leads me to my very related question: When might hive provide a storage ha

Re: Aggregated table larger than expected

2016-06-29 Thread Matt Olson
In case someone else encounters this issue, it looks like this was due to encoding differences between the hourly and daily table. The hourly table often had the same values stored consecutively for certain columns, but the group by on multiple columns caused them to be shuffled around to different