Re: Hive on TEZ + LLAP

2016-07-15 Thread Long, Andrew
Amazon AWS has recently released EMR with Hive + Tez as well. Cheers Andrew From: Jörn Franke Reply-To: "user@hive.apache.org" Date: Friday, July 15, 2016 at 8:36 AM To: "user@hive.apache.org" Subject: Re: Hive on TEZ + LLAP I would recommend a distribution such as Hortonworks were everything

Re: Implementing a custom StorageHandler

2016-07-06 Thread Long, Andrew
number of splits by specifying the lower and upper bounds on the size of splits. Heads up #2. Most FileInputFormat implementations will give you 1 or more splits for each file in the input set. Hive will try to use a Combine input format, which combines small files/splits into larger splits. hth Ga

Possible Bug: to_date("2015-01-15") returns a string

2016-06-30 Thread Long, Andrew
Hello Everyone, I ran into this unusual behavior while converting a date string into a date. I was surprised to find out that to_date will occasionally return a string. Does this make sense? Cheers Andrew hive> CREATE TEMPORARY TABLE datebug > AS SELECT to_date("2015-01-10"); Query ID =

Implementing a custom StorageHandler

2016-06-27 Thread Long, Andrew
Hello everyone, I’m in the process of implementing a custom StorageHandler and I had some questions. 1) What is the difference between org.apache.Hadoop.mapred.InputFormat and org.apache.hadoop.mapreduce.InputFormat? 2) How is numSpits calculated in org.apache.Hadoop.mapred.Input

Re: How does tez calculate the number of Mappers/Reducers?

2016-06-25 Thread Long, Andrew
Thanks once again! “But everything starts off by calling InputFormat::getSplits()” Correct me if I’m wrong but at this point isn’t the number of splits calculated? @Override public InputSplit[] getSplits(JobConf conf, int numSplits) throws IOException { …. After this the splits are then g

Re: How does tez calculate the number of Mappers/Reducers?

2016-06-24 Thread Long, Andrew
Ah that makes sense. Thanks again for all the help. Do you know how the number of splits is calculated? I also noticed a couple unusual things in our Splits(as seen below). Primarily getLength() always return 0l, which I’m guessing is possibly causing other problems as well. Also our getSpli

How does tez calculate the number of Mappers/Reducers?

2016-06-24 Thread Long, Andrew
Hello everyone, How does Tez calculate the number of mappers and reducers? We have a custom StorageHandler, that when used with tez miscalculates the number of mappers when doing a join. I’ve included an EXPLAIN EXTENDED of a sample query below. One thing I have noticed is that under propert