Amazon AWS has recently released EMR with Hive + Tez as well.
Cheers Andrew
From: Jörn Franke
Reply-To: "user@hive.apache.org"
Date: Friday, July 15, 2016 at 8:36 AM
To: "user@hive.apache.org"
Subject: Re: Hive on TEZ + LLAP
I would recommend a distribution such as Hortonworks were everything
number of
splits by specifying the lower and upper bounds on the size of splits.
Heads up #2. Most FileInputFormat implementations will give you 1 or more
splits for each file in the input set. Hive will try to use a Combine input
format, which combines small files/splits into larger splits.
hth
Ga
Hello Everyone,
I ran into this unusual behavior while converting a date string into a date. I
was surprised to find out that to_date will occasionally return a string.
Does this make sense?
Cheers Andrew
hive> CREATE TEMPORARY TABLE datebug
> AS SELECT to_date("2015-01-10");
Query ID =
Hello everyone,
I’m in the process of implementing a custom StorageHandler and I had some
questions.
1) What is the difference between org.apache.Hadoop.mapred.InputFormat
and org.apache.hadoop.mapreduce.InputFormat?
2) How is numSpits calculated in
org.apache.Hadoop.mapred.Input
Thanks once again!
“But everything starts off by calling InputFormat::getSplits()”
Correct me if I’m wrong but at this point isn’t the number of splits calculated?
@Override
public InputSplit[] getSplits(JobConf conf, int numSplits) throws
IOException {
….
After this the splits are then g
Ah that makes sense. Thanks again for all the help.
Do you know how the number of splits is calculated?
I also noticed a couple unusual things in our Splits(as seen below). Primarily
getLength() always return 0l, which I’m guessing is possibly causing other
problems as well. Also our getSpli
Hello everyone,
How does Tez calculate the number of mappers and reducers? We have a custom
StorageHandler, that when used with tez miscalculates the number of mappers
when doing a join. I’ve included an EXPLAIN EXTENDED of a sample query below.
One thing I have noticed is that under propert