Re: Hive and Impala

2015-04-27 Thread Fabio C.
If the comparison mention just MR, then is probably outdated. Hive can now run on Tez with a great improvement in performance. However I don't know about Hive+Tez vs Impala. On Mon, Apr 27, 2015 at 10:50 AM, Nitin Pawar wrote: > What use case are you trying to solve? > > On Mon, Apr 27, 2015 at

Re: Dataset for hive

2015-04-03 Thread Fabio C.
Thanks Gopal, but since it was a while ago and I didn't have to generate too much data I just run the tpc-ds generator binaries in parallel and uploaded it manually. Anyway if you want to have a look at the error: http://hortonworks.com/community/forums/topic/hive-testbench-error/ Maybe it's trivia

Re: Dataset for hive

2015-04-02 Thread Fabio C.
https://github.com/hortonworks/hive-testbench The official procedure to generate and upload the data has never worked for me (and it looks like it's not a supported software), so it could be a bit tricky to do it manually and on a single host. The good point is you already have several queries and

Re: rename a database

2015-03-27 Thread Fabio C.
Maybe they just typed time_shit instead of time_shift and found it out after 3 hours of tables compression... I don't think it's too important, but which is the workaround? I'm also interested in this. Maybe it's just a matter of metastore and one could try to explore the metastore db to change how

Parallel queries/dags running in same AM?

2015-03-09 Thread Fabio C.
Hi all, I've been using Tez on hive, and I had a chance to hear a conversation that mismatches with my present knowledge, can anyone confirm the following statement? (1)- For every TEZ AM it is possible to launch just a single query/DAG at a time. So within a given AM several DAGs can be executed o

Re: running hive on windows 7

2015-03-08 Thread Fabio C.
Maybe it's a stupid question, but did you compile hive from source? I'm not an expert too, but in this way I would expect to get the exe files somewhere... On Sun, Mar 8, 2015 at 9:44 AM, 北极星 <150201...@qq.com> wrote: > Hi > > I'm a freshman in hadoop world. After some struggling, i've successful

Launch hive scripts in PyHS2

2015-03-06 Thread Fabio C.
Hi everyone, does anybody know if it's possible to run a hive script with pyhs2? Typically I will need to set the queue name (for tez) and run a query. I see, in the example, that execute() doesn't ask for a ";" at the end of the query, so I wonder if this is possible, since the script will have it

Re: Hive on tez - fix number of tasks

2015-02-19 Thread Fabio C.
n/max-size to the same value should produce the > desired results; there can be some variances in the groups generated though > - based on the order in which HDFS gives back it's block locations. > > > On Thu, Feb 19, 2015 at 1:47 AM, Fabio C. wrote: > >> Hi everyone, &g

Hive on tez - fix number of tasks

2015-02-19 Thread Fabio C.
Hi everyone, I see that Hive on Tez dynamically chooses the number of tasks to launch for each vertex in the generated DAG according to cluster load (other than data size). For research purposes I'd like to avoid this feature since I need every query (running on the same datasets) to be executed wi