Re: hive plan generation

2014-07-11 Thread Abirami V
Hi Anil, If you use derby as metastore where ever you try to start hive cli it will create a metastore in that directory. First option-Use other rdbms such as postgresql, mysql, oracle ... Second option- conf.set("javax.jdo.option.ConnectionURL", "jdbc:derby:;databaseName=/tmp/metastore_db;create

inline query in the select clause

2014-07-11 Thread N. Ramasubramanian
Hi, I have below 2 tables… 1) create table dim (rank string,grade string) row format delimited fields terminated by ',' stored as textile; Data: 1,1 2,1 3,1 4,2 5,2 6,2 7,3 2) create table fact (rollno string,name string,sub1 string,rank1 string,sub2 string,rank2 string,sub3 string,rank3 str

Re: difference between partition by and distribute by in rank()

2014-07-11 Thread Eric Chu
Thanks for the responses. I understand DISTRIBUTE BY and SORT BY in the normal case (as described in the Hive doc); I just don't understand their behavior in the OVER clause with RANK, which apparently you can do. See ql/src/test/queries/clientpositive/windowing.q for example. Yes I saw Edward's B

Re: Issue while running Hive 0.13

2014-07-11 Thread Jason Dere
Looking at that error online, I see http://slf4j.org/faq.html#compatibility Maybe try to find what version of the slf libraries you have installed (in hadoop? hive?), and try updating to later version. On Jul 10, 2014, at 9:57 PM, Sarath Chandra wrote: > I'm using Hadoop 1.0.4. Suspecting so

JOIN query results not printing to cli - HELP please.

2014-07-11 Thread Sarfraz Ramay
Hi, A very strange thing is happening. I am running the TPC-H benchmark. I have loaded the tables on HDFS running in pseudo-distributed mode. When i query one table at a time select * from customer LIMIT 2; OR select * from NATION LIMIT 2; results are printed to the cli but as soon as i try somet

Re: Hive job scheduling

2014-07-11 Thread Jerome Banks
Cheng, We are working on an exciting new project called Satisfaction, to handle next generation scheduling and workflow for Hive and other Hadoop/BigData technologies. We plan to open source sometime in the near future. Stay tuned !!! --- jerome On Fri, Jul 11, 2014 at 7:02 AM, Xuefu Zhang wr

hive plan generation

2014-07-11 Thread AnilKumar B
Hi, I am trying to generate hive plan as below. But even after creating the "src" table, I am facing, Table not found Exception due to MetaStore issue. Can any one help me in resolving this? private Driver createDriver() { HiveConf conf = new HiveConf(Driver.class); conf.set("hive.metastore.war

Re: Hive job scheduling

2014-07-11 Thread Xuefu Zhang
Or you can just run CRON tasks in your OS. On Thu, Jul 10, 2014 at 4:55 PM, moon soo Lee wrote: > for simpler use, Zeppelin (http://zeppelin-project.org) runs hive query > with web based editor, and it's got cron tab style scheduler. > > Best, > moon > > > On Fri, Jul 11, 2014 at 8:52 AM, Marti

Re: beeline client

2014-07-11 Thread Xuefu Zhang
Chaudra, The difference you saw between Hive CLI and Beeline might indicate a bug. However, before making such a conclusion, could you give an example of your queries? Are the jobs you expect to run parallel for a single query? Please note that your script file is executed line by line in either c

Re: difference between partition by and distribute by in rank()

2014-07-11 Thread Joshi, Rekha
Hi, Quite known, are order and sort reducer nuances related to total order in final output. One could simulate rank over() functionality by using distribute by () /sort by() on datasets{cluster by/ if same key} as in Edward Blog

Re: difference between partition by and distribute by in rank()

2014-07-11 Thread Nitin Pawar
In general principle, distribute by ensures each of N reducers gets non-overlapping ranges of X , but doesn't sort the output of each reducer. You end up with N or unsorted files with non-overlapping ranges. So this is more of a horizontal distribution of data. In my view, Partition by is more ba

difference between partition by and distribute by in rank()

2014-07-11 Thread Eric Chu
Does anyone know what *rank() over(distribute by p_mfgr sort by p_name) * does exactly and how it's different from *rank() over(partition by p_mfgr order by p_name)*? Thanks, Eric