Re: How to apply data mining on Hive?

2012-06-07 Thread Sreenath Menon
Kindly check out Apache Mahout and whether it satisfies your needs.

Re: Multi-group-by select always scans entire table

2012-06-07 Thread Jan Dolinár
Thank you very much Mark for your investigation and explanations. I'm well aware of the fact that hadoop 0.7.1 is quite an old code and that newer version might perform better - that is the main reason I discussed it here instead of reporting it as a bug. For now it doesn't bother me, as I have th

Re: How to apply data mining on Hive?

2012-06-07 Thread jason Yang
Hi, Mark. Thank you for your reply. I have read the User Guide, but I'm still wondering what can I do for the following scenario: 1. Suppose I have a table t_customer_info in Hive, which include lots of information about our customers. 2. Now I would like to cluster those customers into dif

Hive and thrift 0.8.0

2012-06-07 Thread kulkarni.swar...@gmail.com
Is the latest hive release 0.9.0 compatible with thrift 0.8 or do we need to recompile and rebuild the package ourselves to make it compatible? Currently it seems to depend on libthrift-0.7. Thanks for the help. Swarnim

Re: How to apply data mining on Hive?

2012-06-07 Thread Mark Grover
Hi Jason, Hive is a data warehouse system that sits on top of Hadoop. The key selling point here is that it allows users to write SQL-like queries to query their large scale data. These queries get compiled into Map Reduce which is then run on the Hadoop cluster just like any other Map Reduce jobs.

Re: Multi-group-by select always scans entire table

2012-06-07 Thread Mark Grover
Hi Jan, I did some testing for this on Apache Hive 0.9 and I have boiled it down the following: Predicate pushdown seems to work for single-insert queries using LATERAL VIEW. It also seems to work for multi-insert queries NOT using LATERAL VIEW. However, it doesn't work for multi-insert queries usi

How to apply data mining on Hive?

2012-06-07 Thread jason Yang
Hi, dear friends. I was wondering what's the popular way to do data mining on Hive? Since the data in Hive is distributed over the cluster, is there any tool or solution could parallelize the data mining? Any suggestion would be appreciated. -- YANG, Lin

Re: Unable to create sample tables in Hive

2012-06-07 Thread Bejoy Ks
Hi Soham The error looks like your meta store doesn't have the required tables . Try enabling autoCreate database in your connection url. FOr derby metastore It'll look like   javax.jdo.option.ConnectionURL   jdbc:derby:;databaseName=metastore_db;create=true   JDBC connect string for a JDBC me

Re: Unable to create sample tables in Hive

2012-06-07 Thread Nanda Vijaydev
There is nothing wrong with your SQL statement. This works on CLI fine and I tried the following. Your issue seems to be related to the underlying metadata store. CREATE TABLE page_view(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT 'IP Address of the

Re: Multi-group-by select always scans entire table

2012-06-07 Thread Jan Dolinár
On 6/7/12, Mark Grover wrote: > Can you please check if predicate push down enabled changes the explain > plan on a simple inner join query like: > > select a.* from a inner join b on(a.key=b.key) where a.some_col=blah; No problem, I ran following as you suggested (INNER JOIN didn't work for me,

Re: Multi-group-by select always scans entire table

2012-06-07 Thread Mark Grover
Hi Jan, Thanks for the analysis. Yes, it's true that optimize ppd will push predicates to be evaluated earlier. The only catch there is that predicates cannot be pushed across constructs that change the data in the query. An example of this is having a predicate (say of the form 'where Col is not N