Hi, Mark.

Thank you for your reply.

I have read the User Guide, but I'm still wondering what can I do for the
following scenario:
----
1. Suppose I have  a table t_customer_info in Hive, which include lots of
information about our customers.
2. Now I would like to cluster those customers into different groups so
that customers within a group have high similarity, but are very dissimilar
to customers in other groups.
3. This is a classical clustering problem in Data Mining field, I thought
such job can not be done by query language, instead of some data mining
algorithms.
----

When we look "back" to the traditional DBMS, there're lots of data mining
tools or BI tools which could connect to the DBMS, and apply some canonical
algorithms to the data in the DBMS. So I start to wonder is there similar
tools over Hive?

If not, what's the most used way to do data mining over Hadoop?

2012/6/8 Mark Grover <grover.markgro...@gmail.com>

> Hi Jason,
> Hive is a data warehouse system that sits on top of Hadoop. The key
> selling point here is that it allows users to write SQL-like queries to
> query their large scale data. These queries get compiled into Map Reduce
> which is then run on the Hadoop cluster just like any other Map Reduce jobs.
>
> Hadoop does all the parallel processing for you. All you have to do is set
> up a Hadoop cluster, install Hive on the cluster and run your Hive queries.
> All underlying processing will happen in parallel where possible.
>
> This is a good place to get started and learn more about Hive:
> https://cwiki.apache.org/confluence/display/Hive/GettingStarted
>
> Welcome and good luck!
>
> Mark
>
>
> On Thu, Jun 7, 2012 at 10:10 PM, jason Yang <lin.yang.ja...@gmail.com>wrote:
>
>> Hi, dear friends.
>>
>> I was wondering what's the popular way to do data mining on Hive?
>>
>> Since the data in Hive is distributed over the cluster, is there any tool
>> or solution could parallelize the data mining?
>>
>> Any suggestion would be appreciated.
>>
>> --
>> YANG, Lin
>>
>>
>


-- 
YANG, Lin

Reply via email to