Hi Jason,
Hive is a data warehouse system that sits on top of Hadoop. The key selling
point here is that it allows users to write SQL-like queries to query their
large scale data. These queries get compiled into Map Reduce which is then
run on the Hadoop cluster just like any other Map Reduce jobs.

Hadoop does all the parallel processing for you. All you have to do is set
up a Hadoop cluster, install Hive on the cluster and run your Hive queries.
All underlying processing will happen in parallel where possible.

This is a good place to get started and learn more about Hive:
https://cwiki.apache.org/confluence/display/Hive/GettingStarted

Welcome and good luck!

Mark

On Thu, Jun 7, 2012 at 10:10 PM, jason Yang <lin.yang.ja...@gmail.com>wrote:

> Hi, dear friends.
>
> I was wondering what's the popular way to do data mining on Hive?
>
> Since the data in Hive is distributed over the cluster, is there any tool
> or solution could parallelize the data mining?
>
> Any suggestion would be appreciated.
>
> --
> YANG, Lin
>
>

Reply via email to