Hi Jason, Hive is a data warehouse system that sits on top of Hadoop. The key selling point here is that it allows users to write SQL-like queries to query their large scale data. These queries get compiled into Map Reduce which is then run on the Hadoop cluster just like any other Map Reduce jobs.
Hadoop does all the parallel processing for you. All you have to do is set up a Hadoop cluster, install Hive on the cluster and run your Hive queries. All underlying processing will happen in parallel where possible. This is a good place to get started and learn more about Hive: https://cwiki.apache.org/confluence/display/Hive/GettingStarted Welcome and good luck! Mark On Thu, Jun 7, 2012 at 10:10 PM, jason Yang <lin.yang.ja...@gmail.com>wrote: > Hi, dear friends. > > I was wondering what's the popular way to do data mining on Hive? > > Since the data in Hive is distributed over the cluster, is there any tool > or solution could parallelize the data mining? > > Any suggestion would be appreciated. > > -- > YANG, Lin > >