Re: How to apply data mining on Hive?

jason Yang Sat, 09 Jun 2012 09:41:48 -0700

Dear Mark and Sukhendu,

Thank you very much for your advice, I will look at the ways you guys
mentioned.


2012/6/9 Sukhendu Chakraborty <sukhendu.chakrabo...@gmail.com>

> If you are interested, you can also look at Apache hama which provides an
> MPI like interface on top of hadoop map-reduce.
>
> http://incubator.apache.org/hama/
> On Jun 8, 2012 4:55 PM, "Mark Grover" <grover.markgro...@gmail.com> wrote:
>
>> Hi Jason,
>> Hive does expose a JDBC interface which can by tools and applications.
>> You would check out individual tools to see if they support Hadoop (I use
>> the word Hadoop and not Hive since an application doesn't need Hive to run
>> Map Reduce jobs on data in HDFS).
>>
>> Apache Mahout, as Sreenath, mentioned is also an interesting open source
>> project which combines canonical machine learning algorithms with the power
>> of Hadoop. That might fit your bill too.
>>
>> Good luck,
>> Mark
>>
>> On Fri, Jun 8, 2012 at 1:25 AM, jason Yang <lin.yang.ja...@gmail.com>wrote:
>>
>>> Hi, Mark.
>>>
>>> Thank you for your reply.
>>>
>>> I have read the User Guide, but I'm still wondering what can I do for
>>> the following scenario:
>>> ----
>>> 1. Suppose I have  a table t_customer_info in Hive, which include lots
>>> of information about our customers.
>>> 2. Now I would like to cluster those customers into different groups so
>>> that customers within a group have high similarity, but are very dissimilar
>>> to customers in other groups.
>>> 3. This is a classical clustering problem in Data Mining field, I
>>> thought such job can not be done by query language, instead of some data
>>> mining algorithms.
>>> ----
>>>
>>> When we look "back" to the traditional DBMS, there're lots of data
>>> mining tools or BI tools which could connect to the DBMS, and apply some
>>> canonical algorithms to the data in the DBMS. So I start to wonder is there
>>> similar tools over Hive?
>>>
>>> If not, what's the most used way to do data mining over Hadoop?
>>>
>>> 2012/6/8 Mark Grover <grover.markgro...@gmail.com>
>>>
>>>> Hi Jason,
>>>> Hive is a data warehouse system that sits on top of Hadoop. The key
>>>> selling point here is that it allows users to write SQL-like queries to
>>>> query their large scale data. These queries get compiled into Map Reduce
>>>> which is then run on the Hadoop cluster just like any other Map Reduce 
>>>> jobs.
>>>>
>>>> Hadoop does all the parallel processing for you. All you have to do is
>>>> set up a Hadoop cluster, install Hive on the cluster and run your Hive
>>>> queries. All underlying processing will happen in parallel where possible.
>>>>
>>>> This is a good place to get started and learn more about Hive:
>>>> https://cwiki.apache.org/confluence/display/Hive/GettingStarted
>>>>
>>>> Welcome and good luck!
>>>>
>>>> Mark
>>>>
>>>>
>>>> On Thu, Jun 7, 2012 at 10:10 PM, jason Yang 
>>>> <lin.yang.ja...@gmail.com>wrote:
>>>>
>>>>> Hi, dear friends.
>>>>>
>>>>> I was wondering what's the popular way to do data mining on Hive?
>>>>>
>>>>> Since the data in Hive is distributed over the cluster, is there any
>>>>> tool or solution could parallelize the data mining?
>>>>>
>>>>> Any suggestion would be appreciated.
>>>>>
>>>>> --
>>>>> YANG, Lin
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> YANG, Lin
>>>
>>>
>>


-- 
YANG, Lin

Re: How to apply data mining on Hive?

Reply via email to