In a similar way, ML algorithms can be put into a Hive UDAF.  I'm working on 
this at the moment, and it's proved quite straightforward to integrate 
liblinear into a UDAF.  As Igor notes, by setting the number of reducers, you 
can set the number of parallel learners.

Robin
www.baynote.com

From: Igor Tatarinov <i...@decide.com<mailto:i...@decide.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Thursday, January 17, 2013 1:29 PM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Re: question about machine learning on Hive

Here is how Twitter does it with Pig:
http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf

We use a similar approach and I think that Pig, being somewhat lower-level with 
better support of nested objects, is a better tool than Hive. It should be 
possible to do something similar with Hive but we haven't tried. The trick is 
to implement the learner as a serializer. Then, the number of reducers will 
determine how many parallel learners (bags) you can run.

igor
decide.com<http://decide.com>



On Thu, Jan 17, 2013 at 1:23 PM, qiaoresearcher 
<qiaoresearc...@gmail.com<mailto:qiaoresearc...@gmail.com>> wrote:

How to run machine learning algorithms (whatever ML algorithms) directly in 
Hive? assume the input and output already stored as Hive tables.

ps: I know mahout is available there, but would prefer run machine learning 
algorithms directly in Hive

many thanks,



Reply via email to