Hello all,

We released a newer version of Hivemall, v0.3.2.

Hivemall provides machine learning functionality over Hive UDFs/UDAFs/UDTFs.
Hivemall is easy to use because every machine learning step is done
within HiveQL.

   https://github.com/myui/hivemall

In the latest release (v0.3.2), we introduced

   o Anomaly Detection using Local Outlier Factor, and
   o Polynomial features that is useful for non-linear
regression/classification.

Anomaly Detection in Hivemall [1] is very easy to use.

1) Just prepare a table (e.g., a table containing sensor data) as follows.

| rowid | features
|-------| ----------
| 1     | ["reflectance:0.5252967","specific_heat:0.19863537","weight:0.0"]
| 2     | 
["reflectance:0.5950446","specific_heat:0.09166764","weight:0.052084323"]
| 3     | 
["reflectance:0.6797837","specific_heat:0.12567581","weight:0.13255163"]
| 4     | ...

2) Run a query to find top-K outliers. Then, you can get outlier candidates.

| rowid | LOF value
| ----- | -------------
|  87   | 3.031143750623693  (<- rowid 87 is outlier is this case)
|  16   | 1.975556449228491
|  1    | 1.8415763677073722

Hope you enjoy the release! Feedback and pull requests are welcome.

Last but not least, we have changed the license of Hivemall from LGPL
v2 to Apache License v2 since v0.3.1.

[1] 
https://github.com/myui/hivemall/wiki/Outlier-Detection-using-Local-Outlier-Factor

Thanks,
Makoto

--
Makoto YUI
Research Engineer, Treasure Data, Inc.
http://myui.github.io/

Reply via email to