Hello all, We released a newer version of Hivemall, v0.3.2.
Hivemall provides machine learning functionality over Hive UDFs/UDAFs/UDTFs. Hivemall is easy to use because every machine learning step is done within HiveQL. https://github.com/myui/hivemall In the latest release (v0.3.2), we introduced o Anomaly Detection using Local Outlier Factor, and o Polynomial features that is useful for non-linear regression/classification. Anomaly Detection in Hivemall [1] is very easy to use. 1) Just prepare a table (e.g., a table containing sensor data) as follows. | rowid | features |-------| ---------- | 1 | ["reflectance:0.5252967","specific_heat:0.19863537","weight:0.0"] | 2 | ["reflectance:0.5950446","specific_heat:0.09166764","weight:0.052084323"] | 3 | ["reflectance:0.6797837","specific_heat:0.12567581","weight:0.13255163"] | 4 | ... 2) Run a query to find top-K outliers. Then, you can get outlier candidates. | rowid | LOF value | ----- | ------------- | 87 | 3.031143750623693 (<- rowid 87 is outlier is this case) | 16 | 1.975556449228491 | 1 | 1.8415763677073722 Hope you enjoy the release! Feedback and pull requests are welcome. Last but not least, we have changed the license of Hivemall from LGPL v2 to Apache License v2 since v0.3.1. [1] https://github.com/myui/hivemall/wiki/Outlier-Detection-using-Local-Outlier-Factor Thanks, Makoto -- Makoto YUI Research Engineer, Treasure Data, Inc. http://myui.github.io/