Gábor Hermann created FLINK-4613:
------------------------------------
Summary: Extend ALS to handle implicit feedback datasets
Key: FLINK-4613
URL: https://issues.apache.org/jira/browse/FLINK-4613
Project: Flink
Issue Type: New Feature
Components: Machine Learning Library
Reporter: Gábor Hermann
Assignee: Gábor Hermann
The Alternating Least Squares implementation should be extended to handle
_implicit feedback_ datasets. These datasets do not contain explicit ratings by
users, they are rather built by collecting user behavior (e.g. user listened to
artist X for Y minutes), and they require a slightly different optimization
objective. See details by [Hu et al|http://dx.doi.org/10.1109/ICDM.2008.22].
We do not need to modify much in the original ALS algorithm. See [Spark ALS
implementation|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala],
which could be a basis for this extension. Only the updating factor part is
modified, and most of the changes are in the local parts of the algorithm (i.e.
UDFs). In fact, the only modification that is not local, is precomputing a
matrix product Y^T * Y and broadcasting it to all the nodes, which we can do
with broadcast DataSets.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)