Vasia Kalavri created FLINK-2452:
------------------------------------

             Summary: Add a playcount threshold to the MusicProfiles examples
                 Key: FLINK-2452
                 URL: https://issues.apache.org/jira/browse/FLINK-2452
             Project: Flink
          Issue Type: Improvement
          Components: Gelly
    Affects Versions: 0.10
            Reporter: Vasia Kalavri
            Assignee: Vasia Kalavri
            Priority: Minor


In the MusicProfiles example, when creating the user-user similarity graph, an 
edge is created between any 2 users that have listened to the same song (even 
if once). Depending on the input data, this might produce a projection graph 
with many more edges than the original user-song graph.
To make this computation more efficient, this issue proposes adding a 
user-defined parameter that filters out songs that a user has listened to only 
a few times. Essentially, it is a threshold for playcount, above which a user 
is considered to like a song.

For reference, with a threshold value of 30, the whole Last.fm dataset is 
analyzed on my laptop in a few minutes, while no threshold results in a runtime 
of several hours.

There are many solutions to this problem, but since this is just an example 
(not a library method), I think that keeping it simple is important.

Thanks to [~andralungu] for spotting the inefficiency!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to