[ https://issues.apache.org/jira/browse/FLINK-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vasia Kalavri resolved FLINK-2452. ---------------------------------- Resolution: Fixed > Add a playcount threshold to the MusicProfiles example > ------------------------------------------------------ > > Key: FLINK-2452 > URL: https://issues.apache.org/jira/browse/FLINK-2452 > Project: Flink > Issue Type: Improvement > Components: Gelly > Affects Versions: 0.10 > Reporter: Vasia Kalavri > Assignee: Vasia Kalavri > Priority: Minor > Fix For: 0.10 > > > In the MusicProfiles example, when creating the user-user similarity graph, > an edge is created between any 2 users that have listened to the same song > (even if once). Depending on the input data, this might produce a projection > graph with many more edges than the original user-song graph. > To make this computation more efficient, this issue proposes adding a > user-defined parameter that filters out songs that a user has listened to > only a few times. Essentially, it is a threshold for playcount, above which a > user is considered to like a song. > For reference, with a threshold value of 30, the whole Last.fm dataset is > analyzed on my laptop in a few minutes, while no threshold results in a > runtime of several hours. > There are many solutions to this problem, but since this is just an example > (not a library method), I think that keeping it simple is important. > Thanks to [~andralungu] for spotting the inefficiency! -- This message was sent by Atlassian JIRA (v6.3.4#6332)