Re: ALS implementation

2015-06-09 Thread Till Rohrmann
I think I found the possible error. I suspect that the empirical risk calculation causes the problem with the *Hash join exceeded maximum number of recursions*. What you do for this calculation is to provide the training data set DataSet[(Int, Int, Double)] and you calculate for each item the predi

Re: ALS implementation

2015-06-08 Thread Till Rohrmann
Hi Felix, I tried to reproduce the problem with the *Hash join exceeded maximum number of recursions, without reducing partitions enough to be memory resident.* exception. I used the same data set and the same settings for ALS. However, on my machine it runs through without this exception. Could yo

Re: ALS implementation

2015-06-05 Thread Till Rohrmann
I'll look into it to find the responsible join operation. On Jun 5, 2015 10:50 AM, "Stephan Ewen" wrote: > There are two different issues here: > > 1) Flink does figure out how much memory a join gets, but that memory may > be too little for the join to accept it. Flink plans highly conservative

Re: ALS implementation

2015-06-05 Thread Stephan Ewen
There are two different issues here: 1) Flink does figure out how much memory a join gets, but that memory may be too little for the join to accept it. Flink plans highly conservative right now - too conservative often, which is something we have on the immediate roadmap to fix. 2) The "Hash Join

Re: ALS implementation

2015-06-05 Thread Fabian Hueske
Hi, the problem with the "maximum number of recursions" is the distribution of join keys. If a partition does not fit into memory, HybridHashJoin tries to solve this problem by recursively partitioning the partition using a different hash function. If join keys are heavily skewed, this strategy mi

Re: ALS implementation

2015-06-05 Thread Felix Neutatz
Shouldn't Flink figure it out on its own, how much memory there is for the join? The detailed trace for the Nullpointer exception can be found here: https://github.com/FelixNeutatz/IMPRO-3.SS15/blob/8b679f1c2808a2c6d6900824409fbd47e8bed826/NullPointerException.txt Best regards, Felix 2015-06-04

Re: ALS implementation

2015-06-04 Thread Till Rohrmann
I think it is not a problem of join hints, but rather of too little memory for the join operator. If you set the temporary directory, then the job will be split in smaller parts and thus each operator gets more memory. Alternatively, you can increase the memory you give to the Task Managers. The p

Re: ALS implementation

2015-06-04 Thread Chiwan Park
question is, which join in the ALS implementation is the problem :) > > 2015-06-04 19:09 GMT+02:00 Andra Lungu : > >> Hi Felix, >> >> Passing a JoinHint to your function should help. >> see: >> >> http://ma

Re: ALS implementation

2015-06-04 Thread Felix Neutatz
now the question is, which join in the ALS implementation is the problem :) 2015-06-04 19:09 GMT+02:00 Andra Lungu : > Hi Felix, > > Passing a JoinHint to your function should help. > see: > > http://mail-archives.apache.org/mod_mbox/flin

Re: ALS implementation

2015-06-04 Thread Andra Lungu
Hi Felix, Passing a JoinHint to your function should help. see: http://mail-archives.apache.org/mod_mbox/flink-user/201504.mbox/%3ccanc1h_vffbqyyiktzcdpihn09r4he4oluiursjnci_rwc+c...@mail.gmail.com%3E Cheers, Andra On Thu, Jun 4, 2015 at 7:07 PM, Felix Neutatz wrote: > after bug fix: > > for 1

Re: ALS implementation

2015-06-04 Thread Felix Neutatz
after bug fix: for 100 blocks and standard jvm heap space Caused by: java.lang.RuntimeException: Hash join exceeded maximum number of recursions, without reducing partitions enough to be memory resident. Probably cause: Too many duplicate keys. at org.apache.flink.runtime.operators.hash.MutableHa

Re: ALS implementation

2015-06-04 Thread Felix Neutatz
Yes, I will try it again with the newest update :) 2015-06-04 10:17 GMT+02:00 Till Rohrmann : > If the first error is not fixed by Chiwans PR, then we should create a JIRA > for it to not forget it. > > @Felix: Chiwan's PR is here [1]. Could you try to run ALS again with this > version? > > Cheer

Re: ALS implementation

2015-06-04 Thread Till Rohrmann
If the first error is not fixed by Chiwans PR, then we should create a JIRA for it to not forget it. @Felix: Chiwan's PR is here [1]. Could you try to run ALS again with this version? Cheers, Till [1] https://github.com/apache/flink/pull/751 On Thu, Jun 4, 2015 at 10:10 AM, Chiwan Park wrote:

Re: ALS implementation

2015-06-04 Thread Chiwan Park
Hi. The second bug is fixed by the recent change in PR. But there is just no test case for first bug. Regards, Chiwan Park > On Jun 4, 2015, at 5:09 PM, Ufuk Celebi wrote: > > I think both are bugs. They are triggered by the different memory > configurations. > > @chiwan: is the 2nd error fixe

Re: ALS implementation

2015-06-04 Thread Ufuk Celebi
I think both are bugs. They are triggered by the different memory configurations. @chiwan: is the 2nd error fixed by your recent change? @felix: if yes, can you try the 2nd run again with the changes? On Thursday, June 4, 2015, Felix Neutatz wrote: > Hi, > > I played a bit with the ALS recomme

ALS implementation

2015-06-04 Thread Felix Neutatz
Hi, I played a bit with the ALS recommender algorithm. I used the movielens dataset: http://files.grouplens.org/datasets/movielens/ml-latest-README.html The rating matrix has 21.063.128 entries (ratings). I run the algorithm with 3 configurations: 1. standard jvm heap space: val als = ALS()