Re: BlockManager issues

2014-09-22 Thread Christoph Sawade
Hey all. We had also the same problem described by Nishkam almost in the same big data setting. We fixed the fetch failure by increasing the timeout for acks in the driver: set("spark.core.connection.ack.wait.timeout", "600") // 10 minutes timeout for acks between nodes Cheers, Christoph 2014-09

Re: I want to contribute MLlib two quality measures(ARHR and HR) for top N recommendation system. Is this meaningful?

2014-09-19 Thread Christoph Sawade
Hey Deb, NDCG is the "Normalized Discounted Cumulative Gain" [1]. Another popular measure is "Expected Reciprocal Rank" (ERR) [2]; it is based on a probabilistic user model, where the user scans the presented list of search results or recommendations and chooses the first that is sufficiently rele

Re: Adding abstraction in MLlib

2014-09-12 Thread Christoph Sawade
I totally agree, and we discovered also some drawbacks with the classification models implementation that are based on GLMs: - There is no distinction between predicting scores, classes, and calibrated scores (probabilities). For these models it is common to have access to all of them and the pred