ClassCastExceptions when using Spark shell

2014-05-29 Thread Sebastian Schelter
Hi, I have trouble running some custom code on Spark 0.9.1 in standalone mode on a cluster. I built a fat jar (excluding Spark) that I'm adding to the classpath with ADD_JARS=... When I start the Spark shell, I can instantiate classes, but when I run Spark code, I get strange ClassCastExcepti

Re: Problem with the Item-Based Collaborative Filtering Recommendation Algorithms in spark

2014-04-24 Thread Sebastian Schelter
Quin, I'm not sure that I understand your source code correctly but the common problem with item-based collaborative filtering at scale is that the comparison of all pairs of item vectors needs quadratic effort and therefore does not scale. A common approach to this problem is to selectively d

Re: Re: Random Forest on Spark

2014-04-18 Thread Sebastian Schelter
Hi, Stratosphere does not have a real RF implementation yet, there is only a prototype that has been developed by students in a university course which is far from production usage at this stage. --sebastian On 04/18/2014 10:31 AM, Sean Owen wrote: Mahout RDF is fairly old code. If you try

Re: possible bug in Spark's ALS implementation...

2014-03-12 Thread Sebastian Schelter
The mahout implementation is just a straight-forward port of the paper. No changes have been made. On 03/12/2014 08:36 AM, Nick Pentreath wrote: It would be helpful to know what parameter inputs you are using. If the regularization schemes are different (by a factor of alpha, which can often b

Aggregators in GraphX

2014-03-09 Thread Sebastian Schelter
Hi, Does GraphX currently support Giraph/Pregel's "aggregator" feature? I was thinking to implement a PageRank version that is able to correctly handle dangling vertices (i.e. vertices with no outlinks). Therefore I would have to globally sum up the rank associated to them in every iteration,