Thanks, Deb. But I'm looking at org.apache.spark.examples.SparkALS, which is not in the mllib examples, and does not take any file parameters.
I don't see the class you refer to in the examples ...however, if I did want to run that example, where would I find the file in question? It would be great if this were documented, perhaps in the source code. I'll add a JIRA. Thanks, Diana On Mon, Apr 28, 2014 at 1:41 PM, Debasish Das <debasish.da...@gmail.com>wrote: > Diana, > > Here are the parameters: > > ./bin/spark-class org.apache.spark.mllib.recommendation.ALS > Usage: ALS <master> <ratings_file> <rank> <iterations> <output_dir> > [<lambda>] [<implicitPrefs>] [<alpha>] [<blocks>] > > Master: Local/Deployed spark cluster master > ratings_file: Netflix format data > rank: Reduced dimension of the User and Product factors > iterations: How many ALS iterations you would like to run > output_dir: Where to generate the usera and product factors > > lambda: regularization for nuclear norm > implicitPrefs: true will run Koren's netflix prize paper's implicit > algorithm > alpha: is valid for implicitPrefs > blocks: How many blocks you want to partition your rating, user and > product factor matrix > > I have run with 64 GB JVM with 20M users, 1.6M ratings and 50 > factors....you should be able to go even beyond that if you want to > increase the JVM size... > > The scalability issue comes from the fact that each JVM has to collect > either user/product factors before doing a BLAS posv solve....seems like > that's the bottleneck step...but making double to float is one way to scale > it even further... > > Thanks. > Deb > > > > On Mon, Apr 28, 2014 at 10:30 AM, Diana Carroll <dcarr...@cloudera.com>wrote: > >> Hi everyone. I'm trying to run some of the Spark example code, and most >> of it appears to be undocumented (unless I'm missing something). Can >> someone help me out? >> >> I'm particularly interested in running SparkALS, which wants parameters: >> M U F iter slices >> >> What are these variables? They appear to be integers and the default >> values are 100, 500 and 10 respectively but beyond that...huh? >> >> Thanks! >> >> Diana >> > >