Thanks, Deb.  But I'm looking at  org.apache.spark.examples.SparkALS, which
is not in the mllib examples, and does not take any file parameters.

I don't see the class you refer to in the examples ...however, if I did
want to run that example, where would I find the file in question?

It would be great if this were documented, perhaps in the source code.
 I'll add a JIRA.

Thanks,
Diana


On Mon, Apr 28, 2014 at 1:41 PM, Debasish Das <debasish.da...@gmail.com>wrote:

> Diana,
>
> Here are the parameters:
>
> ./bin/spark-class org.apache.spark.mllib.recommendation.ALS
> Usage: ALS <master> <ratings_file> <rank> <iterations> <output_dir>
> [<lambda>] [<implicitPrefs>] [<alpha>] [<blocks>]
>
> Master: Local/Deployed spark cluster master
> ratings_file: Netflix format data
> rank: Reduced dimension of the User and Product factors
> iterations: How many ALS iterations you would like to run
> output_dir: Where to generate the usera and product factors
>
> lambda: regularization for nuclear norm
> implicitPrefs: true will run Koren's netflix prize paper's implicit
> algorithm
> alpha: is valid for implicitPrefs
> blocks: How many blocks you want to partition your rating, user and
> product factor matrix
>
> I have run with 64 GB JVM with 20M users, 1.6M ratings and 50
> factors....you should be able to go even beyond that if you want to
> increase the JVM size...
>
> The scalability issue comes from the fact that each JVM has to collect
> either user/product factors before doing a BLAS posv solve....seems like
> that's the bottleneck step...but making double to float is one way to scale
> it even further...
>
> Thanks.
> Deb
>
>
>
> On Mon, Apr 28, 2014 at 10:30 AM, Diana Carroll <dcarr...@cloudera.com>wrote:
>
>> Hi everyone.  I'm trying to run some of the Spark example code, and most
>> of it appears to be undocumented (unless I'm missing something).  Can
>> someone help me out?
>>
>> I'm particularly interested in running SparkALS, which wants parameters:
>> M U F iter slices
>>
>> What are these variables?  They appear to be integers and the default
>> values are 100, 500 and 10 respectively but beyond that...huh?
>>
>> Thanks!
>>
>> Diana
>>
>
>

Reply via email to