Re: ALS failure with size > Integer.MAX_VALUE

2014-12-15 Thread Bharath Ravi Kumar
be integers. Specifically, the >>> >> > input >>> >> > to >>> >> > ALS is an RDD[Rating] and Rating is an (Int, Int, Double). I am >>> >> > wondering if >>> >> > perhaps one of your identifiers exceeds MAX_INT, could you

Re: ALS failure with size > Integer.MAX_VALUE

2014-12-15 Thread Xiangrui Meng
D[Rating] and Rating is an (Int, Int, Double). I am >>> >> > wondering if >>> >> > perhaps one of your identifiers exceeds MAX_INT, could you write a >>> >> > quick >>> >> > check for that? >>> >> > >

Re: ALS failure with size > Integer.MAX_VALUE

2014-12-14 Thread Bharath Ravi Kumar
; >> > perhaps one of your identifiers exceeds MAX_INT, could you write a >> quick >> >> > check for that? >> >> > >> >> > I have been running a very similar use case to yours (with more >> >> > constrained >> >>

Re: ALS failure with size > Integer.MAX_VALUE

2014-12-03 Thread Bharath Ravi Kumar
> >> > wondering if > >> > perhaps one of your identifiers exceeds MAX_INT, could you write a > quick > >> > check for that? > >> > > >> > I have been running a very similar use case to yours (with more > >> > constrained >

Re: ALS failure with size > Integer.MAX_VALUE

2014-12-02 Thread Xiangrui Meng
r that? >> > >> > I have been running a very similar use case to yours (with more >> > constrained >> > hardware resources) and I haven’t seen this exact problem but I’m sure >> > we’ve >> > seen similar issues. Please let me know if you have o

Re: ALS failure with size > Integer.MAX_VALUE

2014-12-01 Thread Bharath Ravi Kumar
. > > > > From: Bharath Ravi Kumar > > Date: Thursday, November 27, 2014 at 1:30 PM > > To: "user@spark.apache.org" > > Subject: ALS failure with size > Integer.MAX_VALUE > > > > We're training a recommender with ALS in mllib 1.1 against a dataset of

Re: ALS failure with size > Integer.MAX_VALUE

2014-11-30 Thread Sean Owen
Date: Thursday, November 27, 2014 at 1:30 PM > To: "user@spark.apache.org" > Subject: ALS failure with size > Integer.MAX_VALUE > > We're training a recommender with ALS in mllib 1.1 against a dataset of 150M > users and 4.5K items, with the total number of train

Re: ALS failure with size > Integer.MAX_VALUE

2014-11-29 Thread Ganelin, Ilya
questions. From: Bharath Ravi Kumar mailto:reachb...@gmail.com>> Date: Thursday, November 27, 2014 at 1:30 PM To: "user@spark.apache.org<mailto:user@spark.apache.org>" mailto:user@spark.apache.org>> Subject: ALS failure with size > Integer.MAX_VALUE We're training

Re: ALS failure with size > Integer.MAX_VALUE

2014-11-28 Thread Bharath Ravi Kumar
Any suggestions to address the described problem? In particular, it appears that considering the skewed degree of some of the item nodes in the graph, I believe it should be possible to define better block sizes to reflect that fact, but am unsure of the way of arriving at the sizes accordingly. T

ALS failure with size > Integer.MAX_VALUE

2014-11-27 Thread Bharath Ravi Kumar
We're training a recommender with ALS in mllib 1.1 against a dataset of 150M users and 4.5K items, with the total number of training records being 1.2 Billion (~30GB data). The input data is spread across 1200 partitions on HDFS. For the training, rank=10, and we've configured {number of user data