Btw, explicit ALS doesn't need persist because each intermediate factor is only used once. -Xiangrui
On Sun, Apr 6, 2014 at 9:13 PM, Xiangrui Meng <men...@gmail.com> wrote: > The persist used in implicit ALS doesn't help StackOverflow problem. > Persist doesn't cut lineage. We need to call count() and then > checkpoint() to cut the lineage. Did you try the workaround mentioned > in https://issues.apache.org/jira/browse/SPARK-958: > > "I tune JVM thread stack size to 512k via option -Xss512k and it works." > > Best, > Xiangrui > > On Sun, Apr 6, 2014 at 10:21 AM, Debasish Das <debasish.da...@gmail.com> > wrote: >> At the head I see persist option in implicitPrefs but more cases like the >> ones mentioned above why don't we use similar technique and take an input >> that which iteration should we persist in explicit runs as well ? >> >> for (iter <- 1 to iterations) { >> // perform ALS update >> logInfo("Re-computing I given U (Iteration %d/%d)".format(iter, >> iterations)) >> products = updateFeatures(users, userOutLinks, productInLinks, >> partitioner, rank, lambda, >> alpha, YtY = None) >> logInfo("Re-computing U given I (Iteration %d/%d)".format(iter, >> iterations)) >> users = updateFeatures(products, productOutLinks, userInLinks, >> partitioner, rank, lambda, >> alpha, YtY = None) >> } >> >> Say if I want to persist at every k iterations out of N iterations of ALS >> explicit, there shoud be an option to do that...implicit right now uses >> persist at each iteration... >> >> Does this option make sense or you guys want this issue to be fixed in a >> different way... >> >> I definitely see that for my 25M x 3M run, with 64 gb executor memory, >> something is going wrong after 5-th iteration and I wanted to run for 10 >> iterations... >> >> So my k is 4/5 for this particular problem... >> >> I can ask for the PR after testing the fix on the dataset I have...I will >> also try to see if we can make such datasets public for more research... >> >> For the LDA problem mentioned earlier in this email chain, k is 10...NMF >> can generate topics similar to LDA as well...Carrot2 project uses it... >> >> >> >> On Thu, Mar 27, 2014 at 3:20 PM, Debasish Das >> <debasish.da...@gmail.com>wrote: >> >>> Hi Matei, >>> >>> I am hitting similar problems with 10 ALS iterations...I am running with >>> 24 gb executor memory on 10 nodes for 20M x 3 M matrix with rank =50 >>> >>> The first iteration of flatMaps run fine which means that the memory >>> requirements are good per iteration... >>> >>> If I do check-pointing on RDD, most likely rest 9 iterations will also run >>> fine and I will get the results... >>> >>> Is there a plan to add checkpoint option to ALS for such large >>> factorization jobs ? >>> >>> Thanks. >>> Deb >>> >>> >>> >>> >>> >>> On Tue, Jan 28, 2014 at 11:10 PM, Matei Zaharia >>> <matei.zaha...@gmail.com>wrote: >>> >>>> That would be great to add. Right now it would be easy to change it to >>>> use another Hadoop FileSystem implementation at the very least (I think you >>>> can just pass the URL for that), but for Cassandra you'd have to use a >>>> different InputFormat or some direct Cassandra access API. >>>> >>>> Matei >>>> >>>> On Jan 28, 2014, at 5:02 PM, Evan Chan <e...@ooyala.com> wrote: >>>> >>>> > By the way, is there any plan to make a pluggable backend for >>>> > checkpointing? We might be interested in writing a, for example, >>>> > Cassandra backend. >>>> > >>>> > On Sat, Jan 25, 2014 at 9:49 PM, Xia, Junluan <junluan....@intel.com> >>>> wrote: >>>> >> Hi all >>>> >> >>>> >> The description about this Bug submitted by Matei is as following >>>> >> >>>> >> >>>> >> The tipping point seems to be around 50. We should fix this by >>>> checkpointing the RDDs every 10-20 iterations to break the lineage chain, >>>> but checkpointing currently requires HDFS installed, which not all users >>>> will have. >>>> >> >>>> >> We might also be able to fix DAGScheduler to not be recursive. >>>> >> >>>> >> >>>> >> regards, >>>> >> Andrew >>>> >> >>>> > >>>> > >>>> > >>>> > -- >>>> > -- >>>> > Evan Chan >>>> > Staff Engineer >>>> > e...@ooyala.com | >>>> >>>> >>>