RE: About StorageLevel

2014-06-26 Thread tomsheep...@gmail.com
stage, it behaves like there is a "persist(StorageLevel.DISk_ONLY)" called implicitly? Regards, Kang Liu From: Liu, Raymond Date: 2014-06-27 11:02 To: user@spark.apache.org Subject: RE: About StorageLevel I think there is a shuffle stage involved. And the future count job will depends on

RE: About StorageLevel

2014-06-26 Thread Liu, Raymond
: Friday, June 27, 2014 10:08 AM To: user Subject: Re: About StorageLevel Thank u Andrew, that's very helpful. I still have some doubts on a simple trial: I opened a spark shell in local mode, and typed in val r=sc.parallelize(0 to 50) val r2=r.keyBy(x=>x).groupByKey(10) and then I inv

Re: About StorageLevel

2014-06-26 Thread tomsheep...@gmail.com
es) The first job obviously takes more time than the latter ones. Is there some magic underneath? Regards, Kang Liu From: Andrew Or Date: 2014-06-27 02:25 To: user Subject: Re: About StorageLevel Hi Kang, You raise a good point. Spark does not automatically cache all your RDDs. Why? Simply bec

Re: About StorageLevel

2014-06-26 Thread Andrew Or
best what RDDs they are most interested in, so it makes sense to give them control over caching behavior. Best, Andrew 2014-06-26 5:36 GMT-07:00 tomsheep...@gmail.com : > Hi all, > > I have a newbie question about StorageLevel of spark. I came up with > these sentences in spark docum

About StorageLevel

2014-06-26 Thread tomsheep...@gmail.com
Hi all, I have a newbie question about StorageLevel of spark. I came up with these sentences in spark documents: If your RDDs fit comfortably with the default storage level (MEMORY_ONLY), leave them that way. This is the most CPU-efficient option, allowing operations on the RDDs to run as