stage, it behaves like there is a
"persist(StorageLevel.DISk_ONLY)" called implicitly?
Regards,
Kang Liu
From: Liu, Raymond
Date: 2014-06-27 11:02
To: user@spark.apache.org
Subject: RE: About StorageLevel
I think there is a shuffle stage involved. And the future count job will
depends on
: Friday, June 27, 2014 10:08 AM
To: user
Subject: Re: About StorageLevel
Thank u Andrew, that's very helpful.
I still have some doubts on a simple trial: I opened a spark shell in local
mode,
and typed in
val r=sc.parallelize(0 to 50)
val r2=r.keyBy(x=>x).groupByKey(10)
and then I inv
es)
The first job obviously takes more time than the latter ones. Is there some
magic underneath?
Regards,
Kang Liu
From: Andrew Or
Date: 2014-06-27 02:25
To: user
Subject: Re: About StorageLevel
Hi Kang,
You raise a good point. Spark does not automatically cache all your RDDs. Why?
Simply bec
best what RDDs they are most interested
in, so it makes sense to give them control over caching behavior.
Best,
Andrew
2014-06-26 5:36 GMT-07:00 tomsheep...@gmail.com :
> Hi all,
>
> I have a newbie question about StorageLevel of spark. I came up with
> these sentences in spark docum
Hi all,
I have a newbie question about StorageLevel of spark. I came up with these
sentences in spark documents:
If your RDDs fit comfortably with the default storage level (MEMORY_ONLY),
leave them that way. This is the most CPU-efficient option, allowing operations
on the RDDs to run as