Hi Stuart, You're close!
Just add a () after the cache, like: data.cache() ...and then run the .count() action on it and you should be good to see it in the Storage UI! - Sameer On Thu, Oct 30, 2014 at 4:50 PM, Stuart Horsman <stuart.hors...@gmail.com> wrote: > Sorry too quick to pull the trigger on my original email. I should have > added that I'm tried using persist() and cache() but no joy. > > I'm doing this: > > data = sc.textFile("somedata") > > data.cache > > data.count() > > but I still can't see anything in the storage? > > > > On 31 October 2014 10:42, Sameer Farooqui <same...@databricks.com> wrote: > >> Hey Stuart, >> >> The RDD won't show up under the Storage tab in the UI until it's been >> cached. Basically Spark doesn't know what the RDD will look like until it's >> cached, b/c up until then the RDD is just on disk (external to Spark). If >> you launch some transformations + an action on an RDD that is purely on >> disk, then Spark will read it from disk, compute against it and then write >> the results back to disk or show you the results at the scala/python >> shells. But when you run Spark workloads against purely on disk files, the >> RDD won't show up in Spark's Storage UI. Hope that makes sense... >> >> - Sameer >> >> On Thu, Oct 30, 2014 at 4:30 PM, Stuart Horsman <stuart.hors...@gmail.com >> > wrote: >> >>> Hi All, >>> >>> When I load an RDD with: >>> >>> data = sc.textFile("somefile") >>> >>> I don't see the resulting RDD in the SparkContext gui on localhost:4040 >>> in /storage. >>> >>> Is there something special I need to do to allow me to view this? I >>> tried but scala and python shells but same result. >>> >>> Thanks >>> >>> Stuart >>> >> >> >