Hi Sean/Sameer, It seems you're both right. In the python shell I need to explicitly call the empty parens data.cache(), then run an action and it appears in the storage tab. Using the scala shell I can just call data.cache without the parens, run an action tthat works.
Thanks for your help. Stu On 31 October 2014 19:19, Sean Owen <so...@cloudera.com> wrote: > No, empty parens do no matter when calling no-arg methods in Scala. > This invocation should work as-is and should result in the RDD showing > in Storage. I see that when I run it right now. > > Since it really does/should work, I'd look at other possibilities -- > is it maybe taking a short time to start caching? looking at a > different/old Storage tab? > > On Fri, Oct 31, 2014 at 1:17 AM, Sameer Farooqui <same...@databricks.com> > wrote: > > Hi Stuart, > > > > You're close! > > > > Just add a () after the cache, like: data.cache() > > > > ...and then run the .count() action on it and you should be good to see > it > > in the Storage UI! > > > > > > - Sameer > > > > On Thu, Oct 30, 2014 at 4:50 PM, Stuart Horsman < > stuart.hors...@gmail.com> > > wrote: > >> > >> Sorry too quick to pull the trigger on my original email. I should have > >> added that I'm tried using persist() and cache() but no joy. > >> > >> I'm doing this: > >> > >> data = sc.textFile("somedata") > >> > >> data.cache > >> > >> data.count() > >> > >> but I still can't see anything in the storage? > >> > >> > >> > >> On 31 October 2014 10:42, Sameer Farooqui <same...@databricks.com> > wrote: > >>> > >>> Hey Stuart, > >>> > >>> The RDD won't show up under the Storage tab in the UI until it's been > >>> cached. Basically Spark doesn't know what the RDD will look like until > it's > >>> cached, b/c up until then the RDD is just on disk (external to Spark). > If > >>> you launch some transformations + an action on an RDD that is purely on > >>> disk, then Spark will read it from disk, compute against it and then > write > >>> the results back to disk or show you the results at the scala/python > shells. > >>> But when you run Spark workloads against purely on disk files, the RDD > won't > >>> show up in Spark's Storage UI. Hope that makes sense... > >>> > >>> - Sameer > >>> > >>> On Thu, Oct 30, 2014 at 4:30 PM, Stuart Horsman > >>> <stuart.hors...@gmail.com> wrote: > >>>> > >>>> Hi All, > >>>> > >>>> When I load an RDD with: > >>>> > >>>> data = sc.textFile("somefile") > >>>> > >>>> I don't see the resulting RDD in the SparkContext gui on > localhost:4040 > >>>> in /storage. > >>>> > >>>> Is there something special I need to do to allow me to view this? I > >>>> tried but scala and python shells but same result. > >>>> > >>>> Thanks > >>>> > >>>> Stuart > >>> > >>> > >> > > >