Hi Stuart,

You're close!

Just add a () after the cache, like: data.cache()

...and then run the .count() action on it and you should be good to see it
in the Storage UI!


- Sameer

On Thu, Oct 30, 2014 at 4:50 PM, Stuart Horsman <stuart.hors...@gmail.com>
wrote:

> Sorry too quick to pull the trigger on my original email.  I should have
> added that I'm tried using persist() and cache() but no joy.
>
> I'm doing this:
>
> data = sc.textFile("somedata")
>
> data.cache
>
> data.count()
>
> but I still can't see anything in the storage?
>
>
>
> On 31 October 2014 10:42, Sameer Farooqui <same...@databricks.com> wrote:
>
>> Hey Stuart,
>>
>> The RDD won't show up under the Storage tab in the UI until it's been
>> cached. Basically Spark doesn't know what the RDD will look like until it's
>> cached, b/c up until then the RDD is just on disk (external to Spark). If
>> you launch some transformations + an action on an RDD that is purely on
>> disk, then Spark will read it from disk, compute against it and then write
>> the results back to disk or show you the results at the scala/python
>> shells. But when you run Spark workloads against purely on disk files, the
>> RDD won't show up in Spark's Storage UI. Hope that makes sense...
>>
>> - Sameer
>>
>> On Thu, Oct 30, 2014 at 4:30 PM, Stuart Horsman <stuart.hors...@gmail.com
>> > wrote:
>>
>>> Hi All,
>>>
>>> When I load an RDD with:
>>>
>>> data = sc.textFile("somefile")
>>>
>>> I don't see the resulting RDD in the SparkContext gui on localhost:4040
>>> in /storage.
>>>
>>> Is there something special I need to do to allow me to view this?  I
>>> tried but scala and python shells but same result.
>>>
>>> Thanks
>>>
>>> Stuart
>>>
>>
>>
>

Reply via email to