You didn't import the class.
persist() does not save across sessions. You need to write with methods
like saveAsTextFile or whatever is appropriate, or .write methods on a
DataFrame.

On Sat, Nov 27, 2021 at 9:13 PM Kendall Wagner <kendawag...@gmail.com>
wrote:

> Hello,
>
> Sorry I am a spark newbie.
> In pyspark session, I want to store the RDD so that next time I run
> pyspark again, the RDD will be reloaded.
>
> I tried this:
>
> >>> fruit.count()
> 1000
>
> >>> fruit.take(5)
> [('peach', 1), ('apricot', 2), ('apple', 3), ('haw', 1), ('persimmon', 9)]
>
> >>> fruit.persist(StorageLevel.DISK_ONLY)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> NameError: name 'StorageLevel' is not defined
>
>
> RDD.persist method seems not working for me.
> How to store a RDD to disk and how can I reload it again?
>
>
> Thank you in advance.
> Kendall
>
>
>

Reply via email to