Hello,
Sorry I am a spark newbie.
In pyspark session, I want to store the RDD so that next time I run pyspark
again, the RDD will be reloaded.
I tried this:
>>> fruit.count()
1000
>>> fruit.take(5)
[('peach', 1), ('apricot', 2), ('apple', 3), ('haw', 1), ('persimmon', 9)]
>>> fruit.persist(StorageLevel.DISK_ONLY)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'StorageLevel' is not defined
RDD.persist method seems not working for me.
How to store a RDD to disk and how can I reload it again?
Thank you in advance.
Kendall