Question on RDD storage

Kendall Wagner Sat, 27 Nov 2021 19:13:06 -0800

Hello,

Sorry I am a spark newbie.
In pyspark session, I want to store the RDD so that next time I run pyspark
again, the RDD will be reloaded.


I tried this:

>>> fruit.count()
1000

>>> fruit.take(5)
[('peach', 1), ('apricot', 2), ('apple', 3), ('haw', 1), ('persimmon', 9)]

>>> fruit.persist(StorageLevel.DISK_ONLY)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'StorageLevel' is not defined


RDD.persist method seems not working for me.
How to store a RDD to disk and how can I reload it again?


Thank you in advance.
Kendall

Question on RDD storage

Reply via email to