You didn't import the class. persist() does not save across sessions. You need to write with methods like saveAsTextFile or whatever is appropriate, or .write methods on a DataFrame.
On Sat, Nov 27, 2021 at 9:13 PM Kendall Wagner <kendawag...@gmail.com> wrote: > Hello, > > Sorry I am a spark newbie. > In pyspark session, I want to store the RDD so that next time I run > pyspark again, the RDD will be reloaded. > > I tried this: > > >>> fruit.count() > 1000 > > >>> fruit.take(5) > [('peach', 1), ('apricot', 2), ('apple', 3), ('haw', 1), ('persimmon', 9)] > > >>> fruit.persist(StorageLevel.DISK_ONLY) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > NameError: name 'StorageLevel' is not defined > > > RDD.persist method seems not working for me. > How to store a RDD to disk and how can I reload it again? > > > Thank you in advance. > Kendall > > >