Re: Question on RDD storage

2021-11-27 Thread Kendall Wagner
Hello I tried saveAsTextFile but this saves the structure as text. After reading from the text file I can't access the structure directly. So how? Thanks again. On Sun, Nov 28, 2021 at 1:24 PM Sean Owen wrote: > You didn't import the class. > persist() does not save across sessions. You need

Re: Question on RDD storage

2021-11-27 Thread Sean Owen
You didn't import the class. persist() does not save across sessions. You need to write with methods like saveAsTextFile or whatever is appropriate, or .write methods on a DataFrame. On Sat, Nov 27, 2021 at 9:13 PM Kendall Wagner wrote: > Hello, > > Sorry I am a spark newbie. > In pyspark sessio

Question on RDD storage

2021-11-27 Thread Kendall Wagner
Hello, Sorry I am a spark newbie. In pyspark session, I want to store the RDD so that next time I run pyspark again, the RDD will be reloaded. I tried this: >>> fruit.count() 1000 >>> fruit.take(5) [('peach', 1), ('apricot', 2), ('apple', 3), ('haw', 1), ('persimmon', 9)] >>> fruit.persist(Sto