Re: Question on RDD storage

2021-11-28 Thread Mich Talebzadeh
Forgot You need to import below from pyspark.sql.types import * Also you can get the history of commands in Python using below import readline for i in range(readline.get_current_history_length()): print (readline.get_history_item(i + 1)) HTH view my Linkedin profile

Re: Question on RDD storage

2021-11-28 Thread Mich Talebzadeh
Hi Kendal, We had the following before # read that saved file content = sc.textFile(file_path) >>> content.collect() ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] Let us define a schema for this first list first Schema = StructType([ StructField("ID", IntegerType(), False)]) Next read

Re: Question on RDD storage

2021-11-28 Thread Sean Owen
Please read the docs - there is also saveAsObjectFile, for example, but you almost surely want to handle this as a DataFrame. You can .write.format("...") as desired. On Sun, Nov 28, 2021 at 3:58 PM Kendall Wagner wrote: > Thanks Mich > As you show, after reading back from textFile the int becom

Re: Question on RDD storage

2021-11-28 Thread Kendall Wagner
Thanks Mich As you show, after reading back from textFile the int becomes str. I need another map to translate them? Regards Kendall Hi, > > > In Pyspark you can persist storage of a Dataframe (df) to disk by using > the following command > > > df.persist(pyspark.StorageLevel.DISK_ONLY) > > > not

Re: Question on RDD storage

2021-11-28 Thread Mich Talebzadeh
Hi, In Pyspark you can persist storage of a Dataframe (df) to disk by using the following command df.persist(pyspark.StorageLevel.DISK_ONLY) note pyspark.Storagelevel above But that only stores the dataframe df to a temporary storage (work area) for spark akin to using the swap area on a Li

Re: Question on RDD storage

2021-11-27 Thread Kendall Wagner
Hello I tried saveAsTextFile but this saves the structure as text. After reading from the text file I can't access the structure directly. So how? Thanks again. On Sun, Nov 28, 2021 at 1:24 PM Sean Owen wrote: > You didn't import the class. > persist() does not save across sessions. You need

Re: Question on RDD storage

2021-11-27 Thread Sean Owen
You didn't import the class. persist() does not save across sessions. You need to write with methods like saveAsTextFile or whatever is appropriate, or .write methods on a DataFrame. On Sat, Nov 27, 2021 at 9:13 PM Kendall Wagner wrote: > Hello, > > Sorry I am a spark newbie. > In pyspark sessio