subject:"caching a dataframe in Spark takes lot of time"

Re: caching a dataframe in Spark takes lot of time

2024-05-08 Thread Prem Sahoo

Very helpful! On Wed, May 8, 2024 at 9:07 AM Mich Talebzadeh wrote: > *Potential reasons* > > >- Data Serialization: Spark needs to serialize the DataFrame into an >in-memory format suitable for storage. This process can be time-consuming, >especially for large datasets like 3.2 GB w

Re: caching a dataframe in Spark takes lot of time

2024-05-08 Thread Mich Talebzadeh

*Potential reasons* - Data Serialization: Spark needs to serialize the DataFrame into an in-memory format suitable for storage. This process can be time-consuming, especially for large datasets like 3.2 GB with complex schemas. - Shuffle Operations: If your transformations involve shu

Re: caching a dataframe in Spark takes lot of time

2024-05-08 Thread Prem Sahoo

Could any one help me here ? Sent from my iPhone > On May 7, 2024, at 4:30 PM, Prem Sahoo wrote: > > > Hello Folks, > in Spark I have read a file and done some transformation and finally writing > to hdfs. > > Now I am interested in writing the same dataframe to MapRFS but for this > Spark

caching a dataframe in Spark takes lot of time

2024-05-07 Thread Prem Sahoo

Hello Folks, in Spark I have read a file and done some transformation and finally writing to hdfs. Now I am interested in writing the same dataframe to MapRFS but for this Spark will execute the full DAG again (recompute all the previous steps)(all the read + transformations ). I don't want this