Very helpful!
On Wed, May 8, 2024 at 9:07 AM Mich Talebzadeh
wrote:
> *Potential reasons*
>
>
>- Data Serialization: Spark needs to serialize the DataFrame into an
>in-memory format suitable for storage. This process can be time-consuming,
>especially for large datasets like 3.2 GB w
*Potential reasons*
- Data Serialization: Spark needs to serialize the DataFrame into an
in-memory format suitable for storage. This process can be time-consuming,
especially for large datasets like 3.2 GB with complex schemas.
- Shuffle Operations: If your transformations involve shu
Could any one help me here ?
Sent from my iPhone
> On May 7, 2024, at 4:30 PM, Prem Sahoo wrote:
>
>
> Hello Folks,
> in Spark I have read a file and done some transformation and finally writing
> to hdfs.
>
> Now I am interested in writing the same dataframe to MapRFS but for this
> Spark
Hello Folks,
in Spark I have read a file and done some transformation and finally
writing to hdfs.
Now I am interested in writing the same dataframe to MapRFS but for this
Spark will execute the full DAG again (recompute all the previous
steps)(all the read + transformations ).
I don't want this