Hi Prem,
You can try to write to HDFS then read from HDFS and write to MinIO.
This will prevent duplicate transformation.
You can also try persisting the dataframe using the DISK_ONLY level.
Regards,
Vibhor
From: Prem Sahoo
Date: Tuesday, 21 May 2024 at 8:16 AM
To: Spark dev list
Subject: EXT
Hello Team,
I am planning to write to two datasource at the same time .
Scenario:-
Writing the same dataframe to HDFS and MinIO without re-executing the
transformations and no cache(). Then how can we make it faster ?
Read the parquet file and do a few transformations and write to HDFS and
MinIO