Hi,For clarification, are those 12 / 14 minutes cumulative cpu time or wall
clock time? How many executors executed those 1 / 375 tasks?Cheers,Enrico
Ursprüngliche Nachricht Von: Shashank Rao
Datum: 16.05.23 19:48 (GMT+01:00) An:
user@spark.apache.org Betreff: Understandi
Hi,
I'm trying to set up a Spark pipeline which reads data from S3 and writes
it into Google Big Query.
Environment Details:
---
Java 8
AWS EMR-6.10.0
Spark v3.3.1
2 m5.xlarge executor nodes
S3 Directory structure:
---
bucket-name:
|---folder1:
|---folder2:
|--
Hi,
On the issue of Spark shuffle it is accepted that shuffle *often involves*
the following if not all below:
- Disk I/O
- Data serialization and deserialization
- Network I/O
Excluding external shuffle service and without relying on the configuration
options provided by spark for shuf