Hello Spark and User,
we have a Spark project which is a long running Spark session where it
does below
1. We are reading from Mapr FS and writing to MapR FS.
2. Another parallel job which reads from MapR Fs and Writes to MinIO object
storage.
We are finding issues for a few batches of Spark jo
The actual code is not given, so I am going with the plan output and your
explanation
- You're joining a large, bucketed table with a smaller DataFrame on a
common column (key_col).
- The subsequent window function also uses key_col
- However, a shuffle occurs for the window function
Hi Spark community,
Please review the cleansed plan below. It is the result of joining a large,
bucketed table with a smaller DF, and then applying a window function. Both the
join and the window function use the same column, which is also the bucket
column of the table ("key_col" in the plan).