date:20240815

Spark Reads from MapR and Write to MinIO fails for few batches

2024-08-15 Thread Prem Sahoo

Hello Spark and User, we have a Spark project which is a long running Spark session where it does below 1. We are reading from Mapr FS and writing to MapR FS. 2. Another parallel job which reads from MapR Fs and Writes to MinIO object storage. We are finding issues for a few batches of Spark jo

Re: Redundant(?) shuffle after join

2024-08-15 Thread Mich Talebzadeh

The actual code is not given, so I am going with the plan output and your explanation - You're joining a large, bucketed table with a smaller DataFrame on a common column (key_col). - The subsequent window function also uses key_col - However, a shuffle occurs for the window function

Redundant(?) shuffle after join

2024-08-15 Thread Shay Elbaz

Hi Spark community, Please review the cleansed plan below. It is the result of joining a large, bucketed table with a smaller DF, and then applying a window function. Both the join and the window function use the same column, which is also the bucket column of the table ("key_col" in the plan).