karuppayya commented on code in PR #52213:
URL: https://github.com/apache/spark/pull/52213#discussion_r2373774863


##########
sql/core/src/test/scala/org/apache/spark/sql/InjectRuntimeFilterSuite.scala:
##########
@@ -205,6 +206,9 @@ class InjectRuntimeFilterSuite extends QueryTest with 
SQLTestUtils with SharedSp
     sql("analyze table bf5part compute statistics for columns a5, b5, c5, d5, 
e5, f5")
     sql("analyze table bf5filtered compute statistics for columns a5, b5, c5, 
d5, e5, f5")
 
+    // Tests depend on intermediate results that would otherwise be cleaned up 
when

Review Comment:
   I added loggers to prove and verify that its a bug in this 
[commit](https://github.com/apache/spark/pull/52213/commits/c27216c5d9d60f989eb672ed665f50fd7dbc0db1)
 
   
   Output
   ```
   karuppayyar: suite run 1 start
   karuppayyar: subquery started 24
   karuppayyar: query ended 24
   karuppayyar: removing shuffle 6
   karuppayyar: suite run 1 end
   
   karuppayyar: suite run 2 start
   karuppayyar: subquery started 25
   karuppayyar: subquery ended 24
   karuppayyar: query ended 25
   karuppayyar: removing shuffle 8,9
   karuppayyar: suite run 2 end
   
   karuppayyar: suite run 3 start
   17:32:07.521 ERROR org.apache.spark.storage.ShuffleBlockFetcherIterator: 
Failed to create input stream from local block
   java.io.IOException: Error in reading 
FileSegmentManagedBuffer[file=/private/var/folders/tn/62m7jt2j2b7116x0q6wtzg0c0000gn/T/blockmgr-72dd6798-f43d-48a7-8d4c-0a9c44ba09a9/35/shuffle_8_38_0.data,offset=0,length=5195]
        at 
org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:110)
   ```
   Every subquery should end before query ends.
   You can see that subquery execution doesnot complete before the main query 
ends and therein not using the subquery result. 
   
   The side effect of removing shuffle is that when main query completes, it 
removes the shuffle of subquery(which has not completed and its result is no 
longer useful) and subquery execution fails with FetchFailure like above when 
it tries to run to completion. This helped surfacing the issue.
   
   I am not sure if this is the case with all subqueries(looks like that), this 
could result in correctness issues cc: @dongjoon-hyun too.
   
   @cloud-fan @dongjoon-hyun Do you thinks its a bug(in which case i can 
attempt a fix) or am i missing something here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to