ahshahid commented on code in PR #50757: URL: https://github.com/apache/spark/pull/50757#discussion_r2069193802
########## core/src/main/scala/org/apache/spark/rdd/RDD.scala: ########## Review Comment: IMO, this is going to create a lot of potential issues... especially related to performance. IMO, this is sort of attempting to provide predictable behaviour for an inDeterminate value which is not worth the trouble. The perf issues which could potentially arise are:- 1) As mentioned in previous comments, QueryPlans willl be more frequently copied in tree traversal, causing extra cost for flag determination 2) A node which may not be affected at all or may not utilize the inDeterminate expression, would become inDeterminate with cascading effect 3) Given that there is already a race condition , which I have identified in Dag Scheduler, of which @attilapiros suggestion is to deal aggresively by failing queries instead of retry, more queries ( & mostly unnecessarily) will fail instead of retries. 4) If I understand the RoundRobin partitioning issue correctly, I have a feeling that , it can be solved just on the lines of how HashPartition containing inDeterminate expression is handled ( in my PR for 515016), i.e by treating RoundRobin Partitioning as inDeterministic. 5) Making changes in MapRDD etc which is in core layer, with extra inDeterministic flag is going to confuse and unusable in terms of spark's core, because only SQL has metadata to tell RDD that its behaviour can be inDeterministic. in core RDD , a user can write any transformation which might have inDeterminancy , but the MapRDD's flag would convey wrong info. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org