ahshahid commented on code in PR #50757:
URL: https://github.com/apache/spark/pull/50757#discussion_r2069193802


##########
core/src/main/scala/org/apache/spark/rdd/RDD.scala:
##########


Review Comment:
   IMO, this is going to create a lot of potential issues... especially related 
to performance.
   IMO, this is sort of attempting to provide predictable behaviour for an 
inDeterminate value which is not worth the trouble.
   The perf issues which could potentially arise are:-
   1) As mentioned in previous comments, QueryPlans willl be more frequently 
copied in tree traversal, causing extra cost for flag determination
   2) A node which may not be affected at all or may not utilize the 
inDeterminate expression, would become inDeterminate with cascading effect
   3) Given that there is already a race condition , which I have identified in 
Dag Scheduler, of which @attilapiros suggestion is to deal aggresively by 
failing queries instead of retry,  more queries ( & mostly unnecessarily) will 
fail instead of retries.
   4) If I understand the RoundRobin partitioning issue correctly, I have a 
feeling that , it can be solved just on the lines of how HashPartition 
containing inDeterminate expression is handled ( in my PR for 515016), i.e by 
treating RoundRobin Partitioning as inDeterministic.
   5) Making changes in MapRDD etc which is in core layer, with extra 
inDeterministic flag is going to confuse and unusable in terms of spark's core, 
because only SQL has metadata to tell RDD that its behaviour can be 
inDeterministic. in core RDD , a user can write any transformation which might 
have inDeterminancy , but the MapRDD's flag would convey wrong info.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to