ahshahid commented on PR #50757:
URL: https://github.com/apache/spark/pull/50757#issuecomment-2847859035

   @attilapiros : I get what you are pointing out in RDD code...
   Its committers call.
   My view is :
   1) A inDeterministic expression is something which is not predictable and no 
order of any form should be expected from it, whether read once or multiple 
times.
   2) The only requirement from spark side should be that if a Partitioner is 
using that inDeterministic component, then it should not loose/add extra row 
during retry.
   
   If we stick to above 2 requirements, 
   1) then the Shuffle Stage / or any stage, should just consult the RDDs which 
it has to see if its deterministic or not.
   2) An RDD's determinism shoiuld be just based on nature of the Partitioner 
it has. If the Partitioner used by RDD is using an inDeterministic expression , 
then it should be marked as inDeterminate ( and as of now AFAIK, its possible 
only in SQL Layer's RDD where the Partitioner has that info...)
   3) In case of core , from what I have understood the problem is related to 
Round Robin Partitioning logic... Now that ideally is a separate issue and 
should not have been tied to the inDeterminism...  AFAIk if round robin issue 
is not mixed with inDeterminancy , there is no way an RDD can have 
inDeterminancy as true ( nor is it needed).
   And I might be off the mark, but if Round Robin Partitioning issue needs to 
be resolved by piggy backing on inDeterminancyy logic, then may be the check 
could be as small as marking any RDD with RoundRobin Partitioner as 
inDeterminate should solve the issue , as what is needed is retry of all 
partitions in that case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to