HeartSaVioR commented on PR #54523:
URL: https://github.com/apache/spark/pull/54523#issuecomment-3994054975

   @holdenk 
   First of all, are you onboarded with the section we described as "Why are 
the changes needed?"? In overall, we find the API providing iterator to users 
directly be "too" flexible to mess up the latency by users' own hand, as well 
as strongly restrict the future design on features, compared to other streaming 
frameworks. Unlike traditional Spark execution, in RTM, it is more ideal to 
build the baseline of the execution to record-to-record and we have APIs which 
are opposite of this.
   
   Assuming you are onboarded, I think you raised the good point. We missed 
that and we are not considering users to do the hack to workaround it. Ideally 
speaking, we'd need to provide the "official" way to initialize the 
resources/heavy cost objects and clear them at the task completion (maybe 
traditional interface of open/process/close). That warrants a new API - either 
existing API with new signature or simply a divergence. That'd take time to go 
through.
   
   That said, we feel like it's better to block the problematic case first and 
have time to work on alternative thoughtfully. We just don't want to rush for 
alternative just because we want to block the problematic case today. But if 
you strongly demand the alternative to block the case, we can consider it 
though it may not fit to Apache Spark 4.2 timeline.
   
   Would love to hear from you about the plan. Thanks in advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to