zhengruifeng opened a new pull request, #50682:
URL: https://github.com/apache/spark/pull/50682

   ### What changes were proposed in this pull request?
   Avoid eager model removal in meta algorithms when collectSubModel is true
   
   No matter on classic mode or connect mode, no matter collectSubModel is true 
or false, `__del__` of models and estimators are always invoked in 
`_parallelFitTasks`.
   
   There seems to be an internal copy, I add log in `__del__` to print the 
address of objects `id(self)/id(self._java_obj)` to be deleted and find that 
the ids are different from these from final `model.subModels`.
   
   That is to say, internal copying and removal of model/estimator happen in 
`_parallelFitTasks`.
   It is not a problem in classic mode since its `__del__` just **detach** the 
JVM object.
   But in connect mode, it eagerly **delete** the model in the server side.
   
   So the root cause is the semantic difference between **detach** and 
**delete**, this PR make a workaround by adding an extra flag `disable_ml_del` 
to temporarily disable model deletion in `_parallelFitTasks`
   
   
   
   
   
   ### Why are the changes needed?
   for feature parity
   
   
   ### Does this PR introduce _any_ user-facing change?
   yes, bug-fix
   
   ### How was this patch tested?
   enabled tests
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to