zhengruifeng opened a new pull request, #50682: URL: https://github.com/apache/spark/pull/50682
### What changes were proposed in this pull request? Avoid eager model removal in meta algorithms when collectSubModel is true No matter on classic mode or connect mode, no matter collectSubModel is true or false, `__del__` of models and estimators are always invoked in `_parallelFitTasks`. There seems to be an internal copy, I add log in `__del__` to print the address of objects `id(self)/id(self._java_obj)` to be deleted and find that the ids are different from these from final `model.subModels`. That is to say, internal copying and removal of model/estimator happen in `_parallelFitTasks`. It is not a problem in classic mode since its `__del__` just **detach** the JVM object. But in connect mode, it eagerly **delete** the model in the server side. So the root cause is the semantic difference between **detach** and **delete**, this PR make a workaround by adding an extra flag `disable_ml_del` to temporarily disable model deletion in `_parallelFitTasks` ### Why are the changes needed? for feature parity ### Does this PR introduce _any_ user-facing change? yes, bug-fix ### How was this patch tested? enabled tests ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org