HyukjinKwon commented on code in PR #49624: URL: https://github.com/apache/spark/pull/49624#discussion_r1940507357
########## python/pyspark/ml/feature.py: ########## @@ -5069,11 +5070,19 @@ def __init__( self._java_obj = self._new_java_obj( "org.apache.spark.ml.feature.StopWordsRemover", self.uid ) - self._setDefault( - stopWords=StopWordsRemover.loadDefaultStopWords("english"), - caseSensitive=False, - locale=self._java_obj.getLocale(), - ) + if isinstance(self._java_obj, str): + # Skip setting the default value of 'locale', which needs to invoke a JVM method. + # So if users don't explicitly set 'locale', then getLocale fails. + self._setDefault( + stopWords=StopWordsRemover.loadDefaultStopWords("english"), Review Comment: Seems like this will still use Py4J: ``` ====================================================================== ERROR [0.004s]: test_stop_words_remover (pyspark.ml.tests.connect.test_parity_feature.FeatureParityTests.test_stop_words_remover) ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pyspark/ml/tests/test_feature.py", line 866, in test_stop_words_remover remover = StopWordsRemover(stopWords=["b"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pyspark/__init__.py", line 115, in wrapper return func(self, **kwargs) ^^^^^^^^^^^^^^^^^^^^ File "/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pyspark/ml/feature.py", line 5098, in __init__ stopWords=StopWordsRemover.loadDefaultStopWords("english"), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pyspark/ml/feature.py", line 5231, in loadDefaultStopWords stopWordsObj = getattr(_jvm(), "org.apache.spark.ml.feature.StopWordsRemover") ^^^^^^ File "/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pyspark/ml/util.py", line 376, in _jvm from pyspark.core.context import SparkContext ModuleNotFoundError: No module named 'pyspark.core' ``` https://github.com/apache/spark/actions/runs/13120894941/job/36606320881 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org