HyukjinKwon commented on code in PR #49624:
URL: https://github.com/apache/spark/pull/49624#discussion_r1940507357


##########
python/pyspark/ml/feature.py:
##########
@@ -5069,11 +5070,19 @@ def __init__(
         self._java_obj = self._new_java_obj(
             "org.apache.spark.ml.feature.StopWordsRemover", self.uid
         )
-        self._setDefault(
-            stopWords=StopWordsRemover.loadDefaultStopWords("english"),
-            caseSensitive=False,
-            locale=self._java_obj.getLocale(),
-        )
+        if isinstance(self._java_obj, str):
+            # Skip setting the default value of 'locale', which needs to 
invoke a JVM method.
+            # So if users don't explicitly set 'locale', then getLocale fails.
+            self._setDefault(
+                stopWords=StopWordsRemover.loadDefaultStopWords("english"),

Review Comment:
   Seems like this will still use Py4J:
   
   ```
   ======================================================================
   ERROR [0.004s]: test_stop_words_remover 
(pyspark.ml.tests.connect.test_parity_feature.FeatureParityTests.test_stop_words_remover)
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File 
"/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pyspark/ml/tests/test_feature.py",
 line 866, in test_stop_words_remover
       remover = StopWordsRemover(stopWords=["b"])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pyspark/__init__.py",
 line 115, in wrapper
       return func(self, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^
     File 
"/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pyspark/ml/feature.py",
 line 5098, in __init__
       stopWords=StopWordsRemover.loadDefaultStopWords("english"),
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pyspark/ml/feature.py",
 line 5231, in loadDefaultStopWords
       stopWordsObj = getattr(_jvm(), 
"org.apache.spark.ml.feature.StopWordsRemover")
                              ^^^^^^
     File 
"/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pyspark/ml/util.py",
 line 376, in _jvm
       from pyspark.core.context import SparkContext
   ModuleNotFoundError: No module named 'pyspark.core'
   
   ```
   
   https://github.com/apache/spark/actions/runs/13120894941/job/36606320881



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to