Hi all,

I'm using spark on a c5a.16xlarge machine in amazon cloud (so having  64 cores 
and 128 GB RAM). I'm using spark 3.01.

The following python code leads to an exception, is this a bug or is my 
understanding of the API incorrect?

    import pyspark
    conf=pyspark.SparkConf().setMaster("local[64]").setAppName("Test1")
    sc=pyspark.SparkContext.getOrCreate(conf)
    rows=70000
    data=list(range(rows))
    rdd=sc.parallelize(data,rows)
    assert rdd.getNumPartitions()==rows
    rdd0=rdd.filter(lambda x:False)
    assert rdd0.getNumPartitions()==rows
    rdd00=rdd0.coalesce(1)
    data=rdd00.collect()
    assert data==[]

output when starting from PyCharm:

/home/ubuntu/PycharmProjects/<projekt-dir>/venv/bin/python 
/opt/pycharm-2020.2.3/plugins/python/helpers/pydev/pydevconsole.py 
--mode=client --port=41185
import sys; print('Python %s on %s' % (sys.version, sys.platform))
sys.path.extend(['/home/ubuntu/PycharmProjects/<projekt-dir>'])
PyDev console: starting.
Python 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0] on linux
import os
os.environ['PYTHONHASHSEED'] = '0'
runfile('/home/ubuntu/PycharmProjects/<projekt-dir>/tests/test.py', 
wdir='/home/ubuntu/PycharmProjects/<projekt-dir>/tests')
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform 
(file:/opt/spark/jars/spark-unsafe_2.12-3.0.1.jar) to constructor 
java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of 
org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release
21/04/08 12:12:26 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
21/04/08 12:12:29 WARN TaskSetManager: Stage 0 contains a task of very large 
size (4732 KiB). The maximum recommended task size is 1000 KiB.
[Stage 0:>                                                          (0 + 1) / 
1][423.190s][warning][os,thread] Attempt to protect stack guard pages failed 
(0x00007f43d23ff000-0x00007f43d2403000).
[423.190s][warning][os,thread] Attempt to deallocate stack guard pages failed.
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f43d300b000, 
16384, 0) failed; error='Not enough space' (errno=12)
[423.231s][warning][os,thread] Failed to start thread - pthread_create failed 
(EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 16384 bytes for committing 
reserved memory.
# An error report file with more information is saved as:
# /home/ubuntu/PycharmProjects/<projekt-dir>/tests/hs_err_pid17755.log
[thread 17966 also had an error]
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f4b7bd81000, 
262144, 0) failed; error='Not enough space' (errno=12)
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 
1207, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 
1033, in send_command
    response = connection.send_command(command)
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 
1211, in send_command
    raise Py4JNetworkError(
py4j.protocol.Py4JNetworkError: Error while receiving
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java 
server (127.0.0.1:42439)
Traceback (most recent call last):
  File "/opt/spark/python/pyspark/rdd.py", line 889, in collect
    sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 
1304, in __call__
    return_value = get_return_value(
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 334, 
in get_return_value
    raise Py4JError(
py4j.protocol.Py4JError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.collectAndServe
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 
977, in _get_connection
    connection = self.deque.pop()
IndexError: pop from an empty deque
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 
1115, in start
    self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
Traceback (most recent call last):
  File "/opt/spark/python/pyspark/rdd.py", line 889, in collect
    sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 
1304, in __call__
    return_value = get_return_value(
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 334, 
in get_return_value
    raise Py4JError(
py4j.protocol.Py4JError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.collectAndServe
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 
977, in _get_connection
    connection = self.deque.pop()
IndexError: pop from an empty deque
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 
1115, in start
    self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "<input>", line 3, in <module>
  File 
"/opt/pycharm-2020.2.3/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py",
 line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the 
script
  File 
"/opt/pycharm-2020.2.3/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py",
 line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File 
"/home/ubuntu/PycharmProjects/SPO_as_a_Service/tests/test_modeling_paf.py", 
line 992, in <module>
    test_70000()
  File 
"/home/ubuntu/PycharmProjects/SPO_as_a_Service/tests/test_modeling_paf.py", 
line 974, in test_70000
    data=rdd00.collect()
  File "/opt/spark/python/pyspark/rdd.py", line 889, in collect
    sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
  File "/opt/spark/python/pyspark/traceback_utils.py", line 78, in __exit__
    self._context._jsc.setCallSite(None)
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 
1303, in __call__
    answer = self.gateway_client.send_command(command)
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 
1031, in send_command
    connection = self._get_connection()
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 
979, in _get_connection
    connection = self._create_connection()
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 
985, in _create_connection
    connection.start()
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 
1127, in start
    raise Py4JNetworkError(msg, e)
py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to 
the Java server (127.0.0.1:42439)


report of free -m:
              total        used        free      shared  buff/cache   available
Mem:         127462        5548       22680          92       99233      120605
Swap:             0           0           0

Thanks
Markus

Reply via email to