Re: pyspark with pypy not work for spark-1.5.1

Davies Liu Fri, 13 Nov 2015 14:34:21 -0800

We already test CPython 2.6, CPython 3.4 and PyPy 2.5, it took more
than 30 min to run (without parallelization),
I think it should be enough.


PyPy 2.2 is too old that we have not enough resource to support that.

On Fri, Nov 6, 2015 at 2:27 AM, Chang Ya-Hsuan <[email protected]> wrote:
> Hi I run ./python/ru-tests to test following modules of spark-1.5.1:
>
> [pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql',
> 'pyspark-streaming]
>
> against to following pypy versions:
>
> pypy-2.2.1  pypy-2.3  pypy-2.3.1  pypy-2.4.0  pypy-2.5.0  pypy-2.5.1
> pypy-2.6.0  pypy-2.6.1  pypy-4.0.0
>
> except pypy-2.2.1, all others pass the test.
>
> the error message of pypy-2.2.1 is:
>
> Traceback (most recent call last):
>   File "app_main.py", line 72, in run_toplevel
>   File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/runpy.py",
> line 151, in _run_module_as_main
>     mod_name, loader, code, fname = _get_module_details(mod_name)
>   File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/runpy.py",
> line 101, in _get_module_details
>     loader = get_loader(mod_name)
>   File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/pkgutil.py",
> line 465, in get_loader
>     return find_loader(fullname)
>   File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/pkgutil.py",
> line 475, in find_loader
>     for importer in iter_importers(fullname):
>   File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/pkgutil.py",
> line 431, in iter_importers
>     __import__(pkg)
>   File "pyspark/__init__.py", line 41, in <module>
>     from pyspark.context import SparkContext
>   File "pyspark/context.py", line 26, in <module>
>     from pyspark import accumulators
>   File "pyspark/accumulators.py", line 98, in <module>
>     from pyspark.serializers import read_int, PickleSerializer
>   File "pyspark/serializers.py", line 400, in <module>
>     _hijack_namedtuple()
>   File "pyspark/serializers.py", line 378, in _hijack_namedtuple
>     _old_namedtuple = _copy_func(collections.namedtuple)
>   File "pyspark/serializers.py", line 376, in _copy_func
>     f.__defaults__, f.__closure__)
> AttributeError: 'function' object has no attribute '__closure__'
>
> p.s. would you want to test different pypy versions on your Jenkins? maybe I
> could help
>
> On Fri, Nov 6, 2015 at 2:23 AM, Josh Rosen <[email protected]> wrote:
>>
>> You could try running PySpark's own unit tests. Try ./python/run-tests
>> --help for instructions.
>>
>> On Thu, Nov 5, 2015 at 12:31 AM Chang Ya-Hsuan <[email protected]> wrote:
>>>
>>> I've test on following pypy version against to spark-1.5.1
>>>
>>>   pypy-2.2.1
>>>   pypy-2.3
>>>   pypy-2.3.1
>>>   pypy-2.4.0
>>>   pypy-2.5.0
>>>   pypy-2.5.1
>>>   pypy-2.6.0
>>>   pypy-2.6.1
>>>
>>> I run
>>>
>>>     $ PYSPARK_PYTHON=/path/to/pypy-xx.xx/bin/pypy
>>> /path/to/spark-1.5.1/bin/pyspark
>>>
>>> and only pypy-2.2.1 failed.
>>>
>>> Any suggestion to run advanced test?
>>>
>>> On Thu, Nov 5, 2015 at 4:14 PM, Chang Ya-Hsuan <[email protected]>
>>> wrote:
>>>>
>>>> Thanks for your quickly reply.
>>>>
>>>> I will test several pypy versions and report the result later.
>>>>
>>>> On Thu, Nov 5, 2015 at 4:06 PM, Josh Rosen <[email protected]> wrote:
>>>>>
>>>>> I noticed that you're using PyPy 2.2.1, but it looks like Spark 1.5.1's
>>>>> docs say that we only support PyPy 2.3+. Could you try using a newer PyPy
>>>>> version to see if that works?
>>>>>
>>>>> I just checked and it looks like our Jenkins tests are running against
>>>>> PyPy 2.5.1, so that version is known to work. I'm not sure what the actual
>>>>> minimum supported PyPy version is. Would you be interested in helping to
>>>>> investigate so that we can update the documentation or produce a fix to
>>>>> restore compatibility with earlier PyPy builds?
>>>>>
>>>>> On Wed, Nov 4, 2015 at 11:56 PM, Chang Ya-Hsuan <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am trying to run pyspark with pypy, and it is work when using
>>>>>> spark-1.3.1 but failed when using spark-1.4.1 and spark-1.5.1
>>>>>>
>>>>>> my pypy version:
>>>>>>
>>>>>> $ /usr/bin/pypy --version
>>>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>>>>> [PyPy 2.2.1 with GCC 4.8.4]
>>>>>>
>>>>>> works with spark-1.3.1
>>>>>>
>>>>>> $ PYSPARK_PYTHON=/usr/bin/pypy
>>>>>> ~/Tool/spark-1.3.1-bin-hadoop2.6/bin/pyspark
>>>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>>>>> [PyPy 2.2.1 with GCC 4.8.4] on linux2
>>>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>> 15/11/05 15:50:30 WARN Utils: Your hostname, xxxxxx resolves to a
>>>>>> loopback address: 127.0.1.1; using xxx.xxx.xxx.xxx instead (on interface
>>>>>> eth0)
>>>>>> 15/11/05 15:50:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind
>>>>>> to another address
>>>>>> 15/11/05 15:50:31 WARN NativeCodeLoader: Unable to load native-hadoop
>>>>>> library for your platform... using builtin-java classes where applicable
>>>>>> Welcome to
>>>>>>       ____              __
>>>>>>      / __/__  ___ _____/ /__
>>>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>>>    /__ / .__/\_,_/_/ /_/\_\   version 1.3.1
>>>>>>       /_/
>>>>>>
>>>>>> Using Python version 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015)
>>>>>> SparkContext available as sc, HiveContext available as sqlContext.
>>>>>> And now for something completely different: ``Armin: "Prolog is a
>>>>>> mess.", CF:
>>>>>> "No, it's very cool!", Armin: "Isn't this what I said?"''
>>>>>> >>>
>>>>>>
>>>>>> error message for 1.5.1
>>>>>>
>>>>>> $ PYSPARK_PYTHON=/usr/bin/pypy
>>>>>> ~/Tool/spark-1.5.1-bin-hadoop2.6/bin/pyspark
>>>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>>>>> [PyPy 2.2.1 with GCC 4.8.4] on linux2
>>>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>> Traceback (most recent call last):
>>>>>>   File "app_main.py", line 72, in run_toplevel
>>>>>>   File "app_main.py", line 614, in run_it
>>>>>>   File
>>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/shell.py", 
>>>>>> line
>>>>>> 30, in <module>
>>>>>>     import pyspark
>>>>>>   File
>>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/__init__.py",
>>>>>> line 41, in <module>
>>>>>>     from pyspark.context import SparkContext
>>>>>>   File
>>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/context.py",
>>>>>> line 26, in <module>
>>>>>>     from pyspark import accumulators
>>>>>>   File
>>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/accumulators.py",
>>>>>> line 98, in <module>
>>>>>>     from pyspark.serializers import read_int, PickleSerializer
>>>>>>   File
>>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>>>>> line 400, in <module>
>>>>>>     _hijack_namedtuple()
>>>>>>   File
>>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>>>>> line 378, in _hijack_namedtuple
>>>>>>     _old_namedtuple = _copy_func(collections.namedtuple)
>>>>>>   File
>>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>>>>> line 376, in _copy_func
>>>>>>     f.__defaults__, f.__closure__)
>>>>>> AttributeError: 'function' object has no attribute '__closure__'
>>>>>> And now for something completely different: ``the traces don't lie''
>>>>>>
>>>>>> is this a known issue? any suggestion to resolve it? or how can I help
>>>>>> to fix this problem?
>>>>>>
>>>>>> Thanks.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -- 張雅軒
>>>
>>>
>>>
>>>
>>> --
>>> -- 張雅軒
>
>
>
>
> --
> -- 張雅軒

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: pyspark with pypy not work for spark-1.5.1

Reply via email to