Hi Hyukjin

Thanks for the links.

At this point I sort of got my eclipse, pyDev, spark, unitTests working. In
my unit test I can run from the cmd line or from with in eclipse a simple
unit test. The test creates a data frame from a text file and calls
df.show()

The last challenge is that it appears pyspark.sql.functions defines some
functions at run time. Examples are lit() and col(). The causes problem with
my IDE

https://issues.apache.org/jira/browse/SPARK-23878?page=com.atlassian.jira.pl
ugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=16427812#comm
ent-16427812

Andy

P.s. I original started my project using jupyter notebooks. The code base
got to big to manage using notebooks. I am in the process of refactoring
common code into python modules using a standard python IDE. In the IDE I
need to be import all the spark functions and be able to write and run unit
tests.

I choose eclipse because I have a lot of spark code written in java. Its
easier for me to have one IDE for all my java and python code.

From:  Hyukjin Kwon <gurwls...@gmail.com>
Date:  Thursday, April 5, 2018 at 6:09 PM
To:  Andrew Davidson <a...@santacruzintegration.com>
Cc:  "user @spark" <user@spark.apache.org>
Subject:  Re: how to set up pyspark eclipse, pyDev, virtualenv? syntaxError:
yield from walk(

> FYI, there is a PR and JIRA for virtualEnv support in PySpark
> 
> https://issues.apache.org/jira/browse/SPARK-13587
> https://github.com/apache/spark/pull/13599
> 
> 
> 2018-04-06 7:48 GMT+08:00 Andy Davidson <a...@santacruzintegration.com>:
>> FYI
>> 
>> http://www.learn4master.com/algorithms/pyspark-unit-test-set-up-sparkcontext
>> 
>> From:  Andrew Davidson <a...@santacruzintegration.com>
>> Date:  Wednesday, April 4, 2018 at 5:36 PM
>> To:  "user @spark" <user@spark.apache.org>
>> Subject:  how to set up pyspark eclipse, pyDev, virtualenv? syntaxError:
>> yield from walk(
>> 
>>> I am having a heck of a time setting up my development environment. I used
>>> pip to install pyspark. I also downloaded spark from apache.
>>> 
>>> My eclipse pyDev intereperter is configured as a python3 virtualenv
>>> 
>>> I have a simple unit test that loads a small dataframe. Df.show() generates
>>> the following error
>>> 
>>> 
>>> 2018-04-04 17:13:56 ERROR Executor:91 - Exception in task 0.0 in stage 0.0
>>> (TID 0)
>>> 
>>> org.apache.spark.SparkException:
>>> 
>>> Error from python worker:
>>> 
>>>   Traceback (most recent call last):
>>> 
>>>     File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site.py",
>>> line 67, in <module>
>>> 
>>>       import os
>>> 
>>>     File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/os.py",
>>> line 409
>>> 
>>>       yield from walk(new_path, topdown, onerror, followlinks)
>>> 
>>>                ^
>>> 
>>>   SyntaxError: invalid syntax
>>> 
>>> 
>>> 
>>> 
>>> 
>>> My unittest classs is dervied from.
>>> 
>>> 
>>> 
>>> class PySparkTestCase(unittest.TestCase):
>>> 
>>> 
>>> 
>>>     @classmethod
>>> 
>>>     def setUpClass(cls):
>>> 
>>>         conf = SparkConf().setMaster("local[2]") \
>>> 
>>>             .setAppName(cls.__name__) #\
>>> 
>>> #             .set("spark.authenticate.secret", "111111")
>>> 
>>>         cls.sparkContext = SparkContext(conf=conf)
>>> 
>>>         sc_values[cls.__name__] = cls.sparkContext
>>> 
>>>         cls.sqlContext = SQLContext(cls.sparkContext)
>>> 
>>>         print("aedwip:", SparkContext)
>>> 
>>> 
>>> 
>>>     @classmethod
>>> 
>>>     def tearDownClass(cls):
>>> 
>>>         print("....calling stop tearDownClas, the content of sc_values=",
>>> sc_values)
>>> 
>>>         sc_values.clear()
>>> 
>>>         cls.sparkContext.stop()
>>> 
>>> 
>>> 
>>> This looks similar to Class  PySparkTestCase in
>>> https://github.com/apache/spark/blob/master/python/pyspark/tests.py
>>> 
>>> 
>>> 
>>> Any suggestions would be greatly appreciated.
>>> 
>>> 
>>> 
>>> Andy
>>> 
>>> 
>>> 
>>> My downloaed version is spark-2.3.0-bin-hadoop2.7
>>> 
>>> 
>>> 
>>> My virtual env version is
>>> 
>>> (spark-2.3.0) $ pip show pySpark
>>> 
>>> Name: pyspark
>>> 
>>> Version: 2.3.0
>>> 
>>> Summary: Apache Spark Python API
>>> 
>>> Home-page: https://github.com/apache/spark/tree/master/python
>>> 
>>> Author: Spark Developers
>>> 
>>> Author-email: d...@spark.apache.org
>>> 
>>> License: http://www.apache.org/licenses/LICENSE-2.0
>>> 
>>> Location: 
>>> /Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site-packages
>>> 
>>> Requires: py4j
>>> 
>>> (spark-2.3.0) $
>>> 
>>> 
>>> 
>>> (spark-2.3.0) $ python --version
>>> 
>>> Python 3.6.1
>>> 
>>> (spark-2.3.0) $
>>> 
>>> 
> 


Reply via email to