Hi Hyukjin Thanks for the links.
At this point I sort of got my eclipse, pyDev, spark, unitTests working. In my unit test I can run from the cmd line or from with in eclipse a simple unit test. The test creates a data frame from a text file and calls df.show() The last challenge is that it appears pyspark.sql.functions defines some functions at run time. Examples are lit() and col(). The causes problem with my IDE https://issues.apache.org/jira/browse/SPARK-23878?page=com.atlassian.jira.pl ugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=16427812#comm ent-16427812 Andy P.s. I original started my project using jupyter notebooks. The code base got to big to manage using notebooks. I am in the process of refactoring common code into python modules using a standard python IDE. In the IDE I need to be import all the spark functions and be able to write and run unit tests. I choose eclipse because I have a lot of spark code written in java. Its easier for me to have one IDE for all my java and python code. From: Hyukjin Kwon <gurwls...@gmail.com> Date: Thursday, April 5, 2018 at 6:09 PM To: Andrew Davidson <a...@santacruzintegration.com> Cc: "user @spark" <user@spark.apache.org> Subject: Re: how to set up pyspark eclipse, pyDev, virtualenv? syntaxError: yield from walk( > FYI, there is a PR and JIRA for virtualEnv support in PySpark > > https://issues.apache.org/jira/browse/SPARK-13587 > https://github.com/apache/spark/pull/13599 > > > 2018-04-06 7:48 GMT+08:00 Andy Davidson <a...@santacruzintegration.com>: >> FYI >> >> http://www.learn4master.com/algorithms/pyspark-unit-test-set-up-sparkcontext >> >> From: Andrew Davidson <a...@santacruzintegration.com> >> Date: Wednesday, April 4, 2018 at 5:36 PM >> To: "user @spark" <user@spark.apache.org> >> Subject: how to set up pyspark eclipse, pyDev, virtualenv? syntaxError: >> yield from walk( >> >>> I am having a heck of a time setting up my development environment. I used >>> pip to install pyspark. I also downloaded spark from apache. >>> >>> My eclipse pyDev intereperter is configured as a python3 virtualenv >>> >>> I have a simple unit test that loads a small dataframe. Df.show() generates >>> the following error >>> >>> >>> 2018-04-04 17:13:56 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 >>> (TID 0) >>> >>> org.apache.spark.SparkException: >>> >>> Error from python worker: >>> >>> Traceback (most recent call last): >>> >>> File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site.py", >>> line 67, in <module> >>> >>> import os >>> >>> File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/os.py", >>> line 409 >>> >>> yield from walk(new_path, topdown, onerror, followlinks) >>> >>> ^ >>> >>> SyntaxError: invalid syntax >>> >>> >>> >>> >>> >>> My unittest classs is dervied from. >>> >>> >>> >>> class PySparkTestCase(unittest.TestCase): >>> >>> >>> >>> @classmethod >>> >>> def setUpClass(cls): >>> >>> conf = SparkConf().setMaster("local[2]") \ >>> >>> .setAppName(cls.__name__) #\ >>> >>> # .set("spark.authenticate.secret", "111111") >>> >>> cls.sparkContext = SparkContext(conf=conf) >>> >>> sc_values[cls.__name__] = cls.sparkContext >>> >>> cls.sqlContext = SQLContext(cls.sparkContext) >>> >>> print("aedwip:", SparkContext) >>> >>> >>> >>> @classmethod >>> >>> def tearDownClass(cls): >>> >>> print("....calling stop tearDownClas, the content of sc_values=", >>> sc_values) >>> >>> sc_values.clear() >>> >>> cls.sparkContext.stop() >>> >>> >>> >>> This looks similar to Class PySparkTestCase in >>> https://github.com/apache/spark/blob/master/python/pyspark/tests.py >>> >>> >>> >>> Any suggestions would be greatly appreciated. >>> >>> >>> >>> Andy >>> >>> >>> >>> My downloaed version is spark-2.3.0-bin-hadoop2.7 >>> >>> >>> >>> My virtual env version is >>> >>> (spark-2.3.0) $ pip show pySpark >>> >>> Name: pyspark >>> >>> Version: 2.3.0 >>> >>> Summary: Apache Spark Python API >>> >>> Home-page: https://github.com/apache/spark/tree/master/python >>> >>> Author: Spark Developers >>> >>> Author-email: d...@spark.apache.org >>> >>> License: http://www.apache.org/licenses/LICENSE-2.0 >>> >>> Location: >>> /Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site-packages >>> >>> Requires: py4j >>> >>> (spark-2.3.0) $ >>> >>> >>> >>> (spark-2.3.0) $ python --version >>> >>> Python 3.6.1 >>> >>> (spark-2.3.0) $ >>> >>> >