I hope someone could help with this problem I am having. I have previously setup a VM in windows using CENTOS, with hadoop and spark (all in singlenode) and it was working perfectly.
I am now running a multinode setup with another computer, both running CENTOS standalone. I have installed hadoop successfully and is running on both machines. Then I've installed spark with the following setup: Version : Spark 2.2.1-bin-hadoop2.7, with the .bashrc file as follows: export SPARK_HOME=/opt/spark/spark-2.2.1-bin-hadoop2.7 export PATH=$PATH:$SPARK_HOME/bin export PATH="/home/hadoop/anaconda2/bin:$PATH" I am using anaconda (python 2.7 version) to install the pyspark packages. I then have the $SPARK_HOME/conf files setup as follows: the slaves file as: datanode1 (the hostname of the node which i use to conduct the processing on) and the spark-env.sh file: export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk export HADOOP_CONF_DIR=/opt/hadoop/hadoop-2.8.3/etc/hadoop export SPARK_WORKER_CORES=6 The idea is that I then connect the spark to PyCharm IDE to do my work on. In Pycharm I have setup the environment variables (under run -> edit configurations) as PYTHON PATH /opt/spark/spark-2.2.1-bin-hadoop2.7/python/lib SPARK_HOME /opt/spark/spark-2.2.1-bin-hadoop2.7 An image of the environment variables: <http://apache-spark-user-list.1001560.n3.nabble.com/file/t9029/UKaNp.png> I have also setup my python interpreter to point to the anaconda python directory. With all this setup I get multiple errors as output when I call either a spark SQLContext or SparkSession.Builder, for example: conf = SparkConf().setMaster("local[*]") sc = SparkContext(conf=conf) sql_sc = SQLContext(sc) or spark = SparkSession.builder.master("local").appName("PythonTutPrac").config("spark.executor.memory","2gb").getOrCreate() The ERROR given: File "/home/hadoop/Desktop/PythonPrac/CollaborativeFiltering.py", line 72, in .config("spark.executor.memory", "2gb") \ File "/opt/spark/spark-2.2.1-bin-hadoop2.7/python/pyspark/sql/session.py", line 183, in getOrCreate session._jsparkSession.sessionState().conf().setConfString(key, value) File "/home/hadoop/anaconda2/lib/python2.7/site-packages/py4j/java_gateway.py", line 1160, in call answer, self.gateway_client, self.target_id, self.name) File "/opt/spark/spark-2.2.1-bin-hadoop2.7/python/pyspark/sql/utils.py", line 79, in deco raise IllegalArgumentException(s.split(': ', 1)1, stackTrace) pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.internal.SessionStateBuilder':" Unhandled exception in thread started by > Process finished with exit code 1 An image of the error: <http://apache-spark-user-list.1001560.n3.nabble.com/file/t9029/A3D0u.png> I do not know why this error message is showing, when I was running this in my VM single node, it was working fine. I then decided in my multinode setup to remove the datanode1 and just run it again as a singlenode setup with my main computer (hostname - master), but still getting the same errors. I hope someone could help, as I have followed other guides to setup pycharm with pyspark, but could not figure out what is going wrong. Thanks! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org