I've worked around this by dropping the jars into a directory (spark_jars)
and then creating a spark-defaults.conf file in conf containing this:
spark.driver.extraClassPath/home/mj/apps/spark_jars/*
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabbl
Thank you for your response, however, I'm afraid I still can't get it to
work, this is my code:
jar_path = '/home/mj/apps/spark_jars/spark-csv_2.11-1.0.3.jar'
spark_config =
SparkConf().setMaster('local').setAppName('data_frame_test').set("spar
Hi,
I'm trying to figure out how to use a third party jar inside a python
program which I'm running via PyCharm in order to debug it. I am normally
able to run spark code in python such as this:
spark_conf = SparkConf().setMaster('local').setAppName('test')
sc = SparkContext(conf=spark_co
Hi,
I'm having trouble using the --packages option for spark-shell.cmd - I have
to use Windows at work and have been issued a username with a space in it
that means when I use the --packages option it fails with this message:
"Exception in thread "main" java.net.URISyntaxException: Illegal charac
Hi,
I'm trying to use pyspark to save a simple rdd to a text file (code below),
but it keeps throwing an error.
- Python Code -
items=["Hello", "world"]
items2 = sc.parallelize(items)
items2.coalesce(1).saveAsTextFile('c:/tmp/python_out.csv')
- Error --C:\Python27\py
You could try using zipWIthIndex (links below to API docs). For example, in
python:
items =['a','b','c']
items2= sc.parallelize(items)
print(items2.first())
items3=items2.map(lambda x: (x, x+"!"))
print(items3.first())
items4=items3.zipWithIndex()
print(items4.first())
items5=items4.map(lamb
;m running
PySpark via PyCharm and the information for my environment is:
OS: Windows 7
Python version: 2.7.9
Spark version: 1.1.1
Java version: 1.8
I've also included the py file I am using. I'd appreciate any help you can
give me,
MJ.
ERROR MESSAGE--