from:"mj"

Re: How to add jars to standalone pyspark program

2015-05-06 Thread mj

I've worked around this by dropping the jars into a directory (spark_jars) and then creating a spark-defaults.conf file in conf containing this: spark.driver.extraClassPath/home/mj/apps/spark_jars/* -- View this message in context: http://apache-spark-user-list.1001560.n3.nabbl

Re: How to add jars to standalone pyspark program

2015-05-06 Thread mj

Thank you for your response, however, I'm afraid I still can't get it to work, this is my code: jar_path = '/home/mj/apps/spark_jars/spark-csv_2.11-1.0.3.jar' spark_config = SparkConf().setMaster('local').setAppName('data_frame_test').set("spar

How to add jars to standalone pyspark program

2015-04-28 Thread mj

Hi, I'm trying to figure out how to use a third party jar inside a python program which I'm running via PyCharm in order to debug it. I am normally able to run spark code in python such as this: spark_conf = SparkConf().setMaster('local').setAppName('test') sc = SparkContext(conf=spark_co

Change ivy cache for spark on Windows

2015-04-27 Thread mj

Hi, I'm having trouble using the --packages option for spark-shell.cmd - I have to use Windows at work and have been issued a username with a space in it that means when I use the --packages option it fails with this message: "Exception in thread "main" java.net.URISyntaxException: Illegal charac

pyspark 1.1.1 on windows saveAsTextFile - NullPointerException

2014-12-18 Thread mj

Hi, I'm trying to use pyspark to save a simple rdd to a text file (code below), but it keeps throwing an error. - Python Code - items=["Hello", "world"] items2 = sc.parallelize(items) items2.coalesce(1).saveAsTextFile('c:/tmp/python_out.csv') - Error --C:\Python27\py

Re: Appending an incrental value to each RDD record

2014-12-16 Thread mj

You could try using zipWIthIndex (links below to API docs). For example, in python: items =['a','b','c'] items2= sc.parallelize(items) print(items2.first()) items3=items2.map(lambda x: (x, x+"!")) print(items3.first()) items4=items3.zipWithIndex() print(items4.first()) items5=items4.map(lamb

Pyspark 1.1.1 error with large number of records - serializer.dump_stream(func(split_index, iterator), outfile)

2014-12-16 Thread mj

;m running PySpark via PyCharm and the information for my environment is: OS: Windows 7 Python version: 2.7.9 Spark version: 1.1.1 Java version: 1.8 I've also included the py file I am using. I'd appreciate any help you can give me, MJ. ERROR MESSAGE--

Re: How to add jars to standalone pyspark program

Re: How to add jars to standalone pyspark program

How to add jars to standalone pyspark program

Change ivy cache for spark on Windows

pyspark 1.1.1 on windows saveAsTextFile - NullPointerException

Re: Appending an incrental value to each RDD record

Pyspark 1.1.1 error with large number of records - serializer.dump_stream(func(split_index, iterator), outfile)

7 matches

Site Navigation

Mail list logo

Footer information