Re: spark.kryo.classesToRegister

2016-01-28 Thread Jim Lohse
You are only required to add classes to Kryo (compulsorily) if you use a specific setting: //require registration of all classes with Kyro .set("spark.kryo.registrationRequired","true") Here's an example of my setup, I think this is the best approach because it forces me to really think about

Contrib to Docs: Re: SparkContext SyntaxError: invalid syntax

2016-01-18 Thread Jim Lohse
I don't think you have to build the docs, just fork them on Github and submit the pull request? I have been able to do is submit a pull request just by editing the markdown file, I am just confused if I am supposed to merge it myself or wait for notification and/or wait for someone else to mer

Re: Fwd: how to submit multiple jar files when using spark-submit script in shell?

2016-01-12 Thread Jim Lohse
Thanks for your answer, you are correct, it's just a different approach than the one I am asking for :) Building an uber- or assembly- jar goes against the idea of placing the jars on all workers. Uber-jars increase network traffic, using local:/ in the classpath reduces network traffic. Eve

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Jim Lohse
Hey Python 2.6 don't let the door hit you on the way out! haha Drop It No Problem On 01/05/2016 12:17 AM, Reynold Xin wrote: Does anybody here care about us dropping support for Python 2.6 in Spark 2.0? Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json parsing) when compar

Re: feedback on the use of Spark’s gateway hidden REST API (standalone cluster mode) for application submission

2016-01-02 Thread Jim Lohse
There is a lot of interesting info about this API here: https://issues.apache.org/jira/browse/SPARK-5388 I got that from a comment thread on the last link in your PR. Thanks for bringing this up! I knew you could check status via REST per http://spark.apache.org/docs/latest/monitoring.html#res

Re: Dynamic jar loading

2015-12-18 Thread Jim Lohse
I am going to say no, but have not actually tested this. Just going on this line in the docs: http://spark.apache.org/docs/latest/configuration.html |spark.driver.extraClassPath| (none) Extra classpath entries to prepend to the classpath of the driver. /Note:/ In client mode, this config mus

Re: Pyspark submitted app just hangs

2015-12-02 Thread Jim Lohse
Is this the stderr output from a woker? Are any files being written? Can you run in debug and see how far it's getting? This to me doesn't give me a direction to look without the actual logs from $SPARK_HOME or the stderr from the worker UI. Just imho maybe someone know what this means but it

Confirm this won't parallelize/partition?

2015-11-28 Thread Jim Lohse
Hi, I got a good answer on the main question elsewhere, would anyone please confirm the first code is the right approach? For a MVCE I am trying to adapt this example and it's seems like I am having Java issues with types: (but this is basically the right approach?) int count = spark.parallel