Re: bug using kryo as closure serializer

2014-05-04 Thread Reynold Xin
Technically you only need to change the build file, and change part of a line in SparkEnv so you don't have to break your oath :) On Sun, May 4, 2014 at 10:22 PM, Soren Macbeth wrote: > that would violate my personal oath of never writing a single line of > scala, but I might be able to do tha

Re: bug using kryo as closure serializer

2014-05-04 Thread Soren Macbeth
that would violate my personal oath of never writing a single line of scala, but I might be able to do that if I can get past the issue this issue I'm struggling with in this thread. On Sunday, May 4, 2014, Reynold Xin wrote: > Thanks. Do you mind playing with chill-scala a little bit and see if

Re: bug using kryo as closure serializer

2014-05-04 Thread Reynold Xin
Thanks. Do you mind playing with chill-scala a little bit and see if it actually works well for closures? One way to try is to hard code the serializer to use Kryo with chill-scala, and then run through all the unit tests. If it works well, we can incorporate that in the next release (probably not

Re: bug using kryo as closure serializer

2014-05-04 Thread Soren Macbeth
fwiw, it seems like it wouldn't be very difficult to integrate chill-scala, since you're already chill-java and probably get kryo serialization of closures and all sorts of other scala stuff for free. All that would be needed would be to include the dependency and then update KryoSerializer to regi

Re: bug using kryo as closure serializer

2014-05-04 Thread Reynold Xin
Good idea. I submitted a pull request for the doc update here: https://github.com/apache/spark/pull/642 On Sun, May 4, 2014 at 3:54 PM, Soren Macbeth wrote: > Thanks for the reply! > > Ok, if that's the case, I'd recommend a note to that affect in the docs at > least. > > Just to give some more

Re: bug using kryo as closure serializer

2014-05-04 Thread Reynold Xin
Kryo does generate code for serialization, so the CPU overhead is quite lower than Java (which I think just uses reflection). As I understand, they also have a new implementation that uses unsafe intrinsics, which should lead to even higher performance. The generated byte[] size was a lot smaller

Re: Apache Spark running out of the spark shell

2014-05-04 Thread Nicolas Garneau
Hey AJ, I have tried to run on a cluster yet, only on local mode. I'll try to get something running on a cluster soon and keep you posted. Nicolas Garneau > On May 4, 2014, at 6:23 PM, Ajay Nair wrote: > > Now I got it to work .. well almost. However I needed to copy the project/ > folder to t

Re: bug using kryo as closure serializer

2014-05-04 Thread Soren Macbeth
Thanks for the reply! Ok, if that's the case, I'd recommend a note to that affect in the docs at least. Just to give some more context here, I'm working on a Clojure DSL for Spark called Flambo, which I plan to open source shortly. If I could I'd like to focus on the initial bug that I hit. Exce

Re: Apache Spark running out of the spark shell

2014-05-04 Thread Ajay Nair
Now I got it to work .. well almost. However I needed to copy the project/ folder to the spark-standalone folder as the package build was failing because it could not find buil properties. After the copy the build was successful. However when I run it I get errors but it still gives me the output.

Re: bug using kryo as closure serializer

2014-05-04 Thread Mridul Muralidharan
On a slightly related note (apologies Soren for hijacking the thread), Reynold how much better is kryo from spark's usage point of view compared to the default java serialization (in general, not for closures) ? The numbers on kyro site are interesting, but since you have played the most with kryo

Re: bug using kryo as closure serializer

2014-05-04 Thread Reynold Xin
I added the config option to use the non-default serializer. However, at the time, Kryo fails serializing pretty much any closures so that option was never really used / recommended. Since then the Scala ecosystem has developed, and some other projects are starting to use Kryo to serialize more Sc

Re: Apache Spark running out of the spark shell

2014-05-04 Thread Nicolas Garneau
Hey AJ, If you plan to launch your job on a cluster, consider using the spark-submit command. Running this in the spark's home directory gives you a help on how to use this: $ ./bin/spark-submit I haven't tried it yet but considering this post, it will be the preferred way to launch jobs: http

Re: Apache Spark running out of the spark shell

2014-05-04 Thread Ajay Nair
Thank you. I am trying this now -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-running-out-of-the-spark-shell-tp6459p6472.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

bug using kryo as closure serializer

2014-05-04 Thread Soren Macbeth
apologies for the cross-list posts, but I've gotten zero response in the user list and I guess this list is probably more appropriate. According to the documentation, using the KryoSerializer for closures is supported. However, when I try to set `spark.closure.serializer` to `org.apache.spark.seri

Re: reduce, transform, combine

2014-05-04 Thread Manish Amde
Thanks DB. I will work with mapPartition for now.  Question to the community in general: should we consider adding such an operation to RDDs especially as a developer API? On Sun, May 4, 2014 at 1:41 AM, DB Tsai wrote: > You could easily achieve this by mapPartition. However, it seems that it

Re: Mailing list

2014-05-04 Thread Nicolas Lalevée
Le 4 mai 2014 à 06:30, Matei Zaharia a écrit : > Hi Nicolas, > > Good catches on these things. > >> Your website seems a little bit incomplete. I have found this page [1] with >> list the two main mailing lists, users and dev. But I see a reference to a >> mailing list about "issues" which t

Re: reduce, transform, combine

2014-05-04 Thread DB Tsai
You could easily achieve this by mapPartition. However, it seems that it can not be done by using aggregate type of operation. I can see that it's a general useful operation. For now, you could use mapPartition. Sincerely, DB Tsai --- My Blog:

reduce, transform, combine

2014-05-04 Thread Manish Amde
I am currently using the RDD aggregate operation to reduce (fold) per partition and then combine using the RDD aggregate operation. def aggregate[U: ClassTag](zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U) => U): U I need to perform a transform operation after the seqOp and before the combOp. Th