Cool.. Thanks.. And one last final question.. conf = SparkConf.set(....).set(...) matrix = get_data(..) rdd = sc.parallelize(matrix) # heap error here... How and where do I set set the storage level.. seems like conf is the wrong place to set this thing up..?? as I get this error: py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.lang.IllegalArgumentException: For input string: "StorageLevel.MEMORY_AND_DISK_SER" ? Thanks for all the help
On Mon, Oct 13, 2014 at 12:15 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > like this you can set: > > sparkConf.set("spark.executor.extraJavaOptions", " -XX:+UseCompressedOops > -XX:+UseConcMarkSweepGC -XX:+AggressiveOpts -XX:FreqInlineSize=300 > -XX:MaxInlineSize=300 ") > > Here's a benchmark example > <https://github.com/tdas/spark-streaming-benchmark/blob/bd591dbe9e2836d9a72b87c3e63e30ffd908dfd6/Benchmark.scala#L30> > > Thanks > Best Regards > > On Mon, Oct 13, 2014 at 12:36 PM, Chengi Liu <chengi.liu...@gmail.com> > wrote: > >> Hi Akhil, >> Thanks for the response.. >> Another query... do you know how to use "spark.executor.extraJavaOptions" >> option? >> SparkConf.set("spark.executor.extraJavaOptions","what value should go in >> here")? >> I am trying to find an example but cannot seem to find the same.. >> >> >> On Mon, Oct 13, 2014 at 12:03 AM, Akhil Das <ak...@sigmoidanalytics.com> >> wrote: >> >>> Few things to keep in mind: >>> - I believe Driver memory should not exceed executor memory >>> - Set spark.storage.memoryFraction default is 0.6 >>> - Set spark.rdd.compress default is set to false >>> - Always specify the level of parallelism while doing a groupBy, >>> reduceBy, join, sortBy etc. >>> - If you don't have enough memory and the data is huge, then set the >>> Storage level to DISK_AND_MEMORY_SER >>> >>> More you can read over here. >>> <http://spark.apache.org/docs/1.0.0/tuning.html> >>> >>> Thanks >>> Best Regards >>> >>> On Sun, Oct 12, 2014 at 10:28 PM, Chengi Liu <chengi.liu...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> I am trying to use spark but I am having hard time configuring the >>>> sparkconf... >>>> My current conf is >>>> conf = >>>> SparkConf().set("spark.executor.memory","10g").set("spark.akka.frameSize", >>>> "100000000").set("spark.driver.memory","16g") >>>> >>>> but I still see the java heap size error >>>> 14/10/12 09:54:50 ERROR Executor: Exception in task 3.0 in stage 0.0 >>>> (TID 3) >>>> java.lang.OutOfMemoryError: Java heap space >>>> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) >>>> at >>>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) >>>> at >>>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) >>>> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) >>>> at >>>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:332) >>>> at >>>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293) >>>> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) >>>> at >>>> com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:34) >>>> at >>>> com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:21) >>>> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) >>>> at org.apache.spark.serializer.KryoDeserializationStream.readO >>>> >>>> >>>> Whats the right way to turn these knobs and what other knobs I can play >>>> with. >>>> Thanks >>>> >>> >>> >> >