Like this: import org.apache.spark.storage.StorageLevel val rdd = sc.parallelize(1 to 1000000).persist(StorageLevel.MEMORY_AND_DISK_SER)
Thanks Best Regards On Mon, Oct 13, 2014 at 12:50 PM, Chengi Liu <chengi.liu...@gmail.com> wrote: > Cool.. Thanks.. And one last final question.. > conf = SparkConf.set(....).set(...) > matrix = get_data(..) > rdd = sc.parallelize(matrix) # heap error here... > How and where do I set set the storage level.. seems like conf is the > wrong place to set this thing up..?? as I get this error: > py4j.protocol.Py4JJavaError: An error occurred while calling > None.org.apache.spark.api.java.JavaSparkContext. > : java.lang.IllegalArgumentException: For input string: > "StorageLevel.MEMORY_AND_DISK_SER" > ? > Thanks for all the help > > On Mon, Oct 13, 2014 at 12:15 AM, Akhil Das <ak...@sigmoidanalytics.com> > wrote: > >> like this you can set: >> >> sparkConf.set("spark.executor.extraJavaOptions", " -XX:+UseCompressedOops >> -XX:+UseConcMarkSweepGC -XX:+AggressiveOpts -XX:FreqInlineSize=300 >> -XX:MaxInlineSize=300 ") >> >> Here's a benchmark example >> <https://github.com/tdas/spark-streaming-benchmark/blob/bd591dbe9e2836d9a72b87c3e63e30ffd908dfd6/Benchmark.scala#L30> >> >> Thanks >> Best Regards >> >> On Mon, Oct 13, 2014 at 12:36 PM, Chengi Liu <chengi.liu...@gmail.com> >> wrote: >> >>> Hi Akhil, >>> Thanks for the response.. >>> Another query... do you know how to use "spark.executor.extraJavaOptions" >>> option? >>> SparkConf.set("spark.executor.extraJavaOptions","what value should go >>> in here")? >>> I am trying to find an example but cannot seem to find the same.. >>> >>> >>> On Mon, Oct 13, 2014 at 12:03 AM, Akhil Das <ak...@sigmoidanalytics.com> >>> wrote: >>> >>>> Few things to keep in mind: >>>> - I believe Driver memory should not exceed executor memory >>>> - Set spark.storage.memoryFraction default is 0.6 >>>> - Set spark.rdd.compress default is set to false >>>> - Always specify the level of parallelism while doing a groupBy, >>>> reduceBy, join, sortBy etc. >>>> - If you don't have enough memory and the data is huge, then set the >>>> Storage level to DISK_AND_MEMORY_SER >>>> >>>> More you can read over here. >>>> <http://spark.apache.org/docs/1.0.0/tuning.html> >>>> >>>> Thanks >>>> Best Regards >>>> >>>> On Sun, Oct 12, 2014 at 10:28 PM, Chengi Liu <chengi.liu...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> I am trying to use spark but I am having hard time configuring the >>>>> sparkconf... >>>>> My current conf is >>>>> conf = >>>>> SparkConf().set("spark.executor.memory","10g").set("spark.akka.frameSize", >>>>> "100000000").set("spark.driver.memory","16g") >>>>> >>>>> but I still see the java heap size error >>>>> 14/10/12 09:54:50 ERROR Executor: Exception in task 3.0 in stage 0.0 >>>>> (TID 3) >>>>> java.lang.OutOfMemoryError: Java heap space >>>>> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) >>>>> at >>>>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) >>>>> at >>>>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) >>>>> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) >>>>> at >>>>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:332) >>>>> at >>>>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293) >>>>> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) >>>>> at >>>>> com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:34) >>>>> at >>>>> com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:21) >>>>> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) >>>>> at org.apache.spark.serializer.KryoDeserializationStream.readO >>>>> >>>>> >>>>> Whats the right way to turn these knobs and what other knobs I can >>>>> play with. >>>>> Thanks >>>>> >>>> >>>> >>> >> >