Cool.. Thanks.. And one last final question..
conf = SparkConf.set(....).set(...)
matrix = get_data(..)
rdd = sc.parallelize(matrix) # heap error here...
How and where do I set set the storage level.. seems like conf is the wrong
place to set this thing up..?? as I get this error:
py4j.protocol.Py4JJavaError: An error occurred while calling
None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalArgumentException: For input string:
"StorageLevel.MEMORY_AND_DISK_SER"
?
Thanks for all the help

On Mon, Oct 13, 2014 at 12:15 AM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> like this you can set:
>
> sparkConf.set("spark.executor.extraJavaOptions", " -XX:+UseCompressedOops
> -XX:+UseConcMarkSweepGC -XX:+AggressiveOpts -XX:FreqInlineSize=300
> -XX:MaxInlineSize=300 ")
>
> Here's a benchmark example
> <https://github.com/tdas/spark-streaming-benchmark/blob/bd591dbe9e2836d9a72b87c3e63e30ffd908dfd6/Benchmark.scala#L30>
>
> Thanks
> Best Regards
>
> On Mon, Oct 13, 2014 at 12:36 PM, Chengi Liu <chengi.liu...@gmail.com>
> wrote:
>
>> Hi Akhil,
>>   Thanks for the response..
>> Another query... do you know how to use "spark.executor.extraJavaOptions"
>> option?
>> SparkConf.set("spark.executor.extraJavaOptions","what value should go in
>> here")?
>> I am trying to find an example but cannot seem to find the same..
>>
>>
>> On Mon, Oct 13, 2014 at 12:03 AM, Akhil Das <ak...@sigmoidanalytics.com>
>> wrote:
>>
>>> Few things to keep in mind:
>>> - I believe Driver memory should not exceed executor memory
>>> - Set spark.storage.memoryFraction default is 0.6
>>> - Set spark.rdd.compress default is set to false
>>> - Always specify the level of parallelism while doing a groupBy,
>>> reduceBy, join, sortBy etc.
>>> - If you don't have enough memory and the data is huge, then set the
>>> Storage level to DISK_AND_MEMORY_SER
>>>
>>> More you can read over here.
>>> <http://spark.apache.org/docs/1.0.0/tuning.html>
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Sun, Oct 12, 2014 at 10:28 PM, Chengi Liu <chengi.liu...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>   I am trying to use spark but I am having hard time configuring the
>>>> sparkconf...
>>>> My current conf is
>>>> conf =
>>>> SparkConf().set("spark.executor.memory","10g").set("spark.akka.frameSize",
>>>> "100000000").set("spark.driver.memory","16g")
>>>>
>>>> but I still see the java heap size error
>>>> 14/10/12 09:54:50 ERROR Executor: Exception in task 3.0 in stage 0.0
>>>> (TID 3)
>>>> java.lang.OutOfMemoryError: Java heap space
>>>> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
>>>> at
>>>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
>>>> at
>>>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
>>>> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
>>>> at
>>>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:332)
>>>> at
>>>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
>>>> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>>>> at
>>>> com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:34)
>>>> at
>>>> com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:21)
>>>> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>>>> at org.apache.spark.serializer.KryoDeserializationStream.readO
>>>>
>>>>
>>>> Whats the right way to turn these knobs and what other knobs I can play
>>>> with.
>>>> Thanks
>>>>
>>>
>>>
>>
>

Reply via email to