Glad to hear that. :)
On Thu, Jun 18, 2015 at 6:25 AM, Ji ZHANG wrote:
> Hi,
>
> We switched from ParallelGC to CMS, and the symptom is gone.
>
> On Thu, Jun 4, 2015 at 3:37 PM, Ji ZHANG wrote:
>
>> Hi,
>>
>> I set spark.shuffle.io.preferDirectBufs to false in SparkConf and this
>> setting can
Hi,
We switched from ParallelGC to CMS, and the symptom is gone.
On Thu, Jun 4, 2015 at 3:37 PM, Ji ZHANG wrote:
> Hi,
>
> I set spark.shuffle.io.preferDirectBufs to false in SparkConf and this
> setting can be seen in web ui's environment tab. But, it still eats memory,
> i.e. -Xmx set to 512M
Hi,
I set spark.shuffle.io.preferDirectBufs to false in SparkConf and this
setting can be seen in web ui's environment tab. But, it still eats memory,
i.e. -Xmx set to 512M but RES grows to 1.5G in half a day.
On Wed, Jun 3, 2015 at 12:02 PM, Shixiong Zhu wrote:
> Could you set "spark.shuffle.
Could you set "spark.shuffle.io.preferDirectBufs" to false to turn off the
off-heap allocation of netty?
Best Regards,
Shixiong Zhu
2015-06-03 11:58 GMT+08:00 Ji ZHANG :
> Hi,
>
> Thanks for you information. I'll give spark1.4 a try when it's released.
>
> On Wed, Jun 3, 2015 at 11:31 AM, Tathag
Hi,
Thanks for you information. I'll give spark1.4 a try when it's released.
On Wed, Jun 3, 2015 at 11:31 AM, Tathagata Das wrote:
> Could you try it out with Spark 1.4 RC3?
>
> Also pinging, Cloudera folks, they may be aware of something.
>
> BTW, the way I have debugged memory leaks in the pa
Could you try it out with Spark 1.4 RC3?
Also pinging, Cloudera folks, they may be aware of something.
BTW, the way I have debugged memory leaks in the past is as follows.
Run with a small driver memory, say 1 GB. Periodically (maybe a script),
take snapshots of histogram and also do memory dump
Hi,
Thanks for you reply. Here's the top 30 entries of jmap -histo:live result:
num #instances #bytes class name
--
1: 40802 145083848 [B
2: 99264 12716112
3: 99264 12291480
4:
Hi,
Unfortunately, they're still growing, both driver and executors.
I run the same job with local mode, everything is fine.
On Thu, May 28, 2015 at 5:26 PM, Akhil Das
wrote:
> Can you replace your counting part with this?
>
> logs.filter(_.s_id > 0).foreachRDD(rdd => logger.info(rdd.count()))
Can you replace your counting part with this?
logs.filter(_.s_id > 0).foreachRDD(rdd => logger.info(rdd.count()))
Thanks
Best Regards
On Thu, May 28, 2015 at 1:02 PM, Ji ZHANG wrote:
> Hi,
>
> I wrote a simple test job, it only does very basic operations. for example:
>
> val lines = Kaf
Hi,
I wrote a simple test job, it only does very basic operations. for example:
val lines = KafkaUtils.createStream(ssc, zkQuorum, group, Map(topic ->
1)).map(_._2)
val logs = lines.flatMap { line =>
try {
Some(parse(line).extract[Impression])
} catch {
case _:
Hi Zhang,
Could you paste your code in a gist? Not sure what you are doing inside the
code to fill up memory.
Thanks
Best Regards
On Thu, May 28, 2015 at 10:08 AM, Ji ZHANG wrote:
> Hi,
>
> Yes, I'm using createStream, but the storageLevel param is by default
> MEMORY_AND_DISK_SER_2. Besides,
Hi,
Yes, I'm using createStream, but the storageLevel param is by default
MEMORY_AND_DISK_SER_2. Besides, the driver's memory is also growing. I
don't think Kafka messages will be cached in driver.
On Thu, May 28, 2015 at 12:24 AM, Akhil Das
wrote:
> Are you using the createStream or createDir
Are you using the createStream or createDirectStream api? If its the
former, you can try setting the StorageLevel to MEMORY_AND_DISK (it might
slow things down though). Another way would be to try the later one.
Thanks
Best Regards
On Wed, May 27, 2015 at 1:00 PM, Ji ZHANG wrote:
> Hi Akhil,
>
Hi Akhil,
Thanks for your reply. Accoding to the Streaming tab of Web UI, the
Processing Time is around 400ms, and there's no Scheduling Delay, so I
suppose it's not the Kafka messages that eat up the off-heap memory. Or
maybe it is, but how to tell?
I googled about how to check the off-heap memo
After submitting the job, if you do a ps aux | grep spark-submit then you
can see all JVM params. Are you using the highlevel consumer (receiver
based) for receiving data from Kafka? In that case if your throughput is
high and the processing delay exceeds batch interval then you will hit this
memor
Hi,
I'm using Spark Streaming 1.3 on CDH5.1 with yarn-cluster mode. I find out
that YARN is killing the driver and executor process because of excessive
use of memory. Here's something I tried:
1. Xmx is set to 512M and the GC looks fine (one ygc per 10s), so the extra
memory is not used by heap.
16 matches
Mail list logo