Hi,
I'm using Spark Streaming 1.3 on CDH5.1 with yarn-cluster mode. I find out
that YARN is killing the driver and executor process because of excessive
use of memory. Here's something I tried:
1. Xmx is set to 512M and the GC looks fine (one ygc per 10s), so the extra
memory is not used by heap.
-2-direct-approach-no-receivers>
> that comes up with spark.
>
> Thanks
> Best Regards
>
> On Wed, May 27, 2015 at 11:51 AM, Ji ZHANG wrote:
>
>> Hi,
>>
>> I'm using Spark Streaming 1.3 on CDH5.1 with yarn-cluster mode. I find
>> out that YARN is
he createStream or createDirectStream api? If its the
> former, you can try setting the StorageLevel to MEMORY_AND_DISK (it might
> slow things down though). Another way would be to try the later one.
>
> Thanks
> Best Regards
>
> On Wed, May 27, 2015 at 1:00 PM, Ji ZHANG wrote:
>
and then print it to logs.
Thanks.
On Thu, May 28, 2015 at 3:07 PM, Akhil Das
wrote:
> Hi Zhang,
>
> Could you paste your code in a gist? Not sure what you are doing inside
> the code to fill up memory.
>
> Thanks
> Best Regards
>
> On Thu, May 28, 2015 at 10:08 AM
ogger.info(rdd.count()))
>
>
>
> Thanks
> Best Regards
>
> On Thu, May 28, 2015 at 1:02 PM, Ji ZHANG wrote:
>
>> Hi,
>>
>> I wrote a simple test job, it only does very basic operations. for
>> example:
>>
>> val lines = KafkaUtils.
On Tue, Jun 2, 2015 at 2:58 PM, Tathagata Das wrote:
> While you are running is it possible for you login into the YARN node and
> get histograms of live objects using "jmap -histo:live". That may reveal
> something.
>
>
> On Thursday, May 28, 2015, Ji ZHANG wrote:
(more the better). Diffing histo is easy. Diff two dumps can be
> done in JVisualVM, it will show the diff in the objects that got added in
> the later dump. That makes it easy to debug what is not getting cleaned.
>
> TD
>
>
> On Tue, Jun 2, 2015 at 7:33 PM, Ji ZHANG wrote:
>
you set "spark.shuffle.io.preferDirectBufs" to false to turn off the
> off-heap allocation of netty?
>
> Best Regards,
> Shixiong Zhu
>
> 2015-06-03 11:58 GMT+08:00 Ji ZHANG :
>
>> Hi,
>>
>> Thanks for you information. I'll give spark1.4 a try when it
Hi,
We switched from ParallelGC to CMS, and the symptom is gone.
On Thu, Jun 4, 2015 at 3:37 PM, Ji ZHANG wrote:
> Hi,
>
> I set spark.shuffle.io.preferDirectBufs to false in SparkConf and this
> setting can be seen in web ui's environment tab. But, it still eats memory,
&
Hi,
I'm using spark streaming 1.0. I create dstream with kafkautils and
apply some operations on it. There's a reduceByWindow operation at
last so I suppose the checkpoint interval should be automatically set
to more than 10 seconds. But what I see is it still checkpoint every 2
seconds (my batch
Hi,
I'm using Spark Streaming 1.0.
Say I have a source of website click stream, like the following:
('2014-09-19 00:00:00', '192.168.1.1', 'home_page')
('2014-09-19 00:00:01', '192.168.1.2', 'list_page')
...
And I want to calculate the page views (PV, number of logs) and unique
user (UV, identi
Hi,
I'm developing an application with spark-streaming-kafka, which
depends on spark-streaming and kafka. Since spark-streaming is
provided in runtime, I want to exclude the jars from the assembly. I
tried the following configuration:
libraryDependencies ++= {
val sparkVersion = "1.0.2"
Seq(
Hi,
Suppose I have a stream of logs and I want to count them by minute.
The result is like:
2014-10-26 18:38:00 100
2014-10-26 18:39:00 150
2014-10-26 18:40:00 200
One way to do this is to set the batch interval to 1 min, but each
batch would be quite large.
Or I can use updateStateByKey where
Hi,
I'm using Spark Streaming 1.1, and I have the following logs keep growing:
/opt/spark-1.1.0-bin-cdh4/work/app-20141029175309-0005/2/stderr
I think it is executor log, so I setup the following options in
spark-defaults.conf:
spark.executor.logs.rolling.strategy time
spark.executor.logs.rolli
rval=daily
-Dspark.executor.logs.rolling.maxRetainedFiles=3"
Maybe in yarn mode the spark-defaults.conf would be sufficient, but
I've not tested.
On Tue, Nov 4, 2014 at 12:24 PM, Ji ZHANG wrote:
> Hi,
>
> I'm using Spark Streaming 1.1, and I have the following logs keep growing:
>
> /opt/spark-1.1.0
Hi,
Recently I'm migrating from Shark 0.9 to Spark SQL 1.2, my CDH version
is 4.5, Hive 0.11. I've managed to setup Spark SQL Thriftserver, and
normal queries work fine, but custom UDF is not usable.
The symptom is when executing CREATE TEMPORARY FUNCTION, the query
hangs on a lock request:
14/1
Hi,
I want to join a DStream with some other dataset, e.g. join a click
stream with a spam ip list. I can think of two possible solutions, one
is use broadcast variable, and the other is use transform operation as
is described in the manual.
But the problem is the spam ip list will be updated out
t])
}
Now it looks less ugly...
Anyway, I still hope there's a better solution.
On Sun, Jan 18, 2015 at 2:12 AM, Jörn Franke wrote:
> Can't you send a special event through spark streaming once the list is
> updated? So you have your normal events and a special reload even
s accessed?
> Or some reasonable interval? This is just something you write in Java/Scala.
> On Jan 17, 2015 2:06 PM, "Ji ZHANG" wrote:
>
>> Hi,
>>
>> I want to join a DStream with some other dataset, e.g. join a click
>> stream with a spam ip list. I can thin
19 matches
Mail list logo