Re: Spark streaming driver java process RSS memory constantly increasing using cassandra driver

2015-12-14 Thread Conor Fennell
is not in "Heap or non-Heap". If it is not > heap related than it has to be the native memory that is leaking. I can't say > for sure but you do have Threads working there and that could be using the > native memory. We didn't get any pics of JConsole. > >

Re: Spark streaming driver java process RSS memory constantly increasing using cassandra driver

2015-12-14 Thread Conor Fennell
Just bumping the issue I am having, if anyone can provide direction? I have been stuck on this for a while now. Thanks, Conor On Fri, Dec 11, 2015 at 5:10 PM, Conor Fennell wrote: > Hi, > > I have a memory leak in the spark driver which is not in the heap or > the non-heap. &

Sporadic error after moving from kafka receiver to kafka direct stream

2015-10-22 Thread Conor Fennell
Hi, Firstly want to say a big thanks to Cody for contributing the kafka direct stream. I have been using the receiver based approach for months but the direct stream is a much better solution for my use case. The job in question is now ported over to the direct stream doing idempotent outputs to

Sporadic error after moving from kafka receiver to kafka direct stream

2015-10-22 Thread Conor Fennell
Hi, Firstly want to say a big thanks to Cody for contributing the kafka direct stream. I have been using the receiver based approach for months but the direct stream is a much better solution for my use case. The job in question is now ported over to the direct stream doing idempotent outputs to

Sporadic error after moving from kafka receiver to kafka direct stream

2015-10-21 Thread Conor Fennell
Hi, Firstly want to say a big thanks to Cody for contributing the kafka direct stream. I have been using the receiver based approach for months but the direct stream is a much better solution for my use case. The job in question is now ported over to the direct stream doing idempotent outputs to

Sporadic error after moving from kafka receiver to kafka direct stream

2015-10-21 Thread Conor Fennell
Hi, Firstly want to say a big thanks to Cody for contributing the kafka direct stream. I have been using the receiver based approach for months but the direct stream is a much better solution for my use case. The job in question is now ported over to the direct stream doing idempotent outputs to

Re: Streaming: updating broadcast variables

2015-07-06 Thread Conor Fennell
Hi James, The code below shows one way how you can update the broadcast variable on the executors: // ... events stream setup var startTime = new Date().getTime() var hashMap = HashMap("1" -> ("1", 1), "2" -> ("2", 2)) var hashMapBroadcast = stream.context.sparkContext.broadcas

Re: Driver memory leak?

2015-04-29 Thread Conor Fennell
The memory leak could be related to this defect that was resolved in Spark 1.2.2 and 1.3.0. It also was a HashMap causing the issue. -Conor On Wed, Apr 29, 2015 at 12:01 PM, Sean Owen wrote: > Please use user@, not dev@ > > This message does

Re: Shuffle files not cleaned up (Spark 1.2.1)

2015-04-21 Thread Conor Fennell
Hi, We set the spark.cleaner.ttl to some reasonable time and also set spark.streaming.unpersist=true. Those together cleaned up the shuffle files for us. -Conor On Tue, Apr 21, 2015 at 8:18 AM, N B wrote: > We already do have a cron job in place to clean just the shuffle files. > However,

Re: Streaming problems running 24x7

2015-04-21 Thread Conor Fennell
Hi, If the slow memory increase is in the driver, it could be related to this: https://issues.apache.org/jira/browse/SPARK-5967 *"After some hours disk space is being consumed. There are a lot of* *directories with name like "/tmp/spark-e3505437-f509-4b5b-92d2-ae2559badb3c"* Spark doesn't auto

Re: Spark-1.2.2-bin-hadoop2.4.tgz missing

2015-04-20 Thread Conor Fennell
I looking for that build too. -Conor On Mon, Apr 20, 2015 at 9:18 AM, Marius Soutier wrote: > Same problem here... > > > On 20.04.2015, at 09:59, Zsolt Tóth wrote: > > > > Hi all, > > > > it looks like the 1.2.2 pre-built version for hadoop2.4 is not available > on the mirror sites. Am I missi

Spark streaming job throwing ClassNotFound exception when recovering from checkpointing

2015-02-10 Thread Conor Fennell
I am getting the following error when I kill the spark driver and restart the job: 15/02/10 17:31:05 INFO CheckpointReader: Attempting to load checkpoint from > file > hdfs://hdfs-namenode.vagrant:9000/reporting/SampleJob$0.1.0/checkpoint-142358910.bk > 15/02/10 17:31:05 WARN CheckpointReader:

Spark streaming job throwing ClassNotFound exception when recovering from checkpointing

2015-02-10 Thread Conor Fennell
I am getting the following error when I kill the spark driver and restart the job: 15/02/10 17:31:05 INFO CheckpointReader: Attempting to load checkpoint from > file > hdfs://hdfs-namenode.vagrant:9000/reporting/SampleJob$0.1.0/checkpoint-142358910.bk > 15/02/10 17:31:05 WARN CheckpointReader: