YARN worker out of disk memory

2015-06-26 Thread Tarun Garg
Hi, I am running a spark job over yarn, after 2-3 hr execution workers start dieing and i found that a lot of file at /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1435184713615_0008/blockmgr-333f0ade-2474-43a6-9960-f08a15bcc7b7/3f named temp_shuffle. my job is kakfastream.map

Spark with Spring

2015-03-09 Thread Tarun Garg
Hi, I have a existing web base system which receives the request and process that. This framework uses Spring framework. Now i am planning to separate this business logic and out that in Spark Streaming. I am not sure using Spring framework in streaming is how much valuable. Any suggestion is we

Spark Cluster health check

2014-10-13 Thread Tarun Garg
Hi All,I am doing a POC and written a Job in java. so the architecture has kafka and spark.Now i want a process to notify me whenever system performance is getting down or in crunch of resources, like CPU or RAM. I understand org.apache.spark.streaming.scheduler.StreamingListener, but it has ver

RE: Spark Cluster health check

2014-10-14 Thread Tarun Garg
after nagios.ThanksBest Regards On Tue, Oct 14, 2014 at 3:31 AM, Tarun Garg wrote: Hi All, I am doing a POC and written a Job in java. so the architecture has kafka and spark.Now i want a process to notify me whenever system performance is getting down or in crunch of resources, like CPU or

RE: Spark Cluster health check

2014-10-14 Thread Tarun Garg
10:16 PM, Tarun Garg wrote: Thanks for your response, it is not about infrastructure because I am using EC2 machines and Amazon cloud watch can provide EC2 nodes cpu usage, memory usage details but I need to send notification in situation like processing delay, total delay, Maximum rate is low

Spark Streaming is slower than Spark

2014-10-15 Thread Tarun Garg
Hi, I am evaluating Sparking Streaming with kafka and i found that spark streaming is slower than Spark. It took more time is processing same amount of data as per the Spark Console it can process 2300 Records per seconds. Is my assumption is correct? Spark Streaming has to do a lot of this along

RE: Not Serializable exception when integrating SQL and Spark Streaming

2014-12-24 Thread Tarun Garg
Thanks for the reply. I am testing this with a small amount of data and what is happening is when ever there is data in the Kafka topic Spark does not through Exception otherwise it is. ThanksTarun Date: Wed, 24 Dec 2014 16:23:30 +0800 From: lian.cs@gmail.com To: bigdat...@live.com; user@spa

RE: Not Serializable exception when integrating SQL and Spark Streaming

2014-12-24 Thread Tarun Garg
Thanks I debug this further and below is the cause Caused by: java.io.NotSerializableException: org.apache.spark.sql.api.java.JavaSQLContext- field (class "com.basic.spark.NumberCount$2", name: "val$sqlContext", type: "class org.apache.spark.sql.api.java.JavaSQLContext")- object

RE: Not Serializable exception when integrating SQL and Spark Streaming

2014-12-25 Thread Tarun Garg
because it gets pulled into scope more often due to the implicit conversions its contains. You should try marking the variable that holds the context with the annotation @transient. On Wed, Dec 24, 2014 at 7:04 PM, Tarun Garg wrote: Thanks I debug this further and below is the cause Cau