Re: how to spark streaming application start working on next batch before completing on previous batch .

2015-12-15 Thread Mukesh Jha
Try setting *spark*.streaming.*concurrent*. *jobs* to number of concurrent jobs you want to run. On 15 Dec 2015 17:35, "ikmal" wrote: > The best practice is to set batch interval lesser than processing time. I'm > sure your application is suffering from constantly increasing of scheduling > delay

Re: how to spark streaming application start working on next batch before completing on previous batch .

2015-12-15 Thread Mukesh Jha
lerance and data loss if that is > set to more than 1. > > > > On Tue, Dec 15, 2015 at 9:19 AM, Mukesh Jha > wrote: > >> Try setting *spark*.streaming.*concurrent*. *jobs* to number of >> concurrent jobs you want to run. >> On 15 Dec 2015 17:35, "ikma

Spark kafka integration issues

2016-09-13 Thread Mukesh Jha
y examples for the same? 3) is there a newer version to consumer from kafka-0.10 & kafka-0.9 clusters -- Thanks & Regards, *Mukesh Jha *

Re: Spark kafka integration issues

2016-09-14 Thread Mukesh Jha
0.10 or higher. A pull request for > documenting it has been merged, but not deployed. > > On Tue, Sep 13, 2016 at 6:46 PM, Mukesh Jha > wrote: > > Hello fellow sparkers, > > > > I'm using spark to consume messages from kafka in a non streaming > fashion. &

Spark driver not reusing HConnection

2016-11-18 Thread Mukesh Jha
27,888] [INFO Driver] RegionSizeCalculator: Calculating region sizes for table "message". -- Thanks & Regards, *Mukesh Jha *

Re: Spark driver not reusing HConnection

2016-11-20 Thread Mukesh Jha
Any ideas folks? On Fri, Nov 18, 2016 at 3:37 PM, Mukesh Jha wrote: > Hi > > I'm accessing multiple regions (~5k) of an HBase table using spark's > newAPIHadoopRDD. But the driver is trying to calculate the region size of > all the regions. > It is not even reusing t

Re: Spark driver not reusing HConnection

2016-11-23 Thread Mukesh Jha
The solution is to disable region size caluculation check. hbase.regionsizecalculator.enable: false On Sun, Nov 20, 2016 at 9:29 PM, Mukesh Jha wrote: > Any ideas folks? > > On Fri, Nov 18, 2016 at 3:37 PM, Mukesh Jha > wrote: > >> Hi >> >> I'm access

Re: Spark driver not reusing HConnection

2016-11-23 Thread Mukesh Jha
Corrosponding HBase bug: https://issues.apache.org/jira/browse/HBASE-12629 On Wed, Nov 23, 2016 at 1:55 PM, Mukesh Jha wrote: > The solution is to disable region size caluculation check. > > hbase.regionsizecalculator.enable: false > > On Sun, Nov 20, 2016 at 9:29 PM, Muke

Re: SPARK-streaming app running 10x slower on YARN vs STANDALONE cluster

2015-01-21 Thread Mukesh Jha
atest/streaming-programming-guide.html#reducing-the-processing-time-of-each-batch > > On Tue, Dec 30, 2014 at 1:43 AM, Mukesh Jha > wrote: > > Thanks Sandy, It was the issue with the no of cores. > > > > Another issue I was facing is that tasks are not getting di

Re: SPARK-streaming app running 10x slower on YARN vs STANDALONE cluster

2015-01-21 Thread Mukesh Jha
ORY_ONLY_SER())); } JavaPairDStream ks = sc.union(kafkaStreams.remove(0), kafkaStreams); On Wed, Jan 21, 2015 at 3:19 PM, Gerard Maas wrote: > Hi Mukesh, > > How are you creating your receivers? Could you post the (relevant) code? > > -kr, Gerard. > > On Wed, Jan 21, 201

Re: Spark streaming app shutting down

2015-02-09 Thread Mukesh Jha
age being read into zookeeper for fault >> tolerance. In your case i think mostly the "inflight data" would be lost if >> you arent using any of the fault tolerance mechanism. >> >> Thanks >> Best Regards >> >> On Wed, Feb 4, 2015 at 5:24 PM, Mu

Cannot access Spark web UI

2015-02-18 Thread Mukesh Jha
e(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Powered by Jetty:// -- Thanks & Regards, *Mukesh Jha *

SparkStreaming failing with exception Could not compute split, block input

2015-02-24 Thread Mukesh Jha
05:32:43 WARN scheduler.TaskSetManager: Lost task 36.1 in stage 451.0 (TID 22515, chsnmphbase19.usdc2.cloud.com): java.lang.Exception: Could not compute split, block input-3-1424842355600 not found at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51) -- Thanks & Regards, *Mukesh Jha *

Re: spark streaming: stderr does not roll

2015-02-24 Thread Mukesh Jha
"1024") >> .set("spark.executor.logs.rolling.maxRetainedFiles", "3") >> >> >> Yet it does not roll and continues to grow. Am I missing something >> obvious? >> >> >> thanks, >> Duc >> >> > > -- Thanks & Regards, *Mukesh Jha *

Re: Cannot access Spark web UI

2015-02-24 Thread Mukesh Jha
ou paste your spark-env.sh >> file and /etc/hosts file. >> >> Thanks >> Best Regards >> >> On Wed, Feb 18, 2015 at 2:06 PM, Mukesh Jha >> wrote: >> >>> Hello Experts, >>> >>> I am running a spark-streaming app inside YAR

Re: SparkStreaming failing with exception Could not compute split, block input

2015-02-25 Thread Mukesh Jha
My application runs fine for ~3/4 hours and then hits this issue. On Wed, Feb 25, 2015 at 11:34 AM, Mukesh Jha wrote: > Hi Experts, > > My Spark Job is failing with below error. > > From the logs I can see that input-3-1424842351600 was added at 5:32:32 > and was never pu

Re: SparkStreaming failing with exception Could not compute split, block input

2015-02-26 Thread Mukesh Jha
On Wed, Feb 25, 2015 at 8:09 PM, Mukesh Jha wrote: > My application runs fine for ~3/4 hours and then hits this issue. > > On Wed, Feb 25, 2015 at 11:34 AM, Mukesh Jha > wrote: > >> Hi Experts, >> >> My Spark Job is failing with below error. >>

Re: SparkStreaming failing with exception Could not compute split, block input

2015-02-27 Thread Mukesh Jha
> Apart from that little more information about your job would be helpful. > > Thanks > Best Regards > > On Wed, Feb 25, 2015 at 11:34 AM, Mukesh Jha > wrote: > >> Hi Experts, >> >> My Spark Job is failing with below error. >> >> From the

Re: SparkStreaming failing with exception Could not compute split, block input

2015-02-27 Thread Mukesh Jha
Also my job is map only so there is no shuffle/reduce phase. On Fri, Feb 27, 2015 at 7:10 PM, Mukesh Jha wrote: > I'm streamin data from kafka topic using kafkautils & doing some > computation and writing records to hbase. > > Storage level is memory-and-disk-ser > On 2

Re: Functions in Spark

2014-11-16 Thread Mukesh Jha
hare some thoughts on this? >> >> Thank You >> > > -- Thanks & Regards, *Mukesh Jha *

Debugging spark java application

2014-11-19 Thread Mukesh Jha
Hello experts, Is there an easy way to debug a spark java application? I'm putting debug logs in the map's function but there aren't any logs on the console. Also can i include my custom jars while launching spark-shell and do my poc there? This might me a naive question but any help here is ap

Lifecycle of RDD in spark-streaming

2014-11-25 Thread Mukesh Jha
ve questions/assumptions. -- Thanks & Regards, *Mukesh Jha *

Re: Lifecycle of RDD in spark-streaming

2014-11-25 Thread Mukesh Jha
Any pointers guys? On Tue, Nov 25, 2014 at 5:32 PM, Mukesh Jha wrote: > Hey Experts, > > I wanted to understand in detail about the lifecycle of rdd(s) in a > streaming app. > > From my current understanding > - rdd gets created out of the realtime input stream. > - Tr

KafkaUtils explicit acks

2014-12-09 Thread Mukesh Jha
fferent node and it will continue to receive data. 2. https://github.com/dibbhatt/kafka-spark-consumer Txz, *Mukesh Jha *

Re: KafkaUtils explicit acks

2014-12-10 Thread Mukesh Jha
Hello Guys, Any insights on this?? If I'm not clear enough my question is how can I use kafka consumer and not loose any data in cases of failures with spark-streaming. On Tue, Dec 9, 2014 at 2:53 PM, Mukesh Jha wrote: > Hello Experts, > > I'm working on a spark app which re

Re: KafkaUtils explicit acks

2014-12-14 Thread Mukesh Jha
Look at the links from: > > https://issues.apache.org/jira/browse/SPARK-3129 > > > > I'm not aware of any doc yet (did I miss something ?) but you can look at > > the ReliableKafkaReceiver's test suite: > > > > > external/kafka/src/test/scala/org/apac

Re: KafkaUtils explicit acks

2014-12-16 Thread Mukesh Jha
oth in executor-driver side, and > many other things should also be taken care J. > > > > Thanks > > Jerry > > > > *From:* mukh@gmail.com [mailto:mukh@gmail.com] *On Behalf Of *Mukesh > Jha > *Sent:* Monday, December 15, 2014 1:31 PM > *To:* Tath

SPARK-streaming app running 10x slower on YARN vs STANDALONE cluster

2014-12-29 Thread Mukesh Jha
mit --master spark://chsnmvproc71vm3.usdc2.oraclecloud.com:7077 --class com.oracle.ci.CmsgK2H /homext/lib/MJ-ci-k2h.jar vm.cloud.com:2181/kafka spark-standalone avro 1 5000 PS: I did go through the spark website and http://www.virdata.com/tuning-spark/, but was out of any luck. -- Cheers, Mukesh Jha

Re: SPARK-streaming app running 10x slower on YARN vs STANDALONE cluster

2014-12-29 Thread Mukesh Jha
rk-submit command, it looks like you're only running with > 2 executors on YARN. Also, how many cores does each machine have? > > -Sandy > > On Mon, Dec 29, 2014 at 4:36 AM, Mukesh Jha > wrote: > >> Hello Experts, >> I'm bench-marking Spark on YARN ( >>

Re: SPARK-streaming app running 10x slower on YARN vs STANDALONE cluster

2014-12-29 Thread Mukesh Jha
And this is with spark version 1.2.0. On Mon, Dec 29, 2014 at 11:43 PM, Mukesh Jha wrote: > Sorry Sandy, The command is just for reference but I can confirm that > there are 4 executors and a driver as shown in the spark UI page. > > Each of these machines is a 8 core box with

Re: SPARK-streaming app running 10x slower on YARN vs STANDALONE cluster

2014-12-29 Thread Mukesh Jha
; wrote: > >> Are you setting --num-executors to 8? >> >> On Mon, Dec 29, 2014 at 10:13 AM, Mukesh Jha >> wrote: >> >>> Sorry Sandy, The command is just for reference but I can confirm that >>> there are 4 executors and a driver as shown in the spark

Re: SPARK-streaming app running 10x slower on YARN vs STANDALONE cluster

2014-12-29 Thread Mukesh Jha
n running in standalone mode, each executor will be able to use all 8 > cores on the box. When running on YARN, each executor will only have > access to 2 cores. So the comparison doesn't seem fair, no? > > -Sandy > > On Mon, Dec 29, 2014 at 10:22 AM, Mukesh Jha > wrote: &g

Re: SPARK-streaming app running 10x slower on YARN vs STANDALONE cluster

2014-12-30 Thread Mukesh Jha
though other executors are idle. I configured *spark.locality.wait=50* instead of the default 3000 ms, which forced the task rebalancing among nodes, let me know if there is a better way to deal with this. On Tue, Dec 30, 2014 at 12:09 AM, Mukesh Jha wrote: > Makes sense, I've also tri

KafkaUtils not consuming all the data from all partitions

2015-01-07 Thread Mukesh Jha
000"); kafkaConf.put("zookeeper.session.timeout.ms", "6000"); kafkaConf.put("zookeeper.connection.timeout.ms", "6000"); kafkaConf.put("zookeeper.sync.time.ms", "2000"); kafkaConf.put("rebalance.backoff.ms", "1"); kafkaConf.put("rebalance.max.retries", "20"); -- Thanks & Regards, *Mukesh Jha *

Re: KafkaUtils not consuming all the data from all partitions

2015-01-07 Thread Mukesh Jha
, wrote: > >> Hi Mukesh, >> >> If my understanding is correct, each Stream only has a single Receiver. >> So, if you have each receiver consuming 9 partitions, you need 10 input >> DStreams to create 10 concurrent receivers: >> >> >> https://spark.ap

SPARKonYARN failing on CDH 5.3.0 : container cannot be fetched because of NumberFormatException

2015-01-08 Thread Mukesh Jha
tion: Invalid ContainerId: container_e01_1420481081140_0006_01_01) -- Thanks & Regards, *Mukesh Jha *

Re: SPARKonYARN failing on CDH 5.3.0 : container cannot be fetched because of NumberFormatException

2015-01-08 Thread Mukesh Jha
On Thu, Jan 8, 2015 at 5:08 PM, Mukesh Jha wrote: > Hi Experts, > > I am running spark inside YARN job. > > The spark-streaming job is running fine in CDH-5.0.0 but after the upgrade > to 5.3.0 it cannot fetch containers with the below errors. Looks like the > container

Re: SPARKonYARN failing on CDH 5.3.0 : container cannot be fetched because of NumberFormatException

2015-01-09 Thread Mukesh Jha
java/org/apache/hadoop/yarn/util/ConverterUtils.java > > > > Is it possible you're still including the old jars on the classpath in > some > > way? > > > > -Sandy > > > > On Thu, Jan 8, 2015 at 3:38 AM, Mukesh Jha > wrote: > >> >