Try setting *spark*.streaming.*concurrent*. *jobs* to number of concurrent
jobs you want to run.
On 15 Dec 2015 17:35, "ikmal" wrote:
> The best practice is to set batch interval lesser than processing time. I'm
> sure your application is suffering from constantly increasing of scheduling
> delay
lerance and data loss if that is
> set to more than 1.
>
>
>
> On Tue, Dec 15, 2015 at 9:19 AM, Mukesh Jha
> wrote:
>
>> Try setting *spark*.streaming.*concurrent*. *jobs* to number of
>> concurrent jobs you want to run.
>> On 15 Dec 2015 17:35, "ikma
y examples for the same?
3) is there a newer version to consumer from kafka-0.10 & kafka-0.9 clusters
--
Thanks & Regards,
*Mukesh Jha *
0.10 or higher. A pull request for
> documenting it has been merged, but not deployed.
>
> On Tue, Sep 13, 2016 at 6:46 PM, Mukesh Jha
> wrote:
> > Hello fellow sparkers,
> >
> > I'm using spark to consume messages from kafka in a non streaming
> fashion.
&
27,888] [INFO Driver] RegionSizeCalculator: Calculating
region sizes for table "message".
--
Thanks & Regards,
*Mukesh Jha *
Any ideas folks?
On Fri, Nov 18, 2016 at 3:37 PM, Mukesh Jha wrote:
> Hi
>
> I'm accessing multiple regions (~5k) of an HBase table using spark's
> newAPIHadoopRDD. But the driver is trying to calculate the region size of
> all the regions.
> It is not even reusing t
The solution is to disable region size caluculation check.
hbase.regionsizecalculator.enable: false
On Sun, Nov 20, 2016 at 9:29 PM, Mukesh Jha wrote:
> Any ideas folks?
>
> On Fri, Nov 18, 2016 at 3:37 PM, Mukesh Jha
> wrote:
>
>> Hi
>>
>> I'm access
Corrosponding HBase bug: https://issues.apache.org/jira/browse/HBASE-12629
On Wed, Nov 23, 2016 at 1:55 PM, Mukesh Jha wrote:
> The solution is to disable region size caluculation check.
>
> hbase.regionsizecalculator.enable: false
>
> On Sun, Nov 20, 2016 at 9:29 PM, Muke
atest/streaming-programming-guide.html#reducing-the-processing-time-of-each-batch
>
> On Tue, Dec 30, 2014 at 1:43 AM, Mukesh Jha
> wrote:
> > Thanks Sandy, It was the issue with the no of cores.
> >
> > Another issue I was facing is that tasks are not getting di
ORY_ONLY_SER()));
}
JavaPairDStream ks = sc.union(kafkaStreams.remove(0),
kafkaStreams);
On Wed, Jan 21, 2015 at 3:19 PM, Gerard Maas wrote:
> Hi Mukesh,
>
> How are you creating your receivers? Could you post the (relevant) code?
>
> -kr, Gerard.
>
> On Wed, Jan 21, 201
age being read into zookeeper for fault
>> tolerance. In your case i think mostly the "inflight data" would be lost if
>> you arent using any of the fault tolerance mechanism.
>>
>> Thanks
>> Best Regards
>>
>> On Wed, Feb 4, 2015 at 5:24 PM, Mu
e(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Powered by Jetty://
--
Thanks & Regards,
*Mukesh Jha *
05:32:43 WARN scheduler.TaskSetManager: Lost task 36.1 in stage
451.0 (TID 22515, chsnmphbase19.usdc2.cloud.com): java.lang.Exception:
Could not compute split, block input-3-1424842355600 not found
at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
--
Thanks & Regards,
*Mukesh Jha *
"1024")
>> .set("spark.executor.logs.rolling.maxRetainedFiles", "3")
>>
>>
>> Yet it does not roll and continues to grow. Am I missing something
>> obvious?
>>
>>
>> thanks,
>> Duc
>>
>>
>
>
--
Thanks & Regards,
*Mukesh Jha *
ou paste your spark-env.sh
>> file and /etc/hosts file.
>>
>> Thanks
>> Best Regards
>>
>> On Wed, Feb 18, 2015 at 2:06 PM, Mukesh Jha
>> wrote:
>>
>>> Hello Experts,
>>>
>>> I am running a spark-streaming app inside YAR
My application runs fine for ~3/4 hours and then hits this issue.
On Wed, Feb 25, 2015 at 11:34 AM, Mukesh Jha
wrote:
> Hi Experts,
>
> My Spark Job is failing with below error.
>
> From the logs I can see that input-3-1424842351600 was added at 5:32:32
> and was never pu
On Wed, Feb 25, 2015 at 8:09 PM, Mukesh Jha wrote:
> My application runs fine for ~3/4 hours and then hits this issue.
>
> On Wed, Feb 25, 2015 at 11:34 AM, Mukesh Jha
> wrote:
>
>> Hi Experts,
>>
>> My Spark Job is failing with below error.
>>
> Apart from that little more information about your job would be helpful.
>
> Thanks
> Best Regards
>
> On Wed, Feb 25, 2015 at 11:34 AM, Mukesh Jha
> wrote:
>
>> Hi Experts,
>>
>> My Spark Job is failing with below error.
>>
>> From the
Also my job is map only so there is no shuffle/reduce phase.
On Fri, Feb 27, 2015 at 7:10 PM, Mukesh Jha wrote:
> I'm streamin data from kafka topic using kafkautils & doing some
> computation and writing records to hbase.
>
> Storage level is memory-and-disk-ser
> On 2
hare some thoughts on this?
>>
>> Thank You
>>
>
>
--
Thanks & Regards,
*Mukesh Jha *
Hello experts,
Is there an easy way to debug a spark java application?
I'm putting debug logs in the map's function but there aren't any logs on
the console.
Also can i include my custom jars while launching spark-shell and do my poc
there?
This might me a naive question but any help here is ap
ve questions/assumptions.
--
Thanks & Regards,
*Mukesh Jha *
Any pointers guys?
On Tue, Nov 25, 2014 at 5:32 PM, Mukesh Jha wrote:
> Hey Experts,
>
> I wanted to understand in detail about the lifecycle of rdd(s) in a
> streaming app.
>
> From my current understanding
> - rdd gets created out of the realtime input stream.
> - Tr
fferent node and it will continue to receive data.
2. https://github.com/dibbhatt/kafka-spark-consumer
Txz,
*Mukesh Jha *
Hello Guys,
Any insights on this??
If I'm not clear enough my question is how can I use kafka consumer and not
loose any data in cases of failures with spark-streaming.
On Tue, Dec 9, 2014 at 2:53 PM, Mukesh Jha wrote:
> Hello Experts,
>
> I'm working on a spark app which re
Look at the links from:
> > https://issues.apache.org/jira/browse/SPARK-3129
> >
> > I'm not aware of any doc yet (did I miss something ?) but you can look at
> > the ReliableKafkaReceiver's test suite:
> >
> >
> external/kafka/src/test/scala/org/apac
oth in executor-driver side, and
> many other things should also be taken care J.
>
>
>
> Thanks
>
> Jerry
>
>
>
> *From:* mukh@gmail.com [mailto:mukh@gmail.com] *On Behalf Of *Mukesh
> Jha
> *Sent:* Monday, December 15, 2014 1:31 PM
> *To:* Tath
mit
--master spark://chsnmvproc71vm3.usdc2.oraclecloud.com:7077
--class com.oracle.ci.CmsgK2H /homext/lib/MJ-ci-k2h.jar
vm.cloud.com:2181/kafka spark-standalone avro 1 5000
PS: I did go through the spark website and
http://www.virdata.com/tuning-spark/, but was out of any luck.
--
Cheers,
Mukesh Jha
rk-submit command, it looks like you're only running with
> 2 executors on YARN. Also, how many cores does each machine have?
>
> -Sandy
>
> On Mon, Dec 29, 2014 at 4:36 AM, Mukesh Jha
> wrote:
>
>> Hello Experts,
>> I'm bench-marking Spark on YARN (
>>
And this is with spark version 1.2.0.
On Mon, Dec 29, 2014 at 11:43 PM, Mukesh Jha
wrote:
> Sorry Sandy, The command is just for reference but I can confirm that
> there are 4 executors and a driver as shown in the spark UI page.
>
> Each of these machines is a 8 core box with
; wrote:
>
>> Are you setting --num-executors to 8?
>>
>> On Mon, Dec 29, 2014 at 10:13 AM, Mukesh Jha
>> wrote:
>>
>>> Sorry Sandy, The command is just for reference but I can confirm that
>>> there are 4 executors and a driver as shown in the spark
n running in standalone mode, each executor will be able to use all 8
> cores on the box. When running on YARN, each executor will only have
> access to 2 cores. So the comparison doesn't seem fair, no?
>
> -Sandy
>
> On Mon, Dec 29, 2014 at 10:22 AM, Mukesh Jha
> wrote:
&g
though other executors are idle.
I configured *spark.locality.wait=50* instead of the default 3000 ms, which
forced the task rebalancing among nodes, let me know if there is a better
way to deal with this.
On Tue, Dec 30, 2014 at 12:09 AM, Mukesh Jha
wrote:
> Makes sense, I've also tri
000");
kafkaConf.put("zookeeper.session.timeout.ms", "6000");
kafkaConf.put("zookeeper.connection.timeout.ms", "6000");
kafkaConf.put("zookeeper.sync.time.ms", "2000");
kafkaConf.put("rebalance.backoff.ms", "1");
kafkaConf.put("rebalance.max.retries", "20");
--
Thanks & Regards,
*Mukesh Jha *
, wrote:
>
>> Hi Mukesh,
>>
>> If my understanding is correct, each Stream only has a single Receiver.
>> So, if you have each receiver consuming 9 partitions, you need 10 input
>> DStreams to create 10 concurrent receivers:
>>
>>
>> https://spark.ap
tion: Invalid ContainerId:
container_e01_1420481081140_0006_01_01)
--
Thanks & Regards,
*Mukesh Jha *
On Thu, Jan 8, 2015 at 5:08 PM, Mukesh Jha wrote:
> Hi Experts,
>
> I am running spark inside YARN job.
>
> The spark-streaming job is running fine in CDH-5.0.0 but after the upgrade
> to 5.3.0 it cannot fetch containers with the below errors. Looks like the
> container
java/org/apache/hadoop/yarn/util/ConverterUtils.java
> >
> > Is it possible you're still including the old jars on the classpath in
> some
> > way?
> >
> > -Sandy
> >
> > On Thu, Jan 8, 2015 at 3:38 AM, Mukesh Jha
> wrote:
> >>
>
38 matches
Mail list logo