Re: Kafka Spark Streaming integration : Relationship between DStreams and Tasks

2019-05-12 Thread Sheel Pancholi
Hello Can anyone help me understand this? We work with Receiver based approach and are trying to move to Direct based approach. There is no problem as such moving from former to the latter. I am just trying to understand the inner details bottom up. Please help. Regards Sheel On Mon 13 May, 2019

RE: Kafka + Spark streaming, RDD partitions not processed in parallel

2016-03-14 Thread Mukul Gupta
Koeninger [mailto:c...@koeninger.org] Sent: Monday, March 14, 2016 9:39 PM To: Mukul Gupta Cc: user@spark.apache.org Subject: Re: Kafka + Spark streaming, RDD partitions not processed in parallel So what's happening here is that print() uses take(). Take() will try to satisfy the request using on

Re: Kafka + Spark streaming, RDD partitions not processed in parallel

2016-03-14 Thread Cody Koeninger
e link to repository: > https://github.com/guptamukul/sparktest.git > > > From: Cody Koeninger > Sent: 11 March 2016 23:04 > To: Mukul Gupta > Cc: user@spark.apache.org > Subject: Re: Kafka + Spark streaming, RDD partitions not processed in parallel > > Why are

Re: Kafka + Spark streaming, RDD partitions not processed in parallel

2016-03-13 Thread Mukul Gupta
efore. Following is the link to repository: https://github.com/guptamukul/sparktest.git From: Cody Koeninger Sent: 11 March 2016 23:04 To: Mukul Gupta Cc: user@spark.apache.org Subject: Re: Kafka + Spark streaming, RDD partitions not processed in parallel

Re: Kafka + Spark streaming, RDD partitions not processed in parallel

2016-03-11 Thread Cody Koeninger
t); > > JavaDStream processed = messages.map(new Function String>, String>() { > > @Override > public String call(Tuple2 arg0) throws Exception { > > Thread.sleep(7000); > return arg0._2; > } > }); > > processed.print(90); > > try { > jssc.start(); > jssc

Re: Kafka + Spark streaming, RDD partitions not processed in parallel

2016-03-11 Thread Mukul Gupta
___ From: Cody Koeninger Sent: 11 March 2016 20:42 To: Mukul Gupta Cc: user@spark.apache.org Subject: Re: Kafka + Spark streaming, RDD partitions not processed in parallel Can you post your actual code? On Thu, Mar 10, 2016 at 9:55 PM, Mukul Gupta wrote: > Hi All, I was running the following t

Re: Kafka + Spark streaming, RDD partitions not processed in parallel

2016-03-11 Thread Cody Koeninger
Can you post your actual code? On Thu, Mar 10, 2016 at 9:55 PM, Mukul Gupta wrote: > Hi All, I was running the following test: Setup 9 VM runing spark workers > with 1 spark executor each. 1 VM running kafka and spark master. Spark > version is 1.6.0 Kafka version is 0.9.0.1 Spark is using its ow

Re: Kafka & Spark Streaming

2015-09-25 Thread Neelesh
Thanks. Ill keep an eye on this. Our implementation of the DStream basically accepts a function to compute current offsets. The implementation of the function fetches list of topics from zookeeper once in while. It then adds consumer offsets for newly added topics with the currentOffsets thats in

Re: Kafka & Spark Streaming

2015-09-25 Thread Cody Koeninger
Yes, the partition IDs are the same. As far as the failure / subclassing goes, you may want to keep an eye on https://issues.apache.org/jira/browse/SPARK-10320 , not sure if the suggestions in there will end up going anywhere. On Fri, Sep 25, 2015 at 3:01 PM, Neelesh wrote: > For the 1-1 mappin

Re: Kafka & Spark Streaming

2015-09-25 Thread Neelesh
For the 1-1 mapping case, can I use TaskContext.get().partitionId as an index in to the offset ranges? For the failure case, yes, I'm subclassing of DirectKafkaInputDStream. As for failures, different partitions in the same batch may be talking to different RDBMS servers due to multitenancy - a spa

Re: Kafka & Spark Streaming

2015-09-25 Thread Cody Koeninger
Your success case will work fine, it is a 1-1 mapping as you said. To handle failures in exactly the way you describe, you'd need to subclass or modify DirectKafkaInputDStream and change the way compute() works. Unless you really are going to have very fine-grained failures (why would only a give

Re: Kafka & Spark Streaming

2015-09-25 Thread Neelesh
Thanks Petr, Cody. This is a reasonable place to start for me. What I'm trying to achieve stream.foreachRDD {rdd=> rdd.foreachPartition { p=> Try(myFunc(...)) match { case Sucess(s) => updatewatermark for this partition //of course, expectation is that it will work only if the

Re: Kafka & Spark Streaming

2015-09-25 Thread Cody Koeninger
http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers also has an example of how to close over the offset ranges so they are available on executors. On Fri, Sep 25, 2015 at 12:50 PM, Neelesh wrote: > Hi, >We are using DirectKafkaInputDS

Re: Kafka & Spark Streaming

2015-09-25 Thread Petr Novak
You can have offsetRanges on workers f.e. object Something { var offsetRanges = Array[OffsetRange]() def create[F : ClassTag](stream: InputDStream[Array[Byte]]) (implicit codec: Codec[F]: DStream[F] = { stream transform { rdd => offsetRanges = rdd.asInstanceOf[HasOffsetRanges]

Re: kafka spark streaming with mesos

2015-06-24 Thread Akhil Das
A screenshot of your framework running would also be helpful. How many cores does it have? Did you try running it in coarse grained mode? Try to add these to the conf: sparkConf.set("spark.mesos.coarse", "true") sparkConfset("spark.cores.max", "2") Thanks Best Regards On Wed, Jun 24, 2015 at 1

Re: kafka spark streaming working example

2015-06-18 Thread Akhil Das
.setMaster("local") set it to local[2] or local[*] Thanks Best Regards On Thu, Jun 18, 2015 at 5:59 PM, Bartek Radziszewski wrote: > hi, > I'm trying to run simple kafka spark streaming example over spark-shell: > > sc.stop > import org.apache.spark.SparkConf > import org.apache.spark.SparkCont

Re: Kafka Spark Streaming: ERROR EndpointWriter: dropping message

2015-06-09 Thread Dibyendu Bhattacharya
Hi, Can you please little detail stack trace from your receiver logs and also the consumer settings you used ? I have never tested the consumer with Kafka 0.7.3 ..not sure if Kafka Version is the issue . Have you tried building the consumer using Kafka 0.7.3 ? Regards, Dibyendu On Wed, Jun 10, 2

Re: Kafka Spark Streaming: ERROR EndpointWriter: dropping message

2015-06-09 Thread karma243
Thank you for responding @nsalian. 1. I am trying to replicate this project on my local system. 2. Yes, kafka and brokers on the same host. 3. I am working with kafka 0.7.3 and spark 1.3.1. Kafka 0.7.3 does not has "--describe" command. Thou

Re: Kafka Spark Streaming: ERROR EndpointWriter: dropping message

2015-06-09 Thread nsalian
1) Could you share your command? 2) Are the kafka brokers on the same host? 3) Could you run a --describe on the topic to see if the topic is setup correctly (just to be sure)? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Spark-Streaming-ERROR-

Re: kafka + Spark Streaming with checkPointing fails to start with

2015-05-15 Thread Alexander Krasheninnikov
I had same problem. The solution, I've found was to use: JavaStreamingContext streamingContext = JavaStreamingContext.getOrCreate('checkpoint_dir', contextFactory); ALL configuration should be performed inside contextFactory. If you try to configure streamContext after ::getOrCreate, you recei

Re: kafka + Spark Streaming with checkPointing fails to restart

2015-05-14 Thread Ankur Chauhan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thanks everyone, that was the problem. the "create new streaming context" function was supposed to setup the stream processing as well as the checkpoint directory. I had missed the whole process of checkpoint setup. With that done, everything works as

Re: kafka + Spark Streaming with checkPointing fails to restart

2015-05-13 Thread NB
The data pipeline (DAG) should not be added to the StreamingContext in the case of a recovery scenario. The pipeline metadata is recovered from the checkpoint folder. That is one thing you will need to fix in your code. Also, I don't think the ssc.checkpoint(folder) call should be made in case of t

Re: kafka + Spark Streaming with checkPointing fails to restart

2015-05-13 Thread Cody Koeninger
http://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing shows setting up your stream and calling .checkpoint(checkpointDir) inside the functionToCreateContext. It looks to me like you're setting up your stream and calling checkpoint outside, after getOrCreate. I'm not

Re: Kafka + Spark streaming

2014-12-31 Thread Samya Maiti
Thanks TD. On Wed, Dec 31, 2014 at 7:19 AM, Tathagata Das wrote: > 1. Of course, a single block / partition has many Kafka messages, and > from different Kafka topics interleaved together. The message count is > not related to the block count. Any message received within a > particular block int

Re: Kafka + Spark streaming

2014-12-30 Thread Tathagata Das
1. Of course, a single block / partition has many Kafka messages, and from different Kafka topics interleaved together. The message count is not related to the block count. Any message received within a particular block interval will go in the same block. 2. Yes, the receiver will be started on an

Re: Kafka+Spark-streaming issue: Stream 0 received 0 blocks

2014-12-01 Thread Akhil Das
ing.scala:logInfo(59)) - Removing blocks of > RDD BlockRDD[26] at BlockRDD at ReceiverInputDStream.scala:69 of time > 1417427853000 ms > > INFO [sparkDriver-akka.actor.default-dispatcher-6] > scheduler.ReceiverTracker (Logging.scala:logInfo(59)) - *Stream 0 > received 0 blocks* > >

RE: Kafka+Spark-streaming issue: Stream 0 received 0 blocks

2014-12-01 Thread m.sarosh
...@sigmoidanalytics.com] Sent: Monday, December 01, 2014 3:56 PM To: Sarosh, M. Cc: user@spark.apache.org Subject: Re: Kafka+Spark-streaming issue: Stream 0 received 0 blocks It says: 14/11/27 11:56:05 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster

Re: Kafka+Spark-streaming issue: Stream 0 received 0 blocks

2014-12-01 Thread Akhil Das
It says: 14/11/27 11:56:05 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory A quick guess would be, you are giving the wrong master url. ( spark:// 192.168.88.130:7077 ) Open the w

Re: Kafka Spark Streaming job has an issue when the worker reading from Kafka is killed

2014-10-06 Thread Bharat Venkat
TD has addressed this. It should be available in 1.2.0. https://issues.apache.org/jira/browse/SPARK-3495 On Thu, Oct 2, 2014 at 9:45 AM, maddenpj wrote: > I am seeing this same issue. Bumping for visibility. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.

Re: Kafka Spark Streaming job has an issue when the worker reading from Kafka is killed

2014-10-02 Thread maddenpj
I am seeing this same issue. Bumping for visibility. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Spark-Streaming-job-has-an-issue-when-the-worker-reading-from-Kafka-is-killed-tp12595p15611.html Sent from the Apache Spark User List mailing list arch

Re: Kafka Spark Streaming on Spark 1.1

2014-09-18 Thread JiajiaJing
Yeah, I forgot to build the new jar file for spark 1.1... And now the errors are gone. Thank you very much! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Spark-Streaming-on-Spark-1-1-tp14597p14604.html Sent from the Apache Spark User List mailing l

Re: Kafka Spark Streaming on Spark 1.1

2014-09-18 Thread Tim Smith
What kafka receiver are you using? Did you build a new jar for your app with the latest streaming-kafka code for 1.1? On Thu, Sep 18, 2014 at 11:47 AM, JiajiaJing wrote: > Hi Spark Users, > > We just upgrade our spark version from 1.0 to 1.1. And we are trying to > re-run all the written and tes