Re: Spark streaming with Kafka

2020-11-03 Thread Kevin Pis
Hi, this is my Word Count demo. https://github.com/kevincmchen/wordcount MohitAbbi 于2020年11月4日周三 上午3:32写道: > Hi, > > Can you please share the correct versions of JAR files which you used to > resolve the issue. I'm also facing the same issue. > > Thanks > > > > > -- > Sent from: http://apache

Re: Spark streaming with Kafka

2020-11-03 Thread MohitAbbi
Hi, Can you please share the correct versions of JAR files which you used to resolve the issue. I'm also facing the same issue. Thanks -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mai

Re: Spark Streaming with Kafka and Python

2020-08-12 Thread Sean Owen
What supports Python in (Kafka?) 0.8? I don't think Spark ever had a specific Python-Kafka integration. But you have always been able to use it to read DataFrames as in Structured Streaming. Kafka 0.8 support is deprecated (gone in 3.0) but 0.10 means 0.10+ - works with the latest 2.x. What is the

Re: Spark Streaming with Kafka and Python

2020-08-12 Thread German Schiavon
Hey, Maybe I'm missing some restriction with EMR, but have you tried to use Structured Streaming instead of Spark Streaming? https://spark.apache.org/docs/2.4.5/structured-streaming-kafka-integration.html Regards On Wed, 12 Aug 2020 at 14:12, Hamish Whittal wrote: > Hi folks, > > Thought I wo

Spark Streaming with Kafka and Python

2020-08-12 Thread Hamish Whittal
Hi folks, Thought I would ask here because it's somewhat confusing. I'm using Spark 2.4.5 on EMR 5.30.1 with Amazon MSK. The version of Scala used is 2.11.12. I'm using this version of the libraries spark-streaming-kafka-0-8_2.11-2.4.5.jar Now I'm wanting to read from Kafka topics using Python (

Re: Spark streaming with Kafka

2020-07-02 Thread dwgw
Hi I am able to correct the issue. The issue was due to wrong version of JAR file I have used. I have removed the these JAR files and copied correct version of JAR files and the error has gone away. Regards -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ ---

Re: Spark streaming with Kafka

2020-07-02 Thread Jungtaek Lim
I can't reproduce. Could you please make sure you're running spark-shell with official spark 3.0.0 distribution? Please try out changing the directory and using relative path like "./spark-shell". On Thu, Jul 2, 2020 at 9:59 PM dwgw wrote: > Hi > I am trying to stream kafka topic from spark shel

Spark streaming with Kafka

2020-07-02 Thread dwgw
Hi I am trying to stream kafka topic from spark shell but i am getting the following error. I am using *spark 3.0.0/scala 2.12.10* (Java HotSpot(TM) 64-Bit Server VM, *Java 1.8.0_212*) *[spark@hdp-dev ~]$ spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0* Ivy Default Cache s

Spark streaming with Kafka

2020-07-02 Thread dwgw
HiI am trying to stream kafka topic from spark shell but i am getting the following error. I am using *spark 3.0.0/scala 2.12.10* (Java HotSpot(TM) 64-Bit Server VM, *Java 1.8.0_212*)*[spark@hdp-dev ~]$ spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0*Ivy Default Cache set to

Spark streaming with kafka input stuck in (Re-)joing group because of group rebalancing

2018-05-15 Thread JF Chen
When I terminate a spark streaming application and restart it, it always stuck in this step: > > Revoking previously assigned partitions [] for group [mygroup] > (Re-)joing group [mygroup] If I use a new group id, even though it works fine, I may lose the data from the last time I read the previo

Re: spark streaming with kafka source, how many concurrent jobs?

2017-03-21 Thread shyla deshpande
Thanks TD. On Tue, Mar 14, 2017 at 4:37 PM, Tathagata Das wrote: > This setting allows multiple spark jobs generated through multiple > foreachRDD to run concurrently, even if they are across batches. So output > op2 from batch X, can run concurrently with op1 of batch X+1 > This is not safe bec

Re: spark streaming with kafka source, how many concurrent jobs?

2017-03-14 Thread Tathagata Das
This setting allows multiple spark jobs generated through multiple foreachRDD to run concurrently, even if they are across batches. So output op2 from batch X, can run concurrently with op1 of batch X+1 This is not safe because it breaks the checkpointing logic in subtle ways. Note that this was ne

Re: spark streaming with kafka source, how many concurrent jobs?

2017-03-14 Thread shyla deshpande
Thanks TD for the response. Can you please provide more explanation. I am having multiple streams in the spark streaming application (Spark 2.0.2 using DStreams). I know many people using this setting. So your explanation will help a lot of people. Thanks On Fri, Mar 10, 2017 at 6:24 PM, Tathag

Re: spark streaming with kafka source, how many concurrent jobs?

2017-03-10 Thread Tathagata Das
That config I not safe. Please do not use it. On Mar 10, 2017 10:03 AM, "shyla deshpande" wrote: > I have a spark streaming application which processes 3 kafka streams and > has 5 output operations. > > Not sure what should be the setting for spark.streaming.concurrentJobs. > > 1. If the concurr

spark streaming with kafka source, how many concurrent jobs?

2017-03-10 Thread shyla deshpande
I have a spark streaming application which processes 3 kafka streams and has 5 output operations. Not sure what should be the setting for spark.streaming.concurrentJobs. 1. If the concurrentJobs setting is 4 does that mean 2 output operations will be run sequentially? 2. If I had 6 cores what wo

Re: Spark Streaming with Kafka

2016-12-12 Thread Anton Okolnychyi
thanks for all your replies, now I have a complete picture. 2016-12-12 16:49 GMT+01:00 Cody Koeninger : > http://spark.apache.org/docs/latest/streaming-kafka-0-10- > integration.html#creating-a-direct-stream > > Use a separate group id for each stream, like the docs say. > > If you're doing mul

Re: Spark Streaming with Kafka

2016-12-12 Thread Cody Koeninger
http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#creating-a-direct-stream Use a separate group id for each stream, like the docs say. If you're doing multiple output operations, and aren't caching, spark is going to read from kafka again each time, and if some of those re

Re: Spark Streaming with Kafka

2016-12-11 Thread Oleksii Dukhno
Hi Anton, What is the command you run your spark app with? Why not working with data instead of stream on your second stage operation? Can you provide logs with the issue? ConcurrentModificationException is not a spark issue, it means that you use the same Kafka consumer instance from more than o

Re: Spark Streaming with Kafka

2016-12-11 Thread Anton Okolnychyi
sorry, I forgot to mention that I was using Spark 2.0.2, Kafka 0.10, and nothing custom. I will try restate the initial question. Let's consider an example. 1. I create a stream and subscribe to a certain topic. val stream = KafkaUtils.createDirectStream(...) 2. I extract the actual data f

Re: Spark Streaming with Kafka

2016-12-11 Thread Timur Shenkao
Hi, Usual general questions are: -- what is your Spark version? -- what is your Kafka version? -- do you use "standard" Kafka consumer or try to implement something custom (your own multi-threaded consumer)? The freshest docs https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.ht

Spark Streaming with Kafka

2016-12-11 Thread Anton Okolnychyi
Hi, I am experimenting with Spark Streaming and Kafka. I will appreciate if someone can say whether the following assumption is correct. If I have multiple computations (each with its own output) on one stream (created as KafkaUtils.createDirectStream), then there is a chance to have ConcurrentMo

Re: Spark Streaming with Kafka

2016-05-24 Thread Rasika Pohankar
/. Regards, Rasika. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-Kafka-tp21222p27014.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread Cody Koeninger
If by smaller block interval you mean the value in seconds passed to the streaming context constructor, no. You'll still get everything from the starting offset until now in the first batch. On Thu, Feb 18, 2016 at 10:02 AM, praveen S wrote: > Sorry.. Rephrasing : > Can this issue be resolved b

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread praveen S
Sorry.. Rephrasing : Can this issue be resolved by having a smaller block interval? Regards, Praveen On 18 Feb 2016 21:30, "praveen S" wrote: > Can having a smaller block interval only resolve this? > > Regards, > Praveen > On 18 Feb 2016 21:13, "Cody Koeninger" wrote: > >> Backpressure won't h

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread praveen S
Can having a smaller block interval only resolve this? Regards, Praveen On 18 Feb 2016 21:13, "Cody Koeninger" wrote: > Backpressure won't help you with the first batch, you'd need > spark.streaming.kafka.maxRatePerPartition > for that > > On Thu, Feb 18, 2016 at 9:40 AM, praveen S wrote: > >>

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread Cody Koeninger
Backpressure won't help you with the first batch, you'd need spark.streaming.kafka.maxRatePerPartition for that On Thu, Feb 18, 2016 at 9:40 AM, praveen S wrote: > Have a look at > > spark.streaming.backpressure.enabled > Property > > Regards, > Praveen > On 18 Feb 2016 00:13, "Abhishek Anand"

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread praveen S
Have a look at spark.streaming.backpressure.enabled Property Regards, Praveen On 18 Feb 2016 00:13, "Abhishek Anand" wrote: > I have a spark streaming application running in production. I am trying to > find a solution for a particular use case when my application has a > downtime of say 5 hour

Re: Spark Streaming with Kafka DirectStream

2016-02-17 Thread Cody Koeninger
You can print whatever you want wherever you want, it's just a question of whether it's going to show up on the driver or the various executors logs On Wed, Feb 17, 2016 at 5:50 AM, Cyril Scetbon wrote: > I don't think we can print an integer value in a spark streaming process > As opposed to a

Re: Spark Streaming with Kafka DirectStream

2016-02-17 Thread Cyril Scetbon
I don't think we can print an integer value in a spark streaming process As opposed to a spark job. I think I can print the content of an rdd but not debug messages. Am I wrong ? Cyril Scetbon > On Feb 17, 2016, at 12:51 AM, ayan guha wrote: > > Hi > > You can always use RDD properties, whi

Spark Streaming with Kafka Use Case

2016-02-17 Thread Abhishek Anand
I have a spark streaming application running in production. I am trying to find a solution for a particular use case when my application has a downtime of say 5 hours and is restarted. Now, when I start my streaming application after 5 hours there would be considerable amount of data then in the Ka

Re: Spark Streaming with Kafka Use Case

2016-02-17 Thread Cody Koeninger
Just use a kafka rdd in a batch job or two, then start your streaming job. On Wed, Feb 17, 2016 at 12:57 AM, Abhishek Anand wrote: > I have a spark streaming application running in production. I am trying to > find a solution for a particular use case when my application has a > downtime of say

Re: Spark Streaming with Kafka DirectStream

2016-02-16 Thread ayan guha
Hi You can always use RDD properties, which already has partition information. https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/performance_optimization/how_many_partitions_does_an_rdd_have.html On Wed, Feb 17, 2016 at 2:36 PM, Cyril Scetbon wrote: > Your understanding i

Re: Spark Streaming with Kafka DirectStream

2016-02-16 Thread Cyril Scetbon
Your understanding is the right one (having re-read the documentation). Still wondering how I can verify that 5 partitions have been created. My job is reading from a topic in Kafka that has 5 partitions and sends the data to E/S. I can see that when there is one task to read from Kafka there ar

Re: Spark Streaming with Kafka DirectStream

2016-02-16 Thread ayan guha
I have a slightly different understanding. Direct stream generates 1 RDD per batch, however, number of partitions in that RDD = number of partitions in kafka topic. On Wed, Feb 17, 2016 at 12:18 PM, Cyril Scetbon wrote: > Hi guys, > > I'm making some tests with Spark and Kafka using a Python sc

Spark Streaming with Kafka DirectStream

2016-02-16 Thread Cyril Scetbon
Hi guys, I'm making some tests with Spark and Kafka using a Python script. I use the second method that doesn't need any receiver (Direct Approach). It should adapt the number of RDDs to the number of partitions in the topic. I'm trying to verify it. What's the easiest way to verify it ? I also

Re: Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-12 Thread p pathiyil
Thanks Sebastian. I was indeed trying out FAIR scheduling with a high value for concurrentJobs today. It does improve the latency seen by the non-hot partitions, even if it does not provide complete isolation. So it might be an acceptable middle ground. On 12 Feb 2016 12:18, "Sebastian Piu" wrot

Re: Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-11 Thread Sebastian Piu
Have you tried using fair scheduler and queues On 12 Feb 2016 4:24 a.m., "p pathiyil" wrote: > With this setting, I can see that the next job is being executed before > the previous one is finished. However, the processing of the 'hot' > partition eventually hogs all the concurrent jobs. If there

Re: Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-11 Thread p pathiyil
With this setting, I can see that the next job is being executed before the previous one is finished. However, the processing of the 'hot' partition eventually hogs all the concurrent jobs. If there was a way to restrict jobs to be one per partition, then this setting would provide the per-partitio

RE: Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-11 Thread Diwakar Dhanuskodi
:11/02/2016 19:29 (GMT+05:30) To: user@spark.apache.org Cc: Subject: Spark Streaming with Kafka: Dealing with 'slow' partitions Hi, I am looking at a way to isolate the processing of messages from each Kafka partition within the same driver. Scenario: A DStream is create

Re: Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-11 Thread Cody Koeninger
The real way to fix this is by changing partitioning, so you don't have a hot partition. It would be better to do this at the time you're producing messages, but you can also do it with a shuffle / repartition during consuming. There is a setting to allow another batch to start in parallel, but t

Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-11 Thread p pathiyil
Hi, I am looking at a way to isolate the processing of messages from each Kafka partition within the same driver. Scenario: A DStream is created with the createDirectStream call by passing in a few partitions. Let us say that the streaming context is defined to have a time duration of 2 seconds.

Re: Spark Streaming with Kafka - batch DStreams in memory

2016-02-02 Thread Cody Koeninger
It's possible you could (ab)use updateStateByKey or mapWithState for this. But honestly it's probably a lot more straightforward to just choose a reasonable batch size that gets you a reasonable file size for most of your keys, then use filecrush or something similar to deal with the hdfs small fi

Spark Streaming with Kafka - batch DStreams in memory

2016-02-01 Thread p pathiyil
Hi, Are there any ways to store DStreams / RDD read from Kafka in memory to be processed at a later time ? What we need to do is to read data from Kafka, process it to be keyed by some attribute that is present in the Kafka messages, and write out the data related to each key when we have accumula

Re: Higher Processing times in Spark Streaming with kafka Direct

2015-12-04 Thread u...@moosheimer.com
Am 04.12.2015 um 22:21 schrieb SRK : > > Hi, > > Our processing times in Spark Streaming with kafka Direct approach seems to > have increased considerably with increase in the Site traffic. Would > increasing the number of kafka partitions decrease the processing times? > Any

Higher Processing times in Spark Streaming with kafka Direct

2015-12-04 Thread SRK
Hi, Our processing times in Spark Streaming with kafka Direct approach seems to have increased considerably with increase in the Site traffic. Would increasing the number of kafka partitions decrease the processing times? Any suggestions on tuning to reduce the processing times would be of

Re: spark streaming with kafka reset offset

2015-07-14 Thread Tathagata Das
Of course, exactly once receiving is not same as exactly once. In case of direct kafka stream, the data may actually be pulled multiple time. But even if the data of a batch is pulled twice because of some failure, the final result (that is, transformed data accessed through foreachRDD) will always

Re: spark streaming with kafka reset offset

2015-07-14 Thread Chen Song
Thanks TD. As for 1), if timing is not guaranteed, how does exactly once semantics supported? It feels like exactly once receiving is not necessarily exactly once processing. Chen On Tue, Jul 14, 2015 at 10:16 PM, Tathagata Das wrote: > > > On Tue, Jul 14, 2015 at 6:42 PM, Chen Song wrote: >

Re: spark streaming with kafka reset offset

2015-07-14 Thread Tathagata Das
On Tue, Jul 14, 2015 at 6:42 PM, Chen Song wrote: > Thanks TD and Cody. I saw that. > > 1. By doing that (foreachRDD), does KafkaDStream checkpoints its offsets > on HDFS at the end of each batch interval? > The timing is not guaranteed. > 2. In the code, if I first apply transformations and a

Re: spark streaming with kafka reset offset

2015-07-14 Thread Chen Song
Thanks TD and Cody. I saw that. 1. By doing that (foreachRDD), does KafkaDStream checkpoints its offsets on HDFS at the end of each batch interval? 2. In the code, if I first apply transformations and actions on the directKafkaStream and then use foreachRDD on the original KafkaDStream to commit o

Re: spark streaming with kafka reset offset

2015-07-14 Thread Tathagata Das
Relevant documentation - https://spark.apache.org/docs/latest/streaming-kafka-integration.html, towards the end. directKafkaStream.foreachRDD { rdd => val offsetRanges = rdd.asInstanceOf[HasOffsetRanges] // offsetRanges.length = # of Kafka partitions being consumed ... } On Tue,

Re: spark streaming with kafka reset offset

2015-07-14 Thread Cody Koeninger
You have access to the offset ranges for a given rdd in the stream by typecasting to HasOffsetRanges. You can then store the offsets wherever you need to. On Tue, Jul 14, 2015 at 5:00 PM, Chen Song wrote: > A follow up question. > > When using createDirectStream approach, the offsets are checkp

Re: spark streaming with kafka reset offset

2015-07-14 Thread Chen Song
A follow up question. When using createDirectStream approach, the offsets are checkpointed to HDFS and it is understandable by Spark Streaming job. Is there a way to expose the offsets via a REST api to end users. Or alternatively, is there a way to have offsets committed to Kafka Offset Manager s

Re: spark streaming with kafka reset offset

2015-06-30 Thread Cody Koeninger
You can't use different versions of spark in your application vs your cluster. For the direct stream, it's not 60 partitions per executor, it's 300 partitions, and executors work on them as they are scheduled. Yes, if you have no messages you will get an empty partition. It's up to you whether i

Re: spark streaming with kafka reset offset

2015-06-30 Thread Shushant Arora
Is this 3 is no of parallel consumer threads per receiver , means in total we have 2*3=6 consumer in same consumer group consuming from all 300 partitions. 3 is just parallelism on same receiver and recommendation is to use 1 per receiver since consuming from kafka is not cpu bound rather NIC(netwo

Re: spark streaming with kafka reset offset

2015-06-29 Thread Shushant Arora
1. Here you are basically creating 2 receivers and asking each of them to consume 3 kafka partitions each. - In 1.2 we have high level consumers so how can we restrict no of kafka partitions to consume from? Say I have 300 kafka partitions in kafka topic and as in above I gave 2 receivers and 3 ka

Re: spark streaming with kafka reset offset

2015-06-29 Thread ayan guha
Hi Let me take ashot at your questions. (I am sure people like Cody and TD will correct if I am wrong) 0. This is exact copy from the similar question in mail thread from Akhil D: Since you set local[4] you will have 4 threads for your computation, and since you are having 2 receivers, you are le

Re: spark streaming with kafka reset offset

2015-06-29 Thread Cody Koeninger
3. You need to use your own method, because you need to set up your job. Read the checkpoint documentation. 4. Yes, if you want to checkpoint, you need to specify a url to store the checkpoint at (s3 or hdfs). Yes, for the direct stream checkpoint it's just offsets, not all the messages. On Sun

Re: spark streaming with kafka reset offset

2015-06-28 Thread Shushant Arora
Few doubts : In 1.2 streaming when I use union of streams , my streaming application getting hanged sometimes and nothing gets printed on driver. [Stage 2:> (0 + 2) / 2] Whats is 0+2/2 here signifies. 1.Does no of streams in topicsMap.put("testSparkPartitio

Re: spark streaming with kafka reset offset

2015-06-27 Thread Dibyendu Bhattacharya
Hi, There is another option to try for Receiver Based Low Level Kafka Consumer which is part of Spark-Packages ( http://spark-packages.org/package/dibbhatt/kafka-spark-consumer) . This can be used with WAL as well for end to end zero data loss. This is also Reliable Receiver and Commit offset to

Re: spark streaming with kafka reset offset

2015-06-27 Thread Tathagata Das
In the receiver based approach, If the receiver crashes for any reason (receiver crashed or executor crashed) the receiver should get restarted on another executor and should start reading data from the offset present in the zookeeper. There is some chance of data loss which can alleviated using Wr

Re: spark streaming with kafka reset offset

2015-06-26 Thread Cody Koeninger
Read the spark streaming guide ad the kafka integration guide for a better understanding of how the receiver based stream works. Capacity planning is specific to your environment and what the job is actually doing, youll need to determine it empirically. On Friday, June 26, 2015, Shushant Arora

Re: spark streaming with kafka reset offset

2015-06-26 Thread Shushant Arora
In 1.2 how to handle offset management after stream application starts in each job . I should commit offset after job completion manually? And what is recommended no of consumer threads. Say I have 300 partitions in kafka cluster . Load is ~ 1 million events per second.Each event is of ~500bytes.

Re: spark streaming with kafka reset offset

2015-06-26 Thread Cody Koeninger
The receiver-based kafka createStream in spark 1.2 uses zookeeper to store offsets. If you want finer-grained control over offsets, you can update the values in zookeeper yourself before starting the job. createDirectStream in spark 1.3 is still marked as experimental, and subject to change. Tha

spark streaming with kafka reset offset

2015-06-26 Thread Shushant Arora
I am using spark streaming 1.2. If processing executors get crashed will receiver rest the offset back to last processed offset? If receiver itself got crashed is there a way to reset the offset without restarting streaming application other than smallest or largest. Is spark streaming 1.3 whi

Re: spark streaming with kafka jar missing

2015-06-23 Thread Shushant Arora
Thanks a lot. It worked after keeping all versions to same.1.2.0 On Wed, Jun 24, 2015 at 2:24 AM, Tathagata Das wrote: > Why are you mixing spark versions between streaming and core?? > Your core is 1.2.0 and streaming is 1.4.0. > > On Tue, Jun 23, 2015 at 1:32 PM, Shushant Arora > wrote: > >>

Re: spark streaming with kafka jar missing

2015-06-23 Thread Tathagata Das
Why are you mixing spark versions between streaming and core?? Your core is 1.2.0 and streaming is 1.4.0. On Tue, Jun 23, 2015 at 1:32 PM, Shushant Arora wrote: > It throws exception for WriteAheadLogUtils after excluding core and > streaming jar. > > Exception in thread "main" java.lang.NoClass

Re: spark streaming with kafka jar missing

2015-06-23 Thread Shushant Arora
It throws exception for WriteAheadLogUtils after excluding core and streaming jar. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/util/WriteAheadLogUtils$ at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:84) at org

Re: spark streaming with kafka jar missing

2015-06-23 Thread Tathagata Das
You must not include spark-core and spark-streaming in the assembly. They are already present in the installation and the presence of multiple versions of spark may throw off the classloaders in weird ways. So make the assembly marking the those dependencies as scope=provided. On Tue, Jun 23, 20

spark streaming with kafka jar missing

2015-06-23 Thread Shushant Arora
hi While using spark streaming (1.2) with kafka . I am getting below error and receivers are getting killed but jobs get scheduled at each stream interval. 15/06/23 18:42:35 WARN TaskSetManager: Lost task 0.1 in stage 18.0 (TID 82, ip(XX)): java.io.IOException: Failed to connect to ip(XXX

Re: Spark streaming with kafka

2015-05-29 Thread Akhil Das
Thanks Best Regards On Thu, May 28, 2015 at 7:03 PM, boci wrote: > Hi guys, > > I using spark streaming with kafka... In local machine (start as java > application without using spark-submit) it's work, connect to kafka and do > the job (*). I tried to put into spark docker

Spark streaming with kafka

2015-05-28 Thread boci
Hi guys, I using spark streaming with kafka... In local machine (start as java application without using spark-submit) it's work, connect to kafka and do the job (*). I tried to put into spark docker container (hadoop 2.6, spark 1.3.1, try spark submit wil local[5] and yarn-client too ) bu

Re: Re: spark streaming with kafka

2015-04-15 Thread Akhil Das
gt; *From:* Akhil Das > *Date:* 2015-04-15 19:12 > *To:* Shushant Arora > *CC:* user > *Subject:* Re: spark streaming with kafka > Once you start your streaming application to read from Kafka, it will > launch receivers on the executor nodes. And you can see them on the > st

Re: spark streaming with kafka

2015-04-15 Thread Shushant Arora
ver will run on a single core. > > Thanks > Best Regards > > On Wed, Apr 15, 2015 at 3:46 PM, Shushant Arora > wrote: > >> Hi >> >> I want to understand the flow of spark streaming with kafka. >> >> In spark Streaming is the executor nodes at each run

Re: spark streaming with kafka

2015-04-15 Thread Akhil Das
.) You can say, eah receiver will run on a single core. Thanks Best Regards On Wed, Apr 15, 2015 at 3:46 PM, Shushant Arora wrote: > Hi > > I want to understand the flow of spark streaming with kafka. > > In spark Streaming is the executor nodes at each run of streaming interval &g

spark streaming with kafka

2015-04-15 Thread Shushant Arora
Hi I want to understand the flow of spark streaming with kafka. In spark Streaming is the executor nodes at each run of streaming interval same or At each stream interval cluster manager assigns new executor nodes for processing this batch input. If yes then at each batch interval new executors

Re: Spark streaming with Kafka- couldnt find KafkaUtils

2015-04-07 Thread Felix C
Or you could build an uber jar ( you could google that ) https://eradiating.wordpress.com/2015/02/15/getting-spark-streaming-on-kafka-to-work/ --- Original Message --- From: "Akhil Das" Sent: April 4, 2015 11:52 PM To: "Priya Ch" Cc: user@spark.apache.org, "dev"

Re: Spark streaming with Kafka- couldnt find KafkaUtils

2015-04-04 Thread Akhil Das
How are you submitting the application? Use a standard build tool like maven or sbt to build your project, it will download all the dependency jars, when you submit your application (if you are using spark-submit, then use --jars option to add those jars which are causing classNotFoundException). I

Spark streaming with Kafka- couldnt find KafkaUtils

2015-04-04 Thread Priya Ch
Hi All, I configured Kafka cluster on a single node and I have streaming application which reads data from kafka topic using KafkaUtils. When I execute the code in local mode from the IDE, the application runs fine. But when I submit the same to spark cluster in standalone mode, I end up with

Re: Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-31 Thread Ted Yu
Can you show us the output of DStream#print() if you have it ? Thanks On Tue, Mar 31, 2015 at 2:55 AM, Nicolas Phung wrote: > Hello, > > @Akhil Das I'm trying to use the experimental API > https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/scala/org/apache/spark/examples/s

Re: Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-31 Thread Nicolas Phung
Hello, @Akhil Das I'm trying to use the experimental API https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/DirectKafkaWordCount.scala

Re: Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-30 Thread Ted Yu
Nicolas: See if there was occurrence of the following exception in the log: errs => throw new SparkException( s"Couldn't connect to leader for topic ${part.topic} ${part.partition}: " + errs.mkString("\n")), Cheers On Mon, Mar 30, 2015 at 9:40 AM, Cody Koeninge

Re: Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-30 Thread Cody Koeninger
This line at org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.close( KafkaRDD.scala:158) is the attempt to close the underlying kafka simple consumer. We can add a null pointer check, but the underlying issue of the consumer being null probably indicates a problem earlier. Do you see

Re: Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-30 Thread Akhil Das
Did you try this example? https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/KafkaWordCount.scala I think you need to create a topic set with # partitions to consume. Thanks Best Regards On Mon, Mar 30, 2015 at 9:35 PM, Nicolas Phu

Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-30 Thread Nicolas Phung
Hello, I'm using spark-streaming-kafka 1.3.0 with the new consumer "Approach 2: Direct Approach (No Receivers)" ( http://spark.apache.org/docs/latest/streaming-kafka-integration.html). I'm using the following code snippets : // Create direct kafka stream with brokers and topics val messages = Kaf

Re: Spark Streaming with Kafka

2015-01-21 Thread Dibyendu Bhattacharya
On Wed, Jan 21, 2015 at 7:46 AM, firemonk9 wrote: > Hi, > >I am having similar issues. Have you found any resolution ? > > Thank you > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-Kafka-t

Re: Spark Streaming with Kafka

2015-01-20 Thread firemonk9
Hi, I am having similar issues. Have you found any resolution ? Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-Kafka-tp21222p21276.html Sent from the Apache Spark User List mailing list archive at Nabble.com

RE: Spark Streaming with Kafka

2015-01-19 Thread Shao, Saisai
Alfaia [mailto:e.costaalf...@unibs.it] Sent: Monday, January 19, 2015 1:58 AM To: Rasika Pohankar; user@spark.apache.org Subject: R: Spark Streaming with Kafka I have the same issue. Da: Rasika Pohankar<mailto:rasikapohan...@gmail.com> Inviato: ‎18/‎01/‎2015 1

R: Spark Streaming with Kafka

2015-01-18 Thread Eduardo Alfaia
I have the same issue. - Messaggio originale - Da: "Rasika Pohankar" Inviato: ‎18/‎01/‎2015 18:48 A: "user@spark.apache.org" Oggetto: Spark Streaming with Kafka I am using Spark Streaming to process data received through Kafka. The Spark version is 1.2.0. I have

Spark Streaming with Kafka

2015-01-18 Thread Rasika Pohankar
ck if the problem was in that version. But after upgrading also, it is happening. Is this a known issue? Can someone please help. Thanking you. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-Kafka-tp21222.html Sent from the Apache

Re: Spark Streaming with Kafka is failing with Error

2014-11-18 Thread Tobias Pfeiffer
Hi, do you have some logging backend (log4j, logback) on your classpath? This seems a bit like there is no particular implementation of the abstract `log()` method available. Tobias

Spark Streaming with Kafka is failing with Error

2014-11-18 Thread Sourav Chandra
Hi, While running my spark streaming application built on spark 1.1.0 I am getting below error. *14/11/18 15:35:30 ERROR ReceiverTracker: Deregistered receiver for stream 0: Error starting receiver 0 - java.lang.AbstractMethodError* * at org.apache.spark.Logging$class.log(Logging.scala:52)* * at

Re: Spark Streaming with Kafka, building project with 'sbt assembly' is extremely slow

2014-09-08 Thread Matt Narrell
I came across this: https://github.com/xerial/sbt-pack Until i found this, I was simply using the sbt-assembly plugin (sbt clean assembly) mn On Sep 4, 2014, at 2:46 PM, Aris wrote: > Thanks for answering Daniil - > > I have SBT version 0.13.5, is that an old version? Seems pretty up-to-da

Re: Spark Streaming with Kafka, building project with 'sbt assembly' is extremely slow

2014-09-04 Thread Aris
Thanks for answering Daniil - I have SBT version 0.13.5, is that an old version? Seems pretty up-to-date. It turns out I figured out a way around this entire problem: just use 'sbt package', and when using bin/spark-submit, pass it the "--jars" option and GIVE IT ALL THE JARS from the local iv2 c

Re: Spark Streaming with Kafka, building project with 'sbt assembly' is extremely slow

2014-09-02 Thread Daniil Osipov
What version of sbt are you using? There is a bug in early version of 0.13 that causes assembly to be extremely slow - make sure you're using the latest one. On Fri, Aug 29, 2014 at 1:30 PM, Aris wrote: > Hi folks, > > I am trying to use Kafka with Spark Streaming, and it appears I cannot do >

Spark Streaming with Kafka, building project with 'sbt assembly' is extremely slow

2014-08-29 Thread Aris
Hi folks, I am trying to use Kafka with Spark Streaming, and it appears I cannot do the normal 'sbt package' as I do with other Spark applications, such as Spark alone or Spark with MLlib. I learned I have to build with the sbt-assembly plugin. OK, so here is my build.sbt file for my extremely si

Re: Using Spark Streaming with Kafka 0.7.2

2014-07-29 Thread Andre Schumacher
x27;t find any documentation specifically for building spark streaming. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Streaming-with-Kafka-0-7-2-tp10674.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >

Re: Using Spark Streaming with Kafka 0.7.2

2014-07-25 Thread Tathagata Das
ating trying to build spark streaming myself but I > can't find any documentation specifically for building spark streaming. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Streaming-with-Kafka-0-7-2-tp10674.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >

Using Spark Streaming with Kafka 0.7.2

2014-07-25 Thread maddenpj
streaming myself but I can't find any documentation specifically for building spark streaming. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Streaming-with-Kafka-0-7-2-tp10674.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Streaming with Kafka NoClassDefFoundError

2014-07-13 Thread Tathagata Das
Alsom the reason the spark-streaming-kafka is not included in the spark assembly is that we do not want dependencies of external systems like kafka (which itself probably has a complex dependency tree) to cause conflict with the core spark's functionality and stability. TD On Sun, Jul 13, 2014 a

Re: Spark Streaming with Kafka NoClassDefFoundError

2014-07-13 Thread Tathagata Das
In case you still have issues with duplicate files in uber jar, here is a reference sbt file with assembly plugin that deals with duplicates https://github.com/databricks/training/blob/sparkSummit2014/streaming/scala/build.sbt On Fri, Jul 11, 2014 at 10:06 AM, Bill Jay wrote: > You may try to

  1   2   >