Re: Strange behaviour of different SSCs with same Kafka topic

Tathagata Das Tue, 22 Apr 2014 13:59:27 -0700

As I said before, starting two SSCs in the JVM is not supported, neither in
local mode or nor in cluster mode. You have two choices.
1. run one ssc in one JVM: This will use a single Spark cluster (as it will
use a single SparkContext) for the computation. Therefore they can share
the cluster's resources. To do two different computations, you can one of
the following
(i) If you have to receiver two different streams of data and process them
differently, then create two input DStreams and then do transformation
accordingly.
(ii) If you just have to do two different transformation on the same stream
of data, then you can create one input DStream and do two sets of
transformation on them.
val inputStream = ...
val transformedStream1 = inputstream.map(....)
val transformedStream2 = inputstream.filter(....)


2. If you want the two streaming computations to run on the two different
Spark clusters, then you have run two different JVM processes, each having
one streaming context each.

TD


On Mon, Apr 21, 2014 at 9:17 PM, gaganbm <gagan.mis...@gmail.com> wrote:

> Yes. I am running this in a local mode and the SSCs run on the same JVM.
> So, if I deploy this on a cluster, such behavior would be gone ? Also, is
> there anyway I can start the SSCs on a local machine but on different JVMs?
> I couldn't find anything about this in the documentation.
>
> The inter-mingling of data seems to be gone after I made some of those
> external classes as 'scala objects' and keeping static maps and all. Is
> that a good idea as far as performance is concerned ?
>
> Thanks
>
> Gagan B Mishra
>
>
> On Tue, Apr 22, 2014 at 1:59 AM, Tathagata Das [via Apache Spark User
> List] <[hidden email] 
> <http://user/SendEmail.jtp?type=node&node=4582&i=0>>wrote:
>
>> Are you by any chance starting two StreamingContexts in the same JVM?
>> That could explain a lot of the weird mixing of data that you are seeing.
>> Its not a supported usage scenario to start multiple streamingContexts
>> simultaneously in the same JVM.
>>
>> TD
>>
>>
>> On Thu, Apr 17, 2014 at 10:58 PM, gaganbm <[hidden 
>> email]<http://user/SendEmail.jtp?type=node&node=4556&i=0>
>> > wrote:
>>
>>> It happens with normal data rate, i.e., lets say 20 records per second.
>>>
>>> Apart from that, I am also getting some more strange behavior. Let me
>>> explain.
>>>
>>> I establish two sscs. Start them one after another. In SSCs I get the
>>> streams from Kafka sources, and do some manipulations. Like, adding some
>>> "Record_Name" for example, to each of the incoming records. Now this
>>> Record_Name is different for both the SSCs, and I get this field from some
>>> other class, not relevant to the streams.
>>>
>>> Now, expected behavior should be, all records in SSC1 gets added with
>>> the field RECORD_NAME_1 and all records in SSC2 should get added with the
>>> field RECORD_NAME_2. Both the SSCs have nothing to do with each other as I
>>> believe.
>>>
>>> However, strangely enough, I find many records in SSC1 get added with
>>> RECORD_NAME_2 and vice versa. Is it some kind of serialization issue ?
>>> That, the class which provides this RECORD_NAME gets serialized and is
>>> reconstructed and then some weird thing happens inside ? I am unable to
>>> figure out.
>>>
>>> So, apart from skewed frequency and volume of records in both the
>>> streams, I am getting this inter-mingling of data among the streams.
>>>
>>> Can you help me in how to use some external data to manipulate the RDD
>>> records ?
>>>
>>> Thanks and regards
>>>
>>> Gagan B Mishra
>>>
>>>
>>> *Programmer*
>>> *560034, Bangalore*
>>> *India*
>>>
>>>
>>> On Tue, Apr 15, 2014 at 4:09 AM, Tathagata Das [via Apache Spark User
>>> List] <[hidden email]<http://user/SendEmail.jtp?type=node&node=4434&i=0>
>>> > wrote:
>>>
>>>> Does this happen at low event rate for that topic as well, or only for
>>>> a high volume rate?
>>>>
>>>> TD
>>>>
>>>>
>>>> On Wed, Apr 9, 2014 at 11:24 PM, gaganbm <[hidden 
>>>> email]<http://user/SendEmail.jtp?type=node&node=4238&i=0>
>>>> > wrote:
>>>>
>>>>> I am really at my wits' end here.
>>>>>
>>>>> I have different Streaming contexts, lets say 2, and both listening to
>>>>> same
>>>>> Kafka topics. I establish the KafkaStream by setting different consumer
>>>>> groups to each of them.
>>>>>
>>>>> Ideally, I should be seeing the kafka events in both the streams. But
>>>>> what I
>>>>> am getting is really unpredictable. Only one stream gets a lot of
>>>>> events and
>>>>> the other one almost gets nothing or very less compared to the other.
>>>>> Also
>>>>> the frequency is very skewed. I get a lot of events in one stream
>>>>> continuously, and after some duration I get a few events in the other
>>>>> one.
>>>>>
>>>>> I don't know where I am going wrong. I can see consumer fetcher
>>>>> threads for
>>>>> both the streams that listen to the Kafka topics.
>>>>>
>>>>> I can give further details if needed. Any help will be great.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>>  If you reply to this email, your message will be added to the
>>>> discussion below:
>>>>
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050p4238.html
>>>>  To start a new topic under Apache Spark User List, email [hidden
>>>> email] <http://user/SendEmail.jtp?type=node&node=4434&i=1>
>>>> To unsubscribe from Apache Spark User List, click here.
>>>> NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>>
>>>
>>>
>>> ------------------------------
>>> View this message in context: Re: Strange behaviour of different SSCs
>>> with same Kafka 
>>> topic<http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050p4434.html>
>>>
>>> Sent from the Apache Spark User List mailing list 
>>> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>>>
>>
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the
>> discussion below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050p4556.html
>>  To start a new topic under Apache Spark User List, email [hidden 
>> email]<http://user/SendEmail.jtp?type=node&node=4582&i=1>
>> To unsubscribe from Apache Spark User List, click here.
>> NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
> ------------------------------
> View this message in context: Re: Strange behaviour of different SSCs
> with same Kafka 
> topic<http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050p4582.html>
> Sent from the Apache Spark User List mailing list 
> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>

Re: Strange behaviour of different SSCs with same Kafka topic

Reply via email to