Re: How to use ManualClock with Spark streaming

2017-04-05 Thread Hemalatha A
Any updates on how can I use ManualClock other than editing the Spark source code? On Wed, Mar 1, 2017 at 10:19 AM, Hemalatha A < hemalatha.amru...@googlemail.com> wrote: > It is certainly possible through a hack. > I was referring to below post where TD says it is possible th

How to use ManualClock with Spark streaming

2017-02-28 Thread Hemalatha A
Hi, I am running streaming application reading data from kafka and performing window operations on it. I have a usecase where all incoming events have a fixed latency of 10s, which means data belonging to minute 10:00:00 will arrive 10s late at 10:00:10. I want to set the spark clock to "Manualc

Re: How does chaining of Windowed Dstreams work?

2016-09-27 Thread Hemalatha A
Hello, Can anyone please answer the below question and help me understand the windowing operations. On Sun, Sep 4, 2016 at 4:42 PM, Hemalatha A < hemalatha.amru...@googlemail.com> wrote: > Hello, > > I have a set of Dstreams on which I'm performing some computation on each

How does chaining of Windowed Dstreams work?

2016-09-04 Thread Hemalatha A
Hello, I have a set of Dstreams on which I'm performing some computation on each Dstreams which is widowed on one other from the base stream based on the order of window intervals. I want to find out the best Strem on which I could window a particular stream on? Suppose, I have a spark Dstream,

Any exceptions during an action doesn't fail the Spark streaming batch in yarn-client mode

2016-08-07 Thread Hemalatha A
Hello, I am seeing multiple exceptions shown in logs during an action, but none of them fails the Spark streaming batch in yarn-client mode, whereas the same exception is thrown in Yarn-cluster mode and the application ends. I am trying to save a Dataframe To cassandra, which results in error du

Re: Fail a batch in Spark Streaming forcefully based on business rules

2016-07-28 Thread Hemalatha A
Another usecase why I need to do this is, If Exception A is caught I should just print it and ignore, but ifException B occurs, I have to end the batch, fail it and stop processing the batch. Is it possible to achieve this?? Any hints on this please. On Wed, Jul 27, 2016 at 10:42 AM, Hemalatha A

Fail a batch in Spark Streaming forcefully based on business rules

2016-07-26 Thread Hemalatha A
Hello, I have a uescase where in, I have to fail certain batches in my streaming batches, based on my application specific business rules. Ex: If in a batch of 2 seconds, I don't receive 100 message, I should fail the batch and move on. How to achieve this behavior? -- Regards Hemalatha

How to resolve Scheduling delay in Spark streaming applications?

2016-05-10 Thread Hemalatha A
Hello, We are facing large Scheduling delay in our Spark streaming application. Not sure how to debug why the delay is happening. We have all the tuning possible on Spark side. Can someone advice how to debug the cause of the delay and some tips for resolving it please? -- Regards Hemalatha

Spark streaming batch time displayed is not current system time but it is processing current messages

2016-04-16 Thread Hemalatha A
Can anyone help me in debugging this issue please. On Thu, Apr 14, 2016 at 12:24 PM, Hemalatha A < hemalatha.amru...@googlemail.com> wrote: > Hi, > > I am facing a problem in Spark streaming. > Time

Spark streaming time displayed is not current system time but it is processing current messages

2016-04-13 Thread Hemalatha A
Hi, I am facing a problem in Spark streaming. The time displayed in Spark streaming console is 4 days prior i.e., April 10th, which is not current system time of the cluster but the job is processing current messages that is pushed right now April 14th. Can anyone please advice what time does S

How Application jar is copied to worker machines?

2016-04-10 Thread Hemalatha A
Hello, I want to know on doing spark-submit, how is the Application jar copied to worker machines? Who does the copying of Jars? Similarly who copies DAG from driver to executors? -- Regards Hemalatha

[no subject]

2016-04-02 Thread Hemalatha A
Hello, As per Spark programming guide, it says "we should have 2-4 partitions for each CPU in your cluster.". In this case how does 1 CPU core process 2-4 partitions at the same time? Link - http://spark.apache.org/docs/latest/programming-guide.html (under Rdd section) Does it do context switchin

Side effects of using var inside a class object in a Rdd

2016-02-15 Thread Hemalatha A
Hello, I want to know what are the cons and performance impacts of using a var inside class object in a Rdd. Here is a example: Animal is a huge class with n number of val type variables (approx >600 variables), but frequently, we will have to update Age(just 1 variable) after some computation.