Lower Consistency level : Retry

2015-09-28 Thread Samya
Hi Team, Is there any easier way to get a hook to /lower the consistency level for every rety/, In the implementation mentioned in *MultipleRetryPolicy.scala*, we are using the same consistency level. One possible way would be to use custom CassandraConnectionFactory insteed of DefaultConnectionFa

Stop a Dstream computation

2015-09-24 Thread Samya
Hi Team, I have a code piece as follows. try{ someDstream.someaction(...) //Step1 }catch{ case ex:Exception =>{ someDstream.someaction(...) //Step2 } } When I get an exception for current batch, Step2 executes as expected.

RE: Getting parent RDD

2015-09-16 Thread Samya MAITI
m: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: Wednesday, September 16, 2015 12:24 PM To: Samya MAITI Cc: user@spark.apache.org Subject: Re: Getting parent RDD ​How many RDDs are you having in that stream? If its a single RDD then you could do a .foreach and log the message, something lik

Getting parent RDD

2015-09-15 Thread Samya
Hi Team I have the below situation. val ssc = val msgStream = . //SparkKafkaDirectAPI val wordCountPair = TransformStream.transform(msgStream) /wordCountPair.foreachRDD(rdd => try{ //Some action that causes exception }catch { case ex1 : Exception => {

RE: Exception Handling : Spark Streaming

2015-09-11 Thread Samya MAITI
!!! - Sam From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Friday, September 11, 2015 8:05 PM To: Samya MAITI Cc: user Subject: Re: Exception Handling : Spark Streaming Was your intention that exception from rdd.saveToCassandra() be caught ? In that case you can place try / catch around that call

Exception Handling : Spark Streaming

2015-09-11 Thread Samya
Hi Team, I am facing this issue where in I can't figure out why the exception is handled the first time an exception is thrown in the stream processing action, but is ignored the second time. PFB my code base. object Boot extends App { //Load the configuration val config = LoadConfig.get

RE: Maintaining Kafka Direct API Offsets

2015-09-10 Thread Samya
Thanks Ameya. From: ameya [via Apache Spark User List] [mailto:ml-node+s1001560n24650...@n3.nabble.com] Sent: Friday, September 11, 2015 4:12 AM To: Samya MAITI Subject: Re: Maintaining Kafka Direct API Offsets So I added something like this: Runtime.getRuntime().addShutdownHook(new Thread

Re: Maintaining Kafka Direct API Offsets

2015-09-09 Thread Samya
Hi Ameya, Plz suggest, when you say graceful shut-down, what exactly did you handle? Thanks. Thanks, Sam -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Maintaining-Kafka-Direct-API-Offsets-tp24246p24636.html Sent from the Apache Spark User List mailing l

RE: Spark streaming -> cassandra : Fault Tolerance

2015-09-09 Thread Samya MAITI
From: Cody Koeninger [mailto:c...@koeninger.org] Sent: Thursday, September 10, 2015 1:13 AM To: Samya MAITI Cc: user@spark.apache.org Subject: Re: Spark streaming -> cassandra : Fault Tolerance It's been a while since I've looked at the cassandra connector, so I can't give y

Spark streaming -> cassandra : Fault Tolerance

2015-09-09 Thread Samya
Hi Team, I have an sample spark application which reads from Kafka using direct API & then does some transformation & stores to cassandra (using saveToCassandra()). If Cassandra goes down, then application logs NoHostAvailable exception (as expected). But in the mean time the new incoming mes

RE: Relation between threads and executor core

2015-08-26 Thread Samya MAITI
user control? Regards, Sam From: Jem Tucker [mailto:jem.tuc...@gmail.com] Sent: Wednesday, August 26, 2015 2:26 PM To: Samya MAITI ; user@spark.apache.org Subject: Re: Relation between threads and executor core Hi Samya, When submitting an application with spark-submit the cores per executor can

Relation between threads and executor core

2015-08-26 Thread Samya
Hi All, Few basic queries :- 1. Is there a way we can control the number of threads per executor core? 2. Does this parameter “executor-cores” also has say in deciding how many threads to be run? Regards, Sam -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com

Spark-Cassandra-connector

2015-08-20 Thread Samya
(one) connection object per executor, that is shared between tasks ? 2. If the above answer is YES, is there a way to create a connectionPool for each executor, so that multiple task can dump data to cassandra in parallel? Regards, Samya -- View this message in context: http://apache-spark-user-

Re: Spark Streaming - Design considerations/Knobs

2015-05-24 Thread Maiti, Samya
Really good list to brush up basics. Just one input, regarding * An RDD's processing is scheduled by driver's jobscheduler as a job. At a given point of time only one job is active. So, if one job is executing the other jobs are queued. We can have multiple jobs running in a given applicat

Re: Writing to a single file from multiple executors

2015-03-12 Thread Maiti, Samya
Hi TD, I want to append my record to a AVRO file which will be later used for querying. Having a single file is not mandatory for us but then how can we make the executors append the AVRO data to multiple files. Thanks, Sam On Mar 12, 2015, at 4:09 AM, Tathagata Das mailto:t...@databricks.com>

Re: Kafka + Spark streaming

2014-12-31 Thread Samya Maiti
Thanks TD. On Wed, Dec 31, 2014 at 7:19 AM, Tathagata Das wrote: > 1. Of course, a single block / partition has many Kafka messages, and > from different Kafka topics interleaved together. The message count is > not related to the block count. Any message received within a > particular block int

Re: Can we say 1 RDD is generated every batch interval?

2014-12-30 Thread Maiti, Samya
Thank Sean. That was helpful. Regards, Sam On Dec 30, 2014, at 4:12 PM, Sean Owen wrote: > The DStream model is one RDD of data per interval, yes. foreachRDD > performs an operation on each RDD in the stream, which means it is > executed once* for the one RDD in each interval. > > * ignoring th