Re: Spark Streaming - Design considerations/Knobs

2015-05-24 Thread Maiti, Samya
Really good list to brush up basics. Just one input, regarding * An RDD's processing is scheduled by driver's jobscheduler as a job. At a given point of time only one job is active. So, if one job is executing the other jobs are queued. We can have multiple jobs running in a given applicat

Re: Writing to a single file from multiple executors

2015-03-12 Thread Maiti, Samya
Hi TD, I want to append my record to a AVRO file which will be later used for querying. Having a single file is not mandatory for us but then how can we make the executors append the AVRO data to multiple files. Thanks, Sam On Mar 12, 2015, at 4:09 AM, Tathagata Das mailto:t...@databricks.com>

Re: Can we say 1 RDD is generated every batch interval?

2014-12-30 Thread Maiti, Samya
Thank Sean. That was helpful. Regards, Sam On Dec 30, 2014, at 4:12 PM, Sean Owen wrote: > The DStream model is one RDD of data per interval, yes. foreachRDD > performs an operation on each RDD in the stream, which means it is > executed once* for the one RDD in each interval. > > * ignoring th