Really good list to brush up basics.
Just one input, regarding
* An RDD's processing is scheduled by driver's jobscheduler as a job. At a
given point of time only one job is active. So, if one job is executing the
other jobs are queued.
We can have multiple jobs running in a given applicat
Hi TD,
I want to append my record to a AVRO file which will be later used for querying.
Having a single file is not mandatory for us but then how can we make the
executors append the AVRO data to multiple files.
Thanks,
Sam
On Mar 12, 2015, at 4:09 AM, Tathagata Das
mailto:t...@databricks.com>
Thank Sean.
That was helpful.
Regards,
Sam
On Dec 30, 2014, at 4:12 PM, Sean Owen wrote:
> The DStream model is one RDD of data per interval, yes. foreachRDD
> performs an operation on each RDD in the stream, which means it is
> executed once* for the one RDD in each interval.
>
> * ignoring th