Re: Writing to a single file from multiple executors

2015-03-12 Thread Tathagata Das
If you use DStream.saveAsHadoopFiles (or equivalent RDD ops) with the appropriate output format (for Avro) then each partition of the RDDs will be written to a different file. However there is probably going to be a large number of small files and you may have to run a separate compaction phase to

Re: Writing to a single file from multiple executors

2015-03-12 Thread Maiti, Samya
Hi TD, I want to append my record to a AVRO file which will be later used for querying. Having a single file is not mandatory for us but then how can we make the executors append the AVRO data to multiple files. Thanks, Sam On Mar 12, 2015, at 4:09 AM, Tathagata Das mailto:t...@databricks.com>

Re: Writing to a single file from multiple executors

2015-03-11 Thread Tathagata Das
Why do you have to write a single file? On Wed, Mar 11, 2015 at 1:00 PM, SamyaMaiti wrote: > Hi Experts, > > I have a scenario, where in I want to write to a avro file from a streaming > job that reads data from kafka. > > But the issue is, as there are multiple executors and when all try to w