If you use DStream.saveAsHadoopFiles (or equivalent RDD ops) with the
appropriate output format (for Avro) then each partition of the RDDs will
be written to a different file. However there is probably going to be a
large number of small files and you may have to run a separate compaction
phase to
Hi TD,
I want to append my record to a AVRO file which will be later used for querying.
Having a single file is not mandatory for us but then how can we make the
executors append the AVRO data to multiple files.
Thanks,
Sam
On Mar 12, 2015, at 4:09 AM, Tathagata Das
mailto:t...@databricks.com>
Why do you have to write a single file?
On Wed, Mar 11, 2015 at 1:00 PM, SamyaMaiti
wrote:
> Hi Experts,
>
> I have a scenario, where in I want to write to a avro file from a streaming
> job that reads data from kafka.
>
> But the issue is, as there are multiple executors and when all try to w