Re: Append existing Avro file - HDFS Sink

2018-10-12 Thread Mike Percy
Also consider setting up a Spark job or similar (Impala, Hive) to periodically read the Avro files and output in a columnar format (Parquet or ORC) which would give you small-files compaction (assuming you delete the source files periodically) and better analytical read performance on the columnar

Re: Append existing Avro file - HDFS Sink

2018-10-12 Thread Rickard Cardell
Den fre 20 apr. 2018 20:49Nitin Kumar skrev: > Hi All, > > I am using Flume v1.8 in which Flume agent comprises of Kafka Channel & > HDFS Sink. > I am able to write data in Avro file on HDFS into a external HIVE table, > but the problem is whenever Flume gets restarted it closes that file and > o

Re: Append existing Avro file - HDFS Sink

2018-05-02 Thread Nitin Kumar
Thanks Matt On Sat, Apr 21, 2018 at 12:43 AM, Matt Sicker wrote: > It's not a Flume native solution, but an alternative I used in the past > was Kafka Connect using the HDFS connector plugin. That plugin provides > configuration regarding how often to roll over Avro files. > > On 20 April 2018 a

Re: Append existing Avro file - HDFS Sink

2018-04-20 Thread Matt Sicker
It's not a Flume native solution, but an alternative I used in the past was Kafka Connect using the HDFS connector plugin. That plugin provides configuration regarding how often to roll over Avro files. On 20 April 2018 at 13:49, Nitin Kumar wrote: > Hi All, > > I am using Flume v1.8 in which Fl