Re: Append existing Avro file - HDFS Sink

2018-10-12 Thread Rickard Cardell
Den fre 20 apr. 2018 20:49Nitin Kumar skrev: > Hi All, > > I am using Flume v1.8 in which Flume agent comprises of Kafka Channel & > HDFS Sink. > I am able to write data in Avro file on HDFS into a external HIVE table, > but the problem is whenever Flume gets restarted it closes that file and > o

Re: Append existing Avro file - HDFS Sink

2018-10-12 Thread Mike Percy
Also consider setting up a Spark job or similar (Impala, Hive) to periodically read the Avro files and output in a columnar format (Parquet or ORC) which would give you small-files compaction (assuming you delete the source files periodically) and better analytical read performance on the columnar