Also consider setting up a Spark job or similar (Impala, Hive) to
periodically read the Avro files and output in a columnar format (Parquet
or ORC) which would give you small-files compaction (assuming you delete
the source files periodically) and better analytical read performance on
the columnar
Den fre 20 apr. 2018 20:49Nitin Kumar skrev:
> Hi All,
>
> I am using Flume v1.8 in which Flume agent comprises of Kafka Channel &
> HDFS Sink.
> I am able to write data in Avro file on HDFS into a external HIVE table,
> but the problem is whenever Flume gets restarted it closes that file and
> o
Thanks Matt
On Sat, Apr 21, 2018 at 12:43 AM, Matt Sicker wrote:
> It's not a Flume native solution, but an alternative I used in the past
> was Kafka Connect using the HDFS connector plugin. That plugin provides
> configuration regarding how often to roll over Avro files.
>
> On 20 April 2018 a
It's not a Flume native solution, but an alternative I used in the past was
Kafka Connect using the HDFS connector plugin. That plugin provides
configuration regarding how often to roll over Avro files.
On 20 April 2018 at 13:49, Nitin Kumar wrote:
> Hi All,
>
> I am using Flume v1.8 in which Fl