Den fre 20 apr. 2018 20:49Nitin Kumar <nitin.kumar2...@gmail.com> skrev:
> Hi All, > > I am using Flume v1.8 in which Flume agent comprises of Kafka Channel & > HDFS Sink. > I am able to write data in Avro file on HDFS into a external HIVE table, > but the problem is whenever Flume gets restarted it closes that file and > open a new file because of which I can see many small files. (Data is > partition on the basis of date) > > Can't Flume append to existing file to avoid creation of new file? > Hi No, not hdfs-sink at least Also, how can I solve this problem which leads to creation of too many > small files? > We also used the hdfs-sink but because of the high maintenance we went for hbase-sink instead, which also gave us deduplication. The major drawback is that it requires an extra step, an hbase to hdfs job. Your many-small-files problem might be solved with an extra step, e.g oozie job, that would merge smaller files to larger ones. That would also solve the problem with the left over temp-files that flume doesn't clean up in some circumstances /Rickard > Any help would be appreciated. > > -- > > *Regards,Nitin Kumar* >