I believe the way this works is that flume creates a meta directory to track which file is being read. In the event of a restart of the agent the entire file will be re-read which will create some duplicate events.
https://github.com/apache/flume/blob/flume-1.5/flume-ng-core/src/main/java/org/apache/flume/client/avro/ReliableSpoolingFileEventReader.java#L474 On Tue, Jul 22, 2014 at 6:15 AM, SaravanaKumar TR <saran0081...@gmail.com> wrote: > Hi, > > I am planning to use spooling directory to move logfiles in hdfs sink. > > I like to know how flume identifies the file we are moving to spool > directory is complete file or partial & its move still in progress. > > if suppose a file is of large size and we started moving it to spooler > directory , how flume identifies that the complete file is transferred or > is still in progress. > > Please help me out here. > > Thanks, > saravana >