Thanks Ashish , I already referred to this info. But I couldn't see any explanation in flume user guide about how flume differentiates between copy-in progress file and fully copied file.
On Wed, Jul 23, 2014 at 10:59 AM, Ashish <paliwalash...@gmail.com> wrote: > This is specified in Flume's User Guide > > "Unlike the Exec source, this source is reliable and will not miss data, > even if Flume is restarted or killed. In exchange for this reliability, > only immutable, uniquely-named files must be dropped into the spooling > directory. Flume tries to detect these problem conditions and will fail > loudly if they are violated: > > 1. If a file is written to after being placed into the spooling > directory, Flume will print an error to its log file and stop processing. > 2. If a file name is reused at a later time, Flume will print an error > to its log file and stop processing. > > To avoid the above issues, it may be useful to add a unique identifier > (such as a timestamp) to log file names when they are moved into the > spooling directory." > > > On Wed, Jul 23, 2014 at 10:17 AM, SaravanaKumar TR <saran0081...@gmail.com > > wrote: > >> Hi Jeff, >> >> Thanks of your comments.But what I am really looking for is , consider >> we are copying a file of 1 GB to spool directory , if suppose copy is in >> progress , how flume recognize that the complete file is copied into the >> spool directory and the file is ready for processing ? >> >> how flume make sure it doesnt start processing the partially copied file. >> >> >> On Tue, Jul 22, 2014 at 11:15 PM, Jeff Lord <jl...@cloudera.com> wrote: >> >>> I believe the way this works is that flume creates a meta directory to >>> track which file is being read. >>> In the event of a restart of the agent the entire file will be re-read >>> which will create some duplicate events. >>> >>> >>> https://github.com/apache/flume/blob/flume-1.5/flume-ng-core/src/main/java/org/apache/flume/client/avro/ReliableSpoolingFileEventReader.java#L474 >>> >>> >>> On Tue, Jul 22, 2014 at 6:15 AM, SaravanaKumar TR < >>> saran0081...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I am planning to use spooling directory to move logfiles in hdfs sink. >>>> >>>> I like to know how flume identifies the file we are moving to spool >>>> directory is complete file or partial & its move still in progress. >>>> >>>> if suppose a file is of large size and we started moving it to spooler >>>> directory , how flume identifies that the complete file is transferred or >>>> is still in progress. >>>> >>>> Please help me out here. >>>> >>>> Thanks, >>>> saravana >>>> >>> >>> >> > > > -- > thanks > ashish > > Blog: http://www.ashishpaliwal.com/blog > My Photo Galleries: http://www.pbase.com/ashishpaliwal >