Re: how spooling directory source identifies the complete file

2014-07-23 Thread SaravanaKumar TR
Business Park, Hook, Hampshire RG27 9UP > D 01256 75 3362 > I welcome VSRE emails. Learn more at http://vsre.info/ > > > -- > *From:* SaravanaKumar TR [mailto:saran0081...@gmail.com] > *Sent:* 23 July 2014 06:38 > *To:* user@flume.apache.org &

RE: how spooling directory source identifies the complete file

2014-07-23 Thread Needham, Guy
06:38 To: user@flume.apache.org Subject: Re: how spooling directory source identifies the complete file Thanks Ashish , I already referred to this info. But I couldn't see any explanation in flume user guide about how flume differentiates between copy-in progress file and fully copied file. On W

Re: how spooling directory source identifies the complete file

2014-07-22 Thread SaravanaKumar TR
Thanks Ashish , I already referred to this info. But I couldn't see any explanation in flume user guide about how flume differentiates between copy-in progress file and fully copied file. On Wed, Jul 23, 2014 at 10:59 AM, Ashish wrote: > This is specified in Flume's User Guide > > "Unlike the

Re: how spooling directory source identifies the complete file

2014-07-22 Thread Ashish
This is specified in Flume's User Guide "Unlike the Exec source, this source is reliable and will not miss data, even if Flume is restarted or killed. In exchange for this reliability, only immutable, uniquely-named files must be dropped into the spooling directory. Flume tries to detect these pro

Re: how spooling directory source identifies the complete file

2014-07-22 Thread SaravanaKumar TR
Hi Jeff, Thanks of your comments.But what I am really looking for is , consider we are copying a file of 1 GB to spool directory , if suppose copy is in progress , how flume recognize that the complete file is copied into the spool directory and the file is ready for processing ? how flume make

Re: how spooling directory source identifies the complete file

2014-07-22 Thread Jeff Lord
I believe the way this works is that flume creates a meta directory to track which file is being read. In the event of a restart of the agent the entire file will be re-read which will create some duplicate events. https://github.com/apache/flume/blob/flume-1.5/flume-ng-core/src/main/java/org/apac

how spooling directory source identifies the complete file

2014-07-22 Thread SaravanaKumar TR
Hi, I am planning to use spooling directory to move logfiles in hdfs sink. I like to know how flume identifies the file we are moving to spool directory is complete file or partial & its move still in progress. if suppose a file is of large size and we started moving it to spooler directory , ho