unsubscribe

2014-07-22 Thread Kartashov, Andy
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment befor

how spooling directory source identifies the complete file

2014-07-22 Thread SaravanaKumar TR
Hi, I am planning to use spooling directory to move logfiles in hdfs sink. I like to know how flume identifies the file we are moving to spool directory is complete file or partial & its move still in progress. if suppose a file is of large size and we started moving it to spooler directory , ho

Re: Flume stops processing event after a while

2014-07-22 Thread SaravanaKumar TR
Ashish/nattty, The above solution works fine till now.I am planning to move from exec to spooling directory because of reliability. I am planning to use spooling directory to move logfiles in hdfs sink. I like to know how flume identifies the file we are moving to spool directory is complete fil

Re: how spooling directory source identifies the complete file

2014-07-22 Thread Jeff Lord
I believe the way this works is that flume creates a meta directory to track which file is being read. In the event of a restart of the agent the entire file will be re-read which will create some duplicate events. https://github.com/apache/flume/blob/flume-1.5/flume-ng-core/src/main/java/org/apac

Re: how spooling directory source identifies the complete file

2014-07-22 Thread SaravanaKumar TR
Hi Jeff, Thanks of your comments.But what I am really looking for is , consider we are copying a file of 1 GB to spool directory , if suppose copy is in progress , how flume recognize that the complete file is copied into the spool directory and the file is ready for processing ? how flume make

Re: how spooling directory source identifies the complete file

2014-07-22 Thread Ashish
This is specified in Flume's User Guide "Unlike the Exec source, this source is reliable and will not miss data, even if Flume is restarted or killed. In exchange for this reliability, only immutable, uniquely-named files must be dropped into the spooling directory. Flume tries to detect these pro

Re: how spooling directory source identifies the complete file

2014-07-22 Thread SaravanaKumar TR
Thanks Ashish , I already referred to this info. But I couldn't see any explanation in flume user guide about how flume differentiates between copy-in progress file and fully copied file. On Wed, Jul 23, 2014 at 10:59 AM, Ashish wrote: > This is specified in Flume's User Guide > > "Unlike the

Re: Skippin those gost darn 0 byte diles

2014-07-22 Thread Bertrand Dechoux
The best would be to get a hold on a Flume developer. I am not strictly sure of all the differences between sync/flush/hsync/hflush and the different hadoop versions. It might be the case that you are only flushing on the client side. Even if it was a clean strategy, creation+flush is unlikely to b