thanks a lot. This answer sounds perfect for my question.Let me have a try with mv instead of cp.
On Wed, Jul 23, 2014 at 1:16 PM, Needham, Guy <guy.need...@virginmedia.co.uk > wrote: > Hi Saravana, > > Flume will check the size and the time of the last edit to the file when > it starts reading it and when it has finished reading. If the two sets of > values differ between the start and end of the file reading process, Flume > will fail noisily. This means that you must move a fully written file to > the directory or it will not be ingested into your workflow. If you're > running it on a unix system, you can't use a cp command to drop the file > into the directory as cp uses incremental writes whereas mv will move the > file in one go. > > > Regards, > Guy Needham | Data Discovery > Virgin Media | Enterprise Data, Design & Management > Bartley Wood Business Park, Hook, Hampshire RG27 9UP > D 01256 75 3362 > I welcome VSRE emails. Learn more at http://vsre.info/ > > > ------------------------------ > *From:* SaravanaKumar TR [mailto:saran0081...@gmail.com] > *Sent:* 23 July 2014 06:38 > *To:* user@flume.apache.org > *Subject:* Re: how spooling directory source identifies the complete file > > Thanks Ashish , I already referred to this info. > > But I couldn't see any explanation in flume user guide about how flume > differentiates between copy-in progress file and fully copied file. > > > On Wed, Jul 23, 2014 at 10:59 AM, Ashish <paliwalash...@gmail.com> wrote: > >> This is specified in Flume's User Guide >> >> "Unlike the Exec source, this source is reliable and will not miss >> data, even if Flume is restarted or killed. In exchange for this >> reliability, only immutable, uniquely-named files must be dropped into the >> spooling directory. Flume tries to detect these problem conditions and will >> fail loudly if they are violated: >> >> 1. If a file is written to after being placed into the spooling >> directory, Flume will print an error to its log file and stop processing. >> 2. If a file name is reused at a later time, Flume will print an >> error to its log file and stop processing. >> >> To avoid the above issues, it may be useful to add a unique identifier >> (such as a timestamp) to log file names when they are moved into the >> spooling directory." >> >> >> On Wed, Jul 23, 2014 at 10:17 AM, SaravanaKumar TR < >> saran0081...@gmail.com> wrote: >> >>> Hi Jeff, >>> >>> Thanks of your comments.But what I am really looking for is , >>> consider we are copying a file of 1 GB to spool directory , if suppose copy >>> is in progress , how flume recognize that the complete file is copied into >>> the spool directory and the file is ready for processing ? >>> >>> how flume make sure it doesnt start processing the partially copied >>> file. >>> >>> >>> On Tue, Jul 22, 2014 at 11:15 PM, Jeff Lord <jl...@cloudera.com> wrote: >>> >>>> I believe the way this works is that flume creates a meta directory to >>>> track which file is being read. >>>> In the event of a restart of the agent the entire file will be re-read >>>> which will create some duplicate events. >>>> >>>> >>>> https://github.com/apache/flume/blob/flume-1.5/flume-ng-core/src/main/java/org/apache/flume/client/avro/ReliableSpoolingFileEventReader.java#L474 >>>> >>>> >>>> On Tue, Jul 22, 2014 at 6:15 AM, SaravanaKumar TR < >>>> saran0081...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am planning to use spooling directory to move logfiles in hdfs >>>>> sink. >>>>> >>>>> I like to know how flume identifies the file we are moving to spool >>>>> directory is complete file or partial & its move still in progress. >>>>> >>>>> if suppose a file is of large size and we started moving it to >>>>> spooler directory , how flume identifies that the complete file is >>>>> transferred or is still in progress. >>>>> >>>>> Please help me out here. >>>>> >>>>> Thanks, >>>>> saravana >>>>> >>>> >>>> >>> >> >> >> -- >> thanks >> ashish >> >> Blog: http://www.ashishpaliwal.com/blog >> My Photo Galleries: http://www.pbase.com/ashishpaliwal >> > > > -------------------------------------------------------------------- > Save Paper - Do you really need to print this e-mail? > > Visit www.virginmedia.com for more information, and more fun. > > This email and any attachments are or may be confidential and legally > privileged > and are sent solely for the attention of the addressee(s). If you have > received this > email in error, please delete it from your system: its use, disclosure or > copying is > unauthorised. Statements and opinions expressed in this email may not > represent > those of Virgin Media. Any representations or commitments in this email are > subject to contract. > > Registered office: Media House, Bartley Wood Business Park, Hook, > Hampshire, RG27 9UP > Registered in England and Wales with number 2591237 >