NOTICE: This e-mail message and any attachments are confidential, subject to
copyright and may be privileged. Any unauthorized use, copying or disclosure is
prohibited. If you are not the intended recipient, please delete and contact
the sender immediately. Please consider the environment befor
Hi,
I am planning to use spooling directory to move logfiles in hdfs sink.
I like to know how flume identifies the file we are moving to spool
directory is complete file or partial & its move still in progress.
if suppose a file is of large size and we started moving it to spooler
directory , ho
Ashish/nattty,
The above solution works fine till now.I am planning to move from exec to
spooling directory because of reliability.
I am planning to use spooling directory to move logfiles in hdfs sink.
I like to know how flume identifies the file we are moving to spool
directory is complete fil
I believe the way this works is that flume creates a meta directory to
track which file is being read.
In the event of a restart of the agent the entire file will be re-read
which will create some duplicate events.
https://github.com/apache/flume/blob/flume-1.5/flume-ng-core/src/main/java/org/apac
Hi Jeff,
Thanks of your comments.But what I am really looking for is , consider we
are copying a file of 1 GB to spool directory , if suppose copy is in
progress , how flume recognize that the complete file is copied into the
spool directory and the file is ready for processing ?
how flume make
This is specified in Flume's User Guide
"Unlike the Exec source, this source is reliable and will not miss data,
even if Flume is restarted or killed. In exchange for this reliability,
only immutable, uniquely-named files must be dropped into the spooling
directory. Flume tries to detect these pro
Thanks Ashish , I already referred to this info.
But I couldn't see any explanation in flume user guide about how flume
differentiates between copy-in progress file and fully copied file.
On Wed, Jul 23, 2014 at 10:59 AM, Ashish wrote:
> This is specified in Flume's User Guide
>
> "Unlike the
The best would be to get a hold on a Flume developer. I am not strictly
sure of all the differences between sync/flush/hsync/hflush and the
different hadoop versions. It might be the case that you are only flushing
on the client side. Even if it was a clean strategy, creation+flush is
unlikely to b