RE: how flume identifies a file transfer is complete or not

Soheil Eizadi Sat, 26 Jul 2014 07:02:28 -0700

I am new to Flume but this is how I would solve your problem. You can get the 
Community feedback.


I assume for your use case you can set the value of idleTimeout as each 
transfer results in one HDFS file:

agent.sinks.hdfs1.hdfs.idleTimeout = 1

The default value is zero, you see that Flume will open a file, with temporary 
file name while the file is open and transfer is happening. The name will 
change to the final name when transfer is complete, as defined in your spec:

agent.sinks.hdfs1.hdfs.filePrefix = log

agent.sinks.hdfs1.hdfs.fileSuffix = .s1.avro


You can as part your Exec script check for the final name in HDFS to figure out 
if the transfer is complete.

-Soheil
________________________________
From: Anandkumar Lakshmanan [an...@orzota.com]
Sent: Saturday, July 26, 2014 1:34 AM
To: user@flume.apache.org
Subject: Re: how flume identifies a file transfer is complete or not


Thanks Sharinder for the suggestions.

Let me use spool directory source. Will let you know how it works for me.

But anyone let me know, is there any way to find that the transfer is complete?

Thanks,
Anand.


On 07/26/2014 01:38 PM, Sharninder wrote:
If you really want to add files to HDFS, use the spool directory source which 
is much more reliable. If you do want to use the exec source, no point using 
cat since that's as good as cp'ing the file the HDFS, use tail -f rather.

--
Sharninder



On Sat, Jul 26, 2014 at 9:34 AM, Anandkumar Lakshmanan 
<an...@orzota.com<mailto:an...@orzota.com>> wrote:
Hi Natty,

Thanks for the Reply.

So far I am verifying the transfer is complete or not by checking the file in 
the destination  or as you mentioned only.

Thanks
Anand.

On 07/25/2014 11:22 PM, Jonathan Natkins wrote:
Hi Anand,

What you're doing is a slightly odd way to use Flume. With the exec source, 
Flume will execute that command, and consume the output as events. Often the 
exec source is used to tail -F a file, which allows you to pipe more data to 
the file and ingest additional events. By using cat, Flume will cat the file, 
but then the source will become useless, because the command will have 
finished, and there's no way that I'm aware of to get an agent to start a new 
command. By using tail -F, the command persists, and if you do `ps aux | grep 
flume`, you would see a running tail -F command.

As for figuring out when the transfer is complete, I don't think there's a 
really good way other than checking the file itself, or looking to see if the 
cat command is still running.

Does that help?

Thanks,
Natty


On Thu, Jul 24, 2014 at 2:00 AM, Anandkumar Lakshmanan 
<an...@orzota.com<mailto:an...@orzota.com>> wrote:
Hi,

I am new to flume.

I am doing cat a file using exec source into hdfs.
While running it manually, I am able to see the file transferred completely. 
But still flume in is running state.
How do I find when the complete transfer would be done.

Example:

My flume.conf

myAgent.sources.mySource.type = exec
myAgent.sources.mySource.command = cat /home/haas/file2.txt


And checking the transfer is complete or not, only by typing the following 
command manually by comparing the file size.

hadoop fs -ls /user/flumedata/

Is there a way to know when the transfer is get completed?

Thanks.
Anand

RE: how flume identifies a file transfer is complete or not

Reply via email to