Re: HDFS Sink keeps .tmp files and closes with exception

Nishant Neeraj Sat, 20 Oct 2012 04:55:16 -0700

Alright, I went throught a couple of combinations, none worked without any
flaw. It baffled me why there is no way to get Flume working with HDFS
unless both are from Cloudera distribution. So, later today afternoon, I
launched a fresh Ubuntu Precise (12.04) and started with Cloudera. Here is
the combination that seems to be working in pseudo distributed mode (uses
CDH4):


1. Hadoop 2.0.0-cdh4.1.1: Follow the instruction without skipping from here
-- 
https://ccp.cloudera.com/display/CDH4DOC/Installing+CDH4+on+a+Single+Linux+Node+in+Pseudo-distributed+Mode#InstallingCDH4onaSingleLinuxNodeinPseudo-distributedMode-InstallingCDH4withYARNonaSingleLinuxNodeinPseudodistributedmode

2. Flume 1.2.0-cdh4.1.1: in step #1 you already have gotten Cloudera
apt-repo, so start from here
https://ccp.cloudera.com/display/CDH4DOC/Flume+Installation#FlumeInstallation-InstallingtheFlumeRPMorDebianPackages

Config files goes under /etc/hadoop/conf and /etc/flume-ng/conf

This combination works as expected. So, expect:
1. When you set

hdfs.rollSize = 0
hdfs.rollInterval = 0
hdfs.rollCount = 0

   you get a .tmp file of zero byte in HDFS UI until you kill Flume. So,
you CANNOT aggragate logs from all the app-servers into one file and tail
and watch it in UI.

   It would have been great if Flume understands that when all roll*
setting is zero, means user does not want to roll the file. So, do not
create a .tmp file, keep flushing in the data in the final-file based on
'hdfs.batchSize' setting.

2. The good news is, if I kill Flume, the .tmp extention gets deleted and
the UI show populated file.

So, next is Pig. Lets see how that goes.
Thanks for the responses.
Nishant

Re: HDFS Sink keeps .tmp files and closes with exception

Reply via email to