Re: HDFS Sink keeps .tmp files and closes with exception

Nishant Neeraj Fri, 19 Oct 2012 13:30:17 -0700

Thanks for the responses.

a: Got rid of all the CDH stuffs. (basically, started on a fresh AWS
instance)
b: Installed from binary files.


It DID NOT work. Here is what I observed:
flume-ng version: Flume 1.2.0
Hadoop: 1.0.4

This is what my configuration is:

agent1.sinks.fileSink1.type = hdfs
agent1.sinks.fileSink1.channel = memChannel1
agent1.sinks.fileSink1.hdfs.path =
hdfs://localhost:54310/flume/agg1/%y-%m-%d
agent1.sinks.fileSink1.hdfs.filePrefix = agg2
agent1.sinks.fileSink1.hdfs.rollInterval = 0
agent1.sinks.fileSink1.hdfs.rollSize = 0
agent1.sinks.fileSink1.hdfs.rollCount = 0
agent1.sinks.fileSink1.hdfs.fileType = DataStream
agent1.sinks.fileSink1.hdfs.writeFormat = Text
#agent1.sinks.fileSink1.hdfs.batchSize = 10

#1: startup error
-----------------------------------
With new intallation, I start to find this exception on start of Flume (it
does not stop me from adding data to hdfs)

2012-10-19 19:48:32,191 (conf-file-poller-0) [INFO -
org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:70)]
Creating instance of sink: fileSink1, type: hdfs
2012-10-19 19:48:32,296 (conf-file-poller-0) [DEBUG -
org.apache.hadoop.conf.Configuration.<init>(Configuration.java:227)]
java.io.IOException: config()
at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:227)
at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:214)
at
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:184)
at
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:236)
at
org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:516)
at
org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:238)
at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
at
org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.loadSinks
(PropertiesFileConfigurationProvider.java:373)
at
org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load
(PropertiesFileConfigurationProvider.java:223)
at
org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:123)
at
org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
at
org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run
(AbstractFileConfigurationProvider.java:202)
-- snip --
#2: the old issue continues
------------------------------------
When I start loading source, I see console shows that events gets
generated. But HDFS GUI shows 0KB file with .tmp extention. Adding
hdfs.batchSize has no effect, I would assume this should have flushed the
content to the temp file. But no. I tried with smaller and bigger values of
hdfs.batchSize, no effect.

When I shutdown Flume, I see the data gets purged to the temp file. BUT the
temp file is still holding the .tmp extention. So, basically NO WAY TO HAVE
ONE SINGLE AGGRAGATED FILE of all the logs. If I make the rollSize setting
to positive, things start to work, but forfeits the purpose.

Evenwith roll non-zero value, the last file stays as .tmp when I close Flume

#3: Shutdown throws exception
------------------------------------
Closing flume ends with this excpetion, (the data in the file looks OK,
though)

2012-10-19 20:07:55,543 (hdfs-fileSink1-call-runner-7) [DEBUG -
org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:247)]
Closing hdfs://localhost:54310/flume/agg1/12-10-19/agg2.1350676790623.tmp
2012-10-19 20:07:55,543 (hdfs-fileSink1-call-runner-7) [WARN -
org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:253)]
failed to close() HDFSWriter for file
(hdfs://localhost:54310/flume/agg1/12-10-19/agg2.1350676790623.tmp).
Exception follows.
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264)
at org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3667)
at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
at org.apache.flume.sink.hdfs.HDFSDataStream.close(HDFSDataStream.java:103)
at org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:250)
at org.apache.flume.sink.hdfs.BucketWriter.access$400(BucketWriter.java:48)
at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:236)
at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:233)
at
org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:125)
at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:233)
at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:747)
at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:744)
-- snip --

Couple of side notes:

#1: For weird reasons, I did not have to prefix hdfs://localhost:54310 in
my previous config (the one using CDH4 version) and thing were as good as
in this installation except there was not many exceptions.

#2: I have
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.4) (6b24-1.11.4-1ubuntu0.12.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
#3: I did not create a special hadoop:hduser this time. Just dumped the
file in $HOME, changed config files: *-site.xml , -env.sh, flume.sh. And
exported appropriate variables.

#4. here is what my config files look like:

<!-- core-site.xml -->
<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/ubuntu/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
</property>
</configuration>

<!-- hdfs-site.xml -->
<configuration>
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>
</configuration>

<!-- mapred-site.xml  -->

<configuration>
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
</property>
</configuration>

#5: /home/ubuntu/hadoop/tmp has chmod 777 (tried 750 as well)

thanks for your time
- Nishant

On Fri, Oct 19, 2012 at 4:30 AM, Hari Shreedharan <[email protected]
> wrote:

>  Nishant,
>
> CDH4+ Flume is built against Hadoop-2, and may not work correctly against
> Hadoop-1.x, since Hadoop's interfaces changed in the mean time. You could
> also use Apache Flume-1.2.0 or the upcoming Apache Flume-1.3.0 directly
> against Hadoop-1.x without issues, as they are built against Hadoop-1.x.
>
>
> Thanks,
> Hari
>
> --
> Hari Shreedharan
>
> On Thursday, October 18, 2012 at 1:18 PM, Nishant Neeraj wrote:
>
> I am working on a POC using
>
> flume-ng version Flume 1.2.0-cdh4.1.1
> Hadoop 1.0.4
>
> The config looks like this
>
> #Flume agent configuration
> agent1.sources = avroSource1
> agent1.sinks = fileSink1
> agent1.channels = memChannel1
>
> agent1.sources.avroSource1.type = avro
> agent1.sources.avroSource1.channels = memChannel1
> agent1.sources.avroSource1.bind = 0.0.0.0
> agent1.sources.avroSource1.port = 4545
>
> agent1.sources.avroSource1.interceptors = b
> agent1.sources.avroSource1.interceptors.b.type =
> org.apache.flume.interceptor.TimestampInterceptor$Builder
>
> agent1.sinks.fileSink1.type = hdfs
> agent1.sinks.fileSink1.channel = memChannel1
> agent1.sinks.fileSink1.hdfs.path = /flume/agg1/%y-%m-%d
> agent1.sinks.fileSink1.hdfs.filePrefix = agg
> agent1.sinks.fileSink1.hdfs.rollInterval = 0
> agent1.sinks.fileSink1.hdfs.rollSize = 0
> agent1.sinks.fileSink1.hdfs.rollCount = 0
> agent1.sinks.fileSink1.hdfs.fileType = DataStream
> agent1.sinks.fileSink1.hdfs.writeFormat = Text
>
>
> agent1.channels.memChannel1.type = memory
> agent1.channels.memChannel1.capacity = 1000
> agent1.channels.memChannel1.transactionCapacity = 1000
>
>
> Basically, I do not want to roll the file at all. I am just wanting to
> tail and watch the show from Hadoop UI. The problem is it does not work.
> The console keeps saying,
>
> agg.1350590350462.tmp 0 KB    2012-10-18 19:59
>
> Flume console shows events getting pushes. When I stop the flume,  I see
> the file gets populated, but the '.tmp' is still in the file name. And I
> see this exception on close.
>
> 2012-10-18 20:06:49,315 (hdfs-fileSink1-call-runner-8) [DEBUG -
> org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:254)]
> Closing /flume/agg1/12-10-18/agg.1350590350462.tmp
> 2012-10-18 20:06:49,316 (hdfs-fileSink1-call-runner-8) [WARN -
> org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:260)]
> failed to close() HDFSWriter for file
> (/flume/agg1/12-10-18/agg.1350590350462.tmp). Exception follows.
> java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264)
> at org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74)
>  at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3667)
> at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
>  at
> org.apache.flume.sink.hdfs.HDFSDataStream.close(HDFSDataStream.java:103)
> at org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:257)
>  at
> org.apache.flume.sink.hdfs.BucketWriter.access$400(BucketWriter.java:50)
> at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:243)
>  at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:240)
> at
> org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:127)
>  at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:240)
> at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:748)
>  at
> org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:745)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:679)
>
>
> Thanks
> Nishant
>
>
>

Re: HDFS Sink keeps .tmp files and closes with exception

Reply via email to