Nishant, a: if CDH4 was working for you, you could use it with hadoop-2.x or CDH3u5 with hadoop-1.x. b: Looks like your rollSize/rollCount/rollInterval are all 0. Can you increase rollCount to say 1000 or so? If you see here: http://flume.apache.org/FlumeUserGuide.html#hdfs-sink, if you set the roll* configuration params to 0, they would never roll the files. If files are not rolled, they are not closed, and HDFS will show those as 0-sized files. Once the roll happens, HDFS GUI will show you the real file size. You can use any one of the three roll* config parameters to roll the files.
Thanks, Hari -- Hari Shreedharan On Friday, October 19, 2012 at 1:29 PM, Nishant Neeraj wrote: > Thanks for the responses. > > a: Got rid of all the CDH stuffs. (basically, started on a fresh AWS instance) > b: Installed from binary files. > > It DID NOT work. Here is what I observed: > flume-ng version: Flume 1.2.0 > Hadoop: 1.0.4 > > This is what my configuration is: > > agent1.sinks.fileSink1.type = hdfs > agent1.sinks.fileSink1.channel = memChannel1 > agent1.sinks.fileSink1.hdfs.path = hdfs://localhost:54310/flume/agg1/%y-%m-%d > agent1.sinks.fileSink1.hdfs.filePrefix = agg2 > agent1.sinks.fileSink1.hdfs.rollInterval = 0 > agent1.sinks.fileSink1.hdfs.rollSize = 0 > agent1.sinks.fileSink1.hdfs.rollCount = 0 > agent1.sinks.fileSink1.hdfs.fileType = DataStream > agent1.sinks.fileSink1.hdfs.writeFormat = Text > #agent1.sinks.fileSink1.hdfs.batchSize = 10 > > #1: startup error > ----------------------------------- > With new intallation, I start to find this exception on start of Flume (it > does not stop me from adding data to hdfs) > > 2012-10-19 19:48:32,191 (conf-file-poller-0) [INFO - > org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:70)] > Creating instance of sink: fileSink1, type: hdfs > 2012-10-19 19:48:32,296 (conf-file-poller-0) [DEBUG - > org.apache.hadoop.conf.Configuration.<init>(Configuration.java:227)] > java.io.IOException: config() > at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:227) > at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:214) > at > org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:184) > at > org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:236) > at > org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:516) > at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:238) > at org.apache.flume.conf.Configurables.configure(Configurables.java:41) > at > org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.loadSinks > (PropertiesFileConfigurationProvider.java:373) > at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load > (PropertiesFileConfigurationProvider.java:223) > at > org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:123) > at > org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38) > at > org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run > (AbstractFileConfigurationProvider.java:202) > -- snip -- > > #2: the old issue continues > ------------------------------------ > When I start loading source, I see console shows that events gets generated. > But HDFS GUI shows 0KB file with .tmp extention. Adding hdfs.batchSize has no > effect, I would assume this should have flushed the content to the temp file. > But no. I tried with smaller and bigger values of hdfs.batchSize, no effect. > > When I shutdown Flume, I see the data gets purged to the temp file. BUT the > temp file is still holding the .tmp extention. So, basically NO WAY TO HAVE > ONE SINGLE AGGRAGATED FILE of all the logs. If I make the rollSize setting to > positive, things start to work, but forfeits the purpose. > > Evenwith roll non-zero value, the last file stays as .tmp when I close Flume > > #3: Shutdown throws exception > ------------------------------------ > Closing flume ends with this excpetion, (the data in the file looks OK, > though) > > 2012-10-19 20:07:55,543 (hdfs-fileSink1-call-runner-7) [DEBUG - > org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:247)] > Closing hdfs://localhost:54310/flume/agg1/12-10-19/agg2.1350676790623.tmp > 2012-10-19 20:07:55,543 (hdfs-fileSink1-call-runner-7) [WARN - > org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:253)] > failed to close() HDFSWriter for file > (hdfs://localhost:54310/flume/agg1/12-10-19/agg2.1350676790623.tmp). > Exception follows. > java.io.IOException: Filesystem closed > at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264) > at org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3667) > at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) > at org.apache.flume.sink.hdfs.HDFSDataStream.close(HDFSDataStream.java:103) > at org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:250) > at org.apache.flume.sink.hdfs.BucketWriter.access$400(BucketWriter.java:48) > at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:236) > at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:233) > at > org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:125) > at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:233) > at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:747) > at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:744) > -- snip -- > > > Couple of side notes: > > #1: For weird reasons, I did not have to prefix hdfs://localhost:54310 in my > previous config (the one using CDH4 version) and thing were as good as in > this installation except there was not many exceptions. > > #2: I have > java version "1.6.0_24" > OpenJDK Runtime Environment (IcedTea6 1.11.4) (6b24-1.11.4-1ubuntu0.12.04.1) > OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) > > #3: I did not create a special hadoop:hduser this time. Just dumped the file > in $HOME, changed config files: *-site.xml , -env.sh (http://-env.sh), > flume.sh (http://flume.sh). And exported appropriate variables. > > #4. here is what my config files look like: > > <!-- core-site.xml --> > <configuration> > <property> > <name>hadoop.tmp.dir</name> > <value>/home/ubuntu/hadoop/tmp</value> > <description>A base for other temporary directories.</description> > </property> > > <property> > <name>fs.default.name (http://fs.default.name)</name> > <value>hdfs://localhost:54310</value> > </property> > </configuration> > > <!-- hdfs-site.xml --> > <configuration> > <property> > <name>dfs.replication</name> > <value>1</value> > </property> > </configuration> > > <!-- mapred-site.xml --> > > <configuration> > <property> > <name>mapred.job.tracker</name> > <value>localhost:54311</value> > </property> > </configuration> > > #5: /home/ubuntu/hadoop/tmp has chmod 777 (tried 750 as well) > > thanks for your time > - Nishant > > On Fri, Oct 19, 2012 at 4:30 AM, Hari Shreedharan <[email protected] > (mailto:[email protected])> wrote: > > Nishant, > > > > CDH4+ Flume is built against Hadoop-2, and may not work correctly against > > Hadoop-1.x, since Hadoop's interfaces changed in the mean time. You could > > also use Apache Flume-1.2.0 or the upcoming Apache Flume-1.3.0 directly > > against Hadoop-1.x without issues, as they are built against Hadoop-1.x. > > > > > > Thanks, > > Hari > > > > -- > > Hari Shreedharan > > > > > > On Thursday, October 18, 2012 at 1:18 PM, Nishant Neeraj wrote: > > > > > I am working on a POC using > > > > flume-ng version Flume 1.2.0-cdh4.1.1 > > > > Hadoop 1.0.4 > > > > > > > The config looks like this > > > > > > #Flume agent configuration > > > agent1.sources = avroSource1 > > > agent1.sinks = fileSink1 > > > agent1.channels = memChannel1 > > > > > > agent1.sources.avroSource1.type = avro > > > agent1.sources.avroSource1.channels = memChannel1 > > > agent1.sources.avroSource1.bind = 0.0.0.0 > > > agent1.sources.avroSource1.port = 4545 > > > > > > agent1.sources.avroSource1.interceptors = b > > > agent1.sources.avroSource1.interceptors.b.type = > > > org.apache.flume.interceptor.TimestampInterceptor$Builder > > > > > > agent1.sinks.fileSink1.type = hdfs > > > agent1.sinks.fileSink1.channel = memChannel1 > > > agent1.sinks.fileSink1.hdfs.path = /flume/agg1/%y-%m-%d > > > agent1.sinks.fileSink1.hdfs.filePrefix = agg > > > agent1.sinks.fileSink1.hdfs.rollInterval = 0 > > > agent1.sinks.fileSink1.hdfs.rollSize = 0 > > > agent1.sinks.fileSink1.hdfs.rollCount = 0 > > > agent1.sinks.fileSink1.hdfs.fileType = DataStream > > > agent1.sinks.fileSink1.hdfs.writeFormat = Text > > > > > > > > > agent1.channels.memChannel1.type = memory > > > agent1.channels.memChannel1.capacity = 1000 > > > agent1.channels.memChannel1.transactionCapacity = 1000 > > > > > > > > > > > > Basically, I do not want to roll the file at all. I am just wanting to > > > tail and watch the show from Hadoop UI. The problem is it does not work. > > > The console keeps saying, > > > > > > agg.1350590350462.tmp 0 KB 2012-10-18 19:59 > > > > > > Flume console shows events getting pushes. When I stop the flume, I see > > > the file gets populated, but the '.tmp' is still in the file name. And I > > > see this exception on close. > > > > > > 2012-10-18 20:06:49,315 (hdfs-fileSink1-call-runner-8) [DEBUG - > > > org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:254)] > > > Closing /flume/agg1/12-10-18/agg.1350590350462.tmp > > > 2012-10-18 20:06:49,316 (hdfs-fileSink1-call-runner-8) [WARN - > > > org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:260)] > > > failed to close() HDFSWriter for file > > > (/flume/agg1/12-10-18/agg.1350590350462.tmp). Exception follows. > > > java.io.IOException: Filesystem closed > > > at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264) > > > at org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74) > > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3667) > > > at > > > org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) > > > at > > > org.apache.flume.sink.hdfs.HDFSDataStream.close(HDFSDataStream.java:103) > > > at org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:257) > > > at > > > org.apache.flume.sink.hdfs.BucketWriter.access$400(BucketWriter.java:50) > > > at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:243) > > > at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:240) > > > at > > > org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:127) > > > at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:240) > > > at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:748) > > > at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:745) > > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > > > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > > > at java.lang.Thread.run(Thread.java:679) > > > > > > > > > > > > Thanks > > > Nishant > > > > > > > > > > > > > >
