Hi Mike, it makes sense - replication factor is really less then recommended: we test Hadoop on 2 large machines and thus replication is set to 1, but HDFS seems to ignore config and still replicate block 3 times. I was confused about generating small files *before* normal large files, but if Flume has some counter for replication attempts, that explains it.
Thanks. On Thu, Aug 22, 2013 at 1:13 PM, Mike Percy <[email protected]> wrote: > Are you sure your HDFS cluster is configured properly? How big is the > cluster? > > It's complaining that your HDFS blocks are not replicated enough based on > your configured replication factor, and tries to get a sufficiently > replicated pipeline by closing the current file and opening a new one to > write to. Finally it gives up. > > That code is still there on trunk... > > Mike > > Sent from my iPhone > > On Aug 20, 2013, at 3:11 AM, Andrei <[email protected]> wrote: > > Hi, > > I have Flume agent with spool directory as source and HDFS sink. I have > configured sink to roll files only when they reach some (quite large) size > (see full config below). However, when I *restart* Flume, it first > generates ~15 small files (~500 bytes) and only after that starts writing > large file. In Flume logs at the time of generating small files I see > message "Block Under-replication detected. Rotating file". > > From source code I've figured out several things: > > 1. This message is specific to Flume 1.3 and doesn't exist in latest > version. > 2. It comes from BlockWriter.shouldRotate() methid which in its turn calls > HDFSWriter.isUnderReplicated(), and if it returns true, above message is > generated and files is rotated. > > My questions are: why it happens and how do I fix it? > > > Flume 1.3 CDH 4.3 > > flume.config > ----------------- > > agent.sources = my-src > agent.channels = my-ch > agent.sinks = my-sink > > agent.sources.my-src.type = spooldir > agent.sources.my-src.spoolDir = /flume/data > agent.sources.my-src.channels = my-ch > agent.sources.my-src.deletePolicy = immediate > agent.sources.my-src.interceptors = tstamp-int > agent.sources.my-src.interceptors.tstamp-int.type = timestamp > > agent.channels.my-ch.type = file > agent.channels.my-ch.checkpointDir = /flume/checkpoint > agent.channels.my-ch.dataDirs = /flume/channel-data > > agent.sinks.my-sink.type = hdfs > agent.sinks.my-sink.hdfs.path = hdfs://my-hdfs:8020/logs > agent.sinks.my-sink.hdfs.filePrefix = Log > agent.sinks.my-sink.hdfs.batchSize = 10 > agent.sinks.my-sink.hdfs.rollInterval = 3600 > agent.sinks.my-sink.hdfs.rollCount = 0 > agent.sinks.my-sink.hdfs.rollSize = 134217728 > agent.sinks.my-sink.hdfs.fileType = DataStream > agent.sinks.my-sink.channel = my-ch > >
