Flume 1.3.0 + HDFS Sink + S3N + avro_vent + Hive…?

Matt Wise Wed, 08 May 2013 10:43:01 -0700

We're still working on getting our POC of Flume up and running... right now we 
have log events that pass through our Flume nodes via a Syslog input and are 
happily sent off to ElasticSearch for indexing. We're also sending these events 
to S3, but we're finding that they seem to be unreadable with the avro tools.


> # S3 Output Sink
> agent.sinks.s3.type = hdfs
> agent.sinks.s3.channel = fc1
> agent.sinks.s3.hdfs.path = s3n://XXX:XXX@our_bucket/flume/events/%y-%m-%d/%H
> agent.sinks.s3.hdfs.rollInterval = 600
> agent.sinks.s3.hdfs.rollSize = 0
> agent.sinks.s3.hdfs.rollCount = 10000
> agent.sinks.s3.hdfs.batchSize = 10000
> agent.sinks.s3.hdfs.serializer = avro_event
> agent.sinks.s3.hdfs.fileType = SequenceFile
> agent.sinks.s3.hdfs.timeZone = UTC


When we try to look at the avro-serialized files, we get this error:

> [localhost avro]$ java -jar avro-tools-1.7.4.jar getschema 
> FlumeData.1367857371493
> Exception in thread "main" java.io.IOException: Not a data file.
>         at 
> org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
>         at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
>         at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:89)
>         at 
> org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:48)
>         at org.apache.avro.tool.Main.run(Main.java:80)
>         at org.apache.avro.tool.Main.main(Main.java:69)

At this point we're a bit unclear how we're supposed to use these FlumeData 
files with normal Avro tools?

--Matt

Flume 1.3.0 + HDFS Sink + S3N + avro_vent + Hive…?

Reply via email to