Re: How to write avro objects to HDFS?

Mike Thomsen Tue, 02 Jun 2015 04:05:36 -0700

You can take the patch I wrote and apply it to a copy and pasted version of
the HDFS bolt from storm-hdfs. Then you just need to add this to main() in
your topology where "conf" is the topology Config object

Map<String, Object> hdfsConfig = new HashMap<String, Object>();
        hdfsConfig.put("fs.file.impl",
"org.apache.hadoop.fs.LocalFileSystem");
        hdfsConfig.put("fs.hdfs.impl",
"org.apache.hadoop.hdfs.DistributedFileSystem");
        hdfsConfig.put("io.serializations",
"org.apache.hadoop.io.serializer.JavaSerialization,org.apache.avro.hadoop.io.AvroSerialization");
        conf.put("storm.hdfs.config", hdfsConfig);

I would caution you to not go this route. HDFS sequence files are really
not a good match for Storm + Avro. You can easily end up with duplicates in
them if you're not careful because processing Avro data is a lot more
CPU-intensive than typical uses of Storm. So you'll want to make sure you
give yourself some extra room in the timeouts and max pending tuples.

My understand is that Apache Parquet supports Avro and it seems to be a lot
better than HDFS sequence files. It's worth a look before you get deep into
this.

On Tue, Jun 2, 2015 at 5:42 AM, Filli Alem <[email protected]> wrote:

>  Hi,
>
> Im struggeling with writing avro objects to HDFS. Is this possible yet? If
> so how?
>
> Im able to read messages from Kafka and output them to the console, but I
> have no idea on how to write them.
>
>
>
> I found this commit but it doesn’t seem to be in the code base yet:
>
> https://patch-diff.githubusercontent.com/raw/apache/storm/pull/347.patch
>
>
>
> any help is much appreciated.
>
> Alem
>
>
>   .
> * .*
>

Re: How to write avro objects to HDFS?

Reply via email to