Parquet appears to have its own API for that. You'll have to look for how it handles Avro. I believe I saw it as a supported serialization type.
On Wed, Jun 3, 2015 at 9:06 AM, Filli Alem <[email protected]> wrote: > Hey Mike, > > > > Thanks for your quick response! > > > > I looked into the parquet + avro solution, it is a possibility for us to > try. > > I still have the same problem though, how can I serialize with parquet? > > > > Thanks > > Alem > > > > *Von:* Mike Thomsen [mailto:[email protected]] > *Gesendet:* Dienstag, 2. Juni 2015 13:04 > *An:* [email protected] > *Betreff:* Re: How to write avro objects to HDFS? > > > > You can take the patch I wrote and apply it to a copy and pasted version > of the HDFS bolt from storm-hdfs. Then you just need to add this to main() > in your topology where "conf" is the topology Config object > > Map<String, Object> hdfsConfig = new HashMap<String, Object>(); > hdfsConfig.put("fs.file.impl", > "org.apache.hadoop.fs.LocalFileSystem"); > hdfsConfig.put("fs.hdfs.impl", > "org.apache.hadoop.hdfs.DistributedFileSystem"); > hdfsConfig.put("io.serializations", > "org.apache.hadoop.io.serializer.JavaSerialization,org.apache.avro.hadoop.io.AvroSerialization"); > conf.put("storm.hdfs.config", hdfsConfig); > > I would caution you to not go this route. HDFS sequence files are really > not a good match for Storm + Avro. You can easily end up with duplicates in > them if you're not careful because processing Avro data is a lot more > CPU-intensive than typical uses of Storm. So you'll want to make sure you > give yourself some extra room in the timeouts and max pending tuples. > > My understand is that Apache Parquet supports Avro and it seems to be a > lot better than HDFS sequence files. It's worth a look before you get deep > into this. > > > > On Tue, Jun 2, 2015 at 5:42 AM, Filli Alem <[email protected]> wrote: > > Hi, > > Im struggeling with writing avro objects to HDFS. Is this possible yet? If > so how? > > Im able to read messages from Kafka and output them to the console, but I > have no idea on how to write them. > > > > I found this commit but it doesn’t seem to be in the code base yet: > > https://patch-diff.githubusercontent.com/raw/apache/storm/pull/347.patch > > > > any help is much appreciated. > > Alem > > > > . > > > * .* > > > . > * .* >
