Alan,
The reason I am trying to write to the same file is that i don't want to
persist each entry as a small file to hdfs. It will make hive loading very
inefficient, right? (although i could do file merging in a separate job).
My current thought is that i probably could set up a timer(say 6min) i
You may find summingbird relevant, I'm still investigating it:
https://blog.twitter.com/2013/streaming-mapreduce-with-summingbird
On Tue, Jan 7, 2014 at 11:39 AM, Alan Gates wrote:
> I am not wise enough in the ways of Storm to tell you how you should
> partition data across bolts. However, th
I am not wise enough in the ways of Storm to tell you how you should partition
data across bolts. However, there is no need in Hive for all data for a
partition to be in the same file, only in the same directory. So if each bolt
creates a file for each partition and then all those files are pl
Alan,
the problem is that the data is partitioned by epoch ten hourly, and i want
all data belong to that partition to be written into one file named with
that partition. How can i share the file writer across different bolt?
should I instruct data within the same partition to the same bolt?
Thanks
You shouldn’t need to write each record to a separate file. Each Storm bolt
should be able to write to it’s own file, appending records as it goes. As
long as you only have one writer per file this should be fine. You can then
close the files every 15 minutes (or whatever works for you) and h