Hi,
  I get 300MB compressed file (structured CSV data) in spool directory every 3 
minutes from collector. I have around 6 collectors. I move data from spool dir 
to HDFS directory and add as a hive partition for every 15 minutes data. Then I 
run different aggregation queries and post data to Hbase & Mongo. So the data 
is around 9 GB compressed for every query. For this much data I need to 
evaluate how many cluster nodes required to finish all the aggregation queries 
with in time ( within 15 minutes partition window).What is the best way to 
evaluate this?

                Is there any way I can post aggregated data to both Mongo and 
Hbase ( same query result posting to multiple tables instead of running same 
query multiple times and insert only in single table at a time)?

Thanks,
Chandra

Reply via email to