performance and cluster size required

Bogala, Chandra Reddy Thu, 05 Jun 2014 18:02:17 -0700

Hi,
  I get 300MB compressed file (structured CSV data) in spool directory every 3 
minutes from collector. I have around 6 collectors. I move data from spool dir 
to HDFS directory and add as a hive partition for every 15 minutes data. Then I 
run different aggregation queries and post data to Hbase & Mongo. So the data 
is around 9 GB compressed for every query. For this much data I need to 
evaluate how many cluster nodes required to finish all the aggregation queries 
with in time ( within 15 minutes partition window).What is the best way to 
evaluate this?


                Is there any way I can post aggregated data to both Mongo and 
Hbase ( same query result posting to multiple tables instead of running same 
query multiple times and insert only in single table at a time)?

Thanks,
Chandra

performance and cluster size required

Reply via email to