Hi, I get 300MB compressed file (structured CSV data) in spool directory every 3 minutes from collector. I have around 6 collectors. I move data from spool dir to HDFS directory and add as a hive partition for every 15 minutes data. Then I run different aggregation queries and post data to Hbase & Mongo. So the data is around 9 GB compressed for every query. For this much data I need to evaluate how many cluster nodes required to finish all the aggregation queries with in time ( within 15 minutes partition window).What is the best way to evaluate this?
Is there any way I can post aggregated data to both Mongo and Hbase ( same query result posting to multiple tables instead of running same query multiple times and insert only in single table at a time)? Thanks, Chandra