on the first part of your question, what should be the cluster size, it is
totally dependent on
1)what type of queries you are performing
2) what type of cluster you have got as in its shared or dedicated to you
only.
3) compressed file format drives the query performance based if the
compression t
Hi,
I get 300MB compressed file (structured CSV data) in spool directory every 3
minutes from collector. I have around 6 collectors. I move data from spool dir
to HDFS directory and add as a hive partition for every 15 minutes data. Then I
run different aggregation queries and post data to Hba