Hello everyone, I am interested in collecting statistics (mainly amount of time used) from Map Reduce task phases like split, read,spill,aggregate etc in both the map and reduce tasks. I was told to use hive or pig as they are good tools for statistical analysis. I installed hive and am able to query which translates to map reduce jobs in the underlying framework. I however am not sure how to get these statistical data from the map reduce task phases using hive. Can someone please give any hints, like setting a parameter to see the memory usage or time spent in each of these phases. Any help would be appreciated.
Thanking you Yours faithfully Ranjan Banerjee