Hi Chinna,

We envisioned this as something external to the Hive codebase.  It would 
consist of:

* datasets (synthetic such as TPC-H, plus real-world if possible, perhaps 
contributed by a company such as Facebook after sufficient anonymization had 
been applied)

* data loader scripts, plus scripts for other operations such as 
purging/archiving old data

* query scripts with expected results

* configurable test harness for running the various load/query scripts either 
individually or as concurrent mixed workloads; validating results; and 
collecting performance data

* processes for collection of system data such as cluster load, memory usage, 
etc

The idea is that this could be used for testing of changes to either Hive or 
Hadoop.  If we were able to pool resources for a shared cluster, we could run 
through patches and configurations in order to catch regressions or potential 
problems early.

A good home for this might be the new BigTop project:

http://wiki.apache.org/incubator/BigtopProposal

JVS

On Jun 24, 2011, at 1:31 AM, Chinna wrote:

> 
> Hi All,
> 
>  In the following Hive Roadmap(http://wiki.apache.org/hadoop/Hive/Roadmap)
> we have seen the following proposal
> 
> 3.4. Test, Error Messages and Debugging
>    [P0] Heavy-duty test infrastructure
> 
> 
> Our team is interested in working on this task.
> We need some details about the expectations of this task.
> 
> Pls add u r valuable comments.
> 
> Thanks&Regards,
> Chinna Rao lalam
> 
> 

Reply via email to