We are planning a migration from a large PostgreSQL-based DWH to Hadoop/Hive. The principal reason for this migration is the massive growth of the data to analyze (5.6 TB and growing) where PostgreSQL like a MVCC-based RDBMS has its pitfalls with heavy updates and query execution with great quantities of data. (We had done many query tunning and optimization to the server, with a minor effect on the latency of the queries).

So, we have viewed Hadoop and we have done some tests combined with Hive and HBase and it´s awesome the obtained performance.

Can you give us some advices to develop a good plan for this?

Environment:
- O.S:CentOS-5.5 64 bits
- Java version: 1.6. Update 20
- Hardware: 8 Nodes - AMD Opteron QuadCore 4130
                                    8 GB RAM
                                    1 TB HDD

Regards

--
Marcos Luís Ortíz Valmaseda
 Software Engineer (Large-Scaled Distributed Systems)
 University of Information Sciences,
 La Habana, Cuba
 Linux User # 418229
 http://about.me/marcosortiz

Reply via email to