We are planning a migration from a large PostgreSQL-based DWH to
Hadoop/Hive. The principal reason for this migration is the massive
growth of the data to analyze (5.6 TB and growing) where PostgreSQL like
a MVCC-based RDBMS has its pitfalls with heavy updates and query
execution with great quantities of data. (We had done many query tunning
and optimization to the server, with a minor effect on the latency of
the queries).
So, we have viewed Hadoop and we have done some tests combined with Hive
and HBase and it´s awesome the obtained performance.
Can you give us some advices to develop a good plan for this?
Environment:
- O.S:CentOS-5.5 64 bits
- Java version: 1.6. Update 20
- Hardware: 8 Nodes - AMD Opteron QuadCore 4130
8 GB RAM
1 TB HDD
Regards
--
Marcos Luís Ortíz Valmaseda
Software Engineer (Large-Scaled Distributed Systems)
University of Information Sciences,
La Habana, Cuba
Linux User # 418229
http://about.me/marcosortiz