Thanks Saptarshi

I am among of many who runs large DHIS2 database on MySQL. I always think to 
move back to PostgreSQL. This communication is of great enlightening to me.

Congratulations

Lungo



________________________________
 From: Saptarshi Purkayastha <sun...@gmail.com>
To: DHIS 2 developers <dhis2-devs@lists.launchpad.net>; 
dhis2-us...@lists.launchpad.net 
Sent: Tuesday, May 7, 2013 3:53 PM
Subject: [Dhis2-devs] Migrating from MySQL to PostgreSQL
 


Hello DHIS users and devs,

Apologies for the long email... Should have probably been multiple blogpost to 
reduce its length...

I recently encountered a situation with a very large implementation of DHIS2 
having problems generating data mart.
Thus no reports were generated and only data entry was being done. I thought 
I'd share some of the experiences to solve these issues, so that it might be 
useful to other implementers.
Some changes will be needed in the DHIS2 source, so sending this to the dev 
list also, where dev-related discussions can follow-up.

While PostgreSQL is our recommended database, many implementations have also 
used MySQL.
My findings clearly highlight that DHIS2 performs much better on PostgreSQL and 
there are also some bugs related to MySQL dialect.
Total org units - 26303
Total Monthly datasets - 9
Total Daily dataset - 1
Total Yearly dataset - 3

The implementation has about 34 million datavalues (non-zero)... but I pruned 
it for my benchmarking. I added 1-million datavalues and ran the data mart.
The results are from my fairly good laptop (quad-core i7; 8GB RAM; tuned JVM; 
tuned MySQL 5.5 (4GB RAM); tuned PostgreSQL 9.2 (4GB RAM); 240GB SSD)
Using DHIS2 2.11. When doing MySQL benchmark turned off all services including 
postgres and vice versa.

Java Opts:  -Xmx3G -Xms768m -XX:MaxPermSize=512m

Java version: 1.7.0_21 x64

Java vendor: Oracle Corporation
OS name: WindowsMySQL = datamart completed in 3hrs 46min 12sec
PostgreSQL = datamart completed in 2hrs 5min 16sec

So, it is obvious that PostgreSQL is doing datamart much faster. The advantages 
might scale better if larger number of datavalues
One could argue MySQL 5.6 has many performance improvements, I didn't have time 
to explore that.
The migration to PostgreSQL has some challenges. Following are steps I followed:

 - Take the mysqldump
 - replace bit(1) to tinyint(1) in the SQL file
 - You'll see that column names are camelCase. This is an issue because 
postgres will added a double quotes around to get case-sensitivity, which MySQL 
by default nicely excludes.
So you'll have to make all column names to lowercase and remove the quote 
characters. I did this with a simple java program. There are 150-odd column 
names that need changes.
 - Used Navcat premium (trial version or SQLSquirrel also has this feature). 
"Data transfer" is the name of the feature that will move data from MySQL to 
Postgres
 - In MySQL non-standard use of boolean (which came only a few yrs back), its 
converted to smallint in Postgres. I wrote a JDBC program to change column type 
from smallint to boolean. A single table example is as follows that can be made 
into a looping procedure as well in pure PSQL.
ALTER TABLE indicator ALTER COLUMN annualized TYPE boolean
    USING CASE WHEN annualized = 0 THEN FALSE
           WHEN annualized = 1 THEN TRUE
           ELSE NULL
    END;
 - remove NULL values from minimumvalue column of minmaxdatalement table
DELETE from minmaxdatalement WHERE minimumvalue=NULL
 - remove NULL values from maximumvalue column of minmaxdatalement table
DELETE from minmaxdatalement WHERE maximumvalue=NULL
 - remove NULL values from name column of relationshiptype table
DELETE from relationshiptype WHERE name=NULL
 - blobs to bytea conversion is a mess and I had to truncate. Probably a JDBC 
based connector program will do better conversion, but I just truncated it and 
accepted the data loss to systemsetting and usersetting :-)

Devs:
We need to make all column names lowercase in hbm.xml files in code. This will 
ensure portability and is generally a good practice.
We should also have a convention of using last_updated instead of lastUpdated 
in column names, as is the common practice.


---
Regards,
Saptarshi PURKAYASTHA

My Tech Blog:  http://sunnytalkstech.blogspot.com
You Live by CHOICE, Not by CHANCE
_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to     : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help   : https://help.launchpad.net/ListHelp
_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to     : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help   : https://help.launchpad.net/ListHelp

Reply via email to