Hi Devs, I am seeking some advice on how to try and decrease the amount of disk usage with DHIS2.
Here is a list of the biggest tables in the system. public.datavalue | 2316 MB public.datavalue_pkey | 1230 MB public.in_datavalue_lastupdated | 680 MB There are a lot more tables, and all in all, the database occupies about 5.4 GB without analytics. This represents about 30 million data rows, so not that big of a database really. This server is being run off of a Digital Ocean virtual server with 60 GB of disk space. The only thing on the server really is Linux, Postgresql and Tomcat. Nothing else. With out analytics and everything installed for the system, we have about 23% of that 60 GB free. When analytics runs, it maintains a copy of the main analytics tables ( analytics_XXXX) and creates temp tables like analytics_temp_2004. When things are finished and the indexes are built, the tables are swapped. This ensures that analytics resources are available while analytics are being built, but the downside of this is that A LOT more disk space is required, as now we effectively have two copies of the tables along with all their indexes, which are quite large themselves (up to 60% the size of the table itself). Here's what happens when analytics is run public.analytics_temp_2015 | 1017 MB public.analytics_temp_2014 | 985 MB public.analytics_temp_2011 | 952 MB public.analytics_temp_2010 | 918 MB public.analytics_temp_2013 | 885 MB public.analytics_temp_2012 | 835 MB public.analytics_temp_2009 | 804 MB Now each analytics table is taking about 1 GB of space. In the end, it adds up to more than 60 GB and analytics fails to complete. So, while I understand the need for this functionality, I am wondering if we need a system option to allow the analytics tables to be dropped prior to regenerating them, or to have more control over the order in which they are generated (for instance to generate specific periods). I realize this can be done from the API or the scheduler, but only for the past three relative years. The reason I am asking for this is because its a bit of a pain (at the moment) when using Digital Ocean as a service provider, since their stock disk storage is 60 GB. With other VPS providers (Amazon, Linode), its a bit easier, but DigitalOcean only supports block storage in two regions at the moment. Regardless, it would seem somewhat wasteful to have to have such a large amount of disk space, for such a relatively small database. Is this something we just need to plan for and maybe provide better documentation on, or should we think about trying to offer better functionality for people running smaller servers? Regards, Jason
_______________________________________________ Mailing list: https://launchpad.net/~dhis2-devs Post to : dhis2-devs@lists.launchpad.net Unsubscribe : https://launchpad.net/~dhis2-devs More help : https://help.launchpad.net/ListHelp