Hi, One (presumably) simple solution is to stagger analytics on a year by year basis - i.e. run and complete 2009 before processing 2010. That would reduce temp disk space requirements significantly while (presumably) not changing the general design.
Regards Calle On 26 July 2016 at 10:24, Jason Pickering <jason.p.picker...@gmail.com> wrote: > Hi Devs, > I am seeking some advice on how to try and decrease the amount of disk > usage with DHIS2. > > Here is a list of the biggest tables in the system. > > public.datavalue | 2316 MB > public.datavalue_pkey | 1230 MB > public.in_datavalue_lastupdated | 680 MB > > > There are a lot more tables, and all in all, the database occupies about > 5.4 GB without analytics. > > This represents about 30 million data rows, so not that big of a database > really. This server is being run off of a Digital Ocean virtual server with > 60 GB of disk space. The only thing on the server really is Linux, > Postgresql and Tomcat. Nothing else. With out analytics and everything > installed for the system, we have about 23% of that 60 GB free. > > When analytics runs, it maintains a copy of the main analytics tables ( > analytics_XXXX) and creates temp tables like analytics_temp_2004. When > things are finished and the indexes are built, the tables are swapped. This > ensures that analytics resources are available while analytics are being > built, but the downside of this is that A LOT more disk space is required, > as now we effectively have two copies of the tables along with all their > indexes, which are quite large themselves (up to 60% the size of the table > itself). Here's what happens when analytics is run > > public.analytics_temp_2015 | 1017 MB > public.analytics_temp_2014 | 985 MB > public.analytics_temp_2011 | 952 MB > public.analytics_temp_2010 | 918 MB > public.analytics_temp_2013 | 885 MB > public.analytics_temp_2012 | 835 MB > public.analytics_temp_2009 | 804 MB > > Now each analytics table is taking about 1 GB of space. In the end, it > adds up to more than 60 GB and analytics fails to complete. > > So, while I understand the need for this functionality, I am wondering if > we need a system option to allow the analytics tables to be dropped prior > to regenerating them, or to have more control over the order in which they > are generated (for instance to generate specific periods). I realize this > can be done from the API or the scheduler, but only for the past three > relative years. > > The reason I am asking for this is because its a bit of a pain (at the > moment) when using Digital Ocean as a service provider, since their stock > disk storage is 60 GB. With other VPS providers (Amazon, Linode), its a bit > easier, but DigitalOcean only supports block storage in two regions at the > moment. Regardless, it would seem somewhat wasteful to have to have such a > large amount of disk space, for such a relatively small database. > > Is this something we just need to plan for and maybe provide better > documentation on, or should we think about trying to offer better > functionality for people running smaller servers? > > Regards, > Jason > > _______________________________________________ > Mailing list: https://launchpad.net/~dhis2-devs > Post to : dhis2-devs@lists.launchpad.net > Unsubscribe : https://launchpad.net/~dhis2-devs > More help : https://help.launchpad.net/ListHelp > > -- ******************************************* Calle Hedberg 46D Alma Road, 7700 Rosebank, SOUTH AFRICA Tel/fax (home): +27-21-685-6472 Cell: +27-82-853-5352 Iridium SatPhone: +8816-315-19119 Email: calle.hedb...@gmail.com Skype: calle_hedberg *******************************************
_______________________________________________ Mailing list: https://launchpad.net/~dhis2-devs Post to : dhis2-devs@lists.launchpad.net Unsubscribe : https://launchpad.net/~dhis2-devs More help : https://help.launchpad.net/ListHelp