On 20 May 2010 15:56, Bob Jolliffe <bobjolli...@gmail.com> wrote: > 2010/5/20 Ola Hodne Titlestad <olati...@gmail.com>: >> >> 2010/5/20 Lars Helge Øverland <larshe...@gmail.com> >>> >>> Data elements derive their period type from the data sets they are members >>> of. > > Restated (what I just sent Lars only by mistake): a datavalue derives > its period type from the data set of > which its data element is a member :-) > >> >> And when they are members of two datasets with different period types they >> have multiple period types right? > > It's important to remain aware that it is values ultimately which have > periods (and hence period types). > > And when you look at a value you can derive its period type in one of > two ways - via dataset or via period. Potentially these could > disagree, The one which derives from its period should be considered > authoritative ie. if the period is 2009-Jan then regardless of what > the dataset might say this really must be monthly. Of course we hope > these always agree. Incidentally the lookup from > datelement-to-dataset-to-period looks like a greater complexity than > the lookup from period->periodType. > >> >> The key thing to look out for in data entry and data import is to avoid >> overlaps in data values that will cause duplication when aggregating data >> periods. >> E.g. if the SAME ORGUNIT registers values for the same data element for two >> different period types that have overlapping periods, e.g. Jan-10 and Q1-10. >> Then the aggregate values for Q1-10, Jan-June 2010, and 2010 will all show >> an incorrect value since the value for Jan-10 is counted twice. > > OK. Thats a good concrete constraint to have. > >> >> One way to enforce this constraint is to monitor which datasets an orgunit >> is assigned to, and not allow orgunits to be assigned to two datasets that >> have the same data element AND different period types. > > Agreed, Though this constraint should probably be imposed on forms > rather than datasets. > >>As far as I am aware, >> we are not checking for this today. During data import it could be checked >> on data element level by looking up the period type the way Bob has shown, >> but that sounds like a lot of look ups and time consuming validation, or? > > On data import we don't really validate at all, beyond whatever > constraints the db imposes. For efficiency we simply pop the values in > with multiple insert statement. So this validation would have to > happen as a stage before the actual import or would have to be > constrained within the db. In fact it can't be validated easily > before the import as it is dependent on existing values within the db. > >> >> A relatively normal use case that we probably have to find a way to support, >> and I think they are struggling with in Vietnam, is that different provinces >> can use different period types for the same data elements (even for complete >> data sets). E.g. if the national data flow policy says to report on >> immunisation data every quarter, so that becomes the minimum requirement for >> all provinces. Then some of the provinces decide that all their facilities >> have to collect this data monthly anyway, and then at the province level >> they simply send the quarterly aggregates to national level (in the >> paper-based or Excel world). At the same time other provinces just collect >> quarterly data at the facility level as in the minimum national requirement. >> At the national level there is a need to consolidate all this data, even >> data by the facility level, so ideally a national DHIS database should be >> able to store both monthly and quarterly raw data values for the same data >> elements, but for different orgunits. The national information users can >> then easily generate quarterly reports on immunisation for all provinces, >> while in some provinces they can do monthly data analysis if they want to >> collect data using that frequency. >> >> We support the above scenario by allowing the same data elements to be >> assigned to different data sets with different period types, but we don't >> control for misuse of this flexibility which can lead to duplication and >> inconsistent aggregated data values as pointed out above. > > Thinking further ... I really think the problem arises because we we > have a dataset concept which represents a form and is also used to > constrain periodtypes on dataelements. Thinking of the use case you > have just described, it should be the case that one can have a paper > form which national level expect to collect quarterly, and the same > form be used at a lower level to collect data monthly. If we wanted > to mirror that use case electronically we would have to divorce the > form from the periodtype - ie a form would collect datavalues of a > certain period, but the same form could be used in different orgunits > for collecting data at a different frequency.. > > So (leaving dataset aside for the moment) if we can't assign a > periodtype to a form and we can't assign to a dataelement and its too > inefficient to validate on a one by one datavalue basis what is a girl > to do? > > I suspect the correct answer is to refactor datavalue and create a > datavalueset type - note: a set of datavalues rather than a set of > dataelements. Designing out loud, a datavalueset would have the > following fields/attributes: > > 1. a formid - the collection instrument used - roughly corresponds to > current dataset > 2. an orgunitid - where the datavalues come from > 3. a periodid - the period of all the datavalues > couple of other useful attributes I can think of > > Datavalue now becomes slightly simpler (which is always a good thing). > It only has: > value, dataelementid, categorycombooption, datasetid
Afterthought: At the risk of adding complexity to what is otherwise a simplification, my life could become even simpler if datavalueset also had a categorycombo attribute, which would imply that a dataset was linked to a formsectionid rather than a formid. So a form has sections. sections have dataelements. And sections have a datavalueset as a model - which implies a uniform categorycombo within the section. There isn't really a need for dataelements to have a categorycombo. And in lots of ways its good that they don't. Then I am reducing complexity rather than adding to it :-) Consider one orgunit has collected malaria deaths disaggregated by age. Another has collected values for the the same dataelement, but not disaggregated by age. The datavalues will come from a datavalueset so will have a categorycombo. It is possible to aggregate or compare these datavalues,from different datavaluesets, but using the lowest common denominator of categorycombo ie. in both cases you have access to malaria deaths - in the one case you have to "roll-up" the categorycombo which does of course assume that the sum of category options make a sensible whole, but Ola has mentioned this one many times. Regards Bob > > We can relatively efficiently validate that a dataset object is not > persisted which has the same formid, orgunitid and an overlapping > period. > > There is no longer any ambiguity about periodtype of a datavalue. > > stored_by, timestamp, comment might go either way. Probably they need > to stay on datavalue. I notice comment is rarely used but its really > useful to have a comment on datavalueset for import purposes. > > 'nuff designing out loud. Got to go. > > Regards > Bob > >> >> >> Ola >> --------- >> >>> >>> On Thu, May 20, 2010 at 11:44 AM, Ola Hodne Titlestad <olati...@gmail.com> >>> wrote: >>>> >>>> Hi, >>>> >>>> After Kim Anh's email about the use of the same data elements with >>>> different period types I dug up this old discussion from March 2009. >>>> >>>> What is the status on this work, or did we not conclude this? >>>> >>>> Ola >>>> ---------- >>>> >>>> 2009/3/20 Bob Jolliffe <bobjolli...@gmail.com> >>>>> >>>>> 2009/3/20 Lars Helge Øverland <larshe...@gmail.com>: >>>>> > >>>>> >> >>>>> >> Yes this is true. But what do you think of the idea to enforce >>>>> >> DataSet membership having a default DataSet for all the delinquents? >>>>> >> I'm not sure if it can be enforced by the schema, but at least by the >>>>> >> application. >>>>> > >>>>> > OK but what does this give us in terms of PeriodType-determining if >>>>> > this >>>>> > default DataSet has a null PeriodType? >>>>> >>>>> Nothing really. The only effect would be you have an index on the >>>>> unassigned DataElements for what its worth. Mainly it would be useful >>>>> for determining easily the available DataElements which can be added >>>>> to a DataSet. Maybe its a nonsense idea - I was just trying to think >>>>> of ways to make editing DataSets reasonably straightforward. >>>>> >>>>> > >>>>> >> >>>>> >> I don't know if its about right or wrong. There are pros and cons of >>>>> >> both approaches. What you gain on the swings you lose on the >>>>> >> roundabouts :-) >>>>> >> >>>>> >> In the explicit case the application will have to enforce that >>>>> >> DataSet >>>>> >> members all have the same periodType. >>>>> >> >>>>> >> In the implicit case the application will have to enforce that >>>>> >> DataElements can only be members of multiple groups if these share >>>>> >> the >>>>> >> same PeriodType. >>>>> >> >>>>> >> The net result as far as the Data API is concerned can and must be >>>>> >> the >>>>> >> same. Perhaps we should define exactly what extra methods we want in >>>>> >> the API first. We have already identified a few. Then decide >>>>> >> whether >>>>> >> a database change is necessitated by these. >>>>> > >>>>> > Yes. We need at least service method: >>>>> > >>>>> > Collection<DataElement> getDataElementsByPeriodType( PeriodType ) >>>>> > >>>>> > and getter on the DataElement object: >>>>> > >>>>> > PeriodType getPeriodType() >>>>> > >>>>> > >>>>> > I guess we could make a branch, start coding and see how it works out. >>>>> >>>>> Sure. So long as we are adding methods we won't be breaking anything >>>>> in terms of backward compatibility. Just enforcing application level >>>>> constraints. Then we can really encourage (enforce?) upper layers to >>>>> strictly interact with the data via the API. Even if this might >>>>> occasionally mean making some lightweight API methods which bypass the >>>>> ORM. >>>>> >>>>> > >>>>> > Another issue would arise in the (exotic) situation where someone >>>>> > assigns a >>>>> > DataElement to a DataSet, enter data for it, then removes it from the >>>>> > DataElement. The data is there, but how do we deal with it in regard >>>>> > to the >>>>> > mentioned required functionaly (trend analysis, datamart) ? >>>>> > >>>>> >>>>> Yes this gets a bit weird (I presume you mean removes it from the >>>>> DataSet). I'm guessing you haven't lost the data because the >>>>> dataValues each have a PeriodID which in turn is linked to a >>>>> PeriodType. I suppose that (in such an exotic headspace) DataElements >>>>> can in fact change their PeriodTypes over time, though I imagine its >>>>> not a great idea. >>>>> >>>>> The effect would be the same in the explicit relationship case, if >>>>> someone assigns a DataElement to a DataSet, enter data for it, then >>>>> changes the PeriodType of the DataElement ... >>>>> >>>>> Cheers >>>>> Bob >>>>> >>>>> _______________________________________________ >>>>> Mailing list: https://launchpad.net/~dhis2-devs >>>>> Post to : dhis2-devs@lists.launchpad.net >>>>> Unsubscribe : https://launchpad.net/~dhis2-devs >>>>> More help : https://help.launchpad.net/ListHelp >>>> >>> >> >> > _______________________________________________ Mailing list: https://launchpad.net/~dhis2-devs Post to : dhis2-devs@lists.launchpad.net Unsubscribe : https://launchpad.net/~dhis2-devs More help : https://help.launchpad.net/ListHelp