On Fri, 17 Oct 2014, Ranjan Maitra wrote:


What I mean is that R has the capability of generating PDFs, and R has
the capability of calculating various goodness of fit measures, but if
you want to check goodness of fit measures against, say, 50 PDFs, then
you have to write the package.  It's easier for me to use easyfit than
write the package.

Never having heard of "easyfit" before now, I guess I am confused as to what 
you mean when you say fitting a pdf. What is the form of the pdfs that you want to fit? 
It is very unusual to want to fit 50 different parametric pdfs, unless what you mean is 
something totally different. In that case, have you considered going the (nonparametric) 
density estimation route?

Many thanks,
Ranjan


Well, this isn't really a fedora thing, but since I think it's interesting I'll 
impose a little longer.

Here's the problem.  Let's say you have a set of data and you want to 
characterize it in order to use it as the basis of a model.  In order to do 
that, you really need to know the underlying PDF.

Here are two simple examples that I've run into in the past couple of years.  
I'm a forensic pathologist, and investigate unnatural death.  One common 
problem in the field is the issue of abusive head trauma -- can you tell from 
the injuries on a child that the injuries *must* have been caused by another, 
or could they be from an accident of some sort.

There has been a great deal of biomechanical modeling involved with this issue. 
 Some of these models are based on physical measurements of the amount of force 
it takes to fracture the skull of a child.  One very commonly cited study of 
this actually uses a very small data set of donated skulls.  The data is 
reported as if it were gaussian, but in fact if you look at it, it is a uniform 
distribution.

It's a uniform distribution because the investigators took one or two skulls from infants 
of varying ages -- and what they are really measuring is the change in skull properties 
over time.  It's as if they did a study on "average human height" and then took 
one sample from humans at each month from birth to 3 years old.

In situations like this, it's important to see and understand the underlying PDF, because they then use the 
data *as if it were gaussian* to create biomechanical models.  And it's wrong to do that -- it's wrong to 
apply the "average"  and "standard deviation" of height of people from birth to 3 years 
as the supposed "average" height of a newborn baby.  If you look at the distribution, the error 
becomes obvious.

A second example occurred when a group attempted to apply Benford's Law to look 
for bias in manner of death determination in forensic death investigation.  The 
investigators looked at the number of homicides, suicides, accidents, and 
natural deaths in their jurisdiction each month over a period of a couple of 
years, and it *seemed* as if it followed Benford's Law.

However, it was an artifact of their workload.  My office has about twice their 
workload, and the first digit for my practice is scaled by two.  The 
distribution is really a pretty simple gaussian distribution, and these 
distributions tend not to follow Benford's Law.  Thus, knowing that the 
distribution fits a normal well is an argument that manner determination should 
*not* follow Benford's Law.  If, however, the data fit something like the gamma 
distribution well, it would *not* argue against the applicability of Benford's 
law.


billo
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org

Reply via email to