Re: [R] 2 D density plot interpretation and manipulating the data

David Winsemius Fri, 09 Oct 2020 17:34:34 -0700

I’m wondering if you do any searching of the Web or use the help facilities 
before asking questions? When I posed the question to Google’s search 
facilities I immediately was directed to, unsurprisingly, the help page text in 
a webpage format:


https://ggplot2.tidyverse.org/reference/geom_density_2d.html

In many situations I find it very difficult to pry code out of ggplot 
functions, but that help page says it’s a routine found in the MASS package 
which is very well documented. 
— 
David. 

Sent from my iPhone

> On Oct 9, 2020, at 4:23 PM, Ana Marija <sokovic.anamar...@gmail.com> wrote:
> 
> Hi Abby,
> 
> Thanks for getting back to me, yes I believe I did that by doing this:
> 
> SNP$density <- get_density(SNP$mean, SNP$var)
>> summary(SNP$density)
>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>      0     383     696     738    1170    1789
> 
> where get_density() is function from here:
> https://slowkow.com/notes/ggplot2-color-by-density/
> 
> and keep only entries with density > 400
> 
> a=SNP[SNP$density>400,]
> 
> and plot it again:
> 
> p <- ggplot(a, mapping = aes(x = mean, y = var))
> p <- p +  geom_density_2d() + geom_point() + my.theme + ggtitle("SNPS_red")
> 
> and probably I can increase that threshold...
> 
> Any idea how do I interpret data points that are left contained within
> the ellipses?
> 
>> On Fri, Oct 9, 2020 at 6:09 PM Abby Spurdle <spurdl...@gmail.com> wrote:
>> 
>> You could assign a density value to each point.
>> Maybe you've done that already...?
>> 
>> Then trim the lowest n (number of) data points
>> Or trim the lowest p (proportion of) data points.
>> 
>> e.g.
>> Remove the data points with the 20 lowest density values.
>> Or remove the data points with the lowest 5% of density values.
>> 
>> I'll let you decide whether that is a good idea or a bad idea.
>> And if it's a good idea, then how much to trim.
>> 
>> 
>>> On Sat, Oct 10, 2020 at 5:47 AM Ana Marija <sokovic.anamar...@gmail.com> 
>>> wrote:
>>> 
>>> Hi Bert,
>>> 
>>> Another confrontational response from you...
>>> 
>>> You might have noticed that I use the word "outlier" carefully in this
>>> post and only in relation to the plotted ellipses. I do not know the
>>> underlying algorithm of geom_density_2d() and therefore I am having an
>>> issue of how to interpret the plot. I was hoping someone here knows
>>> that and can help me.
>>> 
>>> Ana
>>> 
>>> On Fri, Oct 9, 2020 at 11:31 AM Bert Gunter <bgunter.4...@gmail.com> wrote:
>>>> 
>>>> I recommend that you consult with a local statistical expert. Much of what 
>>>> you say (outliers?!?) seems to make little sense, and your statistical 
>>>> knowledge seems minimal. Perhaps more to the point, none of your questions 
>>>> can be properly answered without subject matter context, which this list 
>>>> is not designed to provide. That's why I believe you need local expertise.
>>>> 
>>>> Bert Gunter
>>>> 
>>>> "The trouble with having an open mind is that people keep coming along and 
>>>> sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>> 
>>>> 
>>>> On Fri, Oct 9, 2020 at 8:25 AM Ana Marija <sokovic.anamar...@gmail.com> 
>>>> wrote:
>>>>> 
>>>>> Hi Abby,
>>>>> 
>>>>> thank you for getting back to me and for this useful information.
>>>>> 
>>>>> I'm trying to detect the outliers in my distribution based of mean and
>>>>> variance. Can I see that from the plot I provided? Would outliers be
>>>>> outside of ellipses? If so how do I extract those from my data frame,
>>>>> based on which parameter?
>>>>> 
>>>>> So I am trying to connect outliers based on what the plot is showing:
>>>>> s <- ggplot(SNP, mapping = aes(x = mean, y = var))
>>>>> s <- s +  geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs")
>>>>> 
>>>>> versus what is in the data:
>>>>> 
>>>>>> head(SNP)
>>>>>               mean      var     sd
>>>>> FQC.10090295 0.0327 0.002678 0.0517
>>>>> FQC.10119363 0.0220 0.000978 0.0313
>>>>> FQC.10132112 0.0275 0.002088 0.0457
>>>>> FQC.10201128 0.0169 0.000289 0.0170
>>>>> FQC.10208432 0.0443 0.004081 0.0639
>>>>> FQC.10218466 0.0116 0.000131 0.0115
>>>>> ...
>>>>> 
>>>>> the distribution is not normal, it is right-skewed.
>>>>> 
>>>>> Cheers,
>>>>> Ana
>>>>> 
>>>>> On Fri, Oct 9, 2020 at 2:13 AM Abby Spurdle <spurdl...@gmail.com> wrote:
>>>>>> 
>>>>>>> My understanding is that this represents bivariate normal
>>>>>>> approximation of the data which uses the kernel density function to
>>>>>>> test for inclusion within a level set. (please correct me)
>>>>>> 
>>>>>> You can fit a bivariate normal distribution by computing five parameters.
>>>>>> Two means, two standard deviations (or two variances) and one
>>>>>> correlation (or covariance) coefficient.
>>>>>> The bivariate normal *has* elliptical contours.
>>>>>> 
>>>>>> A kernel density estimate is usually regarded as an estimate of an
>>>>>> unknown density function.
>>>>>> Often they use a normal (or Gaussian) kernel, but I wouldn't describe
>>>>>> them as normal approximations.
>>>>>> In general, bivariate kernel density estimates do *not* have
>>>>>> elliptical contours.
>>>>>> But in saying that, if the data is close to normality, then contours
>>>>>> will be close to elliptical.
>>>>>> 
>>>>>> Kernel density estimates do not test for inclusion, as such.
>>>>>> (But technically, there are some exceptions to that).
>>>>>> 
>>>>>> I'm not sure what you're trying to achieve here.
>>>>> 
>>>>> ______________________________________________
>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide 
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
> <snps_red.pdf>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 2 D density plot interpretation and manipulating the data

Reply via email to