I’m wondering if you do any searching of the Web or use the help facilities before asking questions? When I posed the question to Google’s search facilities I immediately was directed to, unsurprisingly, the help page text in a webpage format:
https://ggplot2.tidyverse.org/reference/geom_density_2d.html In many situations I find it very difficult to pry code out of ggplot functions, but that help page says it’s a routine found in the MASS package which is very well documented. — David. Sent from my iPhone > On Oct 9, 2020, at 4:23 PM, Ana Marija <sokovic.anamar...@gmail.com> wrote: > > Hi Abby, > > Thanks for getting back to me, yes I believe I did that by doing this: > > SNP$density <- get_density(SNP$mean, SNP$var) >> summary(SNP$density) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 0 383 696 738 1170 1789 > > where get_density() is function from here: > https://slowkow.com/notes/ggplot2-color-by-density/ > > and keep only entries with density > 400 > > a=SNP[SNP$density>400,] > > and plot it again: > > p <- ggplot(a, mapping = aes(x = mean, y = var)) > p <- p + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPS_red") > > and probably I can increase that threshold... > > Any idea how do I interpret data points that are left contained within > the ellipses? > >> On Fri, Oct 9, 2020 at 6:09 PM Abby Spurdle <spurdl...@gmail.com> wrote: >> >> You could assign a density value to each point. >> Maybe you've done that already...? >> >> Then trim the lowest n (number of) data points >> Or trim the lowest p (proportion of) data points. >> >> e.g. >> Remove the data points with the 20 lowest density values. >> Or remove the data points with the lowest 5% of density values. >> >> I'll let you decide whether that is a good idea or a bad idea. >> And if it's a good idea, then how much to trim. >> >> >>> On Sat, Oct 10, 2020 at 5:47 AM Ana Marija <sokovic.anamar...@gmail.com> >>> wrote: >>> >>> Hi Bert, >>> >>> Another confrontational response from you... >>> >>> You might have noticed that I use the word "outlier" carefully in this >>> post and only in relation to the plotted ellipses. I do not know the >>> underlying algorithm of geom_density_2d() and therefore I am having an >>> issue of how to interpret the plot. I was hoping someone here knows >>> that and can help me. >>> >>> Ana >>> >>> On Fri, Oct 9, 2020 at 11:31 AM Bert Gunter <bgunter.4...@gmail.com> wrote: >>>> >>>> I recommend that you consult with a local statistical expert. Much of what >>>> you say (outliers?!?) seems to make little sense, and your statistical >>>> knowledge seems minimal. Perhaps more to the point, none of your questions >>>> can be properly answered without subject matter context, which this list >>>> is not designed to provide. That's why I believe you need local expertise. >>>> >>>> Bert Gunter >>>> >>>> "The trouble with having an open mind is that people keep coming along and >>>> sticking things into it." >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>> >>>> >>>> On Fri, Oct 9, 2020 at 8:25 AM Ana Marija <sokovic.anamar...@gmail.com> >>>> wrote: >>>>> >>>>> Hi Abby, >>>>> >>>>> thank you for getting back to me and for this useful information. >>>>> >>>>> I'm trying to detect the outliers in my distribution based of mean and >>>>> variance. Can I see that from the plot I provided? Would outliers be >>>>> outside of ellipses? If so how do I extract those from my data frame, >>>>> based on which parameter? >>>>> >>>>> So I am trying to connect outliers based on what the plot is showing: >>>>> s <- ggplot(SNP, mapping = aes(x = mean, y = var)) >>>>> s <- s + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs") >>>>> >>>>> versus what is in the data: >>>>> >>>>>> head(SNP) >>>>> mean var sd >>>>> FQC.10090295 0.0327 0.002678 0.0517 >>>>> FQC.10119363 0.0220 0.000978 0.0313 >>>>> FQC.10132112 0.0275 0.002088 0.0457 >>>>> FQC.10201128 0.0169 0.000289 0.0170 >>>>> FQC.10208432 0.0443 0.004081 0.0639 >>>>> FQC.10218466 0.0116 0.000131 0.0115 >>>>> ... >>>>> >>>>> the distribution is not normal, it is right-skewed. >>>>> >>>>> Cheers, >>>>> Ana >>>>> >>>>> On Fri, Oct 9, 2020 at 2:13 AM Abby Spurdle <spurdl...@gmail.com> wrote: >>>>>> >>>>>>> My understanding is that this represents bivariate normal >>>>>>> approximation of the data which uses the kernel density function to >>>>>>> test for inclusion within a level set. (please correct me) >>>>>> >>>>>> You can fit a bivariate normal distribution by computing five parameters. >>>>>> Two means, two standard deviations (or two variances) and one >>>>>> correlation (or covariance) coefficient. >>>>>> The bivariate normal *has* elliptical contours. >>>>>> >>>>>> A kernel density estimate is usually regarded as an estimate of an >>>>>> unknown density function. >>>>>> Often they use a normal (or Gaussian) kernel, but I wouldn't describe >>>>>> them as normal approximations. >>>>>> In general, bivariate kernel density estimates do *not* have >>>>>> elliptical contours. >>>>>> But in saying that, if the data is close to normality, then contours >>>>>> will be close to elliptical. >>>>>> >>>>>> Kernel density estimates do not test for inclusion, as such. >>>>>> (But technically, there are some exceptions to that). >>>>>> >>>>>> I'm not sure what you're trying to achieve here. >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. > <snps_red.pdf> > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.