Dear Jeff, thank you for all your time, and very precious help.
with best regards. -- bogdan On Mon, Jul 9, 2018 at 1:41 AM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > Thank you for making the effort... but most attachments get stripped on > the mailing list. Using the reprex package as I suggested and putting the > result into the email is by far the safest approach. Since I received your > email directly, I did get the attachments. Below is my reproducible > example... to serve as an example for how you can get help from everyone on > the list rather than just the few you are responding to. > > My summary comment is that you have to decide whether the LENGTH values > greater than 500 are relevant... and if they are, you REALLY SHOULD create > a data set that is limited in this fashion. Then you won't have to create > "fake" axes, and you won't get ggplot warnings. > > Note: The reprex package allows you to confirm that the example is in fact > reproducible, so technically it is not necessary to include the plot images > in the question. However, reprex used to conveniently support putting the > images on the imgur website, and for some reason it no longer does that, so > just run the example interactively to see the graphs. > > ####### > ############################################################ > ############################################################ > > library("ggplot2") > > # "file" is the name of a very fundamental function in base R. Re-using > # that name for a data value is at best confusing to anyone reading your > # code and at worst will prevent you from using that function. > #file <- read.delim("LENGTH", sep="\t", header=T, stringsAsFactors=F) > > # Instead of giving us a file, keep the data within the example > # DF <- read.delim("LENGTH", sep="\t", header=T, stringsAsFactors=F) > # set.seed( 42 ) > # also shrink the size of the data for the example... we almost > # never need all of it > # dput( DF[ sample( seq.int( nrow( DF ) ), size = 200 ), , drop=FALSE ] ) > DF <- structure(list(LENGTH = c(6813L, 56035L, 123997L, 281L, 851L, 1072L, > 72196L, 21L, 304L, 110L, 198L, 5922L, 283L, 199348L, 109L, > 3317104L, 106L, 37642146L, 82641L, 20L, 125911L, 354L, 11625388L, > 330L, 9811711L, 18L, 35L, 39897L, 27L, 277L, 79L, 2657L, 17L, > 26L, 23L, 248L, 3634L, 21L, 324L, 206L, 328L, 42L, 286L, > 6042409L, > 24L, 36L, 2879L, 18L, 301L, 90684L, 4296636L, 43L, 1222L, 4536L, > 3281L, 324L, 393L, 3754L, 98824541L, 459L, 18L, 1081L, 175L, > 970L, 17L, 219L, 235558L, 1167315L, 25L, 623L, 2517515L, 32L, > 217L, 29L, 17L, 1744L, 18L, 39L, 26L, 77L, 41L, 22L, 311L, > 119015225L, > 146413L, 22L, 19L, 301L, 373L, 2240L, 6439L, 128L, 18L, 257L, > 783L, 5169L, 31608038L, 325L, 1533L, 25L, 69344L, 54L, 10651L, > 31L, 335062L, 1854019L, 7153L, 38605567L, 51L, 23L, 16L, 301L, > 79L, 313L, 18L, 29L, 39L, 22L, 17L, 306L, 67L, 280L, 324L, 158L, > 93L, 2561L, 302L, 134578L, 328L, 9002L, 969051L, 34L, 20L, 309L, > 355L, 28L, 9461327L, 18627013L, 305L, 64L, 18L, 2730L, 28L, 246L, > 911L, 28L, 241483L, 154691L, 58891L, 55L, 456362L, 281L, 276L, > 51L, 26L, 106821L, 313L, 78L, 29L, 400L, 61171382L, 200L, 101L, > 220331L, 128L, 325L, 28L, 22L, 325L, 2330L, 5879L, 24L, 36L, > 23L, 51L, 26L, 32584707L, 1672L, 13939L, 315L, 20L, 580785L, > 42795L, 49193543L, 695L, 48568156L, 55634L, 207L, 318L, 22056L, > 3670420L, 4815387L, 309L, 17L, 3143160L, 431L, 1164L, 33L, 5503L, > 4166L)), .Names = "LENGTH", row.names = c(8283L, 8484L, 2591L, > 7517L, 5808L, 4698L, 6665L, 1219L, 5944L, 6378L, 4140L, 6503L, > 8452L, 2310L, 4180L, 8497L, 8842L, 1062L, 4293L, 5063L, 8168L, > 1253L, 8932L, 8550L, 745L, 4643L, 3523L, 8177L, 4035L, 7545L, > 6657L, 7319L, 3502L, 6181L, 36L, 7513L, 67L, 1873L, 8174L, 5516L, > 3422L, 3928L, 338L, 8773L, 3891L, 8627L, 7997L, 5765L, 8745L, > 5573L, 3003L, 3122L, 3588L, 7064L, 351L, 6739L, 6095L, 1541L, > 2349L, 4628L, 6077L, 8839L, 6830L, 5094L, 7639L, 1704L, 2439L, > 7443L, 6230L, 2162L, 387L, 1262L, 1944L, 4306L, 1773L, 6460L, > 71L, 3371L, 4618L, 15L, 5220L, 1417L, 3222L, 5792L, 6960L, 5056L, > 2096L, 807L, 768L, 2737L, 5983L, 3L, 1870L, 8361L, 8294L, 6577L, > 2984L, 4614L, 6664L, 5545L, 5608L, 1945L, 1939L, 3482L, 8435L, > 8615L, 6621L, 6561L, 4793L, 21L, 5447L, 7484L, 6721L, 4048L, > 4790L, 4804L, 13L, 3179L, 5471L, 7407L, 3187L, 3669L, 5123L, > 5267L, 6427L, 3527L, 8207L, 8593L, 2085L, 6467L, 8065L, 5385L, > 5635L, 8363L, 7587L, 5172L, 7326L, 1015L, 6817L, 5560L, 1324L, > 716L, 4136L, 6945L, 6536L, 7281L, 1516L, 8415L, 2616L, 1328L, > 6406L, 2886L, 6933L, 3511L, 6040L, 6905L, 1672L, 259L, 1208L, > 6051L, 8315L, 4896L, 5351L, 1752L, 4759L, 1597L, 4017L, 2818L, > 1033L, 1654L, 6483L, 3659L, 3678L, 4266L, 3797L, 1212L, 7322L, > 5258L, 7052L, 6826L, 8147L, 7655L, 2813L, 2300L, 6584L, 6629L, > 8140L, 7034L, 1183L, 2551L, 1726L, 6950L, 1143L, 1144L, 641L, > 471L, 4712L, 995L, 6582L, 6476L), class = "data.frame") > > > ############################# display with PLOT FUNCTION: > > > # saving files should be avoided in reproducible examples... especially > files > # that cannot be transmitted through the R-help mailing list such as pdf > files > #pdf("display.R.ecdf.LENGTH.pdf", width=10, height=6, paper='special') > > # Your original plot commands below create a fake impression of the data by > # falsifying the axes. If you really are only interested in data points > less > # than 500, you should be explicit about creating a data set containing > only > # such constrained values before plotting them. > plot(ecdf(DF$LENGTH), xlab="DEL SIZE", > ylab="fraction of DEL", > main="LENGTH of DEL", > xlim=c(0,500), > col = "dark red", axes = FALSE) > ticks_y <- c(0, 0.2, 0.4, 0.6, 0.8, 1, 1.2, 1.4) > axis(2, at=ticks_y, labels=ticks_y, col.axis="red") > ticks_x <- c(0, 100, 200, 400, 500, 600, 700, 800) > axis(1, at=ticks_x, labels=ticks_x, col.axis="blue") > > #'  > > # my recommendation > DF500 <- subset( DF, LENGTH < 500 ) > plot( ecdf( DF500$LENGTH ) > , xlab = "DEL SIZE" > , ylab = "fraction of DEL" > , main = "LENGTH of DEL" > , col = "dark red" > ) > > #'  > > # alternatively > plot( ecdf( DF$LENGTH ) > , xlab = "DEL SIZE" > , ylab = "fraction of DEL" > , main = "LENGTH of DEL" > , col = "dark red" > , xlim=c( 1, 1e9 ) > , log="x" > ) > > #'  > > > > #dev.off() > > ############################# display in GGPLOT2 : > > BREAKS = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, > 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000) > > barfill <- "#4271AE" > barlines <- "#1F3552" > > #pdf("display.ggplot2.ecdf.LENGTH.pdf", width=10, height=6, > paper='special') > > # ggplot's limits behavior is enabling your false representation of the > data, but it > # warns you of the data removal > ggplot(DF, aes(LENGTH)) + > stat_ecdf(geom = "point", colour = barlines, fill = barfill) + > scale_x_continuous(name = "LENGTH of DEL", > breaks = BREAKS, > limits=c(0, 500) > ) + > scale_y_continuous(name = "FRACTION") + > ggtitle("ECDF of LENGTH") + > theme_bw() + > theme(legend.position = "bottom", legend.direction = > "horizontal", > legend.box = "horizontal", > legend.key.size = unit(1, "cm"), > axis.title = element_text(size = 12), > legend.text = element_text(size = 9), > legend.title=element_text(face = "bold", size = 9)) > #> Warning: Removed 80 rows containing non-finite values (stat_ecdf). > > #'  > > > # my recommendation > ggplot(DF500, aes(LENGTH)) + > stat_ecdf(geom = "point", colour = barlines, fill = barfill) + > scale_x_continuous(name = "LENGTH of DEL", > breaks = BREAKS ) + > scale_y_continuous(name = "FRACTION") + > ggtitle("ECDF of LENGTH") + > theme_bw() + > theme(legend.position = "bottom", legend.direction = "horizontal", > legend.box = "horizontal", > legend.key.size = unit(1, "cm"), > axis.title = element_text(size = 12), > legend.text = element_text(size = 9), > legend.title=element_text(face = "bold", size = 9)) > > #'  > > # or for the un-filtered data > ggplot(DF, aes(LENGTH)) + > stat_ecdf(geom = "point", colour = barlines, fill = barfill) + > scale_x_log10( name = "LENGTH of DEL") + > scale_y_continuous(name = "FRACTION") + > ggtitle("ECDF of LENGTH") + > theme_bw() + > theme(legend.position = "bottom", legend.direction = "horizontal", > legend.box = "horizontal", > legend.key.size = unit(1, "cm"), > axis.title = element_text(size = 12), > legend.text = element_text(size = 9), > legend.title=element_text(face = "bold", size = 9)) > > #'  > > > #dev.off() > > #' Created on 2018-07-09 by the [reprex package](http://reprex.tidyver > se.org) (v0.2.0). > ####### > > > On Sun, 8 Jul 2018, Bogdan Tanasa wrote: > > Dear Jeff, >> thank you for your email. >> >> Yes, in order to be more descriptive/comprehensive, please find attached >> to >> my email the following files (my apologies ... I am sending these as >> attachments, as I do not have a web server running at this moment) : >> >> -- the R script (R_script_display_ECDF.R) that reads the file "LENGTH" and >> outputs ECDF figure by using the standard R function or ggplot2. >> >> -- the display of ECDF by using standard R function >> ("display.R.ecdf.LENGTH.pdf") >> >> -- the display of ECDF by using ggplot2 ("display.ggplot2.ecdf.LENGTH. >> pdf") >> >> The ECDF over xlim(0,500) looks very different (contrasting plot(ecdf) vs >> ggplot2). Please would you advise why ? what shall I change in my ggplot2 >> code ? >> >> thanks a lot, >> >> - bogdan >> >> ps : the R code is also written below : >> >> library("ggplot2") >> >> >> file <- read.delim("LENGTH", sep="\t", header=T, >> stringsAsFactors=F) >> >> >> ############################# display with PLOT FUNCTION: >> >> >> pdf("display.R.ecdf.LENGTH.pdf", width=10, height=6, >> paper='special') >> >> >> plot(ecdf(file$LENGTH), xlab="DEL SIZE", >> ylab="fraction of DEL", >> main="LENGTH of DEL", >> xlim=c(0,500), >> col = "dark red", axes = FALSE) >> >> >> ticks_y <- c(0, 0.2, 0.4, 0.6, 0.8, 1, 1.2, 1.4) >> >> >> axis(2, at=ticks_y, labels=ticks_y, col.axis="red") >> >> >> ticks_x <- c(0, 100, 200, 400, 500, 600, 700, 800) >> >> >> axis(1, at=ticks_x, labels=ticks_x, col.axis="blue") >> >> >> dev.off() >> >> >> ############################# display in GGPLOT2 : >> >> >> BREAKS = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, >> 400, 500, >> 1000, 10000, 100000, 1000000, 10000000, 100000000, >> 1000000000) >> >> >> barfill <- "#4271AE" >> barlines <- "#1F3552" >> >> >> pdf("display.ggplot2.ecdf.LENGTH.pdf", width=10, height=6, >> paper='special') >> >> >> ggplot(file, aes(LENGTH)) + >> stat_ecdf(geom = "point", colour = barlines, fill = >> barfill) + >> scale_x_continuous(name = "LENGTH of DEL", >> breaks = BREAKS, >> limits=c(0, 500)) + >> scale_y_continuous(name = "FRACTION") + >> ggtitle("ECDF of LENGTH") + >> theme_bw() + >> theme(legend.position = "bottom", legend.direction = >> "horizontal", >> legend.box = "horizontal", >> legend.key.size = unit(1, "cm"), >> axis.title = element_text(size = 12), >> legend.text = element_text(size = 9), >> legend.title=element_text(face = "bold", size = >> 9)) >> >> >> dev.off() >> >> >> >> >> >> >> >> On Sat, Jul 7, 2018 at 9:47 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> >> wrote: >> It is a feature of ggplot that points excluded by limits raise >> warnings, while base graphics do not. >> >> You may find that using coord_cartesian with the xlim=c(0,500) >> argument works better with ggplot by showing the consequences of >> points out of the limits on lines within the viewport. >> >> There are other possible problems with your data that your >> non-reproducible example does not show, and sending R code in >> HTML-formatted email usually corrupts it.. so please follow the >> recommendations in the Posting Guide next time you post. >> >> On July 6, 2018 4:32:41 PM PDT, Bogdan Tanasa <tan...@gmail.com> >> wrote: >> >Dear all, >> > >> >I would appreciate having your advice/suggestions/comments on >> the >> >following >> >: >> > >> >1 -- starting from a vector that contains LENGTHS (numerically, >> the >> >values >> >are from 1 to 10 000) >> > >> >2 -- shall I display the ECDF by using the R code and some >> "limits" : >> > >> >BREAKS = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, >> 300, 400, >> >500, >> > 1000, 10000, 100000, 1000000, 10000000, 100000000, >> 1000000000) >> > >> >ggplot(x, aes(LENGTH)) + >> > stat_ecdf(geom = "point") + >> > scale_x_continuous(name = "LENGTH of DEL", >> > breaks = BREAKS, >> > limits=c(0, 500)) >> > >> >3 -- I am getting the following warning message : "Warning >> message: >> >Removed >> >109 rows containing non-finite values (stat_ecdf)." >> > >> >The question is : are these 109 values removed from >> VISUALIZATION as i >> >set >> >up the "limits", or are these 109 values removed from >> statistical >> >CALCULATION? >> > >> >4 -- in contrast, shall I use the standard R functions >> plot(ecdf), >> >there is >> >no "warning mesage" >> > >> >plot(ecdf(x$LENGTH), xlab="DEL LENGTH", >> > ylab="Fraction of DEL", main="DEL", >> xlim=c(0,500), >> > col = "dark red") >> > >> >Thanks a lot ! >> > >> >-- bogdan >> > >> > [[alternative HTML version deleted]] >> > >> >______________________________________________ >> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >https://stat.ethz.ch/mailman/listinfo/r-help >> >PLEASE do read the posting guide >> >http://www.R-project.org/posting-guide.html >> >and provide commented, minimal, self-contained, reproducible code. >> >> -- >> Sent from my phone. Please excuse my brevity. >> >> >> >> >> > ------------------------------------------------------------ > --------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live > Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > ------------------------------------------------------------ > --------------- [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.