:) :) On Thu, Feb 17, 2022 at 10:37 PM Bert Gunter <bgunter.4...@gmail.com> wrote:
> imo, with such simple data, a plot is mere chartjunk. A simple table(= > the distribution) would suffice and be more informative: > > > table(bug) ## bug is a vector. No data frame is needed > > 0 1 2 3 4 5 7 ## bug count > 162 40 9 7 2 1 1 ## nmbr of cases with the given count > > You or others may disagree, of course. > > Bert Gunter > > > > On Thu, Feb 17, 2022 at 11:56 AM Neha gupta <neha.bologn...@gmail.com> > wrote: > > > > Ebert and Rui, thank you for providing the tips (in fact, for providing > the > > answer I needed). > > > > Yes, you are right that boxplot of all zero values will not make sense. > > Maybe histogram will work. > > > > I am providing a few details of my data here and the context of the > > question I asked. > > > > My data is about bugs/defects in different classes of a large software > > system. I have to predict which class will contain bugs and which will be > > free of bugs (bug=0). I trained ML models and predict but my advisor > asked > > me to provide first the data distribution about bugs e.g details of how > > many classes with bugs (bug > 0) and how many are free of bugs (bug=0). > > > > That is why I need to provide the data distribution of both types of > values > > (i.e. bug=0 and bug >0) > > > > Thank you again. > > > > On Thu, Feb 17, 2022 at 8:28 PM Rui Barradas <ruipbarra...@sapo.pt> > wrote: > > > > > Hello, > > > > > > In your original post you read the same file "synapse.arff" twice, > > > apparently to filter each of them by its own criterion. You don't need > > > to do that, read once and filter that one by different criteria. > > > > > > As for the data as posted, I have read it in with the following code: > > > > > > > > > x <- " > > > 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0 0 > > > 4 1 0 > > > 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0 0 > > > 0 0 0 > > > 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0 7 > > > 0 0 1 > > > 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0 0 > > > 1 0 0 > > > 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0 0 > > > 0 0 1 > > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0 > > > " > > > bug <- scan(text = x) > > > data <- data.frame(bug) > > > > > > > > > This is not the right way to post data, the posting guide asks to post > > > the output of > > > > > > > > > dput(data) > > > structure(list(bug = c(0, 1, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, > > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, > > > 0, 0, 4, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 3, 2, 0, 0, 0, 0, > > > 3, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, > > > 0, 0, 1, 1, 2, 1, 0, 1, 0, 0, 0, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, > > > 1, 0, 0, 1, 0, 0, 1, 0, 0, 5, 0, 0, 0, 0, 0, 0, 7, 0, 0, 1, 0, > > > 1, 1, 0, 2, 0, 3, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, > > > 0, 1, 0, 3, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, > > > 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 3, 0, 0, 1, 0, 1, 3, 0, 0, 0, 0, > > > 0, 0, 0, 0, 1, 0, 4, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, > > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 1, 0, 0, 0, 0, 0)), > > > class = "data.frame", row.names = c(NA, -222L)) > > > > > > > > > > > > This can be copied into an R session and the data set recreated with > > > > > > data <- structure(etc) > > > > > > > > > Now the boxplots. > > > > > > (Why would you want to plot a vector of all zeros, btw?) > > > > > > > > > > > > library(dplyr) > > > > > > boxplot(filter(data, bug == 0)) # nonsense > > > boxplot(filter(data, bug > 0), range = 0) > > > > > > # Another way > > > data %>% > > > filter(bug > 0) %>% > > > boxplot(range = 0) > > > > > > > > > Hope this helps, > > > > > > Rui Barradas > > > > > > > > > Às 19:03 de 17/02/2022, Neha gupta escreveu: > > > > That is all the code I have. How can I provide a reproducible code ? > > > > > > > > How can I save this result? > > > > > > > > On Thu, Feb 17, 2022 at 8:00 PM Ebert,Timothy Aaron <teb...@ufl.edu> > > > wrote: > > > > > > > >> You pipe the filter but do not save the result. A reproducible > example > > > >> might help. > > > >> Tim > > > >> > > > >> -----Original Message----- > > > >> From: R-help <r-help-boun...@r-project.org> On Behalf Of Neha gupta > > > >> Sent: Thursday, February 17, 2022 1:55 PM > > > >> To: r-help mailing list <r-help@r-project.org> > > > >> Subject: [R] Problem with data distribution > > > >> > > > >> [External Email] > > > >> > > > >> Hello everyone > > > >> > > > >> I have a dataset with output variable "bug" having the following > values > > > >> (at the bottom of this email). My advisor asked me to provide data > > > >> distribution of bugs with 0 values and bugs with more than 0 values. > > > >> > > > >> data = readARFF("synapse.arff") > > > >> data2 = readARFF("synapse.arff") > > > >> data$bug > > > >> library(tidyverse) > > > >> data %>% > > > >> filter(bug == 0) > > > >> data2 %>% > > > >> filter(bug >= 1) > > > >> boxplot(data2$bug, data$bug, range=0) > > > >> > > > >> But both the graphs are exactly the same, how is it possible? Where > I am > > > >> doing wrong? > > > >> > > > >> > > > >> data$bug > > > >> [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 > 0 0 > > > 0 0 0 > > > >> 0 4 1 0 > > > >> [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 > 1 1 > > > 1 0 0 > > > >> 0 0 0 0 > > > >> [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 > 0 0 > > > 0 0 0 > > > >> 7 0 0 1 > > > >> [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 > 0 0 > > > 0 0 > > > >> 0 1 0 0 > > > >> [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 > 4 1 > > > 1 0 > > > >> 0 0 0 1 > > > >> [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0 > > > >> > > > >> [[alternative HTML version deleted]] > > > >> > > > >> ______________________________________________ > > > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > >> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=NxfkBJHBnd8naYPQTd9Z8dZ2m-RCwh_lpGvHVQ8MwYQ&e= > > > >> PLEASE do read the posting guide > > > >> > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=exznSElUW1tc6ajt0C8uw5cR8ZqwHRD6tUPAarFYdYo&e= > > > >> and provide commented, minimal, self-contained, reproducible code. > > > >> > > > > > > > > [[alternative HTML version deleted]] > > > > > > > > ______________________________________________ > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.