Thank you David. What about if I want to list the excluded rows? I used this (dat3 <- dat1[unique(c(BadName, BadAge, BadWeight)), ])
It did not work.The desired output is, Alex, 20, 13X John, 3BC, 175 Jack3, 34, 140 Thank you, On Sat, Jan 29, 2022 at 10:15 PM David Carlson <dcarl...@tamu.edu> wrote: > It is possible that there would be errors on the same row for different > columns. This does not happen in your example. If row 4 was "John6, 3BC, > 175X" then row 4 would be included 3 times, but we only need to remove it > once. Removing the duplicates is not necessary since R would not get > confused, but length(unique(c(BadName, BadAge, BadWeight)) indicates how > many lines are being removed. > > David > > On Sat, Jan 29, 2022 at 8:32 PM Val <valkr...@gmail.com> wrote: > >> Thank you David for your help. I just have one question on this. What is >> the purpose of using the "unique" function on this? (dat2 <- >> dat1[-unique(c(BadName, BadAge, BadWeight)), ]) I got the same result >> without using it. ZjQcmQRYFpfptBannerStart >> This Message Is From an External Sender >> This message came from outside your organization. >> ZjQcmQRYFpfptBannerEnd >> Thank you David for your help. >> >> I just have one question on this. What is the purpose of using the >> "unique" function on this? >> (dat2 <- dat1[-unique(c(BadName, BadAge, BadWeight)), ]) >> >> I got the same result without using it. >> (dat2 <- dat1[-(c(BadName, BadAge, BadWeight)), ]) >> >> My concern is when I am applying this for the large data set the >> "unique" function may consume resources(time and memory). >> >> Thank you. >> >> On Sat, Jan 29, 2022 at 12:30 AM David Carlson <dcarl...@tamu.edu> wrote: >> >>> Given that you know which columns should be numeric and which should be >>> character, finding characters in numeric columns or numbers in character >>> columns is not difficult. Your data frame consists of three character >>> columns so you can use regular expressions as Bert mentioned. First you >>> should strip the whitespace out of your data: >>> >>> dat1 <-read.table(text="Name, Age, Weight >>> Alex, 20, 13X >>> Bob, 25, 142 >>> Carol, 24, 120 >>> John, 3BC, 175 >>> Katy, 35, 160 >>> Jack3, 34, 140",sep=",", header=TRUE, stringsAsFactors=FALSE, >>> strip.white=TRUE) >>> >>> Now check to see if all of the fields are character as expected. >>> >>> sapply(dat1, typeof) >>> # Name Age Weight >>> # "character" "character" "character" >>> >>> Now identify character variables containing numbers and numeric >>> variables containing characters: >>> >>> BadName <- which(grepl("[[:digit:]]", dat1$Name)) >>> BadAge <- which(grepl("[[:alpha:]]", dat1$Age)) >>> BadWeight <- which(grepl("[[:alpha:]]", dat1$Weight)) >>> >>> Next remove those rows: >>> >>> (dat2 <- dat1[-unique(c(BadName, BadAge, BadWeight)), ]) >>> # Name Age Weight >>> # 2 Bob 25 142 >>> # 3 Carol 24 120 >>> # 5 Katy 35 160 >>> >>> You still need to convert Age and Weight to numeric, e.g. dat2$Age <- >>> as.numeric(dat2$Age). >>> >>> David Carlson >>> >>> >>> On Fri, Jan 28, 2022 at 11:59 PM Bert Gunter <bgunter.4...@gmail.com> >>> wrote: >>> >>>> As character 'polluted' entries will cause a column to be read in (via >>>> read.table and relatives) as factor or character data, this sounds like a >>>> job for regular expressions. If you are not familiar with this subject, >>>> time to learn. And, yes, ZjQcmQRYFpfptBannerStart >>>> This Message Is From an External Sender >>>> This message came from outside your organization. >>>> ZjQcmQRYFpfptBannerEnd >>>> >>>> As character 'polluted' entries will cause a column to be read in (via >>>> read.table and relatives) as factor or character data, this sounds like a >>>> job for regular expressions. If you are not familiar with this subject, >>>> time to learn. And, yes, some heavy lifting will be required. >>>> See ?regexp for a start maybe? Or the stringr package? >>>> >>>> Cheers, >>>> Bert >>>> >>>> >>>> >>>> >>>> On Fri, Jan 28, 2022, 7:08 PM Val <valkr...@gmail.com> wrote: >>>> >>>> > Hi All, >>>> > >>>> > I want to remove rows that contain a character string in an integer >>>> > column or a digit in a character column. >>>> > >>>> > Sample data >>>> > >>>> > dat1 <-read.table(text="Name, Age, Weight >>>> > Alex, 20, 13X >>>> > Bob, 25, 142 >>>> > Carol, 24, 120 >>>> > John, 3BC, 175 >>>> > Katy, 35, 160 >>>> > Jack3, 34, 140",sep=",",header=TRUE,stringsAsFactors=F) >>>> > >>>> > If the Age/Weight column contains any character(s) then remove >>>> > if the Name column contains an digit then remove that row >>>> > Desired output >>>> > >>>> > Name Age weight >>>> > 1 Bob 25 142 >>>> > 2 Carol 24 120 >>>> > 3 Katy 35 160 >>>> > >>>> > Thank you, >>>> > >>>> > ______________________________________________ >>>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVXhZB_0c$ >>>> > PLEASE do read the posting guide >>>> > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVRmZSfcI$ >>>> > and provide commented, minimal, self-contained, reproducible code. >>>> > >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________r-h...@r-project.org mailing >>>> list -- To UNSUBSCRIBE and more, >>>> seehttps://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVXhZB_0c$ >>>> PLEASE do read the posting guide >>>> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVRmZSfcI$ >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.