You need to add "-": ` (dat3 <- dat1[-unique(c(BadName, BadAge, BadWeight)), ])` which makes the command NOT).
David On Sun, Jan 30, 2022 at 11:00 AM Val <valkr...@gmail.com> wrote: > Thank you David. What about if I want to list the excluded rows? I used > this (dat3 <- dat1[unique(c(BadName, BadAge, BadWeight)), ]) It did not > work.The desired output is, Alex, 20, 13X John, 3BC, 175 Jack3, 34, > 140 ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > ZjQcmQRYFpfptBannerEnd > Thank you David. > > What about if I want to list the excluded rows? > I used this > (dat3 <- dat1[unique(c(BadName, BadAge, BadWeight)), ]) > > It did not work.The desired output is, > Alex, 20, 13X > John, 3BC, 175 > Jack3, 34, 140 > > Thank you, > > On Sat, Jan 29, 2022 at 10:15 PM David Carlson <dcarl...@tamu.edu> wrote: > >> It is possible that there would be errors on the same row for different >> columns. This does not happen in your example. If row 4 was "John6, 3BC, >> 175X" then row 4 would be included 3 times, but we only need to remove it >> once. Removing the duplicates is not necessary since R would not get >> confused, but length(unique(c(BadName, BadAge, BadWeight)) indicates how >> many lines are being removed. >> >> David >> >> On Sat, Jan 29, 2022 at 8:32 PM Val <valkr...@gmail.com> wrote: >> >>> Thank you David for your help. I just have one question on this. What is >>> the purpose of using the "unique" function on this? (dat2 <- >>> dat1[-unique(c(BadName, BadAge, BadWeight)), ]) I got the same result >>> without using it. ZjQcmQRYFpfptBannerStart >>> This Message Is From an External Sender >>> This message came from outside your organization. >>> ZjQcmQRYFpfptBannerEnd >>> Thank you David for your help. >>> >>> I just have one question on this. What is the purpose of using the >>> "unique" function on this? >>> (dat2 <- dat1[-unique(c(BadName, BadAge, BadWeight)), ]) >>> >>> I got the same result without using it. >>> (dat2 <- dat1[-(c(BadName, BadAge, BadWeight)), ]) >>> >>> My concern is when I am applying this for the large data set the >>> "unique" function may consume resources(time and memory). >>> >>> Thank you. >>> >>> On Sat, Jan 29, 2022 at 12:30 AM David Carlson <dcarl...@tamu.edu> >>> wrote: >>> >>>> Given that you know which columns should be numeric and which should be >>>> character, finding characters in numeric columns or numbers in character >>>> columns is not difficult. Your data frame consists of three character >>>> columns so you can use regular expressions as Bert mentioned. First >>>> you should strip the whitespace out of your data: >>>> >>>> dat1 <-read.table(text="Name, Age, Weight >>>> Alex, 20, 13X >>>> Bob, 25, 142 >>>> Carol, 24, 120 >>>> John, 3BC, 175 >>>> Katy, 35, 160 >>>> Jack3, 34, 140",sep=",", header=TRUE, stringsAsFactors=FALSE, >>>> strip.white=TRUE) >>>> >>>> Now check to see if all of the fields are character as expected. >>>> >>>> sapply(dat1, typeof) >>>> # Name Age Weight >>>> # "character" "character" "character" >>>> >>>> Now identify character variables containing numbers and numeric >>>> variables containing characters: >>>> >>>> BadName <- which(grepl("[[:digit:]]", dat1$Name)) >>>> BadAge <- which(grepl("[[:alpha:]]", dat1$Age)) >>>> BadWeight <- which(grepl("[[:alpha:]]", dat1$Weight)) >>>> >>>> Next remove those rows: >>>> >>>> (dat2 <- dat1[-unique(c(BadName, BadAge, BadWeight)), ]) >>>> # Name Age Weight >>>> # 2 Bob 25 142 >>>> # 3 Carol 24 120 >>>> # 5 Katy 35 160 >>>> >>>> You still need to convert Age and Weight to numeric, e.g. dat2$Age <- >>>> as.numeric(dat2$Age). >>>> >>>> David Carlson >>>> >>>> >>>> On Fri, Jan 28, 2022 at 11:59 PM Bert Gunter <bgunter.4...@gmail.com> >>>> wrote: >>>> >>>>> As character 'polluted' entries will cause a column to be read in (via >>>>> read.table and relatives) as factor or character data, this sounds like a >>>>> job for regular expressions. If you are not familiar with this subject, >>>>> time to learn. And, yes, ZjQcmQRYFpfptBannerStart >>>>> This Message Is From an External Sender >>>>> This message came from outside your organization. >>>>> ZjQcmQRYFpfptBannerEnd >>>>> >>>>> As character 'polluted' entries will cause a column to be read in (via >>>>> read.table and relatives) as factor or character data, this sounds like a >>>>> job for regular expressions. If you are not familiar with this subject, >>>>> time to learn. And, yes, some heavy lifting will be required. >>>>> See ?regexp for a start maybe? Or the stringr package? >>>>> >>>>> Cheers, >>>>> Bert >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Jan 28, 2022, 7:08 PM Val <valkr...@gmail.com> wrote: >>>>> >>>>> > Hi All, >>>>> > >>>>> > I want to remove rows that contain a character string in an integer >>>>> > column or a digit in a character column. >>>>> > >>>>> > Sample data >>>>> > >>>>> > dat1 <-read.table(text="Name, Age, Weight >>>>> > Alex, 20, 13X >>>>> > Bob, 25, 142 >>>>> > Carol, 24, 120 >>>>> > John, 3BC, 175 >>>>> > Katy, 35, 160 >>>>> > Jack3, 34, 140",sep=",",header=TRUE,stringsAsFactors=F) >>>>> > >>>>> > If the Age/Weight column contains any character(s) then remove >>>>> > if the Name column contains an digit then remove that row >>>>> > Desired output >>>>> > >>>>> > Name Age weight >>>>> > 1 Bob 25 142 >>>>> > 2 Carol 24 120 >>>>> > 3 Katy 35 160 >>>>> > >>>>> > Thank you, >>>>> > >>>>> > ______________________________________________ >>>>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVXhZB_0c$ >>>>> > PLEASE do read the posting guide >>>>> > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVRmZSfcI$ >>>>> > and provide commented, minimal, self-contained, reproducible code. >>>>> > >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________r-h...@r-project.org >>>>> mailing list -- To UNSUBSCRIBE and more, >>>>> seehttps://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVXhZB_0c$ >>>>> PLEASE do read the posting guide >>>>> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVRmZSfcI$ >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>>> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.