Rui: You made my day! -- or at least considerably improved it. Your solution was clever and clear. IMHO, it is also a terrific example of why one should expend the effort to really learn the core features of the language before plunging into packages with alternative paradigms. (But lots of wise folks will disagree, so let's not debate that and just consider me a luddite if you like).
A minor tweak would be to add punctuation characters to the regex's: > dig <- '[[:digit:][:punct:]]' ; nondig <- '[[:alpha:][:punct:]]' > mapply(\(r,x)grepl(r,x),list(dig, nondig, nondig), dat1) This of course would need to be modified for numeric columns with '.' or ',' as a decimal separator. Most examples I've seen were of contamination by a particular character or two (like ',' )) for numeric entries, which could be easily handled of course. As usual, one of the virtues of a nice solution like yours is that it can easily be generalized, say to the case of a data frame with 100's of columns. One just has to be a bit careful about details. A usual 'gotcha' will be to ensure that factor columns are read in or converted to character. Another is that you need to first remove any non-character -- typically non-polluted numeric -- columns from the data frame. This can be done by something like: dat <- dat[, sapply(dat, is.character)] Anyway, with those caveats and perhaps others that I either haven't thought of or may be data-specific, here is an example that illustrates how nicely your approach extends. I'll start from the OP's dat1 example. dat1 <-read.table(text="Name, Age, Weight Alex, 20, 13X Bob, 25, 142 Carol, 24, 120 John, 3BC, 175 Katy, 35, 160 Jack3, 34, 140",sep=",",header=TRUE,stringsAsFactors=F) ## now enlarge the table and add a gender column which should contain only upper or lower case 'm','f', 'o' ## but which I have corrupted with some 'g's (typos) set.seed(9901) genderAbb <- c('M','F','O','m','f','o','g') gender <- sample(genderAbb, 24,dim rep = TRUE) dat1 <- cbind(dat1[rep(1:6,4),], Gender = gender ) head(dat1, 8) Name Age Weight Gender 1 Alex 20 13X O 2 Bob 25 142 M 3 Carol 24 120 o 4 John 3BC 175 o 5 Katy 35 160 f 6 Jack3 34 140 g 1.1 Alex 20 13X M 2.1 Bob 25 142 f ## Now create a list of the different target 'types' for columns. ## Note that these types are user-created categories, not R data types. ## So one can use whatever names one wants. ## Or could use numeric values -- but that obfuscates the meaning and increases the risk of error, imo. type <- c('char', 'int', 'gend') ## obvious ## Now, using your idea, determine the regex's that identify bad entries for each type, badpat <- list( char = '[[:punct:][:digit:]]', ## added stray punctuation int = '[[:punct:][:alpha:]]', ## ditto gend = '[^MFOmfo]' ) ## the only gender abbreviations that will be accepted. ## The initial '^' is the regex symbol for 'anything *but* these in character classes ## Now identify what type of data each column should contain. This is the part that could be tedious ## for many columns, but I see no way of avoiding it. A smarter UI than I give would help! target_type <- c('char','int','int','gend') ## and create the corresponding list of regex patterns to use for mapply() target_pat <- badpat[target_type] ## Now do the Barradas trick result <- mapply(\(pat,x)if(is.character(x))grepl(pat, x) else rep(FALSE, NROW(x)), target_pat, dat1) head(result, 8) ## it's a matrix, not a data frame of course ## ... and then proceed as you showed. Cheers, Bert On Sat, Jan 29, 2022 at 12:46 AM Rui Barradas <ruipbarra...@sapo.pt> wrote: > > Hello, > > Getting creative, here is another way with mapply. > > > regex <- list("[[:digit:]]", "[[:alpha:]]", "[[:alpha:]]") > > i <- mapply(\(x, r) grepl(r, x), dat1, regex) > dat1[rowSums(i) == 0L, ] > > # Name Age Weight > #2 Bob 25 142 > #3 Carol 24 120 > #5 Katy 35 160 > > > Hope this helps, > > Rui Barradas > > > Às 06:30 de 29/01/2022, David Carlson via R-help escreveu: > > Given that you know which columns should be numeric and which should be > > character, finding characters in numeric columns or numbers in character > > columns is not difficult. Your data frame consists of three character > > columns so you can use regular expressions as Bert mentioned. First you > > should strip the whitespace out of your data: > > > > dat1 <-read.table(text="Name, Age, Weight > > Alex, 20, 13X > > Bob, 25, 142 > > Carol, 24, 120 > > John, 3BC, 175 > > Katy, 35, 160 > > Jack3, 34, 140",sep=",", header=TRUE, stringsAsFactors=FALSE, > > strip.white=TRUE) > > > > Now check to see if all of the fields are character as expected. > > > > sapply(dat1, typeof) > > # Name Age Weight > > # "character" "character" "character" > > > > Now identify character variables containing numbers and numeric variables > > containing characters: > > > > BadName <- which(grepl("[[:digit:]]", dat1$Name)) > > BadAge <- which(grepl("[[:alpha:]]", dat1$Age)) > > BadWeight <- which(grepl("[[:alpha:]]", dat1$Weight)) > > > > Next remove those rows: > > > > (dat2 <- dat1[-unique(c(BadName, BadAge, BadWeight)), ]) > > # Name Age Weight > > # 2 Bob 25 142 > > # 3 Carol 24 120 > > # 5 Katy 35 160 > > > > You still need to convert Age and Weight to numeric, e.g. dat2$Age <- > > as.numeric(dat2$Age). > > > > David Carlson > > > > > > On Fri, Jan 28, 2022 at 11:59 PM Bert Gunter <bgunter.4...@gmail.com> wrote: > > > >> As character 'polluted' entries will cause a column to be read in (via > >> read.table and relatives) as factor or character data, this sounds like a > >> job for regular expressions. If you are not familiar with this subject, > >> time to learn. And, yes, ZjQcmQRYFpfptBannerStart > >> This Message Is From an External Sender > >> This message came from outside your organization. > >> ZjQcmQRYFpfptBannerEnd > >> > >> As character 'polluted' entries will cause a column to be read in (via > >> read.table and relatives) as factor or character data, this sounds like a > >> job for regular expressions. If you are not familiar with this subject, > >> time to learn. And, yes, some heavy lifting will be required. > >> See ?regexp for a start maybe? Or the stringr package? > >> > >> Cheers, > >> Bert > >> > >> > >> > >> > >> On Fri, Jan 28, 2022, 7:08 PM Val <valkr...@gmail.com> wrote: > >> > >>> Hi All, > >>> > >>> I want to remove rows that contain a character string in an integer > >>> column or a digit in a character column. > >>> > >>> Sample data > >>> > >>> dat1 <-read.table(text="Name, Age, Weight > >>> Alex, 20, 13X > >>> Bob, 25, 142 > >>> Carol, 24, 120 > >>> John, 3BC, 175 > >>> Katy, 35, 160 > >>> Jack3, 34, 140",sep=",",header=TRUE,stringsAsFactors=F) > >>> > >>> If the Age/Weight column contains any character(s) then remove > >>> if the Name column contains an digit then remove that row > >>> Desired output > >>> > >>> Name Age weight > >>> 1 Bob 25 142 > >>> 2 Carol 24 120 > >>> 3 Katy 35 160 > >>> > >>> Thank you, > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVXhZB_0c$ > >>> PLEASE do read the posting guide > >>> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVRmZSfcI$ > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________r-h...@r-project.org mailing > >> list -- To UNSUBSCRIBE and more, > >> seehttps://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVXhZB_0c$ > >> PLEASE do read the posting guide > >> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVRmZSfcI$ > >> and provide commented, minimal, self-contained, reproducible code. > >> > >> > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.