Hi, On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewa...@gmail.com> wrote: > Hi all, > > I have a data frame with huge rows and columns. > > When I looked at the data, it has several garbage values need to be > > cleaned. For a sample I am showing you the frequency distribution > of one variables > > Var1 Freq > 1 : 3 > 2 ] 6 > 3 MSN 1040 > 4 YYZ 300 > 5 \\ 4 > 6 + 3 > 7. ?> 15
Please use dput() to provide your data. I made a guess at what you had in R, but could be wrong. > and continues. > > I want to keep those rows that contain only a valid variable value > > In this case MSN and YYZ. I tried the following > > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* > > but I am not getting the desired result. What are you getting? How does it differ from the desired result? > I have > > Any help or idea? I get: > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", "\\\\", + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names = c("X", + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L)) > > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > test X Var1 Freq 3 3 MSN 1040 4 4 YYZ 300 Which seems reasonable to me. > > [[alternative HTML version deleted]] Please don't post in HTML either: it introduces all sorts of errors to your message. Sarah -- Sarah Goslee http://www.functionaldiversity.org ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.