Hi all, I want to remove a row based on a condition in one of the variables from a data frame. When we split this string it should be composed of 3-2- 5 format (3 digits numeric, 2 characters and 5 digits numeric). Like area code -region-numeric. The max length of the area code should be 3, the max length of region be should be 2, followed by a max length of 5 numeric digits. The are code can be 1 digit, or 2 digits or 3 digits but not more than three digits. So the max length of this variable is 10. Anything outside of this pattern should be excluded. As an example
dat <-read.table(text=" rown varx 1 9F209 2 FL250 3 2F250 4 102250 5 102FL 6 102 7 1212FL250 8 121FL50",header=TRUE,stringsAsFactors=F) 1 9F209 # keep 2 FL250 # remove, no area code 3 2F250 # keep 4 102250 # remove , no region code 5 102FL # remove , no numeric after region code 6 102 # remove , no region code and numeric 7 1212FL250 #remove, area code is more than three digits 8 121FL50 # Keep The desired output should be 1 9F209 3 2F250 8 121FL50 How do I do this in an efficient way? Thank you in advance ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.