> On Feb 28, 2017, at 8:36 AM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > > For tasks like this, you will probably want to make sure to import the data > as character data rather than as a factor. E.g. > > dat <- read.csv( "myfile.csv", header=FALSE, as.is=TRUE ) > > You can check what you have with the str() function.
Jeff, Narrowly, for this particular task, that is not relevant. gsub() and family use as.character() internally to coerce a factor to character and will work just fine: text <- factor(c("BOEING CO","ENGMANTAYLOR CO","SAGINAW COUNTY INC")) > text [1] BOEING CO ENGMANTAYLOR CO SAGINAW COUNTY INC Levels: BOEING CO ENGMANTAYLOR CO SAGINAW COUNTY INC > gsub(" CO$", "", text) [1] "BOEING" "ENGMANTAYLOR" "SAGINAW COUNTY INC" Using 'as.is' becomes more a personal preference issue beyond this. Regards, Marc > -- > Sent from my phone. Please excuse my brevity. > > On February 28, 2017 5:19:40 AM PST, Marc Schwartz <marc_schwa...@me.com> > wrote: >> >>> On Feb 28, 2017, at 3:38 AM, Harshal Athawale >> <pgcim15.hars...@spjimr.org> wrote: >>> >>> I am new in R. >>> >>> I have a file. This file contains name of the companies. >>> 'data.frame': 494 obs. of 1 variable: >>> $ V1: Factor w/ 470 levels "3-d engineering corp",..: 293 134 339 359 >> 143 >>> 399 122 447 398 384 ... >>> >>> Problem: I would like to remove "CO" (As it is the most frequent >> word). I >>> would like "CO" to removed from BOEING CO --> BOEING but not from >> SAGINAW >>> *CO*UNTY INC*. * >>> >>>> text = c("BOEING CO","ENGMANTAYLOR CO","SAGINAW COUNTY INC") >>> >>>> gsub(x = text, pattern = "CO", replacement = "") >>> >>> [1] "BOEING " "ENGMANTAYLOR " "SAGINAW UNTY" >>> >>> Thanks in advance. >>> >>> - Sam >> >> >> Hi, >> >> See ?regex and ?grep for some details and examples on how to construct >> the expression used for matching, as well as some of the references >> therein. >> >> In this case, you want to use something along the lines of: >> >>> gsub(" CO$", "", text) >> [1] "BOEING" "ENGMANTAYLOR" "SAGINAW COUNTY INC" >> >> where the "CO" is preceded by a space and followed by the "$", which is >> a special character that indicates the end of the string to be matched. >> >> Regards, >> >> Marc Schwartz >> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.