> On Sep 7, 2015, at 1:20 PM, Jon BR <jonsle...@gmail.com> wrote: > > Hi John, > Thanks for the reply; I'm pasting here the output from dput, with a > 'df <-' added in front: > > df <- structure(list(rowNum = c(1, 2, 3), first = structure(c(NA, 1L, > 2L), .Label = c("AD=2;BA=8", "AD=9;BA=1"), class = "factor"), > second = structure(c(2L, 1L, NA), .Label = c("AD=1;BA=2", > "AD=13;BA=49"), class = "factor")), .Names = c("rowNum", > "first", "second"), row.names = c(NA, -3L), class = "data.frame") > > > > > To add more specifics, about what I would like; each value to be adjusted > has the following general format: > > "AD=X;BA=Y" > > I would like to extract the values of X and Y and format them as a string > as such: > > "X_X-Y" > > > Here's how I would handle a specific instance using awk in a shell script: > > echo "AD=X;BA=Y" | awk '{split($1,a,"AD="); split(a[2],b,";"); > split(b[2],c,"BA="); print b[1]"_"b[1]"-"c[2]}' > X_X-Y > > I'd like this to apply for all the entries that aren't NA to the right of > column 1.
df[2:3] <- lapply(df[2:3], sub, patt="(AD\\=)(.+)(;BA\\=)(.+)”, repl="\\2_\\2-\\4” ) > df rowNum first second 1 1 <NA> 13_13-49 2 2 2_2-8 1_1-2 3 3 9_9-1 <NA> > > Hoping this adds clarity for any others who also didn't follow my example. > > Thanks in advance for any tips- > > Best, > Jonathan > > On Mon, Sep 7, 2015 at 3:48 PM, John Kane <jrkrid...@inbox.com> wrote: > >> I'm not making a lot of sense of the data, it looks like you want more >> recodes than you have mentioned but in any case you might want to look at >> the recode function in the car package. It "should" do what you want >> thought there may be faster ways to do it. >> >> BTW, for supplying sample data have a look at ?dput . Using dput() means >> that we see exactly the same data as you do. >> >> Sorry not to be of more help >> John Kane >> Kingston ON Canada >> >> >>> -----Original Message----- >>> From: jonsle...@gmail.com >>> Sent: Mon, 7 Sep 2015 15:27:05 -0400 >>> To: r-help@r-project.org >>> Subject: [R] Reformatting text inside a data frame >>> >>> Hi all, >>> I've read in a large data frame that has formatting similar to the >>> one >>> in the small example below: >>> >>> df <- >>> >> data.frame(c(1,2,3),c(NA,"AD=2;BA=8","AD=9;BA=1"),c("AD=13;BA=49","AD=1;BA=2",NA)); >>> names(df) <- c("rowNum","first","second") >>> >>>> df >>> rowNum first second >>> 1 1 <NA> AD=13;BA=49 >>> 2 2 AD=2;BA=8 AD=1;BA=2 >>> 3 3 AD=9;BA=1 <NA> >>> >>> >>> I'd like to reformat all of the non-NA entries in df from "first" and >>> "second" and so-on such that "AD=13;BA=49" will be replaced by the >>> following string: "13_13-49". >>> >>> So applied to df, the output would be the following: >>> >>> rowNum first second >>> 1 1 <NA> 13_13-49 >>> 2 2 2_2-8 1_1-2 >>> 3 3 9_9-1 <NA> >>> >>> >>> I'm generally a big proponent of shell scripting with awk, but I'd prefer >>> an all-R solution if one exists (and also to learn how to do this more >>> generally). >>> >>> Could someone point out an appropriate paradigm or otherwise point me in >>> the right direction? >>> >>> Best, >>> Jonathan >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ____________________________________________________________ >> FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! >> Check it out at http://www.inbox.com/earth >> >> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.