thanks too. It works also perfect. Not sure I understand all the code though: will have to look into it!
David Biau >________________________________ > De : arun <smartpink...@yahoo.com> >À : Biau David <djmb...@yahoo.fr> >Cc : R help <r-help@r-project.org>; Uwe Ligges ><lig...@statistik.tu-dortmund.de> >Envoyé le : Dimanche 13 janvier 2013 18h36 >Objet : Re: [R] extracting character values > >Hi, >This should also work: >do.call(data.frame,lapply(netw,function(x) gsub("^ *(\\D+) \\w+$","\\1",x))) >A.K. > > > > > >________________________________ >From: Biau David <djmb...@yahoo.fr> >To: arun <smartpink...@yahoo.com>; r help list <r-help@r-project.org> >Sent: Sunday, January 13, 2013 12:02 PM >Subject: Re: [R] extracting character values > > >OK, > >here is a minimal working example: > >au1 <- c('biau dj', 'jones kb', 'van den hoofs j', ' biau dj', 'biau dj', >'campagna r', 'biau dj', 'weiss kr', 'verdegaal sh', 'riad s') >au2 <- c('weiss kr', 'ferguson pc', ' greidanus nv', ' porcher r', 'ferguson >pc', 'pessis e', 'leclerc p', 'biau dj', 'bovee jv', 'biau d') >au3 <- c('bhumbra rs', 'lam b', 'garbuz ds', NA, 'chung p', ' biau dj', >'marmor s', 'bhumbra r', 'pansuriya tc', NA) > >netw <- data.frame(au1, au2, au3) >res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2])) > >for (i in 1:dim(netw)[2]) >{ >wh <- regexpr('[a-z]{3,}', as.character(netw[,i])) >res[i] <- substring(as.character(netw[,i]), wh, wh + attr(wh,'match.length')-1) >} > > problem is for author "van den hoofs j" who is only retrieved as 'van' > >thanks, > > >David Biau > > >>________________________________ >> De : arun <smartpink...@yahoo.com> >>À : Biau David <djmb...@yahoo.fr> >>Envoyé le : Dimanche 13 janvier 2013 17h38 >>Objet : Re: [R] extracting character values >> >>HI, >> >> >> res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2])) >>#Error in matrix(NA, nrow = dim(netw)[1], ncol = dim(netw)[2]) : >> # object 'netw' not found >>Can you provide an example dataset of netw? >>Thanks. >>A.K. >> >> >> >>----- Original Message ----- >>From: Biau David <djmb...@yahoo.fr> >>To: r help list <r-help@r-project.org> >>Cc: >>Sent: Sunday, January 13, 2013 3:53 AM >>Subject: [R] extracting character values >> >>Dear all, >> >>I have a dataframe of names (netw), with each cell including last name and >>initials of an author; some cells have NA. I would like to extract only the >>last name from each cell; this new dataframe is calle 'res' >> >> >>Here is what I do: >> >>res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2])) >> >>for (i in 1:x) >>{ >>wh <- regexpr('[a-z]{3,}', as.character(netw[,i])) >>res[i] <- substring(as.character(netw[,i]), wh, wh + >>attr(wh,'match.length')-1) >>} >> >> >>the problem is that I cannot manage to extract 'complex' names properly such >>as ' van der hoops bf ': here I only get 'van', the real last name is >'van der hoops' and 'bf' are the initials. Basically the last name has always >a minimum of 3 consecutive letters, but may have 3 or more letters separated >by one or more space; the cell may start by a space too; initials never have >more than 2 letters. >> >>Someone would have a nice idea for that? Thanks, >> >> >>David >> >> [[alternative HTML version deleted]] >> >> >>______________________________________________ >>R-help@r-project.org mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide http://www.r-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. >> >> >> >> > > > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.