works great thanks. And you cut off my code a lot and removed the loop.
David Biau >________________________________ > De : Uwe Ligges <lig...@statistik.tu-dortmund.de> >À : Biau David <djmb...@yahoo.fr> >Cc : arun <smartpink...@yahoo.com>; r help list <r-help@r-project.org> >Envoyé le : Dimanche 13 janvier 2013 18h22 >Objet : Re: [R] extracting character values > > > >On 13.01.2013 18:02, Biau David wrote: >> OK, >> >> here is a minimal working example: >> >> au1 <- c('biau dj', 'jones kb', 'van den hoofs j', ' biau dj', 'biau dj', >> 'campagna r', 'biau dj', 'weiss kr', 'verdegaal sh', 'riad s') >> au2 <- c('weiss kr', 'ferguson pc', ' greidanus nv', ' porcher r', 'ferguson >> pc', 'pessis e', 'leclerc p', 'biau dj', 'bovee jv', 'biau d') >> au3 <- c('bhumbra rs', 'lam b', 'garbuz ds', NA, 'chung p', ' biau dj', >> 'marmor s', 'bhumbra r', 'pansuriya tc', NA) >> >> netw <- data.frame(au1, au2, au3) >> res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2])) >> >> for (i in 1:dim(netw)[2]) >> { >> wh <- regexpr('[a-z]{3,}', as.character(netw[,i])) >> res[i] <- substring(as.character(netw[,i]), wh, wh + >> attr(wh,'match.length')-1) >> } > > >There may be an easier solution, but this should do: > >res <- data.frame(lapply(netw, > function(x) > gsub("^ *([[:alpha:] ]*) +[[:alpha:]]+$", "\\1", x))) > >Uwe Ligges > > > > >> problem is for author "van den hoofs j" who is only retrieved as 'van' >> >> thanks, >> >> >> David Biau >> >> >>> ________________________________ >>> De : arun <smartpink...@yahoo.com> >>> À : Biau David <djmb...@yahoo.fr> >>> Envoyé le : Dimanche 13 janvier 2013 17h38 >>> Objet : Re: [R] extracting character values >>> >>> HI, >>> >>> >>> res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2])) >>> #Error in matrix(NA, nrow = dim(netw)[1], ncol = dim(netw)[2]) : >>> # object 'netw' not found >>> Can you provide an example dataset of netw? >>> Thanks. >>> A.K. >>> >>> >>> >>> ----- Original Message ----- >>> From: Biau David <djmb...@yahoo.fr> >>> To: r help list <r-help@r-project.org> >>> Cc: >>> Sent: Sunday, January 13, 2013 3:53 AM >>> Subject: [R] extracting character values >>> >>> Dear all, >>> >>> I have a dataframe of names (netw), with each cell including last name and >>> initials of an author; some cells have NA. I would like to extract only the >>> last name from each cell; this new dataframe is calle 'res' >>> >>> >>> Here is what I do: >>> >>> res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2])) >>> >>> for (i in 1:x) >>> { >>> wh <- regexpr('[a-z]{3,}', as.character(netw[,i])) >>> res[i] <- substring(as.character(netw[,i]), wh, wh + >>> attr(wh,'match.length')-1) >>> } >>> >>> >>> the problem is that I cannot manage to extract 'complex' names properly >>> such as ' van der hoops bf ': here I only get 'van', the real last name is >>> 'van der hoops' and 'bf' are the initials. Basically the last name has >>> always a minimum of 3 consecutive letters, but may have 3 or more letters >>> separated by one or more space; the cell may start by a space too; initials >>> never have more than 2 letters. >>> >>> Someone would have a nice idea for that? Thanks, >>> >>> >>> David >>> >>> [[alternative HTML version deleted]] >>> >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> >>> >> [[alternative HTML version deleted]] >> >> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.