On 13.01.2013 09:53, Biau David wrote:
Dear all, I have a dataframe of names (netw), with each cell including last name and initials of an author; some cells have NA. I would like to extract only the last name from each cell; this new dataframe is calle 'res' Here is what I do: res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2])) for (i in 1:x) { wh <- regexpr('[a-z]{3,}', as.character(netw[,i])) res[i] <- substring(as.character(netw[,i]), wh, wh + attr(wh,'match.length')-1) } the problem is that I cannot manage to extract 'complex' names properly such as ' van der hoops bf ': here I only get 'van', the real last name is 'van der hoops' and 'bf' are the initials. Basically the last name has always a minimum of 3 consecutive letters, but may have 3 or more letters separated by one or more space; the cell may start by a space too; initials never have more than 2 letters. Someone would have a nice idea for that? Thanks,
Maybe some poeple will, but an example of your data will actually help them to help.
Your code is not reproducible without providing the netw object. Best, Uwe Ligges
David [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.