On 13.01.2013 09:53, Biau David wrote:
Dear all,

I have a dataframe of names (netw), with each cell including last name and 
initials of an author; some cells have NA. I would like to extract only the 
last name from each cell; this new dataframe is calle 'res'


Here is what I do:

res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))

for (i in 1:x)
{
wh <- regexpr('[a-z]{3,}', as.character(netw[,i]))
res[i] <- substring(as.character(netw[,i]), wh, wh + attr(wh,'match.length')-1)
}


the problem is that I cannot manage to extract 'complex' names properly such as 
' van der hoops bf  ': here I only get 'van', the real last name is 'van der 
hoops' and 'bf' are the initials. Basically the last name has always a minimum 
of 3 consecutive letters, but may have 3 or more letters separated by one or 
more space; the cell may start by a space too; initials never have more than 2 
letters.

Someone would have a nice idea for that? Thanks,


Maybe some poeple will, but an example of your data will actually help them to help.

Your code is not reproducible without providing the netw object.

Best,
Uwe Ligges



David

        [[alternative HTML version deleted]]



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to