On 13.01.2013 18:02, Biau David wrote:
OK,

here is a minimal working example:

au1 <- c('biau dj', 'jones kb', 'van den hoofs j', ' biau dj', 'biau dj', 
'campagna r', 'biau dj', 'weiss kr', 'verdegaal sh', 'riad s')
au2 <- c('weiss kr', 'ferguson pc', ' greidanus nv', ' porcher r', 'ferguson 
pc', 'pessis e', 'leclerc p', 'biau dj', 'bovee jv', 'biau d')
au3 <- c('bhumbra rs', 'lam b', 'garbuz ds', NA, 'chung p', ' biau dj', 'marmor 
s', 'bhumbra r', 'pansuriya tc', NA)

netw <- data.frame(au1, au2, au3)
res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))

for (i in 1:dim(netw)[2])
{
wh <- regexpr('[a-z]{3,}', as.character(netw[,i]))
res[i] <- substring(as.character(netw[,i]), wh, wh + attr(wh,'match.length')-1)
}


There may be an easier solution, but this should do:

res <- data.frame(lapply(netw,
     function(x)
       gsub("^ *([[:alpha:] ]*) +[[:alpha:]]+$", "\\1", x)))

Uwe Ligges




  problem is for author "van den hoofs j" who is only retrieved as 'van'

thanks,


David Biau


________________________________
De : arun <smartpink...@yahoo.com>
À : Biau David <djmb...@yahoo.fr>
Envoyé le : Dimanche 13 janvier 2013 17h38
Objet : Re: [R] extracting character values

HI,


  res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))
#Error in matrix(NA, nrow = dim(netw)[1], ncol = dim(netw)[2]) :
  # object 'netw' not found
Can you provide an example dataset of netw?
Thanks.
A.K.



----- Original Message -----
From: Biau David <djmb...@yahoo.fr>
To: r help list <r-help@r-project.org>
Cc:
Sent: Sunday, January 13, 2013 3:53 AM
Subject: [R] extracting character values

Dear all,

I have a dataframe of names (netw), with each cell including last name and 
initials of an author; some cells have NA. I would like to extract only the 
last name from each cell; this new dataframe is calle 'res'


Here is what I do:

res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))

for (i in 1:x)
{
wh <- regexpr('[a-z]{3,}', as.character(netw[,i]))
res[i] <- substring(as.character(netw[,i]), wh, wh + attr(wh,'match.length')-1)
}


the problem is that I cannot manage to extract 'complex' names properly such as 
' van der hoops bf  ': here I only get 'van', the real last name is 'van der 
hoops' and 'bf' are the initials. Basically the last name has always a minimum 
of 3 consecutive letters, but may have 3 or more letters separated by one or 
more space; the cell may start by a space too; initials never have more than 2 
letters.

Someone would have a nice idea for that? Thanks,


David

     [[alternative HTML version deleted]]


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




        [[alternative HTML version deleted]]



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to