thanks too. It works also perfect. Not sure I understand all the code though: 
will have to look into it!


 
David Biau


>________________________________
> De : arun <smartpink...@yahoo.com>
>À : Biau David <djmb...@yahoo.fr> 
>Cc : R help <r-help@r-project.org>; Uwe Ligges 
><lig...@statistik.tu-dortmund.de> 
>Envoyé le : Dimanche 13 janvier 2013 18h36
>Objet : Re: [R] extracting character values
> 
>Hi,
>This should also work:
>do.call(data.frame,lapply(netw,function(x) gsub("^ *(\\D+) \\w+$","\\1",x)))
>A.K.
>
>
>
>
>
>________________________________
>From: Biau David <djmb...@yahoo.fr>
>To: arun <smartpink...@yahoo.com>; r help list <r-help@r-project.org> 
>Sent: Sunday, January 13, 2013 12:02 PM
>Subject: Re: [R] extracting character values
>
>
>OK,
>
>here is a minimal working example:
>
>au1 <- c('biau dj', 'jones kb', 'van den hoofs j', ' biau dj', 'biau dj', 
>'campagna r', 'biau dj', 'weiss kr', 'verdegaal sh', 'riad s')
>au2 <- c('weiss kr', 'ferguson pc', ' greidanus nv', ' porcher r', 'ferguson 
>pc', 'pessis e', 'leclerc p', 'biau dj', 'bovee jv', 'biau d')
>au3 <- c('bhumbra rs', 'lam b', 'garbuz ds', NA, 'chung p', ' biau dj', 
>'marmor s', 'bhumbra r', 'pansuriya tc', NA)
>
>netw <- data.frame(au1, au2, au3)
>res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))
>
>for (i in 1:dim(netw)[2])
>{
>wh <- regexpr('[a-z]{3,}', as.character(netw[,i]))
>res[i] <- substring(as.character(netw[,i]), wh, wh + attr(wh,'match.length')-1)
>}
>
> problem is for author "van den hoofs j" who is only retrieved as 'van'
>
>thanks,
>
>
>David Biau
>
>
>>________________________________
>> De : arun <smartpink...@yahoo.com>
>>À : Biau David <djmb...@yahoo.fr> 
>>Envoyé le : Dimanche 13 janvier 2013 17h38
>>Objet : Re: [R] extracting character values
>> 
>>HI,
>>
>>
>> res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))
>>#Error in matrix(NA, nrow = dim(netw)[1], ncol = dim(netw)[2]) : 
>> # object 'netw' not found
>>Can you provide an example dataset of netw?
>>Thanks.
>>A.K.
>>
>>
>>
>>----- Original Message -----
>>From: Biau David <djmb...@yahoo.fr>
>>To: r help list <r-help@r-project.org>
>>Cc: 
>>Sent: Sunday, January 13, 2013 3:53 AM
>>Subject: [R] extracting character values
>>
>>Dear all,
>>
>>I have a dataframe of names (netw), with each cell including last name and 
>>initials of an author; some cells have NA. I would like to extract only the 
>>last name from each cell; this new dataframe is calle 'res'
>>
>>
>>Here is what I do:
>>
>>res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))
>>
>>for (i in 1:x)
>>{
>>wh <- regexpr('[a-z]{3,}', as.character(netw[,i]))
>>res[i] <- substring(as.character(netw[,i]), wh, wh + 
>>attr(wh,'match.length')-1)
>>}
>>
>> 
>>the problem is that I cannot manage to extract 'complex' names properly such 
>>as ' van der hoops bf  ': here I only get 'van', the real last name is
>'van der hoops' and 'bf' are the initials. Basically the last name has always 
>a minimum of 3 consecutive letters, but may have 3 or more letters separated 
>by one or more space; the cell may start by a space too; initials never have 
>more than 2 letters.
>>
>>Someone would have a nice idea for that? Thanks,
>>
>>
>>David
>>
>>    [[alternative HTML version deleted]]
>>
>>
>>______________________________________________
>>R-help@r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>
>
>
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to