Re: [R] Character (1a, 1b) to numeric

Fox, John Fri, 10 Jul 2020 13:05:46 -0700

Hi,

We've had several solutions, and I was curious about their relative efficiency. 
Here's a test with a moderately large data vector:


> library("microbenchmark")
> set.seed(123) # for reproducibility
> x <- sample(xc, 1e4, replace=TRUE) # "data"
> microbenchmark(John = John <- xn[x], 
+                Rich = Rich <- xn[match(x, xc)], 
+                Jeff = Jeff <- {
+                 n <- as.integer( sub( "[a-i]$", "", x ) )
+                 d <- match( sub( "^\\d+", "", x ), letters[1:9] )
+                 d[ is.na( d ) ] <- 0
+                 n + d / 10
+                 },
+                David = David <- as.numeric(gsub("a", ".3", 
+                                      gsub("b", ".5", 
+                                           gsub("c", ".7", x)))),
+                times=1000L
+                )
Unit: microseconds
  expr       min        lq       mean     median         uq       max neval cld
  John   228.816   345.371   513.5614   503.5965   533.0635  10829.08  1000 a  
  Rich   217.395   343.035   534.2074   489.0075   518.3260  15388.96  1000 a  
  Jeff 10325.471 13070.737 15387.2545 15397.9790 17204.0115 153486.94  1000  b 
 David 14256.673 18148.492 20185.7156 20170.3635 22067.6690  34998.95  1000   c
> all.equal(John, Rich)
[1] TRUE
> all.equal(John, David)
[1] "names for target but not for current"
> all.equal(John, Jeff)
[1] "names for target but not for current" "Mean relative difference: 
0.1498243" 

Of course, efficiency isn't the only consideration, and aesthetically (and no 
doubt subjectively) I prefer Rich Heiberger's solution. OTOH, Jeff's solution 
is more general in that it generates the correspondence between letters and 
numbers. The argument for Jeff's solution would, however, be stronger if it 
gave the desired answer.

Best,
 John

> On Jul 10, 2020, at 3:28 PM, David Carlson <dcarl...@tamu.edu> wrote:
> 
> Here is a different approach:
> 
> xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc))))
> xn
> # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7
> 
> David L Carlson
> Professor Emeritus of Anthropology
> Texas A&M University
> 
> On Fri, Jul 10, 2020 at 1:10 PM Fox, John <j...@mcmaster.ca> wrote:
> Dear Jean-Louis,
> 
> There must be many ways to do this. Here's one simple way (with no claim of 
> optimality!):
> 
> > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > 
> > set.seed(123) # for reproducibility
> > x <- sample(xc, 20, replace=TRUE) # "data"
> > 
> > names(xn) <- xc
> > z <- xn[x]
> > 
> > data.frame(z, x)
>      z  x
> 1  2.5 2b
> 2  2.5 2b
> 3  1.5 1b
> 4  2.3 2a
> 5  1.5 1b
> 6  1.3 1a
> 7  1.3 1a
> 8  2.3 2a
> 9  1.5 1b
> 10 2.0  2
> 11 1.7 1c
> 12 2.3 2a
> 13 2.3 2a
> 14 1.0  1
> 15 1.3 1a
> 16 1.5 1b
> 17 2.7 2c
> 18 2.0  2
> 19 1.5 1b
> 20 1.5 1b
> 
> I hope this helps,
>  John
> 
>   -----------------------------
>   John Fox, Professor Emeritus
>   McMaster University
>   Hamilton, Ontario, Canada
>   Web: http::/socserv.mcmaster.ca/jfox
> 
> > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <abit...@sent.com> wrote:
> > 
> > Dear All
> > 
> > I have a character vector,  representing histology stages, such as for 
> > example:
> > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > 
> > and this goes on to 3, 3a etc in various order for each patient. I do have 
> > of course a pre-established  classification available which does change 
> > according to the histology criteria under assessment.
> > 
> > I would want to convert xc, for plotting reasons, to a numeric vector such 
> > as
> > 
> > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > 
> > Unfortunately I have no clue on how to do that.
> > 
> > Thanks for any help and apologies if I am missing the obvious way to do it.
> > 
> > JL
> > -- 
> > Verif30042020
> > 
> > ______________________________________________
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> >  
> > PLEASE do read the posting guide 
> > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> >  
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
>  
> PLEASE do read the posting guide 
> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
>  
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Character (1a, 1b) to numeric

Reply via email to