Re: [R] Locating the starting position of the first number in a string

Jeff Newmiller Mon, 02 Nov 2015 13:36:13 -0800

Also not answering your question directly, but may be provide some usefulideas or results:

library( gsubfn )


DF <- setNames( data.frame( t( strapply( ID

+                                        , "^[^_]+_([A-Z]+)_([A-Z]+)([0-9]+)$"
+                                        , c
+                                        , simplify=TRUE
+                                        )
+                              )
+                           , stringsAsFactors = FALSE
+                           )
+               , c( "Type", "Group", "Number" )
+               )

str( DF )

'data.frame':   100 obs. of  3 variables:
 $ Type  : chr  "MSM" "MSM" "MSM" "MSM" ...
 $ Group : chr  "HN" "HN" "HN" "HN" ...
 $ Number: chr  "01209" "01210" "01211" "10212" ...

On Tue, 3 Nov 2015, Peter Alspach wrote:

Tena koe Jen

Not answering your question: if you are after these locations in order to split 
the IDs in columns, then you might like to consider strsplit; e.g.,

t(sapply(strsplit(ID, '_'), rbind))

You could then split the last column.  You state that there is a 5-digit number at the 
end.  If this is correct, then use this feature (i.e., nchar(ID)-4) as you'd want 
"IBBS3_MSM_HN104213" (the fifth element in ID) to split to IBBS3, MSM, HN1 and 
04213.  However, if it isn't always 5 digits then split at the first number (i.e., HN and 
104213).

HTH .....

Peter Alspach

-----Original Message-----
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jennifer 
Sabatier
Sent: Tuesday, 3 November 2015 7:39 a.m.
To: r-help@r-project.org
Subject: [R] Locating the starting position of the first number in a string

Hi,


So, I've got a vector of strings that look like this:
ID <- c("IBBS3_MSM_HN01209","IBBS3_MSM_HN01210","IBBS3_MSM_HN01211",
"IBBS3_MSM_HN10212","IBBS3_MSM_HN104213","IBBS3_MSM_HN10214",
"IBBS3_MSM_HN44215","IBBS3_MSM_HN44216","IBBS3_MSM_HN44217",
"IBBS3_MSM_HN44218","IBBS3_MSM_HN44219","IBBS3_MSM_HN44220",
"IBBS3_MSM_HN44221","IBBS3_MSM_HN44222","IBBS3_MSM_HN44223",
"IBBS3_MSM_HN44224","IBBS3_MSM_HN44225","IBBS3_MSM_HN44226",
"IBBS3_MSM_HN44227","IBBS3_MSM_HN12228","IBBS3_MSM_HN12229",
"IBBS3_MSM_HN12230","IBBS3_MSM_HN12231","IBBS3_MSM_HN12232",
"IBBS3_MSM_HN12233","IBBS3_MSM_HN12234","IBBS3_MSM_HN12235",
"IBBS3_MSM_HN12236","IBBS3_MSM_HN12237","IBBS3_MSM_HN12238",
"IBBS3_MSM_HN12239","IBBS3_MSM_HN12240","IBBS3_MSM_HN12241",
"IBBS3_MSM_HN12242","IBBS3_MSM_HN12243","IBBS3_MSM_HN12244",
"IBBS3_MSM_HN12245","IBBS3_MSM_HN12246","IBBS3_MSM_HN12247",
"IBBS3_MSM_HN12248","IBBS3_MSM_HN12249","IBBS3_MSM_HN12250",
"IBBS3_MSM_HN12251","IBBS3_MSM_HN12252","IBBS3_MSM_HN12253",
"IBBS3_MSM_HN12254","IBBS3_MSM_HN12255","IBBS3_MSM_HN25256",
"IBBS3_MSM_HN25257","IBBS3_MSM_HN25258","IBBS3_MSM_HN25259",
"IBBS3_MSM_HN25260","IBBS3_MSM_HN25261","IBBS3_MSM_HN25262",
"IBBS3_MSM_HN25263","IBBS3_MSM_HN25264","IBBS3_MSM_HN25265",
"IBBS3_MSM_HN25266","IBBS3_MSM_HN25267","IBBS3_MSM_HN25268",
"IBBS3_MSM_HN25269","IBBS3_MSM_HN25270","IBBS3_MSM_HN25271",
"IBBS3_MSM_HN25272","IBBS3_MSM_HN25273","IBBS3_MSM_HN25274",
"IBBS3_MSM_HN25275","IBBS3_MSM_HN25276", "IBBS3_MSM_HN25277", 
"IBBS3_MSM_HN25278","IBBS3_MSM_HN25279","IBBS3_MSM_HN25280",
"IBBS3_MSM_HN25281","IBBS3_MSM_HN25282","IBBS3_MSM_HN25283",
"IBBS3_MSM_HN25284","IBBS3_MSM_HMC44285",  "IBBS3_MSM_HMC44286", 
"IBBS3_MSM_HMC44287","IBBS3_MSM_HMC44288","IBBS3_MSM_HMC44289",
"IBBS3_MSM_HMC44290","IBBS3_MSM_HMC44291","IBBS3_MSM_HMC44292",
"IBBS3_MSM_HMC44293","IBBS3_MSM_HMC44294","IBBS3_MSM_HMC44295",
"IBBS3_MSM_HMC44296","IBBS3_MSM_HMC44297","IBBS3_MSM_HMC44298",
"IBBS3_MSM_HMC44299","IBBS3_MSM_HMC44300","IBBS3_MSM_HMC44301",
"IBBS3_MSM_HMC44302","IBBS3_MSM_HMC44303","IBBS3_MSM_HMC44304",
"IBBS3_MSM_HMC44305","IBBS3_MSM_HMC44306","IBBS3_MSM_HMC44307",
"IBBS3_MSM_HMC44309")




This is an ID that is in the following format:  IBBS3_Type_Group#####


What I want to do is locate the starting position of Type, which is anywhere 
from 3 to 4 letters long (in this example it's either MSM or PWID), the 
starting position of Group which is 2-3 letters long (either HN or HMC), and 
finally the starting position of the 5-digit number.


I'm able to get Type and Group using the following:


TYPE_s <- sapply(c("MSM", "PWID"), regexpr, ID, ignore.case=T)

GROUP_s <- (sapply(c("HN", "HMC"), regexpr, ID, ignore.case=T))


What I am having trouble with is getting the starting position of the 5-digit 
number.


I am trying:


DIGITS_s <- sapply("([0:9])", regexpr, ID, ignore.case=T)


But that just seems to look for the position of the first 0.:

DIGITS_s


      ([0:9])

 [1,]      13

 [2,]      13

 [3,]      13

 [4,]      14

 [5,]      14

 [6,]      14

 [7,]      -1

 [8,]      -1

 [9,]      -1

[10,]      -1

[11,]      17

[12,]      17

[13,]      -1

[14,]      -1

[15,]      -1

[16,]      -1

[17,]      -1

[18,]      -1

[19,]      -1

[20,]      -1

[21,]      17

[22,]      17

[23,]      -1

[24,]      -1

[25,]      -1

[26,]      -1

[27,]      -1

[28,]      -1

[29,]      -1

[30,]      -1

[31,]      17

[32,]      17

[33,]      -1

[34,]      -1

[35,]      -1

[36,]      -1

[37,]      -1

[38,]      -1

[39,]      -1

[40,]      -1

[41,]      17

[42,]      17

[43,]      -1

[44,]      -1

[45,]      -1

[46,]      -1

[47,]      -1

[48,]      -1

[49,]      -1

[50,]      -1

[51,]      17

[52,]      17

[53,]      -1

[54,]      -1

[55,]      -1

[56,]      -1

[57,]      -1

[58,]      -1

[59,]      -1

[60,]      -1

[61,]      17

[62,]      17

[63,]      -1

[64,]      -1

[65,]      -1

[66,]      -1

[67,]      -1

[68,]      -1

[69,]      -1

[70,]      -1

[71,]      17

[72,]      17

[73,]      -1

[74,]      -1

[75,]      -1

[76,]      -1

[77,]      -1

[78,]      -1

[79,]      -1

[80,]      -1

[81,]      18

[82,]      17

[83,]      17

[84,]      17

[85,]      17

[86,]      17

[87,]      17

[88,]      17

[89,]      17

[90,]      17

[91,]      17

[92,]      17

[93,]      17

[94,]      17

[95,]      17

[96,]      17

[97,]      17

[98,]      17

[99,]      17

[100,]      17


So, clearly, this is wrong.  I just would like to find the starting position of 
the first digit, no matter what it is.

It's probably easy, isn't it?

Best,

Jen

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
The contents of this e-mail are confidential and may be ...{{dropped:14}}

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Locating the starting position of the first number in a string

Reply via email to