> On Oct 14, 2016, at 6:53 PM, Joe Ceradini <joecerad...@gmail.com> wrote:
> 
> Hopefully this looks better. I did not realize gmail default was html.
> 
> I have a dataframe with a column that has many field smashed together.
> I need to split the strings in the column into separate columns based
> on patterns.
> 
> Example of a string that needs to be split:
> 
> ugly <- c("Water temp:14: F Waterbody type:Permanent Lake/Pond: Water
> pH:Unkwn: Conductivity:Unkwn: Water color: Clear: Water turbidity:
> clear: Manmade:no  Permanence:permanent:  Max water depth: <3: Primary
> substrate: Silt/Mud: Evidence of cattle grazing: none: Shoreline
> Emergent Veg(%): 1-25: Fish present: yes: Fish species: unkwn: no
> amphibians observed")
> ugly
> 
> Far as I can tell, there is not a single pattern that would work for
> splitting. Splitting on ":" is close, but not quite right. Each of the
> below attributes should be in a separate column, and are present in
> the string (above) that needs to be split:
> 
> attributes <- c("Water temp", "Waterbody type", "Water pH",
> "Conductivity", "Water color", "Water turbidity", "Manmade",
> "Permanence", "Max water depth", "Primary substrate", "Evidence of
> cattle grazing", "Shoreline Emergent Veg(%)", "Fish present", "Fish
> species")
> 
> Conceptually, I want to use the vector of attributes to split the
> string. However, strsplit only uses the 1st value of the attributes
> object:
> 
> strplit(ugly, attributes).

I tried this:

strsplit( ugly, split=paste0(attributes, collapse="|")  )

And noticed soem of hte attributes were not actually splitting so went back and 
did the data entry after making sure that there were no "\n"'s in the middle of 
attribute names:

dput(attributes)
c("Water temp", "Waterbody type", "Water pH", "Conductivity", 
"Water color", "Water turbidity", "Manmade", "Permanence", "Max water depth", 
"Primary substrate", "Evidence of cattle grazing", "Shoreline Emergent Veg(%)", 
"Fish present", "Fish species")

strsplit( ugly, split=paste0(attributes, collapse="|")  )
[[1]]
 [1] ""                                                                         
                               
 [2] ":14: F "                                                                  
                               
 [3] ":Permanent Lake/Pond: Water\npH:Unkwn: "                                  
                               
 [4] ":Unkwn: "                                                                 
                               
 [5] ": Clear: "                                                                
                               
 [6] ":\nclear: "                                                               
                               
 [7] ":no  "                                                                    
                               
 [8] ":permanent:  "                                                            
                               
 [9] ": <3: Primary\nsubstrate: Silt/Mud: Evidence of cattle grazing: none: 
Shoreline\nEmergent Veg(%): 1-25: "
[10] ": yes: Fish species: unkwn: no\namphibians observed"        

> 
> Should I loop through the values of "attributes"?
> Is there an argument in strsplit I'm missing that will do what I want? \\

I don't think strsplit has such an argument. There may be packages that will 
support this. Perhaps the gubfn package?


> Different approach altogether?
> 
> Thanks! Happy Friday.
> Joe
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to