Hi All,

The most difficult challenge that I face in “learning R” is to do data munging. 
I have reviewed Hadley’s advanced R programming guide, familiarized myself with 
data structures, subsetting, plyr, dplyr, tidy, the lapply() family of 
functions, basic string manipulation and grepping, SQL etc. I’ve also written a 
few dozens of functions that do basic data munging tasks. Further, I’ve already 
reviewed things like the Coursera course “Computing for Data Analysis” - 
https://www.coursera.org/course/compdata and Data Camp's data.table course.


However, many of the tasks that are commonly solved by the tools mentioned 
above seem to be mainly applied to datasets with fairly well-structured 
variables that needs to be transformed and subsetted in various ways - these 
tasks are often not so difficult. 



Much of my work involves querying APIs, SQL databases or scraping websites, and 
then assembling lists of various things that can then be transformed into 
social networks or timestamped sequences of various events etc. Solutions to 
many tricky problems in this area still seem to imply creative leaps of 
imagination that I can understand after I see them, but I have trouble seeing 
how I could ever come up with them independently.


Therefore I ask - what do I need to learn to become better at solving tricky 
data munging problems?


I realize a common answer may be: solve many data munging problems. I 
understand that this is a clear factor, however, I’m trying to figure out if 
there is some more tangible guidance. 


* Is there something like “design patterns” for data munging? 
* Would doing a course in algorithms help? (I’ve reviewed parts of "Guide to 
Programming and Algorithms Using R" - 
http://www.springer.com/computer/swe/book/978-1-4471-5327-6 - many of the 
problems are mathematical and seem far-removed from the kinds of problems that 
I’m trying to solve)
* Is there something like SelectorGadget (http://selectorgadget.com/) for R 
objects?
* Could something like OpenRefine (http://openrefine.org/) make these tasks 
easier?


Best,
Aron

-- 
Aron Lindberg


Doctoral Candidate, Information Systems
Weatherhead School of Management 
Case Western Reserve University
aronlindberg.github.io
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to