Vince Buffalo has covers this nicely in his book "Bioinformatics Data Skills". The original data should stay the original data is immutable and Vince then suggests that you have a text file in your data directory where you explain where the data came from and which scripts you used to create a modified version, when you did this and so on.
I find using roxygen comments and knitr extremely useful for keeping track of what I intend to do and why because it allows me to export all the reasoning, summary tables and plots to a format I can share with collaborators that don't care about the R code for getting there. HTH Ulrik On Thu, 30 Jun 2016 at 17:30 Pito Salas <pitosa...@brandeis.edu> wrote: > I am studying statistics and using R in doing it. I come from software > development where we document everything we do. > > As I “massage” my data, adding columns to a frame, computing on other > data, perhaps cleaning, I feel the need to document in detail what the > meaning, or background, or calculations, or whatever of the data is. After > all it is now derived from my raw data (which may have been well > documented) but it is “new.” > > Is this a real problem? Is there a “best practice” to address this? > > Thanks! > > Pito Salas > Brandeis Computer Science > Feldberg 131 > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.