Private, since this is a trivial comment. Also, just my opinion, so feel free to ignore.
Capture it, yes, but not necessarily as a function; just as a script might do, and the tools mentioned can do this. As others have said, your instincts are good, and you should just choose the methods that work best for you. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Jun 30, 2016 at 8:46 AM, Pito Salas <pitosa...@brandeis.edu> wrote: > Thanks to you both. I think you’re saying/implying that once I “test drive” a > particular bit of cleaning I should capture it in a function which does it > reproducibly against the raw data, and that becomes the best documentation > for it. That makes sense. > > Pito Salas > Brandeis Computer Science > Feldberg 131 > >> On Jun 30, 2016, at 11:44 AM, Robert Baer <rb...@atsu.edu> wrote: >> >> You might look at: >> >> http://stackoverflow.com/questions/7979609/automatic-documentation-of-datasets >> >> You might also, try the FIle | Compile Notebook from within R-Studio >> (https://www.rstudio.com/) on your well-documented R-scripts to get a nice >> reproducible recording/report of data analysis workflow. Similar >> functionality is available from basic R, but involves more work. There are >> many other approaches, but the best choice depends on your precise needs. >> >> And, as a programmer, you are probably already familiar with things like: >> https://google.github.io/styleguide/Rguide.xml >> >> >> >> On 6/30/2016 9:51 AM, Pito Salas wrote: >>> I am studying statistics and using R in doing it. I come from software >>> development where we document everything we do. >>> >>> As I “massage” my data, adding columns to a frame, computing on other data, >>> perhaps cleaning, I feel the need to document in detail what the meaning, >>> or background, or calculations, or whatever of the data is. After all it is >>> now derived from my raw data (which may have been well documented) but it >>> is “new.” >>> >>> Is this a real problem? Is there a “best practice” to address this? >>> >>> Thanks! >>> >>> Pito Salas >>> Brandeis Computer Science >>> Feldberg 131 >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.