Hi Bert, Hi Readers, I did not know much about attributes in R and how to use them. If it is that flexible you are right and I have learnt something.
Kind regards Georg > Gesendet: Donnerstag, 30. Juni 2016 um 20:06 Uhr > Von: "Bert Gunter" <[email protected]> > An: [email protected] > Cc: "Pito Salas" <[email protected]>, "R Help" <[email protected]> > Betreff: Re: [R] Documenting data > > I believe Georg's pronouncements are wrong. See inline below. > > -- Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > "... > > > Within R there are some limitations for storing the informtation about what > > a variable or a value within a variable means. > > That is FALSE. There are no limitations. For example, just attach a > "doc" attribute to your data that says whatever you wish to about > them. e.g. > > > somedata <- runif(10) > > attr(somedata,"doc") <- "Anything you want to say about the data" > > > attr(somedata,"doc") > [1] "Anything you want to say about the data" > > > You can go as crazy as you want to with this, e.g. creating a (S3 or > S4 )class "documented" with appropriate methods for printing it from > classes that inherit from data frames, lists, etc. See also the > roxygen2 package for data documentation and R's ?promptData function > for data documentation file in Rd format. > > R is Turing complete -- so it can do anything any other programming > language can do. You could program SAS in R if you wanted. The > difference is that SAS has pre-programmed some capabilities that R > leaves for users, including contributed packages -- like Sweave, > knitr, etc. You may or may not like this extra flexibility (and extra > work, depending on whether someone else has already done the work for > you), and efficiency may or may not be an issue; but to say that R has > "limitations" is a gross misrepresentation, imho. > > > > Possibilities to store this information is in other software packages > like SAS or SPSS much broader implemented. In R you can work with > meaningful variable names and the data type/class factor which can > store mappings between values and value descriptions. > > > > Example > > -- cut -- > > var1 <- c(rep(1:5, 3)) > > ds_example <- data.frame(var1) > > > > var1_labels <- c("1 = Strongly Agree", > > "2 = Agree", > > "3 = Neither agree/nor disagree", > > "4 = Disagree", > > "5 = Strongly disagree") > > > > ds_example[["var1"]] <- factor(ds_example[["var1"]], > > levels = c(1, 2, 3, 4, 5), > > labels = var1_labels) > > > > summary(ds_example["var1"]) > > -- cut -- > > > > In addition you find methods to work with variable labels and value labels > > in the pacakges Hmisc and memisc. They can also produce a thing called > > codebook which contains all variable names, variable labels, values, value > > labels and summaries of the distribution of values within the variables. > > > > 3. In addition to this you could structure your script in a modular way > > according to the analysis process, e. g. > > importing, cleaning, preparation for analysis, analysis, reporting. Other > > structure may be more sufficient in your case. These modules could have a > > number in the file name indicating in which sequence the scripts should be > > run. > > > > 4. I find it valuable to use a software repository like Github, Sourceforge > > or others to keep the revisions save and seucre in case you would like to > > go back to a version with code you deleted before and figure out that you > > need it now again. The R Studio IDE has an interface to git if you like to > > go with that. Good commit message can help you track what has changed. > > Commits also help you to prepare precise steps when developing your scripts. > > > > 5. I have no experience with Sweave or knitr but you could also compile a > > simple documentation through copying comments to an Excel sheet using > > R-2-Excel libraries like excel.link or others. > > > > Example > > install.packages("excel.link") > > library(excel.link) > > xlc["A1"] <- "Project Documentation" > > xlc["A2"] <- "Step XY" > > xlc["A3"] <- "Some explanation about step xy" > > > > This way you have the documentation in your code and in an external source. > > > > Which approach you chose depends on your experience with R and its > > libraries as well as the size of your project and the need for > > documentation. > > > > 6. It can be helpful to store interim results in a format that can be read > > by non-R-users, e. g. Excel. > > > > 7. Documenting code can be done using roxygen2. > > > > If there are different opinions to my suggestions please say so. > > > > Kind regards > > > > Georg > > > > > >> Gesendet: Donnerstag, 30. Juni 2016 um 16:51 Uhr > >> Von: "Pito Salas" <[email protected]> > >> An: [email protected] > >> Betreff: [R] Documenting data > >> > >> I am studying statistics and using R in doing it. I come from software > >> development where we document everything we do. > >> > >> As I “massage” my data, adding columns to a frame, computing on other > >> data, perhaps cleaning, I feel the need to document in detail what the > >> meaning, or background, or calculations, or whatever of the data is. After > >> all it is now derived from my raw data (which may have been well > >> documented) but it is “new.” > >> > >> Is this a real problem? Is there a “best practice” to address this? > >> > >> Thanks! > >> > >> Pito Salas > >> Brandeis Computer Science > >> Feldberg 131 > >> > >> ______________________________________________ > >> [email protected] mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > [email protected] mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [email protected] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

