Hi Bert,
Hi Readers,

I did not know much about attributes in R and how to use them. If it is that 
flexible you are right and I have learnt something.

Kind regards

Georg

> Gesendet: Donnerstag, 30. Juni 2016 um 20:06 Uhr
> Von: "Bert Gunter" <[email protected]>
> An: [email protected]
> Cc: "Pito Salas" <[email protected]>, "R Help" <[email protected]>
> Betreff: Re: [R] Documenting data
>
> I believe Georg's pronouncements are wrong. See inline below.
> 
> -- Bert
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> "...
> 
> > Within R there are some limitations for storing the informtation about what 
> > a variable or a value within a variable means.
> 
> That is FALSE. There are no limitations. For example, just attach a
> "doc" attribute to your data that says whatever you wish to about
> them. e.g.
> 
> > somedata <- runif(10)
> > attr(somedata,"doc") <- "Anything you want to say about the data"
> 
> > attr(somedata,"doc")
> [1] "Anything you want to say about the data"
> 
> 
> You can go as crazy as you want to with this, e.g. creating a (S3 or
> S4 )class "documented" with appropriate methods for printing it from
> classes that inherit from data frames, lists, etc. See also the
> roxygen2 package for data documentation and R's ?promptData function
> for data documentation file in Rd format.
> 
> R is Turing complete -- so it can do anything any other programming
> language can do. You could program SAS in R if you wanted. The
> difference is that SAS has pre-programmed some capabilities that R
> leaves for users, including contributed packages -- like Sweave,
> knitr, etc.  You may or may not like this extra flexibility (and extra
> work, depending on whether someone else has already done the work for
> you), and efficiency may or may not be an issue; but to say that R has
> "limitations" is a gross misrepresentation, imho.
> 
> 
> 
> Possibilities to store this information is in other software packages
> like SAS or SPSS much broader implemented. In R you can work with
> meaningful variable names and the data type/class factor which can
> store mappings between values and value descriptions.
> >
> > Example
> > -- cut --
> > var1 <- c(rep(1:5, 3))
> > ds_example <- data.frame(var1)
> >
> > var1_labels <- c("1 = Strongly Agree",
> >                 "2 = Agree",
> >                 "3 = Neither agree/nor disagree",
> >                 "4 = Disagree",
> >                 "5 = Strongly disagree")
> >
> > ds_example[["var1"]] <- factor(ds_example[["var1"]],
> >                                levels = c(1, 2, 3, 4, 5),
> >                                labels = var1_labels)
> >
> > summary(ds_example["var1"])
> > -- cut --
> >
> > In addition you find methods to work with variable labels and value labels 
> > in the pacakges Hmisc and memisc. They can also produce a thing called 
> > codebook which contains all variable names, variable labels, values, value 
> > labels and summaries of the distribution of values within the variables.
> >
> > 3. In addition to this you could structure your script in a modular way 
> > according to the analysis process, e. g.
> > importing, cleaning, preparation for analysis, analysis, reporting. Other 
> > structure may be more sufficient in your case. These modules could have a 
> > number in the file name indicating in which sequence the scripts should be 
> > run.
> >
> > 4. I find it valuable to use a software repository like Github, Sourceforge 
> > or others to keep the revisions save and seucre in case you would like to 
> > go back to a version with code you deleted before and figure out that you 
> > need it now again. The R Studio IDE has an interface to git if you like to 
> > go with that. Good commit message can help you track what has changed. 
> > Commits also help you to prepare precise steps when developing your scripts.
> >
> > 5. I have no experience with Sweave or knitr but you could also compile a 
> > simple documentation through copying comments to an Excel sheet using 
> > R-2-Excel libraries like excel.link or others.
> >
> > Example
> > install.packages("excel.link")
> > library(excel.link)
> > xlc["A1"] <- "Project Documentation"
> > xlc["A2"] <- "Step XY"
> > xlc["A3"] <- "Some explanation about step xy"
> >
> > This way you have the documentation in your code and in an external source.
> >
> > Which approach you chose depends on your experience with R and its 
> > libraries as well as the size of your project and the need for 
> > documentation.
> >
> > 6. It can be helpful to store interim results in a format that can be read 
> > by non-R-users, e. g. Excel.
> >
> > 7. Documenting code can be done using roxygen2.
> >
> > If there are different opinions to my suggestions please say so.
> >
> > Kind regards
> >
> > Georg
> >
> >
> >> Gesendet: Donnerstag, 30. Juni 2016 um 16:51 Uhr
> >> Von: "Pito Salas" <[email protected]>
> >> An: [email protected]
> >> Betreff: [R] Documenting data
> >>
> >> I am studying statistics and using R in doing it. I come from software 
> >> development where we document everything we do.
> >>
> >> As I “massage” my data, adding columns to a frame, computing on other 
> >> data, perhaps cleaning, I feel the need to document in detail what the 
> >> meaning, or background, or calculations, or whatever of the data is. After 
> >> all it is now derived from my raw data (which may have been well 
> >> documented) but it is “new.”
> >>
> >> Is this a real problem? Is there a “best practice” to address this?
> >>
> >> Thanks!
> >>
> >> Pito Salas
> >> Brandeis Computer Science
> >> Feldberg 131
> >>
> >> ______________________________________________
> >> [email protected] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide 
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > [email protected] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to