Re: [R] Archive format

2017-04-10 Thread Joe Gain

Hi Georg,


On 08.04.2017 09:04, g.maub...@gmx.de wrote:

Hi Joe,

I have read your question with great interest. I am a little bit astonished to 
read about your project. There is a big national institute in Germany called 
GESIS 
(https://de.wikipedia.org/wiki/GESIS_%E2%80%93_Leibniz-Institut_f%C3%BCr_Sozialwissenschaften)
 which does the same job you are trying to set-up since 1986 now. You could try 
to exchange ideas with them.


we've already had some contact with GESIS. I agree that it would be a 
good idea to communicate and cooperate more with GESIS-- although there 
are many interesting organisations, which are all doing their own thing 
and it's not always easy to do so.


We organised a confernce in Heidelberg, "The E-Science Tage", and I was 
at the GESIS presentation, which was very good.



Your subject is very complex with regard to reproducible research. You might 
want to have a look at



(1) https://cran.r-project.org/web/views/ReproducibleResearch.html
(2) Gandrud, Christopher: Reproducible Research with R and R Studio 
(https://www.amazon.com/Reproducible-Research-Studio-Second-Chapman/dp/1498715370)


Thanks for the useful links. (There's a whole book about R and 
reproducible research!)


The general goal of the web platform is to increase the awareness of 
researchers in Research Data Management.


The topic _is_ very complicated and it's difficult to write a general 
approach, especially, when you consider the different research 
disciplines, etc. nevertheless, that is what we are trying to do. Where 
it's possible and when the information becomes to specific we will 
include links to further resources (such as those, you have recommended 
above). Also, the project is to some extent dependent on the feedback of 
users, especially when they are able to provide us with information, 
which improves the content of the web platform.




Kind regards

Georg



Thanks for taking the time to reply to my question.

All the best,
Joe


Gesendet: Mittwoch, 29. März 2017 um 10:44 Uhr
Von: "Joe Gain" 
An: R-help@r-project.org
Cc: bwfdm-i...@lists.kit.edu
Betreff: [R] Archive format

Hello,

we are collecting information on the subject of research data management
in German on the webplatform:

www.forschungsdaten.info

One of the topics, which we are writing about, is how to *archive* data.
Unfortunately, none of us in the project is an expert with respect to R
and so I would like to ask the list, what they recommend? A related
question is to do with the sharing of data. We have already asked some
academics, who have basically replied that they don't really know other
than to strongly recommend a plain text format.

We would also like to know, if members of the list recommend converting
formats from commercial software such as S-Plus, Terr, SPSS etc. to an
R-compatible format for long term archivation? Are there any general
rules and best practices, when it comes to archiving (and sharing)
statistical data and statistical programs?

Any comments would be much appreciated!
Joe

--
B 1003
Kommunikations-, Informations-, Medienzentrum (KIM)
Universitaet Konstanz

t: ++49-7531-883234
e: joe.g...@uni-konstanz.de

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
B 1003
Kommunikations-, Informations-, Medienzentrum (KIM)
Universitaet Konstanz

t: ++49-7531-883234
e: joe.g...@uni-konstanz.de

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Archive format

2017-03-29 Thread Joe Gain

Hello,

we are collecting information on the subject of research data management 
in German on the webplatform:


www.forschungsdaten.info

One of the topics, which we are writing about, is how to *archive* data. 
Unfortunately, none of us in the project is an expert with respect to R 
and so I would like to ask the list, what they recommend? A related 
question is to do with the sharing of data. We have already asked some 
academics, who have basically replied that they don't really know other 
than to strongly recommend a plain text format.


We would also like to know, if members of the list recommend converting 
formats from commercial software such as S-Plus, Terr, SPSS etc. to an 
R-compatible format for long term archivation? Are there any general 
rules and best practices, when it comes to archiving (and sharing) 
statistical data and statistical programs?


Any comments would be much appreciated!
Joe

--
B 1003
Kommunikations-, Informations-, Medienzentrum (KIM)
Universitaet Konstanz

t: ++49-7531-883234
e: joe.g...@uni-konstanz.de

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Archive format

2017-03-30 Thread Joe Gain

On 29.03.2017 17:36, Jeff Newmiller wrote:

The relevance to R (and therefore R-help) of this question is marginal at best. 
R might not be the language of choice when you go retrieve the data.

Also, this question seems dangerously close to a troll, because the obvious 
answer is that the data should be in an open format but if you are not 
currently working with data in an open format then you increase the cost of 
archiving and risk losing information up front by extracting it from a 
proprietary format, and balancing those concerns is more political than 
technical.

Note that there exist open binary formats, and the goals of your archiving task 
and nature of the data would have to be considered in deciding which of the 
many to use. My own experience has been that plain text survives time best, but 
YMMV.



Well, I didn't mean to troll the list. We have a small section on R, and 
in response to a question that we got from a user, we thought it would 
be a good idea to check with some actual R-users.


I think the responses are pretty much in line with what we expected. 
There's unsurprisingly no simple solution. A text format is advantageous 
due to the many options that a user has to work with text data. Your 
point is valid, with regards to the format of the source-data, which can 
be a clear constraint (other constraints are, for example, of a legal 
nature). I'm not trying to advocate for open formats per se, just trying 
to gather information so as to be able to make a recommendation.


I think we need to restructure the information on our web platform to 
clearly differentiate between data and the source code, scripts etc. 
which are used to process the data ("algorithms").


There is a big problem with data that has been archived but nobody knows 
what it is/ was for. Archivation, sharing, reproducibility are important 
subjects and we are interested in the experience of statisticians in 
dealing with these problems.


Thanks for the replies!
Joe

--
B 1003
Kommunikations-, Informations-, Medienzentrum (KIM)
Universitaet Konstanz

t: ++49-7531-883234
e: joe.g...@uni-konstanz.de

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.