Hi Davide,
On 03/01/2016 02:25 PM, davide risso wrote:
Dear Bioc developers,
I recently downloaded three publicly available single-cell RNA-seq datasets
from the NCBI GEO/SRA repository and created an R package with some
gene-level summaries (read counts and FPKMs).
I'm currently using the package locally for my own tests, but I'm thinking
that this may be a useful resource for the community and thinking of
sharing it on github and eventually submit it to Bioconductor.
I was not involved in any way with the original studies, and I'm wondering
what is the best practice in terms of license / data sharing. Since there
are many experimental data packages in Bioconductor, I'm guessing that I'm
not the first person wondering about this.
From the NCBI website, I read (quote from
https://www.ncbi.nlm.nih.gov/home/about/policies.shtml):
Databases of molecular data on the NCBI Web site include such examples as
nucleotide sequences (GenBank), protein sequences, macromolecular
structures, molecular variation, gene expression, and mapping data. They
are designed to provide and encourage access within the scientific
community to sources of current and comprehensive information. Therefore,
NCBI itself places no restrictions on the use or distribution of the data
contained therein. Nor do we accept data when the submitter has requested
restrictions on reuse or redistribution. However, some submitters of the
original data (or the country of origin of such data) may claim patent,
copyright, or other intellectual property rights in all or a portion of the
data (that has been submitted). NCBI is not in a position to assess the
validity of such claims and since there is no transfer of rights from
submitters to NCBI, NCBI has no rights to transfer to a third party.
Therefore, NCBI cannot provide comment or unrestricted permission
concerning the use, copying, or distribution of the information contained
in the molecular databases.
Should I contact the original authors for permission? Or is the fact that
the data were publicly shared enough to grant me permission to redistribute?
In that case, is there a standard license that I should use?
Thanks for any feedback / thought!
I don't have much to offer. AFAIK we don't really have guidelines or
recommendations for what license to use for experimental data packages,
except for the usual "make sure you use an appropriate license" advice.
So far it has really been up to each author/maintainer to make sure
they pick up a license that is compatible with the original
license/copyright/patent of the original data they are packaging
and with its redistribution thru the Bioconductor channel.
FWIW here is a summary of the licenses used by the 276 experimental
data packages currently in BioC devel:
License Nb of packages
------------ --------------
GPL 135
Artistic-2.0 96
LGPL 41
other 4
Would be interesting to hear from other developers about this. For
example, how people choose between GPL vs Artistic-2.0? Is one
license typically more appropriate for packaging and redistributing
data that is already publicly available?
H.
Best,
davide
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpa...@fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel