Web services are only part of the problem. In essence, there are at least two facets:
1. downloading the data using some protocol
2. mapping the data to a common model

Having #1 makes the import/download easier, but it really becomes useful when both are included. I think #2 is the harder problem to address. Software can usually be written to handle #1 by making a useful abstraction layer. #2 means that data has consistent names and meanings, and this requires people to agree on common definitions and a common naming convention.

RDF (Resource Description Framework) and its related technologies (SPARQL, OWL, etc) are one of the many attempts to try to address this. While this effort would benefit R, I think it's best if it's part of a larger effort.

Services such as DBpedia and Freebase are trying to unify many data sets using RDF.

The task view and package ideas a great ideas. I'm just adding another perspective.

Jason

On 01/13/2012 05:18 PM, Roy Mendelssohn wrote:
HI Benjamin:

What would make this easier is if these sites used standardized web services, 
so it would only require writing once.  data.gov is the worst example, they 
spun the own, weak service.

There is a lot of environmental data available through OPenDAP, and that is 
supported in the ncdf4 package.  My own group has a service called ERDDAP that 
is entirely RESTFul, see:

http://coastwatch.pfel.noaa.gov/erddap

and

http://upwell.pfeg.noaa.gov/erddap

We provide R  (and matlab) scripts that automate the extract for certain cases, 
see:

http://coastwatch.pfeg.noaa.gov/xtracto/

We also have a tool called the Environmental Data Connector  (EDC) that  
provides a GUI from with R  (and ArcGIS, Matlab and Excel) that allows you to 
subset  data that is served by OPeNDAP, ERDDAP, certain Sensor Observation 
Service (SOS) servers,  and have it read directly into R.  It is freely 
available at:

http://www.pfeg.noaa.gov/products/EDC/

We can write such tools because the service is either standardized  (OPeNDAP, 
SOS) or is easy to implement  (ERDDAP).

-Roy


On Jan 13, 2012, at 1:14 PM, Benjamin Weber wrote:

Dear R Users -

R is a wonderful software package. CRAN provides a variety of tools to
work on your data. But R is not apt to utilize all the public
databases in an efficient manner.
I observed the most tedious part with R is searching and downloading
the data from public databases and putting it into the right format. I
could not find a package on CRAN which offers exactly this fundamental
capability.
Imagine R is the unified interface to access (and analyze) all public
data in the easiest way possible. That would create a real impact,
would put R a big leap forward and would enable us to see the world
with different eyes.

There is a lack of a direct connection to the API of these databases,
to name a few:

- Eurostat
- OECD
- IMF
- Worldbank
- UN
- FAO
- data.gov
- ...

The ease of access to the data is the key of information processing with R.

How can we handle the flow of information noise? R has to give an
answer to that with an extensive API to public databases.

I would love your comments and ideas as a contribution in a vital discussion.

Benjamin

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
**********************
"The contents of this message do not reflect any position of the U.S. Government or 
NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: roy.mendelss...@noaa.gov (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to