I have been following this thread, but there are many aspects of it
which are unclear to me.  Who are the publishers?  Who are the users?
What is the problem?  I have a vauge sense for some of these, but it
seems to me like one valuable starting place would be creating a
document that clarifies everything.  It is easier to tackle a concrete
problem (e.g., agree on a standard numerical representation of dates
and times a la ISO 8601) than something diffuse (e.g., information
overload).

Good luck,

Josh

On Sat, Jan 14, 2012 at 10:02 AM, Benjamin Weber <m...@bwe.im> wrote:
> Mike
>
> We see that the publishers are aware of the problem. They don't think
> that the raw data is the usable for the user. Consequently they
> recognizing this fact with the proprietary formats. Yes, they resign
> in the information overload. That's pathetic.
>
> It is not a question of *which* data format, it is a question about
> the general concept. Where do publisher and user meet? There has to be
> one *defined* point which all parties agree on. I disagree with your
> statement that the publisher should just publish csv or cook his own
> API. That leads to fragmentation and inaccessibility of data. We want
> data to be accessible.
>
> A more pragmatic approach is needed to revolutionize the way we go
> about raw data.
>
> Benjamin
>
> On 14 January 2012 22:17, Mike Marchywka <marchy...@hotmail.com> wrote:
>>
>>
>>
>>
>>
>>
>>
>> LOL, I remember posting about this in the past. The US gov agencies vary but 
>> mostare quite good. The big problem appears to be people who push 
>> proprietary orcommercial "standards" for which only one effective source 
>> exists. Some formats,like Excel and PDF come to mind and there is a 
>> disturbing trend towards theiradoption in some places where raw data is 
>> needed by many. The best thing to do is contact the informationprovider and 
>> let them know you want raw data, not images or stuff that worksin limited 
>> commercial software packages. Often data sources are valuable andthe revenue 
>> model impacts availability.
>>
>> If you are just arguing over different open formats,  it is usually easy for 
>> someone towrite some conversion code and publish it- CSV to JSON would not 
>> be a problem for example. Data of course are quite variable and there is 
>> nothingwrong with giving provider his choice.
>>
>> ----------------------------------------
>>> Date: Sat, 14 Jan 2012 10:21:23 -0500
>>> From: ja...@rampaginggeek.com
>>> To: r-help@r-project.org
>>> Subject: Re: [R] The Future of R | API to Public Databases
>>>
>>> Web services are only part of the problem. In essence, there are at
>>> least two facets:
>>> 1. downloading the data using some protocol
>>> 2. mapping the data to a common model
>>>
>>> Having #1 makes the import/download easier, but it really becomes useful
>>> when both are included. I think #2 is the harder problem to address.
>>> Software can usually be written to handle #1 by making a useful
>>> abstraction layer. #2 means that data has consistent names and meanings,
>>> and this requires people to agree on common definitions and a common
>>> naming convention.
>>>
>>> RDF (Resource Description Framework) and its related technologies
>>> (SPARQL, OWL, etc) are one of the many attempts to try to address this.
>>> While this effort would benefit R, I think it's best if it's part of a
>>> larger effort.
>>>
>>> Services such as DBpedia and Freebase are trying to unify many data sets
>>> using RDF.
>>>
>>> The task view and package ideas a great ideas. I'm just adding another
>>> perspective.
>>>
>>> Jason
>>>
>>> On 01/13/2012 05:18 PM, Roy Mendelssohn wrote:
>>> > HI Benjamin:
>>> >
>>> > What would make this easier is if these sites used standardized web 
>>> > services, so it would only require writing once. data.gov is the worst 
>>> > example, they spun the own, weak service.
>>> >
>>> > There is a lot of environmental data available through OPenDAP, and that 
>>> > is supported in the ncdf4 package. My own group has a service called 
>>> > ERDDAP that is entirely RESTFul, see:
>>> >
>>> > http://coastwatch.pfel.noaa.gov/erddap
>>> >
>>> > and
>>> >
>>> > http://upwell.pfeg.noaa.gov/erddap
>>> >
>>> > We provide R (and matlab) scripts that automate the extract for certain 
>>> > cases, see:
>>> >
>>> > http://coastwatch.pfeg.noaa.gov/xtracto/
>>> >
>>> > We also have a tool called the Environmental Data Connector (EDC) that 
>>> > provides a GUI from with R (and ArcGIS, Matlab and Excel) that allows you 
>>> > to subset data that is served by OPeNDAP, ERDDAP, certain Sensor 
>>> > Observation Service (SOS) servers, and have it read directly into R. It 
>>> > is freely available at:
>>> >
>>> > http://www.pfeg.noaa.gov/products/EDC/
>>> >
>>> > We can write such tools because the service is either standardized 
>>> > (OPeNDAP, SOS) or is easy to implement (ERDDAP).
>>> >
>>> > -Roy
>>> >
>>> >
>>> > On Jan 13, 2012, at 1:14 PM, Benjamin Weber wrote:
>>> >
>>> >> Dear R Users -
>>> >>
>>> >> R is a wonderful software package. CRAN provides a variety of tools to
>>> >> work on your data. But R is not apt to utilize all the public
>>> >> databases in an efficient manner.
>>> >> I observed the most tedious part with R is searching and downloading
>>> >> the data from public databases and putting it into the right format. I
>>> >> could not find a package on CRAN which offers exactly this fundamental
>>> >> capability.
>>> >> Imagine R is the unified interface to access (and analyze) all public
>>> >> data in the easiest way possible. That would create a real impact,
>>> >> would put R a big leap forward and would enable us to see the world
>>> >> with different eyes.
>>> >>
>>> >> There is a lack of a direct connection to the API of these databases,
>>> >> to name a few:
>>> >>
>>> >> - Eurostat
>>> >> - OECD
>>> >> - IMF
>>> >> - Worldbank
>>> >> - UN
>>> >> - FAO
>>> >> - data.gov
>>> >> - ...
>>> >>
>>> >> The ease of access to the data is the key of information processing with 
>>> >> R.
>>> >>
>>> >> How can we handle the flow of information noise? R has to give an
>>> >> answer to that with an extensive API to public databases.
>>> >>
>>> >> I would love your comments and ideas as a contribution in a vital 
>>> >> discussion.
>>> >>
>>> >> Benjamin
>>> >>
>>> >> ______________________________________________
>>> >> R-help@r-project.org mailing list
>>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >> PLEASE do read the posting guide 
>>> >> http://www.R-project.org/posting-guide.html
>>> >> and provide commented, minimal, self-contained, reproducible code.
>>> > **********************
>>> > "The contents of this message do not reflect any position of the U.S. 
>>> > Government or NOAA."
>>> > **********************
>>> > Roy Mendelssohn
>>> > Supervisory Operations Research Analyst
>>> > NOAA/NMFS
>>> > Environmental Research Division
>>> > Southwest Fisheries Science Center
>>> > 1352 Lighthouse Avenue
>>> > Pacific Grove, CA 93950-2097
>>> >
>>> > e-mail: roy.mendelss...@noaa.gov (Note new e-mail address)
>>> > voice: (831)-648-9029
>>> > fax: (831)-648-8440
>>> > www: http://www.pfeg.noaa.gov/
>>> >
>>> > "Old age and treachery will overcome youth and skill."
>>> > "From those who have been given much, much will be expected"
>>> > "the arc of the moral universe is long, but it bends toward justice" -MLK 
>>> > Jr.
>>> >
>>> > ______________________________________________
>>> > R-help@r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide 
>>> > http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>> >
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to