Hi, Gábor:

On 11/9/2014 7:58 PM, Gábor Csárdi wrote:
A little more details about the metacran search, to show how it (imo)
solves a different problem than sos, rseek, RSiteSearch, or
rdocumentation.org.

1. The most important difference is that it searches for _packages_.
The results are packages, not functions, vignettes, etc. E.g. if you
want to find all packages that interact with google apis, you can just
say (https://github.com/metacran/seer is the CLI version):

library(seer)
see("google")
SAW "google" -------------------------------- 25 packages in 0.013 seconds ---
  #  # Title     # Package
  1  RgoogleMaps Overlays on Google map tiles in R
  2  ggmap       A package for spatial visualization with Google Maps and Ope...
  3  RGA         A Google Analytics API client for R
  4  plotKML     Visualization of spatial and spatio-temporal objects in Goog...
  5  googleVis   Interface between R and Google Charts
  6  scholar     Analyse citation data from Google Scholar
  7  translateR  Bindings for the Google and Microsoft Translation APIs
  8  plusser     A Google+ Interface for R
  9  gooJSON     Google JSON Data Interpreter for R
  10 translate   Bindings for the Google Translate API v2
more()
SAW "google" -------------------------------- 25 packages in 0.012 seconds ---
  #  # Title          # Package
  11 ngramr           Retrieve and plot Google n-gram data
  12 RGoogleAnalytics R Wrapper for the Google Analytics API
  13 R2G2             Converting R CRAN outputs into Google Earth.
  14 plotGoogleMaps   Plot spatial or spatio-temporal data over Google Maps
  15 googlePublicData An R library to build Google's Public Data Explorer DSP...
  16 RWeather         R wrapper around the Yahoo! Weather, Google Weather and...
  17 sysfonts         Loading system fonts into R
  18 hashFunction     A collection of non-cryptographic hash functions
  19 rgauges          R wrapper to Gaug.es API
  20 splitstackshape  Stack and Reshape Datasets After Splitting Concatenated...

2. The second difference is that metacran ranks the search results
based on (among other things) the package dependency graph, so if you
search for 'graphics' lattice and ggplot2 come first.

3. Another difference is that metacran exposes a full search API of
the underlying ElasticSearch engine, so if someone wants to rank
results differently, or make more difficult complex queries, they can.

4. It does not search code and docs. I think rdocumentation.org does a
good job with docs, and http://github.com/cran is great for code, e.g.
if you want packages that call SET_SLOT in C:
https://github.com/search?l=c&q=SET_SLOT+user%3Acran&ref=searchresults&type=Code&utf8=%E2%9C%93


      Thanks for the explanation of metacran/seer.


"sos" is also designed to identify packages, but it does it based on the number and rank of help pages matching the search term. I often do "a|b" to obtain the union of two different searches then use "writeFindFn2xls" to output the result to an MS Excel file with 3 sheets: (1) a package summary, (2) the raw search results of help pages sorted by package, and (3) info on the search terms used. "findFn" has a "sortBy" that allows a user to change the default sort order, but I've never used it. Part of the information from the package summary is taken from installed packages and is missing for packages that are not installed. "sos" includes "installPackages" to install the highest ranking packages, but that's a poor solution to the problem. I'd be happy to work with others who can potentially improve the selection of information to present and get it all without installing the packages first. Spencer


Gabor

On Sun, Nov 9, 2014 at 7:18 PM, Spencer Graves
<spencer.gra...@prodsyse.com> wrote:
       Might it be appropriate to add "http://metacran.github.io/search"; and
the "sos" package to the official list of R search capabilities at
"www.r-project.org/search.html"?  [Disclaimer:  I'm the lead author of
"sos".]


       Best Wishes,
       Spencer Graves


On 11/9/2014 11:06 AM, Gábor Csárdi wrote:
Hi,

I think much of this is simply impossible to do. CRAN packages are
written and maintained by thousands of people, how are you planning to
convince them to reorganize their packages? Or even just rename them?
This obviously won't happen.

Btw. did you see 'CRAN Task Views'? That is one organizations of
packages into topics.

Personally, I don't think organization is the solution here. It is too
costly (i.e. too much work) to maintain, impossible to enforce. I
think, however, that a good search engine would definitely help.

FWIW there is a simple search engine here:
http://metacran.github.io/search/
This ranks packages according to the number of reverse dependencies
(among other things), i.e. packages more often used by other packages
will be higher up in the list.

Ranking them according to downloads is also possible, but AFAIK only
one CRAN mirror gives out statistics about downloads, so you don't
really have the complete numbers there.

Disclaimer: I built the search engine above. There are obviously other
alternatives as well, e.g. http://rdocumentation.org, and
http://mran.revolutionanalytics.com/packages/ are the two I know.

Gabor

On Sun, Nov 9, 2014 at 11:24 AM, Steven Sagaert
<steven.saga...@gmail.com> wrote:
Hi,
I’ve been using R on and off for a couple of years. I think R is pretty
great but one thing I’d like to see improved is the way packages are
organised. Instead of CRAN being a long list of packages having a short &
usually unintelligible name I ‘d like to see packages organised in a
hierarchical way with that path acting as a hierarchical namespace just like
you have in many other languages like Java, C#,Scala,… The names of the
(sub)packages should also be clear and unambiguous & packages should be
organised according to their functionality and not just for example be code
for a whole book thrown together and given a cryptic name.

Next to that it would be nice to have extra metadata in the packages to
allow for another more loose flat multi-class class-action like in tagging
blog systems & other metadata to allow for for automatically generating
something like task views.

Due to the large number of packages it’s hard to see the forest from the
trees so a recommendation system for CRAN based on popularity (download
statistics) , ratings & other data  like related packages from package
metadata would be most welcome.

Finally the number of packages in CRAN is exponentially growing but there
is also a large partial overlap in functionality between packages & so many
packages make it hard to find what you are looking for. So maybe there less
is more and there should be a system of removing hardly used/low quality
packages on a regular basis.
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to