Re: [Rd] R vs. C

Dominick Samperi Mon, 17 Jan 2011 13:33:57 -0800

On Mon, Jan 17, 2011 at 3:57 PM, Spencer Graves <
spencer.gra...@structuremonitoring.com> wrote:


>      For me, a major strength of R is the package development process.
>  I've found this so valuable that I created a Wikipedia entry by that name
> and made additions to a Wikipedia entry on "software repository", noting
> that this process encourages good software development practices that I have
> not seen standardized for other languages.  I encourage people to review
> this material and make additions or corrections as they like (or sent me
> suggestions for me to make appropriate changes).
>

I agree that the package development process is a major strength. Other
factors include
the high level of user support, hand-holding, feedback, and prompt bug
fixes. It is not
uncommon to see support at levels far exceeding what you would expect from a
for-profit
business. Newbie questions are answered in seconds in some cases!

On the package development process, if C/C++ development does become more
popular with
the help of packages like Rcpp, then extensions that check this part of a
package for
consistency, documentation, etc. might be helpful. This might exploit
features of Doxygen, for example.

Dominick


>
>      While R has other capabilities for unit and regression testing, I
> often include unit tests in the "examples" section of documentation files.
>  To keep from cluttering the examples with unnecessary material, I often
> include something like the following:
>
>
> A1 <- myfunc() # to test myfunc
>
> A0 <- ("manual generation of the correct  answer for A1")
>
> \dontshow{stopifnot(} # so the user doesn't see "stopifnot("
> all.equal(A1, A0) # compare myfunc output with the correct answer
> \dontshow{)} # close paren on "stopifnot(".
>
>
>      This may not be as good in some ways as a full suite of unit tests,
> which could be provided separately.  However, this has the distinct
> advantage of including unit tests with the documentation in a way that
> should help users understand "myfunc".  (Unit tests too detailed to show
> users could be completely enclosed in "\dontshow".
>
>
>      Spencer
>
>
>
> On 1/17/2011 11:38 AM, Dominick Samperi wrote:
>
>> On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves<
>> spencer.gra...@structuremonitoring.com>  wrote:
>>
>>       Another point I have not yet seen mentioned:  If your code is
>>> painfully slow, that can often be fixed without leaving R by
>>> experimenting
>>> with different ways of doing the same thing -- often after using
>>> profiling
>>> your code to find the slowest part as described in chapter 3 of "Writing
>>> R
>>> Extensions".
>>>
>>>
>>>      If I'm given code already written in C (or some other language),
>>> unless it's really simple, I may link to it rather than recode it in R.
>>>  However, the problems with portability, maintainability, transparency to
>>> others who may not be very facile with C, etc., all suggest that it's
>>> well
>>> worth some effort experimenting with alternate ways of doing the same
>>> thing
>>> in R before jumping to C or something else.
>>>
>>>      Hope this helps.
>>>      Spencer
>>>
>>>
>>>
>>> On 1/17/2011 10:57 AM, David Henderson wrote:
>>>
>>>  I think we're also forgetting something, namely testing.  If you write
>>>> your
>>>> routine in C, you have placed additional burden upon yourself to test
>>>> your
>>>> C
>>>> code through unit tests, etc.  If you write your code in R, you still
>>>> need
>>>> the
>>>> unit tests, but you can rely on the well tested nature of R to allow you
>>>> to
>>>> reduce the number of tests of your algorithm.  I routinely tell people
>>>> at
>>>> Sage
>>>> Bionetworks where I am working now that your new C code needs to
>>>> experience at
>>>> least one order of magnitude increase in performance to warrant the
>>>> effort
>>>> of
>>>> moving from R to C.
>>>>
>>>> But, then again, I am working with scientists who are not primarily, or
>>>> even
>>>> secondarily, coders...
>>>>
>>>> Dave H
>>>>
>>>>
>>>>  This makes sense, but I have seem some very transparent algorithms
>> turned
>> into vectorized R code
>> that is difficult to read (and thus to maintain or to change). These
>> chunks
>> of optimized R code are like
>> embedded assembly, in the sense that nobody is likely to want to mess with
>> it. This could be addressed
>> by including pseudo code for the original (more transparent) algorithm as
>> a
>> comment, but I have never
>> seen this done in practice (perhaps it could be enforced by R CMD
>> check?!).
>>
>> On the other hand, in principle a well-documented piece of C/C++ code
>> could
>> be much easier to understand,
>> without paying a performance penalty...but "coders" are not likely to
>> place
>> this high on their
>> list of priorities.
>>
>> The bottom like is that R is an adaptor ("glue") language like Lisp that
>> makes it easy to mix and
>> match functions (using classes and generic functions), many of which are
>> written in C (or C++
>> or Fortran) for performance reasons. Like any object-based system there
>> can
>> be a lot of
>> object copying, and like any functional programming system, there can be a
>> lot of function
>> calls, resulting in poor performance for some applications.
>>
>> If you can vectorize your R code then you have effectively found a way to
>> benefit from
>> somebody else's C code, thus saving yourself some time. For operations
>> other
>> than pure
>> vector calculations you will have to do the C/C++ programming yourself (or
>> call a library
>> that somebody else has written).
>>
>> Dominick
>>
>>
>>
>>  ----- Original Message ----
>>>> From: Dirk Eddelbuettel<e...@debian.org>
>>>> To: Patrick Leyshock<ngkbr...@gmail.com>
>>>> Cc: r-devel@r-project.org
>>>> Sent: Mon, January 17, 2011 10:13:36 AM
>>>> Subject: Re: [Rd] R vs. C
>>>>
>>>>
>>>> On 17 January 2011 at 09:13, Patrick Leyshock wrote:
>>>> | A question, please about development of R packages:
>>>> |
>>>> | Are there any guidelines or best practices for deciding when and why
>>>> to
>>>> | implement an operation in R, vs. implementing it in C?  The "Writing R
>>>> | Extensions" recommends "working in interpreted R code . . . this is
>>>> normally
>>>> | the best option."  But we do write C-functions and access them in R -
>>>> the
>>>> | question is, when/why is this justified, and when/why is it NOT
>>>> justified?
>>>> |
>>>> | While I have identified helpful documents on R coding standards, I
>>>> have
>>>> not
>>>> | seen notes/discussions on when/why to implement in R, vs. when to
>>>> implement
>>>> | in C.
>>>>
>>>> The (still fairly recent) book 'Software for Data Analysis: Programming
>>>> with
>>>> R' by John Chambers (Springer, 2008) has a lot to say about this.  John
>>>> also
>>>> gave a talk in November which stressed 'multilanguage' approaches; see
>>>> e.g.
>>>>
>>>>
>>>> http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html
>>>>
>>>>
>>>> In short, it all depends, and it is unlikely that you will get a
>>>> coherent
>>>> answer that is valid for all circumstances.  We all love R for how
>>>> expressive
>>>> and powerful it is, yet there are times when something else is called
>>>> for.
>>>> Exactly when that time is depends on a great many things and you have
>>>> not
>>>> mentioned a single metric in your question.  So I'd start with John's
>>>> book.
>>>>
>>>> Hope this helps, Dirk
>>>>
>>>>  ______________________________________________
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R vs. C

Reply via email to