On Wed, Aug 01, 2012 at 05:13:01PM -0500, R. Michael Weylandt wrote:
> On Wed, Aug 1, 2012 at 4:19 PM, Ramiro Barrantes
> <ram...@precisionbioassay.com> wrote:
> > Hello,
> >
> > I come from using different programming languages (C++, Mathematica, Perl) 
> > but have been using R extensively for several months.  I see the data frame 
> > as a key piece of the language and wanted to inquire people's experience 
> > regarding its use.
> >
> > Say you have a data frame D
> >
> > D <- data.frame(some columns)
> >
> > and you define a function that needs the information from this data frame 
> > and is supposed to return a calculation based on some columns of such data 
> > frame D.
> >
> > func <- function(d) {}
> > #EFFECT: Does calculation X from some columns of d
> >
> > QUESTION: Would you consider better practice to return the same data.frame 
> > but expanded, or would you return a small data frame that consists of the 
> > newly computed columns?
> 
> I'd say return what you need, no more no less: and if you want to
> reattach it to the input data, do that at the caller level, but don't
> make it required: orthogonality and minimality and all that jazz....

One thing I consider an element of good practice is: Whatever type of
object you return, make sure it contains meaningful names.

So if you return an extended data frame in the example above, make
sure the new columns have reasonable names. If you return a new data
frame with rows corresponding to the data frame passed in, then it's
definitely part of the job of func to do something like

    rownames(returnedFrame) <- rownames(d);

rather than leaving it to its clients to do that.

This is rather basic but I've seen this being neglected sufficiently
many times to write this post...

Regarding whether to return an extended data frame vs just the computed
columns, I'd tend to go with the latter but if I find that clients of
func all merge the columns from the result into the frame passed in,
then doing that might be a responsibility of func as well.

Best regards, Jan

> As Bert points out, note that returning a data.frame is by no means
> necessary -- they aren't "primitive" data structures like (atomic)
> vectors and lists [we are in a Scheme dialect after all!], but they
> are helpful and well supported. Use them liberally but no more than
> necessary ;-)
> 
> Best,
> Michael
> 
> >
> > Some might say, either way, personal preference.  But after using and 
> > seeing other's code for some time, I am thinking that returning the result 
> > that consists of ONLY the relevant columns is a better practice as it 
> > defines the function as only returning what it was intended to return, and 
> > leaves it up to the user of the function to do whatever they were intending 
> > to do with it (including naming of the new columns, adding them to a data 
> > frame, etc.).  This might be a question for a computer programming theory 
> > group, but if anybody has any insight from their experience please share.
> >
> > Thanks in advance,
> >
> > Ramiro
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
 +- Jan T. Kim -------------------------------------------------------+
 |             email: jtt...@gmail.com                                |
 |             WWW:   http://www.jtkim.dreamhosters.com/              |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to