Mike,

On time classes specifically, the lubridate package
with documentation
Garrett Grolemund, Hadley Wickham (2011).
 Dates and Times Made Easy with lubridate.
 Journal of Statistical Software, 40(3), 1-25.
 http://www.jstatsoft.org/v40/i03/.

solves many confusion problems.  Does it handle the problems you are
reporting?

Rich



On Thu, Nov 3, 2011 at 7:49 PM, Mike Williamson <this.is....@gmail.com>wrote:

> Hi Joshua,
>
>    Thank you for the input!
>
>    I agree that it is non-trivial to solve the cases you & I have posed.
>  However, I would wholeheartedly support having an error spit back for any
> function that does not explicitly support a class.  In this case, if I
> attempt to do   sapply(x, class), and 'x' is of class "difftime", then I
> should receive an error "sapply cannot function upon class 'difftime' ".
>  Why do I take this stance?  There are at least 2 strong reasons:
>
>   - Most importantly, an incorrect answer is far more dangerous than no
>   answer.  E.g., if I ask "what is 3 + 3?", I would far prefer to receive
> "I
>   don't know" than "5".  The former lets me know I need to choose another
>   path, the latter mistakenly makes me think I have an answer, when I do
> not,
>   and I continue with analyses on the assumption that answer is correct.
>  In
>   the case of dates, this happens often.  E.g., is the numeric that is
>   returned from sapply, for instance, the # of seconds since 1970-01-01, or
>   the number of days since 1970-01-01.  This depends upon how 'R'
> internally
>   attempts to fix any incongruities.
>   - But also very significantly, an error will get me in the habit of
>   avoiding any marginalized class types.  I keep thinking, for instance,
> that
>   I can use the "Dates" class, since 'R' says that it supports them.  But
> if
>   I got into the habit of converting all dates into numerics myself
>   beforehand (maybe counting the number of seconds from 1970-01-01, since
>   that seems a magic date), then I would be guaranteed that a function will
>   either (a) cause an error (e.g., if I try a character function on it), or
>   (b) function properly.  However, since I don't overtly receive errors
> when
>   attempting to use dates (or difftimes, or factors, or whatever), I keep
>   using them, instead of relying solely upon the true & trusted classes.
>      - the trickiest here is really factors.  Factors are, by most
>      accounts, considered a core class.  In some cases, you can only use
>      factors.  E.g., when you want some sort of ordinal categorical
> variable.
>       Therefore, the fact that factors also barf similarly to other
> classes like
>      difftime (albeit much more rarely), is especially dangerous.
>
>    There are, of course, habits that we can create to make ourselves
> better programmers, and I will recognize that I can improve.  However, this
> issue of functions generating "wrong" answers is such a *huge* problem with
> 'R', and other languages are catching up to 'R' so quickly, as far as their
> capability to handle higher level math, that this issue is making 'R' a
> less desirable language to use, as time progresses.  I don't mean to claim
> that my opinion is the end-all-be-all, but I would like to hear others
> chime in, whether this is a large concern, or whether there is a very small
> minority of folks impacted by it.
>
>                                                  Regards,
>                                                         Mike
>
> ---
> XKCD <http://www.xkcd.com>
>
>
>
> On Thu, Nov 3, 2011 at 2:51 PM, Joshua Wiley <jwiley.ps...@gmail.com>
> wrote:
>
> > Hi Mike,
> >
> > This isn't really an answer to your question, but perhaps will serve
> > to continue discussion.  I think that there are some fundamental
> > issues when working special classes.  As a thought example, suppose I
> > wrote a class, "posreal", which inherits from the numeric class.  It
> > is only valid for positive, real numbers.  I use it in a package, but
> > do not develop methods for it.  A user comes along and creates a
> > vector, x that is a posreal.  Then tries: mean(x * -3).  Since I never
> > bothered to write a special method for mean for my class, R falls back
> > to the inherited numeric, but gives a value that is clearly not valid
> > for posreal.  What should happen?  S3 methods do not really have
> > validation, so in principle, one could write a function like:
> >
> > f <- function(x) {
> >  vclass <- class(x)
> >  res <- mean(x)
> >  class(res) <- vclass
> >  return(res)
> > }
> >
> > which "retains" the appropriate class, but in name only.  R core
> > cannot possibly know or imagine all classes that may be written that
> > inherit from more basic types but with possible special aspects and
> > requirements.  I think the inherited is considered to be more generic
> > and that is returned.  It is usually up to the user to ensure that the
> > function (whose methods were not specific to that special class but
> > the inherited) is valid for that class and can manually convert it
> > back:
> >
> > res <- as.posreal(res)
> >
> > What about lapply and sapply?  Neither are generic or have methods for
> > difftime, and so do some unexpected/desirable things.  Again, without
> > methods defined for a particular class, they cannot know what is
> > special or appropriate way to handle it, they use defaults which
> > sometimes work but may give unexpected or undesirable results, but
> > what else can be done?  (okay, they could just throw an error)  If a
> > function is naive about a class, it does not seem right to operate on
> > it using unknown methods and then pretend to be returning the same
> > type of data.  As it stands, they convert to a data type they know and
> > return that.
> >
> > Now, you mention that for loops are slow in R, and this is true to a
> > degree.  However, the *apply functions are basically just internal
> > loops, so they do not really save you (they are certainly not
> > vectorized!), though they are more elegant than explicit loops IMO.
> > One way to use them while retaining class would be like:
> >
> > sapply(seq_along(test), function(i) class(test[i]))
> >
> > this is less efficient then sapply(test, class), but the overhead
> > drops considerably as the function does nontrivial calculations.
> > Finally, I find the (relatively) new compiler package really shines at
> > making functions that are just wrappers for for loops more efficient.
> > Take a look at the examples from:
> >
> > require(compiler)
> > ?cmpfun
> >
> > I am not familiar with numPy so I do not know how it handles new
> > classes, but with some tweaks to my workflow, I do not find myself
> > running into problems with how R handles them.  I definitely
> > appreciate your position because I have been there...as I became more
> > familiar with R, classes, and methods, I find I work in a way that
> > avoids passing objects to functions that do not know how to handle
> > them properly.
> >
> > Cheers,
> >
> > Josh
> >
> >
> > On Thu, Nov 3, 2011 at 11:08 AM, Mike Williamson <this.is....@gmail.com>
> > wrote:
> > > Hi All,
> > >
> > >    I don't have a "I need help" question, so much as a query into any
> > > update whether 'R' has made any progress with some of the core
> functions
> > > retaining classes.  As an example, because it's one of the cases that
> > most
> > > egregiously impacts me & my work and keeps pushing me away from 'R' and
> > > into other numerical languages (such as NumPy in python), I will use
> > sapply
> > > / lapply to demonstrate, but this behavior is ubiquitous throughout
> 'R'.
> > >
> > >    Let's say I have a class which is theoretically supported, but not
> one
> > > of the core "numeric" or "character" classes (and, to some degree,
> > "factor"
> > > classes).  Many of the basic functions will convert my desired class
> into
> > > either numeric or character, so that my returned answer is gibberish.
> > >
> > > E.g.:
> > >
> > > test= as.difftime(c(1, 1, 8, 0.25, 8, 1.25), units= "days")  ## create
> a
> > > small array of time differences
> > > class(test)  ## this will return the proper class, "difftime"
> > > class(test[1] ) ## this will also return the proper class, "difftime"
> > > sapply(test, class)  ## this will return *numerics* for all of the
> > classes.
> > >  Ack!!
> > >
> > >    In the example I give above, the impact might seem small, but the
> > > implications are *huge*.  This means that I am, in effect, not allowed
> to
> > > use *any* of the vectoring functions in 'R', which avoid performing
> loops
> > > thereby speeding up process time extraordinarily.  Many can sympathize
> > that
> > > 'R' is ridiculously slow with "for" loops, compared to other languages.
> > >  But that's theoretically OK, a good statistician or data analyst
> should
> > be
> > > able to work comfortably with matrices and vectors.  However, *'R'
> cannot
> > > work comfortably* with matrices or vectors, *unless* they are using the
> > > numeric or character classes.  Many of the classes suffer the problem I
> > > just described, although I only used "difftime" in the example.
>  Factors
> > > seem a bit more "comfortable", and can be handled most of the time, but
> > not
> > > as well as numerics, and at times functions working on factors can
> return
> > > the numerical representation of the factor instead of the original
> > factor.
> > >
> > >    Is there any progress in guaranteeing that all core functions either
> > > (a) ideally return exactly the classes, and hierarchy of classes, that
> > they
> > > received (e.g., a list of data frames with difftimes & dates &
> characters
> > > would return a list of data frames with difftimes & dates &
> characters),
> > or
> > > (b) barring that, the function should at least error out with a clear
> > error
> > > explaining that sapply, for example, cannot vectorize on the class
> being
> > > used?  Returning incorrect answers is far worse than returning an
> error,
> > > from a perspective of stability.
> > >
> > >    This is, by far, the largest Achilles' heel to 'R'.  Personally, as
> my
> > > career advances and I work on more technical things, I am finding that
> I
> > > have to leave 'R' by the wayside and use other languages for robust
> > > numerical calculations and programming.  This saddens me, because there
> > are
> > > so many wonderful packages developed by the community.  The example
> above
> > > came up because I am using the "forecast" library to great effect in
> > > predicting how long our product cycle time will be.  However, I spend
> > much
> > > of my time fighting all these class & typing bugs in 'R' (and we have
> to
> > > start recognizing that they are bugs, otherwise they may never get
> > > resolved), such that many of the improvements in my productivity due to
> > all
> > > the wonderful computational packages are entirely offset by the time
> > > I spend fighting this issue of poor classes.
> > >
> > >                                     Thanks & Regards!
> > >                                              Mike
> > >
> > > ---
> > > XKCD <http://www.xkcd.com>
> > >
> > >        [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > r-h...@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> >
> > --
> > Joshua Wiley
> > Ph.D. Student, Health Psychology
> > Programmer Analyst II, ATS Statistical Consulting Group
> > University of California, Los Angeles
> > https://joshuawiley.com/
> >
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> r-h...@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to