Re: [R] indexing question

Thomas Lumley Tue, 13 Jan 2009 23:25:57 -0800


There are real examples; they are all fairly obscure.  It can't be a big 
problem because the standard formal argument name for a data frame in modelling 
and graphics functions is 'data'.  That's actually a more serious problem than 
the function called data() -- having local and global variables with the same 
name won't confuse R, but it can easily confuse you.


Possibilities for R getting confused include
 1. The functions for environment access by name, eg exists(), get(), don't by 
default check the type of the argument.
 2. bquote() and substitute() substitute before evaluating and could get 
confused.

There used to be real problems in S when certain function names were used as 
data names.  Then there was a period of aversive conditioning by irritating 
warnings. As a result, I still avoid 'c' and 't' as variable names.

You could call your data frames 'df' -- many of the people who complain about 
'data' don't realise that df() is the density function of the F distribution :)

     -thomas



On Tue, 13 Jan 2009, Ista Zahn wrote:

On Tue, Jan 13, 2009 at 10:23 AM, jim holtman <jholt...@gmail.com> wrote:

How about this:

data(ToothGrowth)
ls()

[1] "ToothGrowth"

data <- function(x){invisible(NULL)}
data(ToothGrowth)
ls()

[1] "data"

Yep, that sure does cause a problem alright. Is it the case that that
problems arise when you name a function with the same name as an existing
function? Or are there cases where naming data.frames, vectors, matrices,
etc. can also cause problems?

I hope I'm not being annoying -- I'm just trying to determine if I need to
break my habit of naming data.frames "data".

Thanks,
Ista



On Tue, Jan 13, 2009 at 9:53 AM, Ista Zahn <istaz...@gmail.com> wrote:

From: baptiste auguie <ba...@exeter.ac.uk>
To: Dimitris Rizopoulos <d.rizopou...@erasmusmc.nl>
Date: Tue, 13 Jan 2009 09:38:09 +0000
Subject: Re: [R] indexing question

you can also look at subset,


       my.data.frame <- data.frame(a=rnorm(10),

b=factor(sample(letters[1:4], 10, replace=T)))
       str(my.data.frame)
       my.data.frame[my.data.frame$b == "a", ]
       subset(my.data.frame, b == "a")


by the way, it is probably safer not to use "data" as a variable name as

it

is also a function.


I've often wondered about this. The thing is, I've never run into a

problem

with this. For example:

ls()

character(0)

data(ToothGrowth)
ls()

[1] "ToothGrowth"

rm(ToothGrowth)
ls()

character(0)

data <- data.frame(1:10, 101:110)
data(ToothGrowth) #works just the same
ls()

[1] "data"        "ToothGrowth"


In this example the data command works just the same the second time,

even

though I have a data.frame named data. Can someone give an example where
this causes a problem?

Thanks,
Ista

       [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Thomas Lumley                   Assoc. Professor, Biostatistics
tlum...@u.washington.edu        University of Washington, Seattle

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] indexing question

Reply via email to