This is a recurring problem and from previous correspondence it seems unlikely 
that "sample" itself will ever be changed (and having myself been on the wrong 
end of a number of non-back-compatible changes in R, that's fine with me!).

To forestall future confusion, my suggestion is to add a function "rsample" 
defined as below, which has first argument "n" (number of values to return) 
consistent with the other "r..." random-generating functions. 

rsample <- function( n=length(pop), pop, replace=FALSE, prob=NULL) 
  pop[ sample( seq_along( pop)-1, size=n, replace=replace, prob=prob)+1]

The default for n is not necessary, but handy in case one is just trying to 
reorder a "pop" argument that is defined on-the-fly (as in Wacek's example). 
The -1 & +1 in the body prevent 'sample' from getting confused.

Perhaps this should be patched up to cope with the case n==length(pop)==0 that 
Duncan mentions

rsample <- function( n=length(pop), pop, replace=FALSE, prob=NULL) 
  if( n>0) pop[ sample( seq_along( pop)-1, size=n, replace=replace, 
prob=prob)+1] else if(n==0) pop[0] else stop( "invalid 'n' argument")

Mark Bravington
________________________________________
From: r-devel-boun...@r-project.org [r-devel-boun...@r-project.org] On Behalf 
Of Wacek Kusnierczyk [waclaw.marcin.kusnierc...@idi.ntnu.no]
Sent: 03 January 2009 06:54
To: r-devel@r-project.org
Subject: [Rd] [Fwd: Re: [R] Randomly remove condition-selected rows from a      
matrix]

Following Duncan's suggestion, I forward the below to R-devel.

vQ

-------- Original Message --------
Subject:        Re: [R] Randomly remove condition-selected rows from a matrix
Date:   Fri, 02 Jan 2009 10:34:52 -0500
From:   Duncan Murdoch <murd...@stats.uwo.ca>
To:     Wacek Kusnierczyk <waclaw.marcin.kusnierc...@idi.ntnu.no>
CC:     R help <r-h...@stat.math.ethz.ch>
References:     <79cafbdd-4bb8-4c9d-a0e9-54e280458...@gmail.com>
<8b356f880812300920o19d18aeo47dc31f087c3...@mail.gmail.com>
<da6ecc19-c786-4c02-b246-4b613726b...@gmail.com>
<8b356f880812311042la28aef3t81ad09a3b14c...@mail.gmail.com>
<495e2d95.9040...@idi.ntnu.no>



On 02/01/2009 10:07 AM, Wacek Kusnierczyk wrote:
> Stavros Macrakis wrote:
>> On Wed, Dec 31, 2008 at 12:44 PM, Guillaume Chapron
>> <carnivorescie...@gmail.com> wrote:
>>
>>>> m[-sample(which(m[,1]<8 & m[,2]>12),2),]
>>>>
>>> Supposing I sample only one row among the ones matching my criteria. Then
>>> consider the case where there is just one row matching this criteria. Sure,
>>> there is no need to sample, but the instruction would still be executed.
>>> Then if this row index is 15, my instruction becomes which(15,1), and this
>>> can gives me any row from 1 to 15, which is not correct. I have to make a
>>> condition in case there is only one row matching the criteria.
>>>
>> Yes, this is a (documented!) design flaw in 'sample' -- see the man page.
>>
>> For some reason, the designers of R have chosen to document the flaw
>> and leave it up to individual users to work around it rather than fix
>> it definitively.  A related case is sample(c(),0), which gives an
>> error rather than giving an empty vector, though in general R deals
>> with empty vectors correctly (e.g. sum(c()) => 0).
>>
>>
>
> interestingly, ?sample says:
>
> "
>      'sample' takes a sample of the specified size from the elements of
>      'x' using either with or without replacement.
>
>        x: Either a (numeric, complex, character or logical) vector of
>           more than one element from which to choose, or a positive
>           integer.
>
>     If 'x' has length 1, is numeric (in the sense of 'is.numeric') and
>      'x >= 1', sampling takes place from '1:x'.  _Note_ that this
>      convenience feature may lead to undesired behaviour when 'x' is of
>      varying length 'sample(x)'.  See the 'resample()' example below.
>
> "
>
> yet the following works, even though x has length 1 and is *not* numeric:
>
> x = "foolme"
> is.numeric(x)
> sample(x, 1)
> sample(x)
>
> x = NA
> is.numeric(NA)
> sample(x, 1)
> sample(x)
>
> is this a bug in the code, or a bug in the documentation?
>
>
>
>> To my mind, it is bizarre to have an important basic function which
>> works for some argument lengths but not others.  The convenience of
>> being able to write sample(5,2) for sample(1:5,2) hardly seems worth
>> inflicting inconsistency on all users -- but perhaps one of the
>> designers of R/S can enlighten us on the design rationale here.
>>
>>
>
> hopefully.

This is more of an R-devel sort of question.  My guess is that this is
in the S blue book, but I don't have a copy here to check.

Duncan Murdoch

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to