Re: [R] readBin documentation error

2016-09-03 Thread peter dalgaard

> On 03 Sep 2016, at 03:24 , Yucheng Song via R-help  
> wrote:
> 
> Thanks for the reply. What I meant was that there is no int(), if you do a 
> ?readBin, you will find it there. 

Not as far as I can tell:

what: Either an object whose mode will give the mode of the vector
  to be read, or a character vector of length one describing
  the mode: one of ‘"numeric"’, ‘"double"’, ‘"integer"’,
  ‘"int"’, ‘"logical"’, ‘"complex"’, ‘"character"’, ‘"raw"’.

Note: Either...or...

I.e., you can use a character string (==vector of length one) 

readBin(zz, "int", 8, size = 1) 

and you can use an object of the desired mode

readBin(zz, integer(), ...) or equivalently readBin(zz, 0L, ...)

but there is no implication that each of the possible character strings have a 
corresponding function. It is not clear why we allow  both "int" and "integer" 
here, but there is no reason to expect int() to exist.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] readBin documentation error

2016-09-03 Thread Duncan Murdoch

On 03/09/2016 3:56 AM, peter dalgaard wrote:



On 03 Sep 2016, at 03:24 , Yucheng Song via R-help  wrote:

Thanks for the reply. What I meant was that there is no int(), if you do a 
?readBin, you will find it there.


Not as far as I can tell:

what: Either an object whose mode will give the mode of the vector
  to be read, or a character vector of length one describing
  the mode: one of ‘"numeric"’, ‘"double"’, ‘"integer"’,
  ‘"int"’, ‘"logical"’, ‘"complex"’, ‘"character"’, ‘"raw"’.

Note: Either...or...

I.e., you can use a character string (==vector of length one)

readBin(zz, "int", 8, size = 1)

and you can use an object of the desired mode

readBin(zz, integer(), ...) or equivalently readBin(zz, 0L, ...)

but there is no implication that each of the possible character strings have a 
corresponding function.



It is not clear why we allow  both "int" and "integer" here, but there is no 
reason to expect int() to exist.



Partial matching isn't allowed on the names (because a length one 
character vector implies 'what = "character"' unless it happens to 
contain one of those strings), so this is a way to allow a common 
readable abbreviation.


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improve code efficient with do.call, rbind and split contruction

2016-09-03 Thread Bert Gunter
Chuck et. al.:

As I said previously, my intuition about the relative efficiency of
tapply() and duplicated() in the context of this thread was wrong. But
I wondered exactly how and to what extent. So I've fooled around a bit
more and think I understand. Using the example I gave, the key is to
replace the duplicated.data.frame method and the inner data.frame
subscripting with the duplicated.default method via with() and the
interaction() function (paste() -ing instead takes extra time):

> system.time(z <-with(df,df[!duplicated(interaction(f,g),fromLast = TRUE),]))
   user  system elapsed
  0.039   0.006   0.045
>
> system.time(
+   {ix <- seq_len(nrow(df));
+z <- with(df,df[tapply(ix,list(f,g),function(x)x[length(x)]),])
+})
   user  system elapsed
  0.025   0.005   0.029


tapply() still appears slightly more efficient (which is still
surprising to me), but only slightly.


Hope this is informative.


Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Sep 2, 2016 at 1:48 PM, Bert Gunter  wrote:
> Chuck:
>
> I think this is quite clever. But note that the which() is
> unnecessary: logical indicing suffices, e.g.
>
> df[!duplicated(df[,c("f","g")],fromLast = TRUE),]
>
> I thought that your approach would be faster because it moves
> comparisons from the tapply() to C code. But I was wrong. e.g. for 1e6
> rows:
>
>> set.seed(1001)
>> df <- data.frame(f =factor(sample(LETTERS[1:4],1e6,rep=TRUE)),
>+ g
> =factor(sample(letters[1:6],1e6,rep=TRUE)),
>+ y = runif(1e6))
>
> ##using duplicated()
>  > system.time(z <-df[!duplicated(df[,c("f","g")],fromLast = TRUE),])
> user  system elapsed
> 0.175   0.008   0.183
>
> ## Using tapply()
>  > system.time(
> + {ix <- seq_len(nrow(df));
> + z <- df[with(df,tapply(ix,list(f,g),function(x)x[length(x)])),]
> + })
> user  system elapsed
> 0.025   0.003   0.028
>
>
> This illustrates the faultiness of my "intuition."  A guess would be
> that the subscripting to get the factor combinations and
> duplicated.data.frame method takes the extra time.
>
> Anyway...
>
> Best,
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Sep 2, 2016 at 11:50 AM, Charles C. Berry  wrote:
>> On Fri, 2 Sep 2016, Bert Gunter wrote:
>> [snip]
>>>
>>>
>>> The "trick" is to use tapply() to select the necessary row indices of
>>> your data frame and forget about all the do.call and rbind stuff. e.g.
>>>
>>
>> I agree the way to go is "select the necessary row indices" but I get there
>> a different way. See below.
>>
 set.seed(1001)
 df <- data.frame(f =factor(sample(LETTERS[1:4],100,rep=TRUE)),
>>>
>>> +  g <- factor(sample(letters[1:6],100,rep=TRUE)),
>>> +  y = runif(100))


 ix <- seq_len(nrow(df))

 ix <- with(df,tapply(ix,list(f,g),function(x)x[length(x)]))
 ix
>>>
>>>   a  b   c  d  e  f
>>> A 94 69 100 59 80 87
>>> B 89 57  65 90 75 88
>>> C 85 92  86 95 97 62
>>> D 47 73  72 74 99 96
>>
>>
>>
>>   jx <- which( !duplicated( df[,c("f","g")], fromLast=TRUE ))
>>
>>   xtabs(jx~f+g,df[jx,]) ## Show equivalence to Bert's `ix'
>>
>>g
>> f a   b   c   d   e   f
>>   A  94  69 100  59  80  87
>>   B  89  57  65  90  75  88
>>   C  85  92  86  95  97  62
>>   D  47  73  72  74  99  96
>>
>>
>> Chuck
>>
>>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Improve code efficient with do.call, rbind and split contruction

2016-09-03 Thread Charles C. Berry

On Sat, 3 Sep 2016, Bert Gunter wrote:


Chuck et. al.:

As I said previously, my intuition about the relative efficiency of
tapply() and duplicated() in the context of this thread was wrong.


My `intuition' was wrong, too.

But tapply() uses split() which runs quite fast. So not a big surprise, 
but if you look thru tapply() you'll notice it is well crafted in other 
ways. In particular, the way the `f' arg of split is constructed makes a 
big difference in timing (using a for loop to build up a mixed radix 
number). In fact interaction(f,g) needs about 3 times the time of 
tapply(f,list(f,g)) for just building an index.


Thanks for following up.

Best,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] svytable: How do i create a table informing a third variable?

2016-09-03 Thread David Winsemius

> On Sep 2, 2016, at 6:08 PM, Juan Ceccarelli Arias  wrote:
> 
> Thanks a lot. Your code does the trick.
> One last question:
> The tabulate produced is showing every cross in just one column.
> I mean, it presents the region by order and sex=1, and then again the
> region but by sex==2.
> Can i list or present as this:
> sex1 sex2
> region1  323.  3434..
> ...
> regionN 123..  432..
> 
> and ignoring the remaining info (standar errors or se in this case)?
> Again, thanks Anthony. Really.
> 
(Anthony's probably asleep.)

This doesn't ignore the se's but that could be easily done by omitting that 
column from the data argument:

>From the examples on the help page for svymean:

> svyby( ~ mobility , ~ stype + comp.imp , dclus1 , svymean )
  stype comp.imp mobilityse
E.No  E   No 19.71875  1.347583
H.No  H   No 13.14286  0.740017
M.No  M   No 14.81818  2.960618
E.Yes E  Yes 17.28571  1.536158
H.Yes H  Yes 35.14286 16.570001
M.Yes M  Yes 13.71429  2.628573

apimeans1 <- svyby( ~ mobility , ~ stype + comp.imp , dclus1 , svymean )

> reshape(apimeans1, idvar='stype', direction="wide", timevar="comp.imp")
 stype mobility.Nose.No mobility.Yesse.Yes
E.No E19.71875 1.347583 17.28571  1.536158
H.No H13.14286 0.740017 35.14286 16.570001
M.No M14.81818 2.960618 13.71429  2.628573

-- 
David.

> 
> 
> 
> 
> On Fri, Sep 2, 2016 at 8:24 PM, Anthony Damico  wrote:
> 
>> # mean
>> svymean( ~ income_variable , NN )
>> svyby( ~ income_variable , ~ age + sex , NN , svymean )
>> 
>> # median
>> svyquantile( ~ income_variable , NN )
>> svyby( ~ income_variable , ~ age + sex , NN , svyquantile , 0.5 )
>> 
>> 
>> 
>> 
>> On Fri, Sep 2, 2016 at 3:04 PM, Juan Ceccarelli Arias 
>> wrote:
>> 
>>> Hello
>>> Im analyzing a survey and i need to obtain some statistics per groups.
>>> Im able to create a table with sex and age. However, if i want to know how
>>> much income earns the population by sex and age, i can't.
>>> Im loading the dataset as describe the line below
>>> NN <- svydesign(ids = ~1, data = encuesta, weights = fact)
>>> Some simple table i can create
>>> table(svytable(~age+sex,design=NN))
>>> But im not able to handle the same tabulate referencing a income variable,
>>> in this case, wage.
>>> Can you help me?
>>> Thanks for your replies and time.
>>> 
>>>[[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] svytable: How do i create a table informing a third variable?

2016-09-03 Thread Juan Ceccarelli Arias
Your help was everything i needed it.
Please, declare this topic as solved.
And thanks again.

On Sat, Sep 3, 2016 at 10:06 PM, David Winsemius 
wrote:

>
> > On Sep 2, 2016, at 6:08 PM, Juan Ceccarelli Arias 
> wrote:
> >
> > Thanks a lot. Your code does the trick.
> > One last question:
> > The tabulate produced is showing every cross in just one column.
> > I mean, it presents the region by order and sex=1, and then again the
> > region but by sex==2.
> > Can i list or present as this:
> > sex1 sex2
> > region1  323.  3434..
> > ...
> > regionN 123..  432..
> >
> > and ignoring the remaining info (standar errors or se in this case)?
> > Again, thanks Anthony. Really.
> >
> (Anthony's probably asleep.)
>
> This doesn't ignore the se's but that could be easily done by omitting
> that column from the data argument:
>
> From the examples on the help page for svymean:
>
> > svyby( ~ mobility , ~ stype + comp.imp , dclus1 , svymean )
>   stype comp.imp mobilityse
> E.No  E   No 19.71875  1.347583
> H.No  H   No 13.14286  0.740017
> M.No  M   No 14.81818  2.960618
> E.Yes E  Yes 17.28571  1.536158
> H.Yes H  Yes 35.14286 16.570001
> M.Yes M  Yes 13.71429  2.628573
>
> apimeans1 <- svyby( ~ mobility , ~ stype + comp.imp , dclus1 , svymean )
>
> > reshape(apimeans1, idvar='stype', direction="wide", timevar="comp.imp")
>  stype mobility.Nose.No mobility.Yesse.Yes
> E.No E19.71875 1.347583 17.28571  1.536158
> H.No H13.14286 0.740017 35.14286 16.570001
> M.No M14.81818 2.960618 13.71429  2.628573
>
> --
> David.
>
> >
> >
> >
> >
> > On Fri, Sep 2, 2016 at 8:24 PM, Anthony Damico 
> wrote:
> >
> >> # mean
> >> svymean( ~ income_variable , NN )
> >> svyby( ~ income_variable , ~ age + sex , NN , svymean )
> >>
> >> # median
> >> svyquantile( ~ income_variable , NN )
> >> svyby( ~ income_variable , ~ age + sex , NN , svyquantile , 0.5 )
> >>
> >>
> >>
> >>
> >> On Fri, Sep 2, 2016 at 3:04 PM, Juan Ceccarelli Arias <
> jfca...@gmail.com>
> >> wrote:
> >>
> >>> Hello
> >>> Im analyzing a survey and i need to obtain some statistics per groups.
> >>> Im able to create a table with sex and age. However, if i want to know
> how
> >>> much income earns the population by sex and age, i can't.
> >>> Im loading the dataset as describe the line below
> >>> NN <- svydesign(ids = ~1, data = encuesta, weights = fact)
> >>> Some simple table i can create
> >>> table(svytable(~age+sex,design=NN))
> >>> But im not able to handle the same tabulate referencing a income
> variable,
> >>> in this case, wage.
> >>> Can you help me?
> >>> Thanks for your replies and time.
> >>>
> >>>[[alternative HTML version deleted]]
> >>>
> >>> __
> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide http://www.R-project.org/posti
> >>> ng-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >>
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.