Re: [R] readBin documentation error
> On 03 Sep 2016, at 03:24 , Yucheng Song via R-help > wrote: > > Thanks for the reply. What I meant was that there is no int(), if you do a > ?readBin, you will find it there. Not as far as I can tell: what: Either an object whose mode will give the mode of the vector to be read, or a character vector of length one describing the mode: one of ‘"numeric"’, ‘"double"’, ‘"integer"’, ‘"int"’, ‘"logical"’, ‘"complex"’, ‘"character"’, ‘"raw"’. Note: Either...or... I.e., you can use a character string (==vector of length one) readBin(zz, "int", 8, size = 1) and you can use an object of the desired mode readBin(zz, integer(), ...) or equivalently readBin(zz, 0L, ...) but there is no implication that each of the possible character strings have a corresponding function. It is not clear why we allow both "int" and "integer" here, but there is no reason to expect int() to exist. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] readBin documentation error
On 03/09/2016 3:56 AM, peter dalgaard wrote: On 03 Sep 2016, at 03:24 , Yucheng Song via R-help wrote: Thanks for the reply. What I meant was that there is no int(), if you do a ?readBin, you will find it there. Not as far as I can tell: what: Either an object whose mode will give the mode of the vector to be read, or a character vector of length one describing the mode: one of ‘"numeric"’, ‘"double"’, ‘"integer"’, ‘"int"’, ‘"logical"’, ‘"complex"’, ‘"character"’, ‘"raw"’. Note: Either...or... I.e., you can use a character string (==vector of length one) readBin(zz, "int", 8, size = 1) and you can use an object of the desired mode readBin(zz, integer(), ...) or equivalently readBin(zz, 0L, ...) but there is no implication that each of the possible character strings have a corresponding function. It is not clear why we allow both "int" and "integer" here, but there is no reason to expect int() to exist. Partial matching isn't allowed on the names (because a length one character vector implies 'what = "character"' unless it happens to contain one of those strings), so this is a way to allow a common readable abbreviation. Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improve code efficient with do.call, rbind and split contruction
Chuck et. al.: As I said previously, my intuition about the relative efficiency of tapply() and duplicated() in the context of this thread was wrong. But I wondered exactly how and to what extent. So I've fooled around a bit more and think I understand. Using the example I gave, the key is to replace the duplicated.data.frame method and the inner data.frame subscripting with the duplicated.default method via with() and the interaction() function (paste() -ing instead takes extra time): > system.time(z <-with(df,df[!duplicated(interaction(f,g),fromLast = TRUE),])) user system elapsed 0.039 0.006 0.045 > > system.time( + {ix <- seq_len(nrow(df)); +z <- with(df,df[tapply(ix,list(f,g),function(x)x[length(x)]),]) +}) user system elapsed 0.025 0.005 0.029 tapply() still appears slightly more efficient (which is still surprising to me), but only slightly. Hope this is informative. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Sep 2, 2016 at 1:48 PM, Bert Gunter wrote: > Chuck: > > I think this is quite clever. But note that the which() is > unnecessary: logical indicing suffices, e.g. > > df[!duplicated(df[,c("f","g")],fromLast = TRUE),] > > I thought that your approach would be faster because it moves > comparisons from the tapply() to C code. But I was wrong. e.g. for 1e6 > rows: > >> set.seed(1001) >> df <- data.frame(f =factor(sample(LETTERS[1:4],1e6,rep=TRUE)), >+ g > =factor(sample(letters[1:6],1e6,rep=TRUE)), >+ y = runif(1e6)) > > ##using duplicated() > > system.time(z <-df[!duplicated(df[,c("f","g")],fromLast = TRUE),]) > user system elapsed > 0.175 0.008 0.183 > > ## Using tapply() > > system.time( > + {ix <- seq_len(nrow(df)); > + z <- df[with(df,tapply(ix,list(f,g),function(x)x[length(x)])),] > + }) > user system elapsed > 0.025 0.003 0.028 > > > This illustrates the faultiness of my "intuition." A guess would be > that the subscripting to get the factor combinations and > duplicated.data.frame method takes the extra time. > > Anyway... > > Best, > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Fri, Sep 2, 2016 at 11:50 AM, Charles C. Berry wrote: >> On Fri, 2 Sep 2016, Bert Gunter wrote: >> [snip] >>> >>> >>> The "trick" is to use tapply() to select the necessary row indices of >>> your data frame and forget about all the do.call and rbind stuff. e.g. >>> >> >> I agree the way to go is "select the necessary row indices" but I get there >> a different way. See below. >> set.seed(1001) df <- data.frame(f =factor(sample(LETTERS[1:4],100,rep=TRUE)), >>> >>> + g <- factor(sample(letters[1:6],100,rep=TRUE)), >>> + y = runif(100)) ix <- seq_len(nrow(df)) ix <- with(df,tapply(ix,list(f,g),function(x)x[length(x)])) ix >>> >>> a b c d e f >>> A 94 69 100 59 80 87 >>> B 89 57 65 90 75 88 >>> C 85 92 86 95 97 62 >>> D 47 73 72 74 99 96 >> >> >> >> jx <- which( !duplicated( df[,c("f","g")], fromLast=TRUE )) >> >> xtabs(jx~f+g,df[jx,]) ## Show equivalence to Bert's `ix' >> >>g >> f a b c d e f >> A 94 69 100 59 80 87 >> B 89 57 65 90 75 88 >> C 85 92 86 95 97 62 >> D 47 73 72 74 99 96 >> >> >> Chuck >> >> __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improve code efficient with do.call, rbind and split contruction
On Sat, 3 Sep 2016, Bert Gunter wrote: Chuck et. al.: As I said previously, my intuition about the relative efficiency of tapply() and duplicated() in the context of this thread was wrong. My `intuition' was wrong, too. But tapply() uses split() which runs quite fast. So not a big surprise, but if you look thru tapply() you'll notice it is well crafted in other ways. In particular, the way the `f' arg of split is constructed makes a big difference in timing (using a for loop to build up a mixed radix number). In fact interaction(f,g) needs about 3 times the time of tapply(f,list(f,g)) for just building an index. Thanks for following up. Best, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] svytable: How do i create a table informing a third variable?
> On Sep 2, 2016, at 6:08 PM, Juan Ceccarelli Arias wrote: > > Thanks a lot. Your code does the trick. > One last question: > The tabulate produced is showing every cross in just one column. > I mean, it presents the region by order and sex=1, and then again the > region but by sex==2. > Can i list or present as this: > sex1 sex2 > region1 323. 3434.. > ... > regionN 123.. 432.. > > and ignoring the remaining info (standar errors or se in this case)? > Again, thanks Anthony. Really. > (Anthony's probably asleep.) This doesn't ignore the se's but that could be easily done by omitting that column from the data argument: >From the examples on the help page for svymean: > svyby( ~ mobility , ~ stype + comp.imp , dclus1 , svymean ) stype comp.imp mobilityse E.No E No 19.71875 1.347583 H.No H No 13.14286 0.740017 M.No M No 14.81818 2.960618 E.Yes E Yes 17.28571 1.536158 H.Yes H Yes 35.14286 16.570001 M.Yes M Yes 13.71429 2.628573 apimeans1 <- svyby( ~ mobility , ~ stype + comp.imp , dclus1 , svymean ) > reshape(apimeans1, idvar='stype', direction="wide", timevar="comp.imp") stype mobility.Nose.No mobility.Yesse.Yes E.No E19.71875 1.347583 17.28571 1.536158 H.No H13.14286 0.740017 35.14286 16.570001 M.No M14.81818 2.960618 13.71429 2.628573 -- David. > > > > > On Fri, Sep 2, 2016 at 8:24 PM, Anthony Damico wrote: > >> # mean >> svymean( ~ income_variable , NN ) >> svyby( ~ income_variable , ~ age + sex , NN , svymean ) >> >> # median >> svyquantile( ~ income_variable , NN ) >> svyby( ~ income_variable , ~ age + sex , NN , svyquantile , 0.5 ) >> >> >> >> >> On Fri, Sep 2, 2016 at 3:04 PM, Juan Ceccarelli Arias >> wrote: >> >>> Hello >>> Im analyzing a survey and i need to obtain some statistics per groups. >>> Im able to create a table with sex and age. However, if i want to know how >>> much income earns the population by sex and age, i can't. >>> Im loading the dataset as describe the line below >>> NN <- svydesign(ids = ~1, data = encuesta, weights = fact) >>> Some simple table i can create >>> table(svytable(~age+sex,design=NN)) >>> But im not able to handle the same tabulate referencing a income variable, >>> in this case, wage. >>> Can you help me? >>> Thanks for your replies and time. >>> >>>[[alternative HTML version deleted]] >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posti >>> ng-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] svytable: How do i create a table informing a third variable?
Your help was everything i needed it. Please, declare this topic as solved. And thanks again. On Sat, Sep 3, 2016 at 10:06 PM, David Winsemius wrote: > > > On Sep 2, 2016, at 6:08 PM, Juan Ceccarelli Arias > wrote: > > > > Thanks a lot. Your code does the trick. > > One last question: > > The tabulate produced is showing every cross in just one column. > > I mean, it presents the region by order and sex=1, and then again the > > region but by sex==2. > > Can i list or present as this: > > sex1 sex2 > > region1 323. 3434.. > > ... > > regionN 123.. 432.. > > > > and ignoring the remaining info (standar errors or se in this case)? > > Again, thanks Anthony. Really. > > > (Anthony's probably asleep.) > > This doesn't ignore the se's but that could be easily done by omitting > that column from the data argument: > > From the examples on the help page for svymean: > > > svyby( ~ mobility , ~ stype + comp.imp , dclus1 , svymean ) > stype comp.imp mobilityse > E.No E No 19.71875 1.347583 > H.No H No 13.14286 0.740017 > M.No M No 14.81818 2.960618 > E.Yes E Yes 17.28571 1.536158 > H.Yes H Yes 35.14286 16.570001 > M.Yes M Yes 13.71429 2.628573 > > apimeans1 <- svyby( ~ mobility , ~ stype + comp.imp , dclus1 , svymean ) > > > reshape(apimeans1, idvar='stype', direction="wide", timevar="comp.imp") > stype mobility.Nose.No mobility.Yesse.Yes > E.No E19.71875 1.347583 17.28571 1.536158 > H.No H13.14286 0.740017 35.14286 16.570001 > M.No M14.81818 2.960618 13.71429 2.628573 > > -- > David. > > > > > > > > > > > On Fri, Sep 2, 2016 at 8:24 PM, Anthony Damico > wrote: > > > >> # mean > >> svymean( ~ income_variable , NN ) > >> svyby( ~ income_variable , ~ age + sex , NN , svymean ) > >> > >> # median > >> svyquantile( ~ income_variable , NN ) > >> svyby( ~ income_variable , ~ age + sex , NN , svyquantile , 0.5 ) > >> > >> > >> > >> > >> On Fri, Sep 2, 2016 at 3:04 PM, Juan Ceccarelli Arias < > jfca...@gmail.com> > >> wrote: > >> > >>> Hello > >>> Im analyzing a survey and i need to obtain some statistics per groups. > >>> Im able to create a table with sex and age. However, if i want to know > how > >>> much income earns the population by sex and age, i can't. > >>> Im loading the dataset as describe the line below > >>> NN <- svydesign(ids = ~1, data = encuesta, weights = fact) > >>> Some simple table i can create > >>> table(svytable(~age+sex,design=NN)) > >>> But im not able to handle the same tabulate referencing a income > variable, > >>> in this case, wage. > >>> Can you help me? > >>> Thanks for your replies and time. > >>> > >>>[[alternative HTML version deleted]] > >>> > >>> __ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide http://www.R-project.org/posti > >>> ng-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > >> > >> > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.