Re: [R] Creating Enumerated Variables

2010-07-16 Thread Hadley Wickham
On Thu, Jul 15, 2010 at 11:08 PM, Dennis Murphy wrote: > Hi: > > I sincerely hope there's an easier way, but one method to get this is as > follows, > with d as the data frame name of your test data: > > d <- d[order(with(d, Age, School, rev(Grade))), ] > d$Count <- do.call(c, mapply(seq, 1, as.ve

Re: [R] NA preserved in logical call - I don't understand this behavior because NA is not equal to 0

2010-07-18 Thread Hadley Wickham
> The problem is in data.frame[ and any NA in a logical vector will return a > row of NA's. This can be avoid by wrapping which() around the logical vector > which seems entirely wasteful or using subset(). The basic philosophy that causes this behaviour is sensible in my opinion: missing values m

Re: [R] best way to apply a list of functions to a dataset ?

2010-07-21 Thread Hadley Wickham
> ddply(ma, .(variable), summarise, mean = mean(value), sd = sd(value), >       skewness = skewness(value), median = median(value), >       mean.gt.med = mean.gt.med(value)) In principle, you should be able to do: ddply(ma, .(variable), colwise(each(mean, sd, skewness, median, mean.gt.med))) but

Re: [R] using "sample()" for a vector of length 1

2010-07-22 Thread Hadley Wickham
Did you look at the examples in sample? # sample()'s surprise -- example x <- 1:10 sample(x[x > 8]) # length 2 sample(x[x > 9]) # oops -- length 10! sample(x[x > 10]) # length 0 ## For R >= 2.11.0 only resample <- function(x, ...) x[sample.int(length(x), ...)] resample(x[x > 8]) #

Re: [R] p-VALUE calculation

2010-07-22 Thread Hadley Wickham
What is your null hypothesis? What is your alternate hypothesis? What is the test statistic? Why do you want a p-value? Hadley On Thu, Jul 22, 2010 at 5:40 PM, jd6688 wrote: > > Here is my dataframe with 1000 rows: > > employee_id         weigth       p-value > > 100                     150 > 1

Re: [R] union data in column

2010-07-24 Thread Hadley Wickham
On Sat, Jul 24, 2010 at 2:23 AM, Jeff Newmiller wrote: > Fahim Md wrote: >> >> Is there any function/way to merge/unite the following data >> >>  GENEID      col1          col2             col3                col4 >>  G234064         1             0                  0                   0 >>  G2340

[R] [R-pkgs] plyr version 1.1

2010-07-26 Thread Hadley Wickham
plyr is a set of tools for a common set of problems: you need to break down a big data structure into manageable pieces, operate on each piece and then put all the pieces back together. For example, you might want to: * fit the same model to subsets of a data frame * quickly calculate summary

Re: [R] adding list to data.frame iteratively

2010-09-08 Thread Hadley Wickham
Why don't you read the answers to your stackoverflow question? http://stackoverflow.com/questions/3665885/adding-a-list-of-vectors-to-a-data-frame-in-r/3667753 Hadley On Wed, Sep 8, 2010 at 1:17 AM, raje...@cse.iitm.ac.in wrote: > > Hi, > > I have a preallocated dataframe to which I have to add

Re: [R] Strange output daply with empty strata

2010-09-09 Thread hadley wickham
> daply(data.test, .(municipality, employed), function(d){mean(d$age)} ) >     employed > municipality   no  yes >    A 41.58759 44.67463 >    B 55.57407 43.82545 >    C 43.59330   NA > > The .drop argument has a different meaning in daply. Some R functio

[R] [R-pkgs] plyr: version 1.2

2010-09-10 Thread Hadley Wickham
plyr is a set of tools for a common set of problems: you need to __split__ up a big data structure into homogeneous pieces, __apply__ a function to each piece and then __combine__ all the results back together. For example, you might want to: * fit the same model each patient subsets of a data f

[R] [R-pkgs] reshape2: a reboot of the reshape package

2010-09-10 Thread Hadley Wickham
Reshape2 is a reboot of the reshape package. It's been over five years since the first release of the package, and in that time I've learned a tremendous amount about R programming, and how to work with data in R. Reshape2 uses that knowledge to make a new package for reshaping data that is much mo

Re: [R] Data.frames : difference between x$a and x[, "a"] ? - How set new values on x$a with a as variable ?

2010-09-10 Thread Hadley Wickham
>>> I'm having trouble parsing this. What exactly do you want to do? >> 1 - Put a list as an element of a data.frame. That's quite convenient for my >> pricing function. > > I think this is a really bad idea. data.frames are not meant to be > used in this way. Why not use a list of lists? It can

Re: [R] Data.frames : difference between x$a and x[, "a"] ? - How set new values on x$a with a as variable ?

2010-09-10 Thread Hadley Wickham
>>> I think this is a really bad idea. data.frames are not meant to be >>> used in this way. Why not use a list of lists? >> >> It can be very convenient, but I suspect the original poster is >> confused about the different between vectors and lists. > > I wouldn't be surprised if someone were conf

Re: [R] Which language is faster for numerical computation?

2010-09-10 Thread Hadley Wickham
On Fri, Sep 10, 2010 at 10:23 AM, Henrik Bengtsson wrote: > Don't underestimate the importance of the choice of the algorithm you > use.  That often makes a huge difference.  Also, vectorization is key > in R, and when you use that you're really up there among the top > performing languages.  Her

Re: [R] Problems with reshape2 on Mac

2010-09-13 Thread Hadley Wickham
Hi Uwe, The problem is most likely because the original poster doesn't have the latest version of plyr. I correctly declare this dependency in the DESCRIPTION (http://cran.r-project.org/web/packages/reshape2/index.html), but unfortunately R doesn't seem to use this information at run time, genera

Re: [R] post

2010-09-13 Thread Hadley Wickham
Have a look at: "Computing Thousands of Test Statistics Simultaneously in R" by Holger Schwender and Tina Müller, in http://stat-computing.org/newsletter/issues/scgn-18-1.pdf Hadley On Mon, Sep 13, 2010 at 4:26 PM, Alexey Ush wrote: > Hello, > > I have a question regarding how to speed up the t

Re: [R] parallel computation with plyr 1.2.1

2010-09-16 Thread Hadley Wickham
Yes, this was a little bug that will be fixed in the next release. Hadley On Thu, Sep 16, 2010 at 1:11 PM, Dylan Beaudette wrote: > Hi, > > I have been trying to use the new .parallel argument with the most recent > version of plyr [1] to speed up some tasks. I can run the example in the NEWS > f

Re: [R] Problem with ggplot2 - Boxplot

2010-09-22 Thread Hadley Wickham
That implies you need to update your version of plyr. Hadley On Wed, Sep 22, 2010 at 4:10 AM, RaoulD wrote: > > Hi, > > I am using ggplot2 to create a boxplot that summarizes a continuous > variable. This code works fine for me on one PC however when I use it on > another it doesnt. > > The struc

Re: [R] Script auto-detecting its own path

2010-09-29 Thread Hadley Wickham
> > Forgive me if this question has been addressed, but I was unable to find > anything in the r-help list or in cyberspace. My question is this: is there a > function, or set of functions, that will enable a script to detect its own > path? I have tried file.path() but that was not what I was l

Re: [R] function which can apply a function by a grouping variable and also hand over an additional variable, e.g. a weight

2010-10-01 Thread Hadley Wickham
You might want to check out the plyr package. Hadley On Fri, Oct 1, 2010 at 6:05 AM, Werner W. wrote: > Hi, > > I was wondering if there is an easy way to accomplish the following in R: > Often I want to apply a function, e.g. weighted.quantile from the Hmisc > package > to grouped subsets of a

Re: [R] plyr: a*ply with functions that return matrices-- possible bug in aaply?

2010-10-04 Thread hadley wickham
>  That is, I want to define something like the > following using an a*ply method, but aaply gives a result in which the > applied .margin(s) do not appear last in the > result, contrary to the documentation for ?aaply.  I think this is a bug, > either in the function or the documentation, > but pe

Re: [R] Issue with match.call

2010-10-04 Thread Hadley Wickham
> RFF<-function(qtype, qOpt,...){} > i.e., I have two args that are compulsary and the rest are optional. Now when > my user passes the function call, I need to see what optional args are > defined and process accordingly...what I have so far is.. > > RFF<-function(qtype, qOpt,...){ >        mc <

Re: [R] Script auto-detecting its own path

2010-10-04 Thread Hadley Wickham
> I'm not sure this will solve the issue because if I move the script, I would > still have to go into the script and edit the "/path/to/my/script.r", or do > I misunderstand your workaround? > I'm looking for something like: > file.path.is.here("myscript.r") > and which would return something like

Re: [R] R: Tools for thinking about data analysis and graphics

2010-10-06 Thread Hadley Wickham
On Wed, Oct 6, 2010 at 4:05 PM, Michael Friendly wrote: >  I'm giving a talk about some aspects of language and conceptual tools for > thinking about how > to solve problems in several programming languages for statistical computing > and graphics. I'm particularly > interested in language feature

Re: [R] Looking for a book/tutorial with the following context:

2010-10-08 Thread Hadley Wickham
> Do you also know more references about variables? Unfortunately this was a > little bit short so I do not feel 100% sure I completely got it. Try here: http://github.com/hadley/devtools/wiki/Scoping It's a work in progress. Hadley -- Assistant Professor / Dobelman Family Junior Chair Departm

Re: [R] can't find and install reshape2??

2010-10-12 Thread Hadley Wickham
> My guess is you are using an outdated R version for which the rather new > reshape2 package has not been compiled. I wonder if install.packages() could detect this case (e.g. by also checking if the source version is not available), and offer a more informative error message. Hadley -- Assist

Re: [R] Query on save.image()

2010-10-14 Thread Hadley Wickham
On Thu, Oct 14, 2010 at 11:56 AM, Joshua Wiley wrote: > Hi, > > I do not believe you can use the save.image() function in this case. > save.image() is a wrapper for save() with defaults for the global > environment (your workspace).  Try this instead, I believe it does > what you are after: > > my

Re: [R] Find index of a string inside a string?

2010-10-25 Thread Hadley Wickham
Or str_locate: library(stringr) str_locate("aabcd", "bcd") Hadley On Mon, Oct 25, 2010 at 5:53 AM, jim holtman wrote: > I think what you want is 'regexpr': > >> regexpr("bcd", "aabcd") > [1] 3 > attr(,"match.length") > [1] 3 >> > > > On Mon, Oct 25, 2010 at 7:27 AM, yoav baranan wrote: >> >> H

Re: [R] Which version control system to learn for managing R projects?

2010-10-26 Thread Hadley Wickham
> git is where the world is headed.  This video is a little old: > http://www.youtube.com/watch?v=4XpnKHJAok8, but does a good job > getting the point across. And lots of R users are using github already: http://github.com/languages/R/created Hadley -- Assistant Professor / Dobelman Family Juni

Re: [R] Forcing results from lm into datframe

2010-10-26 Thread Hadley Wickham
On Tue, Oct 26, 2010 at 11:55 AM, Dennis Murphy wrote: > Hi: > > When it comes to split, apply, combine, think plyr. > > library(plyr) > ldply(split(afvtprelvefs, afvtprelvefs$basestudy), >         function(x) coef(lm (ef ~ quartile, data=x, weights=1/ef_std))) Or do it in two steps: models <- d

Re: [R] Which version control system to learn for managing R projects?

2010-10-26 Thread Hadley Wickham
> 1. What is everyone else using?  The network effect is important since > you want people to be able to access your repository and you want to > leverage your knowledge of the version control system for other > projects' repositories.  To that extent Subversion is the clear choice > since its used

Re: [R] overloading the generic primitive functions "+" and "["

2010-10-28 Thread Hadley Wickham
> Note how S3 methods are dispatched only by reference to the first > argument (on the left of the operator). I think S4 beats this by > having signatures that can dispatch depending on both arguments. That's somewhat of a simplification for primitive binary operators. R actually looks up the meth

Re: [R] avoiding too many loops - reshaping data

2010-11-04 Thread Hadley Wickham
> Beware of facile comparisons of this sort -- they may be apples and nematodes. And they also imply that the main time sink is the computation. In my experience, figuring out how to solve the problem using takes considerably more time than 18 / 1000 seconds, and so investing your energy in learn

Re: [R] Heatmap construction problems

2010-11-07 Thread Hadley Wickham
It's hard to know without a minimal reproducible example, but you probably want scale_fill_gradient or scale_fill_gradientn. Hadley On Thu, Oct 28, 2010 at 9:42 AM, Struchtemeyer, Chris wrote: > I am very new to R and don't have any computer program experience > whatsoever.  I am trying to gener

Re: [R] ggplot2: facet_grid with only one level does not display the graph with the facet_grid level in title

2010-11-07 Thread Hadley Wickham
This is on my to do list: https://github.com/hadley/ggplot2/issues/labels/facet#issue/107 Hadley On Thu, Oct 28, 2010 at 11:51 AM, Matthew Pettis wrote: > Hi All, > > Here is the code that I'll be referring to: > > p <- ggplot(wastran.data, aes(PER_KEY, EVENTS)) > (p <- p + >    facet_grid( pool

[R] How to detect if a vector is FP constant?

2010-11-08 Thread Hadley Wickham
Hi all, What's the equivalent to length(unique(x)) == 1 if want to ignore small floating point differences? Should I look at diff(range(x)) or sd(x) or something else? What cut off should I use? If it helps to be explicit, I'm interested in detecting when a vector is constant for the purpose of

Re: [R] How to detect if a vector is FP constant?

2010-11-08 Thread Hadley Wickham
> I think this does what you want (borrowing from all.equal.numeric): > > all(abs((x - mean(x))) < .Machine$double.eps^0.5) > > with a vector of length 1 million, it took .076 seconds on a fairly old > system. Hmmm, maybe I want: all.equal(min(x), max(x)) ? Hadley -- Assistant Professor / Do

Re: [R] Extending the accuracy of exp(1) in R

2010-11-09 Thread Hadley Wickham
> Where the value of exp(1) as computed by R is concerned, you have > been deceived by what R displays ("prints") on screen. The default > is to display any number to 7 digits of accuracy, but that is not > the accuracy of the number held internally by R: > >  exp(1) >  # [1] 2.718282 >  exp(1) - 2

Re: [R] sum in vector

2010-11-17 Thread Hadley Wickham
> rowsum(value, paste(factor1, factor2, factor3)) That is dangerous in general, and always inefficient. Imagine factor1 is c("a", "a b") and factor2 is ("b c", "c"). Use interaction with drop = T. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice Uni

Re: [R] Help on running regression by grouping firms

2010-11-25 Thread Hadley Wickham
> res <- function(x) resid(x) > ds_test$u <- do.call(c, llply(mods, res)) I'd be a little careful with this, because there's no guarantee the results will by ordered in the same way as the input (and I'd also prefer ds_test$u <- unlist(llply(mods, res)) or ds_test$u <- laply(mods, res)) > In your

Re: [R] Go (back) from Rd to roxygen

2010-11-25 Thread Hadley Wickham
> Since roxygen is a great help to document R packages, I am wondering > if there exists an approach to go back from the raw Rd files to > roxygen-documentation? E.g. turn "\author{Somebody}" into "@author > Somebody". This sounds ridiculous, but I believe it helps in the long > term for me to main

Re: [R] ggplot2 histograms

2010-11-30 Thread Hadley Wickham
You may find it easier to use a frequency polygon, geom = "freqpoly". Hadley On Tue, Nov 30, 2010 at 2:36 PM, Small Sandy (NHS Greater Glasgow & Clyde) wrote: > Hi > > With ggplot2 I can very easily create beautiful histograms but I would like > to put two histograms on the same plot. The histo

Re: [R] ggplot2 histograms

2010-12-01 Thread Hadley Wickham
> However if you do: > ggplot(data=dafr, aes(x = d1, fill=d2)) + geom_histogram(binwidth = 1, > position = position_dodge(width=0.99)) > > The position of first bin which goes from 0-2 appears to start at about 0.2 > (I accept that there is some "white space" to the left of this) while the > pos

Re: [R] [plyr] Question regarding ddply: use of .(as.name(varname)) and varname in ddply function

2010-12-06 Thread Hadley Wickham
On Mon, Dec 6, 2010 at 3:58 AM, Sunny Srivastava wrote: > Dear R-Helpers: > > I am using trying to use *ddply* to extract min and max of a particular > column in a data.frame. I am using two different forms of the function: > > > ## var_name_to_split is a string -- something like "var1" which is t

Re: [R] [plyr] Question regarding ddply: use of .(as.name(varname)) and varname in ddply function

2010-12-06 Thread Hadley Wickham
ot. I was > just trying to know my mistake. I am sorry if it is a basic question. > Thank you and others for your reply. > Best Regards, > S. > > On Mon, Dec 6, 2010 at 5:28 PM, Hadley Wickham wrote: >> >> On Mon, Dec 6, 2010 at 3:58 AM, Sunny Srivastava >> wro

Re: [R] ggplot2 histograms

2010-12-13 Thread Hadley Wickham
ndy > > Sandy Small > Clinical Physicist > NHS Forth Valley > (Tel: 01324567002) > and > NHS Greater Glasgow and Clyde > (Tel: 01412114592) > > From: h.wick...@gmail.com [h.wick...@gmail.com] On Behalf Of Hadley Wickham &g

Re: [R] Coding a new variable based on criteria in a dataset

2010-12-22 Thread Hadley Wickham
>  It isn't quite convenient to read the data posted below into R > (if it was originally tab-separated, that formatting got lost) but > ddply from the plyr package is good for this: something like (untested) > >  d <- with(data,ddply(data,interaction(UniqueID,Reason), >                    function

Re: [R] How to change the default location of x-axis in ggplot2?

2010-12-22 Thread Hadley Wickham
> In ggplot2, by default the x-axis is in the bottom of the graph and > y-axis is in the left of the graph. I wonder if it is possible to: > > 1. put the x axis in the top, or put the y axis in the right? > 2. display x axis in both the top and bottom? These are on the to do list. > 3. display x

Re: [R] Writing a single output file

2010-12-23 Thread Hadley Wickham
>> input <- do.call(rbind, lapply(fileNames, function(.name){ > +     .data <- read.table(.name, header = TRUE, as.is = TRUE) > +     # add file name to the data > +     .data$file <- .name > +     .data > + })) You can simplify this a little with plyr: fileNames <- list.files(pattern = "file.*.c

[R] [R-pkgs] ggplot2 0.8.9 - Merry Christmas version

2010-12-24 Thread Hadley Wickham
ggplot2 ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and avoid bad parts. It takes care of many of the fiddly details that make plotting a hassle (l

Re: [R] Writing a single output file

2010-12-30 Thread Hadley Wickham
It looks like you have csv files, so use read.csv instead of read.table. Hadley On Thu, Dec 30, 2010 at 12:18 AM, Amy Milano wrote: > Dear sir, > > At the outset I sincerely apologize for reverting back bit late as I was out > of office. I thank you for your guidance extended by you in response

Re: [R] pdf() Export Problem: Circles Interpreted as Fonts from ggplot2 Graphics

2010-12-30 Thread Hadley Wickham
> The Inkscape user asked if there was any way that R could be coerced to use > actual circles or paths for the points. I am not aware of a way to do this so > any input from anyone here would be greatly appreciated. pdf(..., useDingbats = F) Hadley -- Assistant Professor / Dobelman Family Juni

Re: [R] packagename:::functionname vs. importFrom

2011-01-03 Thread Hadley Wickham
Hi Frank, I think you mean packagename::functionname? The three colon form is for accessing non-exported objects. Otherwise, I think using :: vs importFrom is functionally identical - either approach delays package loading until necessary. Hadley On Mon, Jan 3, 2011 at 9:45 PM, Frank Harrell

Re: [R] packagename:::functionname vs. importFrom

2011-01-03 Thread Hadley Wickham
>> I think you mean packagename::functionname?  The three colon form is >> for accessing non-exported objects. > > Normally two colons suffice, but within a package you need three to > access exported but un-imported objects :) Are you sure? Note that it is typically a design mistake to use

Re: [R] packagename:::functionname vs. importFrom

2011-01-03 Thread Hadley Wickham
> Correct.  I'm doing this because of non-exported functions in other packages, > so I need ::: But you really really shouldn't be doing that. Is there a reason that the package authors won't export the functions? > I'd still appreciate any insight about whether importFrom in NAMESPACE > defers

[R] [R-pkgs] plyr 1.4

2011-01-04 Thread Hadley Wickham
# plyr plyr is a set of tools for a common set of problems: you need to __split__ up a big data structure into homogeneous pieces, __apply__ a function to each piece and then __combine__ all the results back together. For example, you might want to: * fit the same model each patient subsets of

[R] [R-pkgs] reshape2 1.1

2011-01-04 Thread Hadley Wickham
Reshape2 is a reboot of the reshape package. It's been over five years since the first release of the package, and in that time I've learned a tremendous amount about R programming, and how to work with data in R. Reshape2 uses that knowledge to make a new package for reshaping data that is much mo

Re: [R] Help with Data Transformation

2011-01-11 Thread Hadley Wickham
> The data is initially extracted from an SQL database into Excel, then saved > as a tab-delimited text file for use in R. You might also want to look at the SQL packages for R so you can skip this manual step. I'd recommend starting with http://cran.r-project.org/doc/manuals/R-data.html#Relation

Re: [R] median by geometric mean

2011-01-15 Thread Hadley Wickham
exp(median(log(x)) ? Hadley On Sat, Jan 15, 2011 at 10:26 AM, Skull Crossbones wrote: > Hi All, > > I need to calculate the median for even number of data points.However > instead of calculating > the arithmetic mean of the two middle values,I need to calculate their > geometric mean. > > Though

Re: [R] rootogram for normal distributions

2011-01-16 Thread Hadley Wickham
> The normal distribution is a continuous distribution, i.e., the frequency > for each observed value will essentially be 1/n and not converge to the > density function. Hence, you would need to look at histogram or smoothed > densities. Rootograms, on the other hand, are intended for discrete > di

Re: [R] data prep question

2011-01-16 Thread Hadley Wickham
On Sun, Jan 16, 2011 at 5:48 AM, wrote: > Here is one way > > Here is one way: > >> con <- textConnection(" > + ID              TIME    OBS > + 001             2200    23 > + 001             2400    11 > + 001             3200    10 > + 001             4500    22 > + 003             3900     45 >

Re: [R] how to cut a multidimensional array along a chosen dimension and store each piece into a list

2011-01-17 Thread Hadley Wickham
On Mon, Jan 17, 2011 at 2:20 PM, Sean Zhang wrote: > Dear R-Helpers, > > I wonder whether there is a function which cuts a multiple dimensional array > along a chosen dimension and then store each piece (still an array of one > dimension less) into a list. > For example, > > arr <- array(seq(1*2*3

Re: [R] Summing data frame columns on identical data

2011-01-17 Thread Hadley Wickham
> library(plyr) > # Function to sum y by A-B combinations for a generic data frame > dsum <- function(d) ddply(d, .(A, B), summarise, sumY = sum(y)) See count in plyr 1.4 for a much much faster way of doing this. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statist

Re: [R] dataframe: string operations on columns

2011-01-18 Thread Hadley Wickham
> how can I perform a string operation like strsplit(x," ")  on a column of a > dataframe, and put the first or the second item of the split into a new > dataframe column? > (so that on each row it is consistent) Have a look at str_split_fixed in the stringr package. Hadley -- Assistant Profes

Re: [R] ggplot2, geom_hline and facet_grid

2011-01-19 Thread Hadley Wickham
Hi Sandy, It's difficult to know what's going wrong without a small reproducible example (https://github.com/hadley/devtools/wiki/Reproducibility) - could you please provide one? You might also have better luck with an email directly to the ggplot2 mailing list. Hadley On Wed, Jan 19, 2011 at 2

Re: [R] How to reshape wide format data.frame to long format?

2011-01-20 Thread Hadley Wickham
>> I think I should be able to do this using the "reshape" function, but >> I cannot get it to work. I think I need some help to understand >> this... >> >> >> (If I could split the "variable" into three separate columns splitting >> by ".", that would be even better.) > > Use strsplit and "[" Or

Re: [R] ggplot2, geom_hline and facet_grid

2011-01-20 Thread Hadley Wickham
nhs.net> wrote: > >     Hi >     Still  having problems in that when I use geom_hline and facet_grid >     together I get two extra empty panels >     A reproducible example can be found at: >     [2]https://gist.github.com/786894 >     Sandy Small >     ___

Re: [R] Counting number of rows with two criteria in dataframe

2011-01-26 Thread Hadley Wickham
On Wed, Jan 26, 2011 at 5:27 AM, Dennis Murphy wrote: > Hi: > > Here are two more candidates, using the plyr and data.table packages: > > library(plyr) > ddply(X, .(x, y), function(d) length(unique(d$z))) >  x y V1 > 1 1 1  2 > 2 1 2  2 > 3 2 3  2 > 4 2 4  2 > 5 3 5  2 > 6 3 6  2 > > The function

Re: [R] looking for setdiff equivalent on dataset

2010-07-29 Thread Hadley Wickham
Here's one way, using a function from the plyr package: TheLittleOne<-data.frame(cbind(c(2,3),c(2,3))) TheBigOne<-data.frame(cbind(c(1,1,2),c(1,1,2))) keys <- plyr:::join.keys(TheBigOne, TheLittleOne) !(keys$x %in% keys$y) TheBigOne[!(keys$x %in% keys$y), ] Hadley On Thu, Jul 29, 2010 at 1:38

Re: [R] looking for setdiff equivalent on dataset

2010-07-29 Thread Hadley Wickham
> Well, here's one way that "might" work (explanation below): > > The ideas is to turn each row into a character vector and then work with the > two character vectors. > >> bigs <- do.call(paste,TheBigOne) >> ix <-  which(bigs %in% setdiff(bigs,do.call(paste,TheLittleOne))) >> TheBigOne[ix,] > > Ho

Re: [R] image plot but data not on grid.

2010-08-06 Thread Hadley Wickham
On Fri, Aug 6, 2010 at 9:24 AM, W Eryk Wolski wrote: > Hi, > > Would like to make an image > however the values in z are not on an uniform grid. > > Have a dataset with > length(x) == length(y) == length(z) > x[1],y[1] gives the position of z[1] > > and would like to encode value of z by a color.

Re: [R] image plot but data not on grid.

2010-08-07 Thread Hadley Wickham
On Sat, Aug 7, 2010 at 2:54 AM, Michael Bedward wrote: > On 7 August 2010 06:26, Hadley Wickham wrote: > >> library(ggplot2) >> qplot(x, y, fill = z, data = df, geom = "tile") > > Hi Hadley, > > I read the original question as being about irregularly spac

Re: [R] image plot but data not on grid.

2010-08-09 Thread Hadley Wickham
With sweave, you need to explicitly print() the output of ggplot2 and lattice plots. Hadley On Mon, Aug 9, 2010 at 6:32 AM, W Eryk Wolski wrote: > qplot does (?) what I was looking for! > At least it plots what I want to plot in the interactive modus. > However, it seems not to work with Sweave.

Re: [R] coef(summary) and plyr

2010-08-09 Thread Hadley Wickham
On Mon, Aug 9, 2010 at 9:29 AM, David Winsemius wrote: > If you look at the output (as I did)  you should see that despite whatever > expectations you have developed regarding plyr, that it did not produce a > grouping variable: > >> ldply(dl, function(x) coef(summary(x)) ) >   fac    Estimate Std

Re: [R] coef(summary) and plyr

2010-08-09 Thread Hadley Wickham
> There is one further improvement to consider. When I tried using dlply to > tackle a problem on which I had been bashing my head for the last three days > and it gave just the results I had been looking for, I also noticed that the > dlply function returns the grouping variable levels in an attri

Re: [R] coef(summary) and plyr

2010-08-09 Thread Hadley Wickham
>> That's exactly what dlply does - so you should never have to do that >> yourself. > > I'm unclear what you are saying. Are you saying that the plyr function > _should_ have examined the objects in that list and determined that there > were 4 rows and properly labeled the rows to indicate which l

Re: [R] coef(summary) and plyr

2010-08-09 Thread Hadley Wickham
On Mon, Aug 9, 2010 at 4:30 PM, Matthew Dowle wrote: > > > Another option for consideration : > > library(data.table) > mydt = as.data.table(mydf) > > mydt[,as.list(coef(lm(y~x1+x2+x3))),by=fac] >     fac X.Intercept.       x1       x2        x3 > [1,]   0  -0.16247059 1.130220 2.988769 -19.14719

Re: [R] ggplot2 histograms... a subtle error found

2010-08-09 Thread hadley wickham
> When ggplot2 verifies the widths before stacking (the default position for > histograms), it computes the widths from the minimum and maximum values for > each bin.  However, because the width of the bins (0.28) is much smaller > than the scale of the edges (6.8e+09), there is some underflow and

Re: [R] drawing dot plots with size, shape affecting dot characteristics

2010-08-12 Thread Hadley Wickham
On Wed, Aug 11, 2010 at 10:14 PM, Brian Tsai wrote: > Hi all, > > I'm interested in doing a dot plot where *both* the size and color (more > specifically, shade of grey) change with the associated value. > > I've found examples online for ggplot2 where you can scale the size of the > dot with a va

Re: [R] problems with merge() - the output has many repeated lines

2010-08-21 Thread Hadley Wickham
You may find a close reading of ?merge helpful, particularly this sentence: "If there is more than one match, all possible matches contribute one row each" (so check that you don't have multiple matches). Hadley On Sat, Aug 21, 2010 at 10:45 AM, Cecilia Carmo wrote: > Hi everyone, > > I have be

[R] Recyclable

2010-08-23 Thread Hadley Wickham
Hi all, Is there a function to determine whether a set of vectors is cleanly recyclable? i.e. is there a common function for detecting the error/warnings that underlie the following two function calls? > 1:3 + 1:2 [1] 2 4 4 Warning message: In 1:3 + 1:2 : longer object length is not a multiple

Re: [R] Recyclable

2010-08-23 Thread Hadley Wickham
I should note that I realise this function is pretty trivial to write (see below), I just want to avoid reinventing the wheel. recyclable <- function(...) { lengths <- vapply(list(...), length, 1) all(max(lengths) %% lengths == 0) } Hadley On Mon, Aug 23, 2010 at 10:33 AM, Hadley W

[R] Comparing/diffing strings

2010-08-24 Thread Hadley Wickham
Hi all, all.equal is generally very useful when you want to find the differences between two objects. It breaks down however, when you have two long strings to compare: > all.equal(a, b) [1] "1 string mismatch" Does any one know of any good text diffing tools implemented in R? Thanks, Hadley

Re: [R] change order of plot panels in faceted ggplot/qplot

2010-08-24 Thread Hadley Wickham
On Mon, Aug 23, 2010 at 1:02 PM, Alison Macalady wrote: > Hi, > > I have a 5-paneled figure that i made using the facet function in qplot > (ggplot).  I've managed to arrange the panels into two rows/three columns, > but for the sake of easy visual comparisons between panels in my particular > dat

Re: [R] Comparing/diffing strings

2010-08-24 Thread Hadley Wickham
On Tue, Aug 24, 2010 at 11:25 AM, Martin Morgan wrote: > On 08/24/2010 07:27 AM, Doran, Harold wrote: >> There is the stringMatch function in the MiscPsycho package. >> >>> stringMatch('Hadley', 'Hadley Wickham', normalize = 'no') >>

Re: [R] Plot bar lines like excel

2010-08-25 Thread Hadley Wickham
On Wed, Aug 25, 2010 at 6:05 AM, abotaha wrote: > > Woow, it is amazing, > thank you very much. > yes i forget to attach the dates, however, the dates in my case is every 16 > days. > so how i can use 16 day interval instead of month in by option. Here's one way using the lubridate package: libr

[R] [R-pkgs] stringr: version 0.4

2010-08-25 Thread Hadley Wickham
Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparations tasks. R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they

Re: [R] log y 'axis' of histogram

2010-08-30 Thread Hadley Wickham
It's not just that counts might be zero, but also that the base of each bar starts at zero. I really don't see how logging the y/axis of a histogram makes sense. Hadley On Sunday, August 29, 2010, Joshua Wiley wrote: > Hi Derek, > > Here is an option using the package ggplot2: > > library(ggplot

Re: [R] log y 'axis' of histogram

2010-08-30 Thread Hadley Wickham
> I have counts ranging over 4-6 orders of magnitude with peaks > occurring at various 'magic' values.  Using a log scale for the > y-axis enables the smaller peaks, which would otherwise > be almost invisible bumps along the x-axis, to be seen That doesn't justify the use of a _histogram_ - and

Re: [R] log y 'axis' of histogram

2010-08-30 Thread Hadley Wickham
>> That doesn't justify the use of a _histogram_  - and regardless of > > The usage highlights meaningful characteristics of the data. > What better justification for any method of analysis and display is > there? That you're displaying something that is mathematically well founded and meaningful

Re: [R] how to replace NA with a specific score that is dependant on another indicator variable

2010-09-01 Thread Hadley Wickham
> first ddply result did I see that some sort of misregistration had occurred; > Better with: > > res <-ddply(egraw2, .(category), .fun=function(df) { >               sapply(df, >                    function(x) {mnx <- mean(x, na.rm=TRUE); >                                 sapply(x, function(z) if

[R] [R-pkgs] testthat: version 0.3

2010-09-01 Thread Hadley Wickham
# testthat Testing your code is normally painful and boring. `testthat` tries to make testing as fun as possible, so that you get a visceral satisfaction from writing tests. Testing should be fun, not a drag, so you do it all the time. To make that happen, `testthat`: * Provides functions that ma

Re: [R] Please explain "do.call" in this context, or critique to "stack this list faster"

2010-09-04 Thread Hadley Wickham
> One common way around this is to pre-allocate memory and then to > populate the object using a loop, but a somewhat easier solution here > turns out to be ldply() in the plyr package. The following is the same > idea as do.call(rbind, l), only faster: > >> system.time(u3 <- ldply(l, rbind)) >   u

Re: [R] Aggregating data from two data frames

2010-09-08 Thread Hadley Wickham
Have a look at match and merge. Hadley On Wednesday, September 8, 2010, Michael Haenlein wrote: > Dear all, > > I'm working with two data frames. > > The first frame (agg_data) consists of two columns. agg_data[,1] is a unique > ID for each row and agg_data[,2] contains a continuous variable. > >

Re: [R] Empty Data Frame

2011-04-27 Thread Hadley Wickham
On Wed, Apr 27, 2011 at 4:58 AM, Dennis Murphy wrote: > Hi: > > You could try something like > > df <- data.frame( expand.grid( Week = 1:52, Year = 2002:2011 )) expand.grid already returns a data frame... You might want KEEP.OUT.ATTRS = F though. Even it feels like you are yelling at R. Hadley

Re: [R] setting options only inside functions

2011-04-27 Thread Hadley Wickham
> This has the side effect of ignoring errors > and even hiding the error messages.  If you > are concerned about multiple calls to on.exit() > in one function you could define a new function > like >  withOptions <- function(optionList, expr) { >   oldOpts <- options(optionList) >   on.exit(option

Re: [R] MASS fitdistr with plyr or data.table?

2011-04-27 Thread Hadley Wickham
On Wed, Apr 27, 2011 at 3:55 PM, Justin Haynes wrote: > I am trying to extract the shape and scale parameters of a wind speed > distribution for different sites.  I can do this in a clunky way, but > I was hoping to find a way using data.table or plyr.  However, when I > try I am met with the foll

Re: [R] setting options only inside functions

2011-04-27 Thread Hadley Wickham
> Put together a list and we can see what might make sense.  If we did > take this on it would be good to think about providing a reasonable > mechanism for addressing the small flaw in this function as it is > defined here. In devtools, I have: #' Evaluate code in specified locale. with_locale <

Re: [R] Simple loop

2011-05-07 Thread Hadley Wickham
>> Using paste(Site,Prof) when calling ave() is ugly, in that it >> forces you to consider implementation details that you expect >> ave() to take care of (how does paste convert various types >> to strings?).  It also courts errors  since paste("A B", "C") >> and paste("A", "B C") give the same re

Re: [R] ddply with mean and max...

2011-05-11 Thread Hadley Wickham
> Thats the ticket!  So mean is already set up to operate on columns but max and > min are not?  I guess its not too important now I know ... but whats going on > in > the background that makes that happen? Basically, this: > mean.data.frame function (x, ...) sapply(x, mean, ...) > min.data.fra

<    2   3   4   5   6   7   8   9   10   11   >