On Thu, Jul 15, 2010 at 11:08 PM, Dennis Murphy wrote:
> Hi:
>
> I sincerely hope there's an easier way, but one method to get this is as
> follows,
> with d as the data frame name of your test data:
>
> d <- d[order(with(d, Age, School, rev(Grade))), ]
> d$Count <- do.call(c, mapply(seq, 1, as.ve
> The problem is in data.frame[ and any NA in a logical vector will return a
> row of NA's. This can be avoid by wrapping which() around the logical vector
> which seems entirely wasteful or using subset().
The basic philosophy that causes this behaviour is sensible in my
opinion: missing values m
> ddply(ma, .(variable), summarise, mean = mean(value), sd = sd(value),
> skewness = skewness(value), median = median(value),
> mean.gt.med = mean.gt.med(value))
In principle, you should be able to do:
ddply(ma, .(variable), colwise(each(mean, sd, skewness, median, mean.gt.med)))
but
Did you look at the examples in sample?
# sample()'s surprise -- example
x <- 1:10
sample(x[x > 8]) # length 2
sample(x[x > 9]) # oops -- length 10!
sample(x[x > 10]) # length 0
## For R >= 2.11.0 only
resample <- function(x, ...) x[sample.int(length(x), ...)]
resample(x[x > 8]) #
What is your null hypothesis? What is your alternate hypothesis? What
is the test statistic? Why do you want a p-value?
Hadley
On Thu, Jul 22, 2010 at 5:40 PM, jd6688 wrote:
>
> Here is my dataframe with 1000 rows:
>
> employee_id weigth p-value
>
> 100 150
> 1
On Sat, Jul 24, 2010 at 2:23 AM, Jeff Newmiller
wrote:
> Fahim Md wrote:
>>
>> Is there any function/way to merge/unite the following data
>>
>> GENEID col1 col2 col3 col4
>> G234064 1 0 0 0
>> G2340
plyr is a set of tools for a common set of problems: you need to break
down a big data structure into manageable pieces, operate on each
piece and then put all the pieces back together. For example, you
might want to:
* fit the same model to subsets of a data frame
* quickly calculate summary
Why don't you read the answers to your stackoverflow question?
http://stackoverflow.com/questions/3665885/adding-a-list-of-vectors-to-a-data-frame-in-r/3667753
Hadley
On Wed, Sep 8, 2010 at 1:17 AM, raje...@cse.iitm.ac.in
wrote:
>
> Hi,
>
> I have a preallocated dataframe to which I have to add
> daply(data.test, .(municipality, employed), function(d){mean(d$age)} )
> employed
> municipality no yes
> A 41.58759 44.67463
> B 55.57407 43.82545
> C 43.59330 NA
>
> The .drop argument has a different meaning in daply. Some R functio
plyr is a set of tools for a common set of problems: you need to
__split__ up a big data structure into homogeneous pieces, __apply__ a
function to each piece and then __combine__ all the results back
together. For example, you might want to:
* fit the same model each patient subsets of a data f
Reshape2 is a reboot of the reshape package. It's been over five years
since the first release of the package, and in that time I've learned
a tremendous amount about R programming, and how to work with data in
R. Reshape2 uses that knowledge to make a new package for reshaping
data that is much mo
>>> I'm having trouble parsing this. What exactly do you want to do?
>> 1 - Put a list as an element of a data.frame. That's quite convenient for my
>> pricing function.
>
> I think this is a really bad idea. data.frames are not meant to be
> used in this way. Why not use a list of lists?
It can
>>> I think this is a really bad idea. data.frames are not meant to be
>>> used in this way. Why not use a list of lists?
>>
>> It can be very convenient, but I suspect the original poster is
>> confused about the different between vectors and lists.
>
> I wouldn't be surprised if someone were conf
On Fri, Sep 10, 2010 at 10:23 AM, Henrik Bengtsson
wrote:
> Don't underestimate the importance of the choice of the algorithm you
> use. That often makes a huge difference. Also, vectorization is key
> in R, and when you use that you're really up there among the top
> performing languages. Her
Hi Uwe,
The problem is most likely because the original poster doesn't have
the latest version of plyr. I correctly declare this dependency in
the DESCRIPTION
(http://cran.r-project.org/web/packages/reshape2/index.html), but
unfortunately R doesn't seem to use this information at run time,
genera
Have a look at:
"Computing Thousands of Test Statistics Simultaneously in R" by Holger
Schwender and Tina Müller, in
http://stat-computing.org/newsletter/issues/scgn-18-1.pdf
Hadley
On Mon, Sep 13, 2010 at 4:26 PM, Alexey Ush wrote:
> Hello,
>
> I have a question regarding how to speed up the t
Yes, this was a little bug that will be fixed in the next release.
Hadley
On Thu, Sep 16, 2010 at 1:11 PM, Dylan Beaudette
wrote:
> Hi,
>
> I have been trying to use the new .parallel argument with the most recent
> version of plyr [1] to speed up some tasks. I can run the example in the NEWS
> f
That implies you need to update your version of plyr.
Hadley
On Wed, Sep 22, 2010 at 4:10 AM, RaoulD wrote:
>
> Hi,
>
> I am using ggplot2 to create a boxplot that summarizes a continuous
> variable. This code works fine for me on one PC however when I use it on
> another it doesnt.
>
> The struc
>
> Forgive me if this question has been addressed, but I was unable to find
> anything in the r-help list or in cyberspace. My question is this: is there a
> function, or set of functions, that will enable a script to detect its own
> path? I have tried file.path() but that was not what I was l
You might want to check out the plyr package.
Hadley
On Fri, Oct 1, 2010 at 6:05 AM, Werner W. wrote:
> Hi,
>
> I was wondering if there is an easy way to accomplish the following in R:
> Often I want to apply a function, e.g. weighted.quantile from the Hmisc
> package
> to grouped subsets of a
> That is, I want to define something like the
> following using an a*ply method, but aaply gives a result in which the
> applied .margin(s) do not appear last in the
> result, contrary to the documentation for ?aaply. I think this is a bug,
> either in the function or the documentation,
> but pe
> RFF<-function(qtype, qOpt,...){}
> i.e., I have two args that are compulsary and the rest are optional. Now when
> my user passes the function call, I need to see what optional args are
> defined and process accordingly...what I have so far is..
>
> RFF<-function(qtype, qOpt,...){
> mc <
> I'm not sure this will solve the issue because if I move the script, I would
> still have to go into the script and edit the "/path/to/my/script.r", or do
> I misunderstand your workaround?
> I'm looking for something like:
> file.path.is.here("myscript.r")
> and which would return something like
On Wed, Oct 6, 2010 at 4:05 PM, Michael Friendly wrote:
> I'm giving a talk about some aspects of language and conceptual tools for
> thinking about how
> to solve problems in several programming languages for statistical computing
> and graphics. I'm particularly
> interested in language feature
> Do you also know more references about variables? Unfortunately this was a
> little bit short so I do not feel 100% sure I completely got it.
Try here:
http://github.com/hadley/devtools/wiki/Scoping
It's a work in progress.
Hadley
--
Assistant Professor / Dobelman Family Junior Chair
Departm
> My guess is you are using an outdated R version for which the rather new
> reshape2 package has not been compiled.
I wonder if install.packages() could detect this case (e.g. by also
checking if the source version is not available), and offer a more
informative error message.
Hadley
--
Assist
On Thu, Oct 14, 2010 at 11:56 AM, Joshua Wiley wrote:
> Hi,
>
> I do not believe you can use the save.image() function in this case.
> save.image() is a wrapper for save() with defaults for the global
> environment (your workspace). Try this instead, I believe it does
> what you are after:
>
> my
Or str_locate:
library(stringr)
str_locate("aabcd", "bcd")
Hadley
On Mon, Oct 25, 2010 at 5:53 AM, jim holtman wrote:
> I think what you want is 'regexpr':
>
>> regexpr("bcd", "aabcd")
> [1] 3
> attr(,"match.length")
> [1] 3
>>
>
>
> On Mon, Oct 25, 2010 at 7:27 AM, yoav baranan wrote:
>>
>> H
> git is where the world is headed. This video is a little old:
> http://www.youtube.com/watch?v=4XpnKHJAok8, but does a good job
> getting the point across.
And lots of R users are using github already:
http://github.com/languages/R/created
Hadley
--
Assistant Professor / Dobelman Family Juni
On Tue, Oct 26, 2010 at 11:55 AM, Dennis Murphy wrote:
> Hi:
>
> When it comes to split, apply, combine, think plyr.
>
> library(plyr)
> ldply(split(afvtprelvefs, afvtprelvefs$basestudy),
> function(x) coef(lm (ef ~ quartile, data=x, weights=1/ef_std)))
Or do it in two steps:
models <- d
> 1. What is everyone else using? The network effect is important since
> you want people to be able to access your repository and you want to
> leverage your knowledge of the version control system for other
> projects' repositories. To that extent Subversion is the clear choice
> since its used
> Note how S3 methods are dispatched only by reference to the first
> argument (on the left of the operator). I think S4 beats this by
> having signatures that can dispatch depending on both arguments.
That's somewhat of a simplification for primitive binary operators. R
actually looks up the meth
> Beware of facile comparisons of this sort -- they may be apples and nematodes.
And they also imply that the main time sink is the computation. In my
experience, figuring out how to solve the problem using takes
considerably more time than 18 / 1000 seconds, and so investing your
energy in learn
It's hard to know without a minimal reproducible example, but you
probably want scale_fill_gradient or scale_fill_gradientn.
Hadley
On Thu, Oct 28, 2010 at 9:42 AM, Struchtemeyer, Chris
wrote:
> I am very new to R and don't have any computer program experience
> whatsoever. I am trying to gener
This is on my to do list:
https://github.com/hadley/ggplot2/issues/labels/facet#issue/107
Hadley
On Thu, Oct 28, 2010 at 11:51 AM, Matthew Pettis
wrote:
> Hi All,
>
> Here is the code that I'll be referring to:
>
> p <- ggplot(wastran.data, aes(PER_KEY, EVENTS))
> (p <- p +
> facet_grid( pool
Hi all,
What's the equivalent to length(unique(x)) == 1 if want to ignore
small floating point differences? Should I look at diff(range(x)) or
sd(x) or something else? What cut off should I use?
If it helps to be explicit, I'm interested in detecting when a vector
is constant for the purpose of
> I think this does what you want (borrowing from all.equal.numeric):
>
> all(abs((x - mean(x))) < .Machine$double.eps^0.5)
>
> with a vector of length 1 million, it took .076 seconds on a fairly old
> system.
Hmmm, maybe I want:
all.equal(min(x), max(x))
?
Hadley
--
Assistant Professor / Do
> Where the value of exp(1) as computed by R is concerned, you have
> been deceived by what R displays ("prints") on screen. The default
> is to display any number to 7 digits of accuracy, but that is not
> the accuracy of the number held internally by R:
>
> exp(1)
> # [1] 2.718282
> exp(1) - 2
> rowsum(value, paste(factor1, factor2, factor3))
That is dangerous in general, and always inefficient. Imagine factor1
is c("a", "a b") and factor2 is ("b c", "c"). Use interaction with
drop = T.
Hadley
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice Uni
> res <- function(x) resid(x)
> ds_test$u <- do.call(c, llply(mods, res))
I'd be a little careful with this, because there's no guarantee the
results will by ordered in the same way as the input (and I'd also
prefer ds_test$u <- unlist(llply(mods, res)) or ds_test$u <-
laply(mods, res))
> In your
> Since roxygen is a great help to document R packages, I am wondering
> if there exists an approach to go back from the raw Rd files to
> roxygen-documentation? E.g. turn "\author{Somebody}" into "@author
> Somebody". This sounds ridiculous, but I believe it helps in the long
> term for me to main
You may find it easier to use a frequency polygon, geom = "freqpoly".
Hadley
On Tue, Nov 30, 2010 at 2:36 PM, Small Sandy (NHS Greater Glasgow &
Clyde) wrote:
> Hi
>
> With ggplot2 I can very easily create beautiful histograms but I would like
> to put two histograms on the same plot. The histo
> However if you do:
> ggplot(data=dafr, aes(x = d1, fill=d2)) + geom_histogram(binwidth = 1,
> position = position_dodge(width=0.99))
>
> The position of first bin which goes from 0-2 appears to start at about 0.2
> (I accept that there is some "white space" to the left of this) while the
> pos
On Mon, Dec 6, 2010 at 3:58 AM, Sunny Srivastava
wrote:
> Dear R-Helpers:
>
> I am using trying to use *ddply* to extract min and max of a particular
> column in a data.frame. I am using two different forms of the function:
>
>
> ## var_name_to_split is a string -- something like "var1" which is t
ot. I was
> just trying to know my mistake. I am sorry if it is a basic question.
> Thank you and others for your reply.
> Best Regards,
> S.
>
> On Mon, Dec 6, 2010 at 5:28 PM, Hadley Wickham wrote:
>>
>> On Mon, Dec 6, 2010 at 3:58 AM, Sunny Srivastava
>> wro
ndy
>
> Sandy Small
> Clinical Physicist
> NHS Forth Valley
> (Tel: 01324567002)
> and
> NHS Greater Glasgow and Clyde
> (Tel: 01412114592)
>
> From: h.wick...@gmail.com [h.wick...@gmail.com] On Behalf Of Hadley Wickham
&g
> It isn't quite convenient to read the data posted below into R
> (if it was originally tab-separated, that formatting got lost) but
> ddply from the plyr package is good for this: something like (untested)
>
> d <- with(data,ddply(data,interaction(UniqueID,Reason),
> function
> In ggplot2, by default the x-axis is in the bottom of the graph and
> y-axis is in the left of the graph. I wonder if it is possible to:
>
> 1. put the x axis in the top, or put the y axis in the right?
> 2. display x axis in both the top and bottom?
These are on the to do list.
> 3. display x
>> input <- do.call(rbind, lapply(fileNames, function(.name){
> + .data <- read.table(.name, header = TRUE, as.is = TRUE)
> + # add file name to the data
> + .data$file <- .name
> + .data
> + }))
You can simplify this a little with plyr:
fileNames <- list.files(pattern = "file.*.c
ggplot2
ggplot2 is a plotting system for R, based on the grammar of graphics,
which tries to take the good parts of base and lattice graphics and
avoid bad parts. It takes care of many of the fiddly details
that make plotting a hassle (l
It looks like you have csv files, so use read.csv instead of read.table.
Hadley
On Thu, Dec 30, 2010 at 12:18 AM, Amy Milano wrote:
> Dear sir,
>
> At the outset I sincerely apologize for reverting back bit late as I was out
> of office. I thank you for your guidance extended by you in response
> The Inkscape user asked if there was any way that R could be coerced to use
> actual circles or paths for the points. I am not aware of a way to do this so
> any input from anyone here would be greatly appreciated.
pdf(..., useDingbats = F)
Hadley
--
Assistant Professor / Dobelman Family Juni
Hi Frank,
I think you mean packagename::functionname? The three colon form is
for accessing non-exported objects. Otherwise, I think using :: vs
importFrom is functionally identical - either approach delays package
loading until necessary.
Hadley
On Mon, Jan 3, 2011 at 9:45 PM, Frank Harrell
>> I think you mean packagename::functionname? The three colon form is
>> for accessing non-exported objects.
>
> Normally two colons suffice, but within a package you need three to
> access exported but un-imported objects :)
Are you sure?
Note that it is typically a design mistake to use
> Correct. I'm doing this because of non-exported functions in other packages,
> so I need :::
But you really really shouldn't be doing that. Is there a reason that
the package authors won't export the functions?
> I'd still appreciate any insight about whether importFrom in NAMESPACE
> defers
# plyr
plyr is a set of tools for a common set of problems: you need to
__split__ up a big data structure into homogeneous pieces, __apply__ a
function to each piece and then __combine__ all the results back
together. For example, you might want to:
* fit the same model each patient subsets of
Reshape2 is a reboot of the reshape package. It's been over five years
since the first release of the package, and in that time I've learned
a tremendous amount about R programming, and how to work with data in
R. Reshape2 uses that knowledge to make a new package for reshaping
data that is much mo
> The data is initially extracted from an SQL database into Excel, then saved
> as a tab-delimited text file for use in R.
You might also want to look at the SQL packages for R so you can skip
this manual step. I'd recommend starting with
http://cran.r-project.org/doc/manuals/R-data.html#Relation
exp(median(log(x)) ?
Hadley
On Sat, Jan 15, 2011 at 10:26 AM, Skull Crossbones
wrote:
> Hi All,
>
> I need to calculate the median for even number of data points.However
> instead of calculating
> the arithmetic mean of the two middle values,I need to calculate their
> geometric mean.
>
> Though
> The normal distribution is a continuous distribution, i.e., the frequency
> for each observed value will essentially be 1/n and not converge to the
> density function. Hence, you would need to look at histogram or smoothed
> densities. Rootograms, on the other hand, are intended for discrete
> di
On Sun, Jan 16, 2011 at 5:48 AM, wrote:
> Here is one way
>
> Here is one way:
>
>> con <- textConnection("
> + ID TIME OBS
> + 001 2200 23
> + 001 2400 11
> + 001 3200 10
> + 001 4500 22
> + 003 3900 45
>
On Mon, Jan 17, 2011 at 2:20 PM, Sean Zhang wrote:
> Dear R-Helpers,
>
> I wonder whether there is a function which cuts a multiple dimensional array
> along a chosen dimension and then store each piece (still an array of one
> dimension less) into a list.
> For example,
>
> arr <- array(seq(1*2*3
> library(plyr)
> # Function to sum y by A-B combinations for a generic data frame
> dsum <- function(d) ddply(d, .(A, B), summarise, sumY = sum(y))
See count in plyr 1.4 for a much much faster way of doing this.
Hadley
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statist
> how can I perform a string operation like strsplit(x," ") on a column of a
> dataframe, and put the first or the second item of the split into a new
> dataframe column?
> (so that on each row it is consistent)
Have a look at str_split_fixed in the stringr package.
Hadley
--
Assistant Profes
Hi Sandy,
It's difficult to know what's going wrong without a small reproducible
example (https://github.com/hadley/devtools/wiki/Reproducibility) -
could you please provide one? You might also have better luck with an
email directly to the ggplot2 mailing list.
Hadley
On Wed, Jan 19, 2011 at 2
>> I think I should be able to do this using the "reshape" function, but
>> I cannot get it to work. I think I need some help to understand
>> this...
>>
>>
>> (If I could split the "variable" into three separate columns splitting
>> by ".", that would be even better.)
>
> Use strsplit and "["
Or
nhs.net> wrote:
>
> Hi
> Still having problems in that when I use geom_hline and facet_grid
> together I get two extra empty panels
> A reproducible example can be found at:
> [2]https://gist.github.com/786894
> Sandy Small
> ___
On Wed, Jan 26, 2011 at 5:27 AM, Dennis Murphy wrote:
> Hi:
>
> Here are two more candidates, using the plyr and data.table packages:
>
> library(plyr)
> ddply(X, .(x, y), function(d) length(unique(d$z)))
> x y V1
> 1 1 1 2
> 2 1 2 2
> 3 2 3 2
> 4 2 4 2
> 5 3 5 2
> 6 3 6 2
>
> The function
Here's one way, using a function from the plyr package:
TheLittleOne<-data.frame(cbind(c(2,3),c(2,3)))
TheBigOne<-data.frame(cbind(c(1,1,2),c(1,1,2)))
keys <- plyr:::join.keys(TheBigOne, TheLittleOne)
!(keys$x %in% keys$y)
TheBigOne[!(keys$x %in% keys$y), ]
Hadley
On Thu, Jul 29, 2010 at 1:38
> Well, here's one way that "might" work (explanation below):
>
> The ideas is to turn each row into a character vector and then work with the
> two character vectors.
>
>> bigs <- do.call(paste,TheBigOne)
>> ix <- which(bigs %in% setdiff(bigs,do.call(paste,TheLittleOne)))
>> TheBigOne[ix,]
>
> Ho
On Fri, Aug 6, 2010 at 9:24 AM, W Eryk Wolski wrote:
> Hi,
>
> Would like to make an image
> however the values in z are not on an uniform grid.
>
> Have a dataset with
> length(x) == length(y) == length(z)
> x[1],y[1] gives the position of z[1]
>
> and would like to encode value of z by a color.
On Sat, Aug 7, 2010 at 2:54 AM, Michael Bedward
wrote:
> On 7 August 2010 06:26, Hadley Wickham wrote:
>
>> library(ggplot2)
>> qplot(x, y, fill = z, data = df, geom = "tile")
>
> Hi Hadley,
>
> I read the original question as being about irregularly spac
With sweave, you need to explicitly print() the output of ggplot2 and
lattice plots.
Hadley
On Mon, Aug 9, 2010 at 6:32 AM, W Eryk Wolski wrote:
> qplot does (?) what I was looking for!
> At least it plots what I want to plot in the interactive modus.
> However, it seems not to work with Sweave.
On Mon, Aug 9, 2010 at 9:29 AM, David Winsemius wrote:
> If you look at the output (as I did) you should see that despite whatever
> expectations you have developed regarding plyr, that it did not produce a
> grouping variable:
>
>> ldply(dl, function(x) coef(summary(x)) )
> fac Estimate Std
> There is one further improvement to consider. When I tried using dlply to
> tackle a problem on which I had been bashing my head for the last three days
> and it gave just the results I had been looking for, I also noticed that the
> dlply function returns the grouping variable levels in an attri
>> That's exactly what dlply does - so you should never have to do that
>> yourself.
>
> I'm unclear what you are saying. Are you saying that the plyr function
> _should_ have examined the objects in that list and determined that there
> were 4 rows and properly labeled the rows to indicate which l
On Mon, Aug 9, 2010 at 4:30 PM, Matthew Dowle wrote:
>
>
> Another option for consideration :
>
> library(data.table)
> mydt = as.data.table(mydf)
>
> mydt[,as.list(coef(lm(y~x1+x2+x3))),by=fac]
> fac X.Intercept. x1 x2 x3
> [1,] 0 -0.16247059 1.130220 2.988769 -19.14719
> When ggplot2 verifies the widths before stacking (the default position for
> histograms), it computes the widths from the minimum and maximum values for
> each bin. However, because the width of the bins (0.28) is much smaller
> than the scale of the edges (6.8e+09), there is some underflow and
On Wed, Aug 11, 2010 at 10:14 PM, Brian Tsai wrote:
> Hi all,
>
> I'm interested in doing a dot plot where *both* the size and color (more
> specifically, shade of grey) change with the associated value.
>
> I've found examples online for ggplot2 where you can scale the size of the
> dot with a va
You may find a close reading of ?merge helpful, particularly this
sentence: "If there is more than one match, all possible
matches contribute one row each" (so check that you don't have
multiple matches).
Hadley
On Sat, Aug 21, 2010 at 10:45 AM, Cecilia Carmo wrote:
> Hi everyone,
>
> I have be
Hi all,
Is there a function to determine whether a set of vectors is cleanly
recyclable? i.e. is there a common function for detecting the
error/warnings that underlie the following two function calls?
> 1:3 + 1:2
[1] 2 4 4
Warning message:
In 1:3 + 1:2 :
longer object length is not a multiple
I should note that I realise this function is pretty trivial to write
(see below), I just want to avoid reinventing the wheel.
recyclable <- function(...) {
lengths <- vapply(list(...), length, 1)
all(max(lengths) %% lengths == 0)
}
Hadley
On Mon, Aug 23, 2010 at 10:33 AM, Hadley W
Hi all,
all.equal is generally very useful when you want to find the
differences between two objects. It breaks down however, when you
have two long strings to compare:
> all.equal(a, b)
[1] "1 string mismatch"
Does any one know of any good text diffing tools implemented in R?
Thanks,
Hadley
On Mon, Aug 23, 2010 at 1:02 PM, Alison Macalady wrote:
> Hi,
>
> I have a 5-paneled figure that i made using the facet function in qplot
> (ggplot). I've managed to arrange the panels into two rows/three columns,
> but for the sake of easy visual comparisons between panels in my particular
> dat
On Tue, Aug 24, 2010 at 11:25 AM, Martin Morgan wrote:
> On 08/24/2010 07:27 AM, Doran, Harold wrote:
>> There is the stringMatch function in the MiscPsycho package.
>>
>>> stringMatch('Hadley', 'Hadley Wickham', normalize = 'no')
>>
On Wed, Aug 25, 2010 at 6:05 AM, abotaha wrote:
>
> Woow, it is amazing,
> thank you very much.
> yes i forget to attach the dates, however, the dates in my case is every 16
> days.
> so how i can use 16 day interval instead of month in by option.
Here's one way using the lubridate package:
libr
Strings are not glamorous, high-profile components of R, but they do
play a big role in many data cleaning and preparations tasks. R
provides a solid set of string operations, but because they have grown
organically over time, they can be inconsistent and a little hard to
learn. Additionally, they
It's not just that counts might be zero, but also that the base of
each bar starts at zero. I really don't see how logging the y/axis of
a histogram makes sense.
Hadley
On Sunday, August 29, 2010, Joshua Wiley wrote:
> Hi Derek,
>
> Here is an option using the package ggplot2:
>
> library(ggplot
> I have counts ranging over 4-6 orders of magnitude with peaks
> occurring at various 'magic' values. Using a log scale for the
> y-axis enables the smaller peaks, which would otherwise
> be almost invisible bumps along the x-axis, to be seen
That doesn't justify the use of a _histogram_ - and
>> That doesn't justify the use of a _histogram_ - and regardless of
>
> The usage highlights meaningful characteristics of the data.
> What better justification for any method of analysis and display is
> there?
That you're displaying something that is mathematically well founded
and meaningful
> first ddply result did I see that some sort of misregistration had occurred;
> Better with:
>
> res <-ddply(egraw2, .(category), .fun=function(df) {
> sapply(df,
> function(x) {mnx <- mean(x, na.rm=TRUE);
> sapply(x, function(z) if
# testthat
Testing your code is normally painful and boring. `testthat` tries to
make testing as fun as possible, so that you get a visceral
satisfaction from writing tests. Testing should be fun, not a drag, so
you do it all the time. To make that happen, `testthat`:
* Provides functions that ma
> One common way around this is to pre-allocate memory and then to
> populate the object using a loop, but a somewhat easier solution here
> turns out to be ldply() in the plyr package. The following is the same
> idea as do.call(rbind, l), only faster:
>
>> system.time(u3 <- ldply(l, rbind))
> u
Have a look at match and merge.
Hadley
On Wednesday, September 8, 2010, Michael Haenlein
wrote:
> Dear all,
>
> I'm working with two data frames.
>
> The first frame (agg_data) consists of two columns. agg_data[,1] is a unique
> ID for each row and agg_data[,2] contains a continuous variable.
>
>
On Wed, Apr 27, 2011 at 4:58 AM, Dennis Murphy wrote:
> Hi:
>
> You could try something like
>
> df <- data.frame( expand.grid( Week = 1:52, Year = 2002:2011 ))
expand.grid already returns a data frame... You might want
KEEP.OUT.ATTRS = F though. Even it feels like you are yelling at R.
Hadley
> This has the side effect of ignoring errors
> and even hiding the error messages. If you
> are concerned about multiple calls to on.exit()
> in one function you could define a new function
> like
> withOptions <- function(optionList, expr) {
> oldOpts <- options(optionList)
> on.exit(option
On Wed, Apr 27, 2011 at 3:55 PM, Justin Haynes wrote:
> I am trying to extract the shape and scale parameters of a wind speed
> distribution for different sites. I can do this in a clunky way, but
> I was hoping to find a way using data.table or plyr. However, when I
> try I am met with the foll
> Put together a list and we can see what might make sense. If we did
> take this on it would be good to think about providing a reasonable
> mechanism for addressing the small flaw in this function as it is
> defined here.
In devtools, I have:
#' Evaluate code in specified locale.
with_locale <
>> Using paste(Site,Prof) when calling ave() is ugly, in that it
>> forces you to consider implementation details that you expect
>> ave() to take care of (how does paste convert various types
>> to strings?). It also courts errors since paste("A B", "C")
>> and paste("A", "B C") give the same re
> Thats the ticket! So mean is already set up to operate on columns but max and
> min are not? I guess its not too important now I know ... but whats going on
> in
> the background that makes that happen?
Basically, this:
> mean.data.frame
function (x, ...)
sapply(x, mean, ...)
> min.data.fra
601 - 700 of 1521 matches
Mail list logo