Re: [R] Version control (git, mercurial) for R packages

2012-02-10 Thread Rainer M Krug
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

CCd to r-devel as suggested by Peter.

On 09/02/12 19:28, Hadley Wickham wrote:
>>> I'm exploring using a version control system to keep better
>>> track of changes to the packages I maintain. I'm leaning
>>> towards git (although mercurial also looks good) but am not
>>> sure what is the best way to set up the repository. It seems I
>>> can't set the repository directly within the R package main
>>> directory, since it will be incompatible with the standard
>>> package structure. I can set the repository in the directory
>>> that contains my R packages, but then I have a single
>>> repository for all of my packages, which is not ideal.
>> 
>> Git is nice - but if you ar looking for simplicity in usage for
>> R packages, I guess r-forge and rforge are the easiest to use.
> 
> I think git + github is substantially easier to use, especially if
> you want to incorporate patches from the community.
> 
>> But I would be interested in the workflow when using github as
>> the main VCS.
> 
> The thing that has made this really successful for me is the 
> install_github function in devtools.  Then it's easy for people to
> try out development versions:
> 
> install.packages("devtools") library(devtools) 
> install_github("myrepo", "myusername")

Thanks Hadley - I should definitely look closer into the devtools
package - there really seem to be some gems in there.
One example is this function - sounds perfect for small scale usages
and for developers.

But what I was thinking about (in my other post as well) is to include
github (or any other git repo provider) into r-forge for the automatic
creation of packages.

So is there an easy way to kind of push a certain revision up to
r-forge to have the automatic builds?
One could do it manualy, but automatic would be so much nicer.

> 
> This requires a working R development environment but thanks to
> the hard work of Simon Urbanek, Duncan Murdoch, Brian Ripley and
> others, this is a one-install process on all major platforms.

True - no problem for developers.

Cheers,

Rainer

> 
> Hadley
> 
> 


- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :   +33 - (0)9 53 10 27 44
Cell:   +33 - (0)6 85 62 59 98
Fax :   +33 - (0)9 58 10 27 44

Fax (D):+49 - (0)3 21 21 25 22 44

email:  rai...@krugs.de

Skype:  RMkrug
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk800TcACgkQoYgNqgF2ego0QgCfcDIpUlxmVHfYMHi/IlCNKSJ3
1sIAnjQEmBYoTXIE5SlvRUqs/eAioibC
=38uA
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Importing a CSV file

2012-02-10 Thread peter dalgaard

On Feb 10, 2012, at 04:20 , sezgin ozcan wrote:

> I use mac.
> I tried this command also
> a<-read.table("clipboard",sep=”\t”,row.names=1,header=T)

Something is wrong with those quotes around \t... 

Also, last I needed this, using file="clipboard" to read from the clipboard 
didn't work on Mac, but file=pipe("pbpaste") did.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Custom caret metric based on prob-predictions/rankings

2012-02-10 Thread Yang Zhang
Actually, is there any way to get at additional information beyond the
classProbs?  In particular, is there any way to find out the
associated weights, or otherwise the row indices into the original
model matrix corresponding to the tested instances?

On Thu, Feb 9, 2012 at 4:37 PM, Yang Zhang  wrote:
> Oops, found trainControl's classProbs right after I sent!
>
> On Thu, Feb 9, 2012 at 4:30 PM, Yang Zhang  wrote:
>> I'm dealing with classification problems, and I'm trying to specify a
>> custom scoring metric (recall@p, ROC, etc.) that depends on not just
>> the class output but the probability estimates, so that caret::train
>> can choose the optimal tuning parameters based on this metric.
>>
>> However, when I supply a trainControl summaryFunction, the data given
>> to it contains only class predictions, so the only metrics possible
>> are things like accuracy, kappa, etc.
>>
>> Is there any way to do this that I'm looking?  If not, could I put
>> this in as a feature request?  Thanks!
>>
>> --
>> Yang Zhang
>> http://yz.mit.edu/
>
>
>
> --
> Yang Zhang
> http://yz.mit.edu/



-- 
Yang Zhang
http://yz.mit.edu/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding all the coefficients for a logit model

2012-02-10 Thread peter dalgaard

On Feb 10, 2012, at 06:08 , R. Michael Weylandt wrote:

> Add -1 to your glm to remove the intercept term -- that will force
> Friday to have its own coefficient.


That's one answer. 

Another one is that the coefficient (in the model with an intercept term) for 
Friday is zero with an s.e. of zero. Namely, by definition, the log-odds for 
Friday minus the log-odds for Friday.

-pd

> 
> E.g.,
> 
> dg <- data.frame(value = rnorm(28), day = letters[1:7])
> lm(value ~ day, data = dg)
> lm(value ~ day - 1, data = dg)
> 
> Michael
> 
> On Thu, Feb 9, 2012 at 6:58 PM, Abraham Mathew  wrote:
>> Let's say I have a variable, day, which is saved as a factor with 7 levels,
>> and I use it in a
>> logistic regression model. I ran the model using the car package in R and
>> printed out the
>> results.
>> 
>> mod1 = glm(factor(status1) ~ factor(day), data=mydat,
>> family=binomial(link="logit"))
>> print(summary(mod1))
>> 
>> The result I get is:
>> 
>> Coefficients:
>> Estimate Std. Error z value Pr(>|z|)
>> (Intercept)   -0.4350 0.0379  -11.48   <2e-16 ***
>> factor(day)Monday -0.6072 0.0479  -12.69   <2e-16 ***
>> factor(day)Saturday0.5964 0.0559   10.67   <2e-16 ***
>> factor(day)Sunday  1.1140 0.0627   17.78   <2e-16 ***
>> factor(day)Thursday   -0.4492 0.0516   -8.71   <2e-16 ***
>> factor(day)Tuesday-0.9331 0.0496  -18.82   <2e-16 ***
>> factor(day)Wednesday  -0.8575 0.0486  -17.63   <2e-16 ***
>> 
>> 
>> It seems that Friday is being used as the baseline, but I want to know
>> 
>> how I can acquire the coefficient for the baseline (friday)?
>> 
>> 
>> I ran mod1$coefficients, but that didn't do the trick.
>> 
>> 
>> Can anyone help.
>> 
>> 
>> Thank you,
>> 
>> Abraham M.
>> 
>>[[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Recall@p plot using ROCR?

2012-02-10 Thread Yang Zhang
Is it possible to use ROCR to plot a simple recall@p plot?  I.e., a
plot where the x-axis is the position into the ranked test set, and
the y-axis is the recall, so you can see what's the recall in the top
10% of the ranked results.

I searched through the performance() manual but found nothing.

(Not a cutoff-vs-recall graph, since the cutoff is the probability
estimate returned by the model.)

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot using scale_x_date gives Error in seq.int(r1$year, to$year, by)

2012-02-10 Thread Hadley Wickham
Hi Aidan,

str is your friend:

> str(g)
'data.frame':   9 obs. of  3 variables:
 $ Date: chr  "2011-12-23" "2011-12-30" "2012-01-06" "2011-12-23" ...
 $ variable: Factor w/ 3 levels "Price","Yield",..: 1 1 1 2 2 2 3 3 3
 $ value   : num  86.78 86.04 86.44 9.74 9.54 ...

You haven't turned the Date variable into an actual date yet.  Try:

g$Date <- as.Date(g$Date)

Hadley

On Fri, Jan 6, 2012 at 9:12 AM, Aidan Corcoran
 wrote:
> Dear all,
>
> ggplot gives me an error when trying to plot time series data using a
> date variable as the x axis.
>
> g<-structure(list(Date = c("2011-12-23", "2011-12-30", "2012-01-06",
> "2011-12-23", "2011-12-30", "2012-01-06", "2011-12-23", "2011-12-30",
> "2012-01-06"), variable = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
> 3L, 3L, 3L), .Label = c("Price", "Yield", "CDS Spread"), class = "factor"),
>    value = c(86.777, 86.037, 86.437, 9.737, 9.542, 9.683, 580.132,
>    576.866, 573.564)), .Names = c("Date", "variable", "value"
> ), row.names = c(100L, 101L, 102L, 202L, 203L, 204L, 304L, 305L,
> 306L), class = "data.frame")
>
>> g
>          Date   variable value
> 100 2011-12-23      Price  86.8
> 101 2011-12-30      Price  86.0
> 102 2012-01-06      Price  86.4
> 202 2011-12-23      Yield   9.7
> 203 2011-12-30      Yield   9.5
> 204 2012-01-06      Yield   9.7
> 304 2011-12-23 CDS Spread 580.1
> 305 2011-12-30 CDS Spread 576.9
> 306 2012-01-06 CDS Spread 573.6
>> gp<-ggplot(g,aes(x=Date,y=value,group=variable,lty=variable)) +geom_line()
>> gp
>> #THIS WORKS FINE (BUT AXIS LABELLING NOT VISIBLE WITH MORE DATA, HENCE I 
>> WOULD LIKE TO USE SCALE_X_DATE)
>> gp<-gp+   scale_x_date()
>> gp
> Error in seq.int(r1$year, to$year, by) : 'from' must be finite
> In addition: Warning messages:
> 1: In get(x, envir = this, inherits = inh)(this, ...) :
>  NAs introduced by coercion
> 2: In min(x) : no non-missing arguments to min; returning Inf
> 3: In max(x) : no non-missing arguments to max; returning -Inf
> 4: In min(x) : no non-missing arguments to min; returning Inf
> 5: In max(x) : no non-missing arguments to max; returning -Inf
>>
>
> In previous help requests, the workaround of specifying the unit was 
> suggested:
> gp<-gp+   scale_x_date(major="years")
> but this doesn't work for me (same error).
>
> Any help is very much appreciated! Thanks in advance.
>
> Aidan
>
> R version 2.14.1 (2011-12-22)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_Ireland.1252  LC_CTYPE=English_Ireland.1252
> LC_MONETARY=English_Ireland.1252
> [4] LC_NUMERIC=C                     LC_TIME=English_Ireland.1252
>
> attached base packages:
>  [1] datasets  tools     grDevices grid      splines   graphics  stats
>    tcltk     utils
> [10] methods   base
>
> other attached packages:
>  [1] micEcon_0.6-6      miscTools_0.6-12   np_0.40-11
> cubature_1.0       boot_1.3-3
>  [6] RODBC_1.3-3        sqldf_0.4-6.1      chron_2.3-41
> gsubfn_0.5-7       DBI_0.2-5
> [11] Haver_1.0          xtable_1.5-6       plm_1.2-7
> sandwich_2.2-7     MASS_7.3-16
> [16] Formula_1.0-1      nlme_3.1-102       bdsmatrix_1.0
> RBloomberg_0.4-150 rJava_0.9-1
> [21] gtools_2.6.2       gdata_2.8.2        ggplot2_0.8.9
> proto_0.3-9.2      zoo_1.7-4
> [26] reshape_0.8.4      plyr_1.6           svSocket_0.9-52
> TinnR_1.0.3        R2HTML_2.2
> [31] Hmisc_3.8-3        survival_2.36-10
>
> loaded via a namespace (and not attached):
> [1] cluster_1.14.1        digest_0.5.0          lattice_0.20-0
> RSQLite_0.10.0
> [5] RSQLite.extfuns_0.0.1 svMisc_0.9-63
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GGplot controlling point size across range

2012-02-10 Thread Hadley Wickham
On Thu, Jan 12, 2012 at 5:27 PM, Darran King  wrote:
> Hi all
>
> New to R and GGplot2 but loving the potential. I am trying to plot four
> separate point plots by looping over the data and plotting a different
> subset each time.
>
> When I plot the data as a point plot, the size of the points is determined
> by the data values used as below
>
> qplot(accum_rain, accum_g_radn, data = clim_sub[[i]], size = avgyld, colour
> = avgyld)
>
> The problem is that i want all four plots to be comparable, so a point size
> representing avgyld = 2000 should be the same on all four plots. However as
> the data for some plots has a smaller range than others and the plots are
> automatically scalling to the range of data in each plot, and the largest
> point is always assigned to the largest value a plot with a top value of say
> 5000 with be represented with the same size point as a plot with a top value
> of 7000.
>
> Any tips on how to scale the point sizes to a defined range of classes and
> still plot the actual data to those classes?

Specify limits to scale_area:  + scale_area(limits = c(0, 1000))

If you're still having problems, you might want to try the ggplot2
mailing list, as Jeff suggested.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pretty(range(data$z),10) Error

2012-02-10 Thread Hadley Wickham
Hi Mario,

If you're still having problems, I'd suggest sending a small
reproducible example
(https://github.com/hadley/devtools/wiki/Reproducibility) to the
ggplot2 mailing list.

Hadley

On Tue, Jan 17, 2012 at 12:32 PM, Mario Giesel  wrote:
> Hello, R-List,
> I'm getting error messages when adding geom_density2d() [package ggplot2]:
> 
> Fehler in pretty(range(data$z), 10) :
>   NA/NaN/Inf in externem Funktionsaufruf (arg 1)
> Zusätzlich: Warnmeldungen:
> 1: Removed 1 rows containing missing values (stat_contour).
> 2: In min(x) : kein nicht-fehlendes Argument für min; gebe Inf zurück
> 3: In max(x) : kein nicht-fehlendes Argument für max; gebe -Inf zurück
> 4: In min(x) : kein nicht-fehlendes Argument für min; gebe Inf zurück
> 5: In max(x) : kein nicht-fehlendes Argument für max; gebe -Inf zurück
> 
>
> I installed 2.11.1 Patched in the hope to overcome this obstacle but in vain.
> I tried reading spss and csv format - no effect.
>
> While this one works:
> ggplot(sp2, aes(x=Qq04, y=Qq01)) + geom_point() + facet_wrap(~ segment,ncol=3)
> This one fails:
> ggplot(sp2, aes(x=Qq04, y=Qq01)) + geom_point() + facet_wrap(~ 
> segment,ncol=3) + geom_density2d()
>
> But it only fails for 2 out of 10 x-variables, although they do not differ in 
> structure.
> Data structure looks like this:
>
> segment Qq01 Qq02 Qq02a Qq04 Qq05 Qq07 Qq08 Qq10 Qq11 Qq15 Qq17a
> A 7 5 5 5 5 4 5 5 3 7 7
> A 5 4 4 3 4 3 4 4 4 5 6
> B 3 3 3 3 2 3 4 2 4 3 2
> B 6 4 5 3 3 3 4 4 3 6 6
> C 3 3 3 3 3 4 2 3 3 4 2
> C 2 1 4 3 1 4 1 1 3 4 2
> D 5 3 4 3 3 3 4 3 3 6 3
> D 5 3 4 2 2 4 4 3 3 4 4
> E 3 3 3 2 3 4 4 3 2 3 4
> E 7 5 5 5 5 4 4 5 1 7 7
> F 6 5 4 3 3 3 4 3 3 6 6
> F 5 3 4 3 3 4 4 3 3 4 4
> G 4 3 3 3 2 1 3 3 3 4 4
> G 6 5 5 5 5 4 5 5 3 7 7
> H 6 5 5 4 4 4 5 4 3 5 6
> H 7 5 5 5 4 4 5 5 3 7 7
> I 6 5 5 4 4 4 5 4 3 6 6
> I 6 3 4 2 3 4 4 3 4 6 4
> J 5 3 4 1 3 4 4 3 3 5 5
> J 4 2 4 3 2 2 3 2 3 4 4
> K 4 4 4 2 3 4 4 4 3 5 5
> K 6 4 3 3 2 3 4 3 3 6 5
> L 6 4 5 3 3 4 5 4 3 6 6
> L 3 1 2 2 1 3 3 1 3 4 4
> ...
> About 1.800 cases (more than 100 per segment).
> Did anybody experience similar problems?
>
> Thanks for all comments,
>  Mario
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plotting bar graph over a geographical map

2012-02-10 Thread Hadley Wickham
Hi Simon,

You might want to try sending a small reproducible example
(https://github.com/hadley/devtools/wiki/Reproducibility) to the
ggplot2 mailing list.

Hadley

On Tue, Jan 31, 2012 at 10:53 PM, sjlabrie  wrote:
> Hi,
>
> I am looking for a way to plot bar on a map instead of the standard points.
> I have been using ggplot2 and maps libraries.
> The points are added with the function geom_point. I know that there is a
> function
> geom_bar but I can't figure out how to use it.
>
> Thank you for your help,
>
> Simon
>
> ### R-code
> library(ggplot2)
> library(maps)
>
> measurements <- read.csv("all_podo.count.csv", header=T)
> allworld <- map_data("world")
>
> pdf("map.pdf")
> ggplot(measurements, aes(long, lat)) +
>  geom_polygon(data = allworld, aes(x = long, y = lat, group = group),
>  colour = "grey70", fill = "grey70") +
>  geom_point(aes(size = ref)) +
>  opts(axis.title.x = theme_blank(),
>  axis.title.y = theme_blank()) +
>  geom_bar(aes(y = normcount))
> dev.off()
> ###
>
>
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Plotting-bar-graph-over-a-geographical-map-tp4346925p4346925.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 geom_polygon fill

2012-02-10 Thread Hadley Wickham
Hi Raimund,

To increase your chances of getting help, I'd recommend using the
ggplot2 mailing list, and reducing your example down to the essence of
the problem.  For example, the theme components don't affect the
problem, but make the code longer, and so harder to understand.

Hadley

On Mon, Feb 6, 2012 at 10:50 AM, raimund  wrote:
> Hi everyone,
>
> i've been trying to make a special plot with ggplot2, but I can't get it to
> fill the polygon I'd like to see filled so very very much.
>
> I want to display the difference or change in the distribution of the
> modified Rankin Scale between two groups. mRS is a scale for disability or
> daily activities competence.
>
> It looks like this.
>
> http://r.789695.n4.nabble.com/file/n4361919/rankinplot.png
>
> I just wish the polygons in between the bars would fill in the same colors
> as the bar segments do.
> Interestingly, in the example provided by the geom_polygon help page, there
> is a fill, which leads me to suspect that something with my data frame might
> be wrong.
>
> If I supply a "colour" argument, I get borders, but not always in the color
> I defined. The attached image has such borders, to make clear what polygons
> I am talking about. In the final plot I don't desire such borders, only the
> as of yet unattainable fill.
>
>
> Here's my code:
>
> library(ggplot2)
> library(plyr)
>
>
> # define good looks
>
> no_margins <- opts(
>  axis.line =         theme_blank(),
>  axis.text.x =       theme_blank(),
>  axis.ticks =        theme_blank(),
>  axis.title.x = theme_text(size = 12, vjust = 0.15),
>  axis.title.y = theme_text(angle = 90, size = 12, vjust = 0.2),
>  axis.ticks.length = unit(0, "cm"),
>  axis.ticks.margin = unit(0, "cm"),
>  panel.background =  theme_blank(),
>  panel.grid.major =  theme_blank(),
>  panel.grid.minor =  theme_blank(),
>  plot.background =   theme_blank(),
>  plot.title =        theme_blank(),
>  plot.margin =       unit(c(1, 1, 1, 1.5), "lines")
>  )
>
> sfm = scale_fill_manual("mRS", c("0"="darkgreen",
>                                 "1"="forestgreen",
>                                 "2"="mediumseagreen",
>                                 "3"="coral",
>                                 "4"="red",
>                                 "5"="darkred",
>                                 "6"="black"))
>
> barwidth = 0.6
>
> # good looks defined
>
> smalldummy = data.frame(
>  mRS = as.factor(rep(0:6,2)),
>  rsfreq = sample(0:6,14,replace=T),
>  treatment = factor(rep(c("one","two"),each=7))
>  )
>
> smalldummy = ddply(smalldummy, .(treatment), transform,
>                   textpos = cumsum(rsfreq/sum(rsfreq)) -
> rsfreq/sum(rsfreq)/2, # center within segment
>                   lineposx = cumsum(rsfreq/sum(rsfreq)))
> # segment borders without 0
>
>
>
> # make the xs of the polygon
> polylo = 1 + barwidth/2
> polyhi = 2 - barwidth/2
> xs = rep(c(polylo,polyhi,polyhi,polylo), 7)
>
> # make the ys of the polygon
> tmp1 = c(0, smalldummy$lineposx[1:7])
> tmp2 = c(0, smalldummy$lineposx[8:14])
> ys = c()
> for(i in 1:7) {
>  nu = c(tmp1[i], tmp2[i], tmp2[i+1], tmp1[i+1])
>  ys = c(ys, nu)
> }
> m = as.factor(rep(0:6, each=4))
> tmpdf = data.frame(xs, ys, mRS = m)
>
>
> bigdummy = merge(smalldummy, tmpdf, by = "mRS")
>
> ggplot(data = bigdummy, aes(x = treatment, y = rsfreq, fill = mRS)) +
>  geom_bar(aes(width = barwidth),stat="identity",position="fill") +
>  geom_text(aes(label=as.character(mRS),
>    y = ifelse(rsfreq != 0, textpos, NA)),
>    size = 8, colour = "white") +
>  geom_polygon(aes(x = xs, y = ys, group = mRS, fill = mRS)) +
>  ylab("Modified Rankin Scale") + xlab("Treatment") +
>  coord_flip() + theme_bw() + opts(legend.position = "none") + no_margins +
> sfm
>
>
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/ggplot2-geom-polygon-fill-tp4361919p4361919.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multiple comparisons of lme model - interactions?

2012-02-10 Thread Fredrik Karlsson
Dear list,

Please excuse my ignorance, but I'm trying to model some data using the lme
package. vot is a numeric response, and condition, location and obs are all
categories.
This works:

> anova(vot.lme <- lme(vot ~ condition * location *
obs,data=mergedCodesL,random= ~1 |patient))
   numDF denDF  F-value p-value
(Intercept)1  1898 462.7519  <.0001
condition  2  1898   8.4126  0.0002
location   112   0.0272  0.8718
obs2  1898 472.5526  <.0001
condition:location 2  1898   1.0467  0.3513
condition:obs  4  1898   1.0683  0.3706
location:obs   2  1898   9.7067  0.0001
condition:location:obs 4  1898   4.6143  0.0010


If I then would like to do post-hoc testing, I found in the email archives
that I could use the glht function of multcomp - something like

summary(glht(vot.lme, linfct=mcp(obs = "Tukey")))

However, if I would like to investigate the condition:location:obs -
interaction. What do I do then?


Best!

/Fredrik
-- 
"Life is like a trumpet - if you don't put anything into it, you don't get
anything out of it."

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Custom caret metric based on prob-predictions/rankings

2012-02-10 Thread Max Kuhn
I think you need to read the man pages and the four vignettes. A lot
of your questions have answers there.

If you don't specify the resampling indices, they ones generated for
you are saved in the train object:

> data(iris)
> TrainData <- iris[,1:4]
> TrainClasses <- iris[,5]
>
> knnFit1 <- train(TrainData, TrainClasses,
+  method = "knn",
+  preProcess = c("center", "scale"),
+  tuneLength = 10,
+  trControl = trainControl(method = "cv"))
Loading required package: class

Attaching package: ‘class’

The following object(s) are masked from ‘package:reshape’:

condense

Warning message:
executing %dopar% sequentially: no parallel backend registered
> str(knnFit1$control$index)
List of 10
 $ Fold01: int [1:135] 1 2 3 4 5 6 7 9 10 11 ...
 $ Fold02: int [1:135] 1 2 3 4 5 6 8 9 10 12 ...
 $ Fold03: int [1:135] 1 3 4 5 6 7 8 9 10 11 ...
 $ Fold04: int [1:135] 1 2 3 5 6 7 8 9 10 11 ...
 $ Fold05: int [1:135] 1 2 3 4 6 7 8 9 11 12 ...
 $ Fold06: int [1:135] 1 2 3 4 5 6 7 8 9 10 ...
 $ Fold07: int [1:135] 1 2 3 4 5 7 8 9 10 11 ...
 $ Fold08: int [1:135] 2 3 4 5 6 7 8 9 10 11 ...
 $ Fold09: int [1:135] 1 2 3 4 5 6 7 8 9 10 ...
 $ Fold10: int [1:135] 1 2 4 5 6 7 8 10 11 12 ...

There is also a savePredictions argument that gives you the hold-out results.

I'm not sure which weights you are referring to.

On Fri, Feb 10, 2012 at 4:38 AM, Yang Zhang  wrote:
> Actually, is there any way to get at additional information beyond the
> classProbs?  In particular, is there any way to find out the
> associated weights, or otherwise the row indices into the original
> model matrix corresponding to the tested instances?
>
> On Thu, Feb 9, 2012 at 4:37 PM, Yang Zhang  wrote:
>> Oops, found trainControl's classProbs right after I sent!
>>
>> On Thu, Feb 9, 2012 at 4:30 PM, Yang Zhang  wrote:
>>> I'm dealing with classification problems, and I'm trying to specify a
>>> custom scoring metric (recall@p, ROC, etc.) that depends on not just
>>> the class output but the probability estimates, so that caret::train
>>> can choose the optimal tuning parameters based on this metric.
>>>
>>> However, when I supply a trainControl summaryFunction, the data given
>>> to it contains only class predictions, so the only metrics possible
>>> are things like accuracy, kappa, etc.
>>>
>>> Is there any way to do this that I'm looking?  If not, could I put
>>> this in as a feature request?  Thanks!
>>>
>>> --
>>> Yang Zhang
>>> http://yz.mit.edu/
>>
>>
>>
>> --
>> Yang Zhang
>> http://yz.mit.edu/
>
>
>
> --
> Yang Zhang
> http://yz.mit.edu/
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R-question about html writer: hwriter; how to create scroll for matrices?

2012-02-10 Thread Bornhauser Hanspeter
 
Dear All

This is the first time I use this help forum. My apologies in case I did
not follow a predefined format.

My question: 

I use the package "hwriter".

How can I produce a matrix (html) that allows scrolling (up and down,
left and right) or, alternatively, prduces a fixed row for the headers
and a fixed column for the row names (so that the names do not disappear
if I go to the right or downward on that webpage)?

I did try quite few things (also via CSS) but I seem to have problems
referring to the entire matrix rather than a single item of it.

For example: assume we have a correlation matrix of some dimension
(assume 100x100). What would be an efficient way to do the above?

Thank you to anyone who has a go at this. 

Kind Regards

Hanspeter


This message may contain confidential information and is...{{dropped:10}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] making multiple lines using qqplot

2012-02-10 Thread ilai
Melissa,
par(new=T) works as many times as you use it. You don't provide data,
but (assuming it is not NULL) more likely your n=500 qqplot was just
obscuring the points of the n=50 plot.

Reverse the order (i.e. qqplot 500 first, 50, 5 last) and see if all
three are there (as there are more 500 you should still see green on
the extremes).

Second, you say you want "lines" but used pch=20. replace with
type='l' to get lines (may also help with your main problem). If you
want to stick with points and your device supports it, you can
consider semi-transparent colors.

Cheers

On Thu, Feb 9, 2012 at 9:00 PM, Melrose2012
 wrote:
> Hi Everyone,
>
> I want to make 3 lines on the same graph (not as subplots, all in the same
> window, one on top of each other) and I want them to be quantile-quantile
> plots (qqplot).  Essentially, I am looking for the equivalent of Matlab's
> "hold on" command to use with qqplot.  I know I can use 'points' or 'lines',
> but these do not give me a qqplot (only appear to work as scatter plots).  I
> found the syntax 'par(new=TRUE)' but that only seems to work for two lines,
> not for three.
>
> My script currently looks like:
>
> qqplot(nq.n5,tq.n5,col="red",xlab="Normal Distribution Quantiles",ylab="t
> Distribution Quantiles",main="Quantile-Quantile Plot of Normal vs
> t-Distribution for Various Sample Sizes",pch=20)
> par(new=TRUE)
> qqplot(nq.n50,tq.n50,col="blue",xlab="",ylab="",pch=20).
> par(new=TRUE)
> qqplot(nq.n500,tq.n500,col="green",xlab="",ylab="",pch=20)
> legend("topleft",c("n=5","n=50","n=500"),fill=c("red","blue","green"))
>
> I realize that this only plots the first and the third qqplot because by
> doing par(new=TRUE) again, it gets rid of the middle one.  I don't know how
> to get around this and get all 3 lines on the same plot.
>
> Can anyone please help me with this syntax?
>
> Thank you very much for your time and advice!
>
> Cheers,
> Melissa
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/making-multiple-lines-using-qqplot-tp4375273p4375273.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot using scale_x_date gives Error in seq.int(r1$year, to$year, by)

2012-02-10 Thread Aidan Corcoran
Thanks Hadley, of course I should have spotted that.

On Fri, Feb 10, 2012 at 12:46 PM, Hadley Wickham  wrote:
> Hi Aidan,
>
> str is your friend:
>
>> str(g)
> 'data.frame':   9 obs. of  3 variables:
>  $ Date    : chr  "2011-12-23" "2011-12-30" "2012-01-06" "2011-12-23" ...
>  $ variable: Factor w/ 3 levels "Price","Yield",..: 1 1 1 2 2 2 3 3 3
>  $ value   : num  86.78 86.04 86.44 9.74 9.54 ...
>
> You haven't turned the Date variable into an actual date yet.  Try:
>
> g$Date <- as.Date(g$Date)
>
> Hadley
>
> On Fri, Jan 6, 2012 at 9:12 AM, Aidan Corcoran
>  wrote:
>> Dear all,
>>
>> ggplot gives me an error when trying to plot time series data using a
>> date variable as the x axis.
>>
>> g<-structure(list(Date = c("2011-12-23", "2011-12-30", "2012-01-06",
>> "2011-12-23", "2011-12-30", "2012-01-06", "2011-12-23", "2011-12-30",
>> "2012-01-06"), variable = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
>> 3L, 3L, 3L), .Label = c("Price", "Yield", "CDS Spread"), class = "factor"),
>>    value = c(86.777, 86.037, 86.437, 9.737, 9.542, 9.683, 580.132,
>>    576.866, 573.564)), .Names = c("Date", "variable", "value"
>> ), row.names = c(100L, 101L, 102L, 202L, 203L, 204L, 304L, 305L,
>> 306L), class = "data.frame")
>>
>>> g
>>          Date   variable value
>> 100 2011-12-23      Price  86.8
>> 101 2011-12-30      Price  86.0
>> 102 2012-01-06      Price  86.4
>> 202 2011-12-23      Yield   9.7
>> 203 2011-12-30      Yield   9.5
>> 204 2012-01-06      Yield   9.7
>> 304 2011-12-23 CDS Spread 580.1
>> 305 2011-12-30 CDS Spread 576.9
>> 306 2012-01-06 CDS Spread 573.6
>>> gp<-ggplot(g,aes(x=Date,y=value,group=variable,lty=variable)) +geom_line()
>>> gp
>>> #THIS WORKS FINE (BUT AXIS LABELLING NOT VISIBLE WITH MORE DATA, HENCE I 
>>> WOULD LIKE TO USE SCALE_X_DATE)
>>> gp<-gp+   scale_x_date()
>>> gp
>> Error in seq.int(r1$year, to$year, by) : 'from' must be finite
>> In addition: Warning messages:
>> 1: In get(x, envir = this, inherits = inh)(this, ...) :
>>  NAs introduced by coercion
>> 2: In min(x) : no non-missing arguments to min; returning Inf
>> 3: In max(x) : no non-missing arguments to max; returning -Inf
>> 4: In min(x) : no non-missing arguments to min; returning Inf
>> 5: In max(x) : no non-missing arguments to max; returning -Inf
>>>
>>
>> In previous help requests, the workaround of specifying the unit was 
>> suggested:
>> gp<-gp+   scale_x_date(major="years")
>> but this doesn't work for me (same error).
>>
>> Any help is very much appreciated! Thanks in advance.
>>
>> Aidan
>>
>> R version 2.14.1 (2011-12-22)
>> Platform: i386-pc-mingw32/i386 (32-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_Ireland.1252  LC_CTYPE=English_Ireland.1252
>> LC_MONETARY=English_Ireland.1252
>> [4] LC_NUMERIC=C                     LC_TIME=English_Ireland.1252
>>
>> attached base packages:
>>  [1] datasets  tools     grDevices grid      splines   graphics  stats
>>    tcltk     utils
>> [10] methods   base
>>
>> other attached packages:
>>  [1] micEcon_0.6-6      miscTools_0.6-12   np_0.40-11
>> cubature_1.0       boot_1.3-3
>>  [6] RODBC_1.3-3        sqldf_0.4-6.1      chron_2.3-41
>> gsubfn_0.5-7       DBI_0.2-5
>> [11] Haver_1.0          xtable_1.5-6       plm_1.2-7
>> sandwich_2.2-7     MASS_7.3-16
>> [16] Formula_1.0-1      nlme_3.1-102       bdsmatrix_1.0
>> RBloomberg_0.4-150 rJava_0.9-1
>> [21] gtools_2.6.2       gdata_2.8.2        ggplot2_0.8.9
>> proto_0.3-9.2      zoo_1.7-4
>> [26] reshape_0.8.4      plyr_1.6           svSocket_0.9-52
>> TinnR_1.0.3        R2HTML_2.2
>> [31] Hmisc_3.8-3        survival_2.36-10
>>
>> loaded via a namespace (and not attached):
>> [1] cluster_1.14.1        digest_0.5.0          lattice_0.20-0
>> RSQLite_0.10.0
>> [5] RSQLite.extfuns_0.0.1 svMisc_0.9-63
>>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coxme with frailty

2012-02-10 Thread Terry Therneau
 A couple of clarifications for you.

1. I write mixed effects Cox models as exp(X beta + Z b), beta = fixed
effects coefficients and b = random effects coefficients.  I'm using
notation that is common in linear mixed effects models (on purpose).
  About 2/3 of the papers use exp(X beta)* c, i.e., pull the random
effects out of the exponent.  Does it make a difference?  Not much: b
will be Gaussian and c will be log-normal. However, it's much easier to
write down complicated (crossed, nested, ect) random effects in the
notation I chose.  The description you give from H&L is using the "c"
style.

 2. If var(b) is known, solution of beta and b is straightforward.  It's
a small modification of the ususal coxph internals for a fixed effects
model.  That is, a bit of c code that has been optimized and exercised
over a decade, it makes sense to re-use it.  So all the routines I know
of solve mixed effects in a nested fashion: an outer maximizer looks for
the parameters of var(b), and for each trial value calls the known
routine (which itself iterates) for the corresponding (beta, b) pair.
The H&L formula you quote only holds in a specific case.

Terry T

 begin included message --
Thank you very much for taking the time to answer. This pointed me in
the right direction.
For those interested, I found a useful explanation  of the derivation
of the estimated variance of the random effect in Hosmer & Lemeshow's
Applied Survival Analysis (1999), pp.321-26 (I love your book, Dr.
Therneay, but I needed something easier...). They proceed in 4 steps:

1. Obtain the cumulative hazard function for each subject.
2. Choose an arbitrary value for the variance parameter of the
frailties (call it theta).
3. Compute for each subject an estimate of the value of their
frailties, USING this variance parameter theta:
frailty_i= \frac(1+\theta \times c_i}{1+\theta \times H_i} (formula on
p. 321), where H is the cumulative hazard for the subject. So if theta
is 0 (no variance), then frailty=1 (i.e., no excess frailty). As theta
goes to infinity, the estimated frailty is simply the ratio
1/(cumulative risk so far) or 1/(cumulative risk so far), depending on
whether the subject is still alive or not.
4. fit the proportional hazard model with the same covariates as
before, but this time including the frailties in the hazard function.
5. calculate a log-likelihood for the model.

Repeat this for all possible values of the frailties (in practice,
sweep through them according to some algorithm), and pick the one with
the highest log-likelihood.

SO IF I understand correctly, the frailties are calculated GIVEN a
variance parameter of the frailties, and not the reverse. I.e., theta
is chosen first, and the random effects are estimated accordingly.
Which is why the variance of the estimated random effects is NOT the
same as theta. Did I get this right?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Find interval between numbers in list or vector

2012-02-10 Thread alend
hi,

  I have a vector as follow:

  myVec = c(1,4,10,30)

How can I get a vector that show the interval of the numbers.

 I need to get this result:

  distance
>3,6,20

--
View this message in context: 
http://r.789695.n4.nabble.com/Find-interval-between-numbers-in-list-or-vector-tp4376115p4376115.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help needed please

2012-02-10 Thread Jaymin Shah
I have coded a time series from simulated data:

simtimeseries <- arima.sim(n=1024,list(order=c(4,0,0),ar=c(2.7607, -3.8106, 
2.6535,  -0.9258),sd=sqrt(1)))
#show roots are outside unit circle
plot.ts(simtimeseries, xlab="", ylab="", main="Time Series of Simulated Data")

# Yule 


q1 <- cbind(simtimeseries[1:1024])
q2 <- t(q1)%*%q1
s0 <- q2/1204
r1 <- cbind(simtimeseries[1:1023])
r2 <- cbind(simtimeseries[2:1024])
r3 <- t(r1)%*%r2
s1 <- r3/1204
t1 <- cbind(simtimeseries[1:1022])
t2 <- cbind(simtimeseries[3:1024])
t3 <- t(t1)%*%t2
s2 <- t3/1204
u1 <- cbind(simtimeseries[1:1021])
u2 <- cbind(simtimeseries[4:1024])
u3 <- t(u1)%*%u2
s3 <- u3/1204
v1 <- cbind(simtimeseries[1:1020])
v2 <- cbind(simtimeseries[5:1024])
v3 <- t(v1)%*%v2
s4 <- v3/1204

i0 <- c(s0,s1,s2,s3)
i1 <- c(s1,s0,s1,s2)
i2 <- c(s2,s1,s0,s1)
i3 <- c(s3,s2,s1,s0)

gamma <- cbind(i0,i1,i2,i3)
eta <-c(s1,s2,s3,s4)
inversegamma <- solve(gamma)
phihat <- inversegamma%*%eta
phihat

Phihat <- cbind(phihat)
s <- c(s1,s2,s3,s4)
S <- cbind(s)
sigmasquaredyule <- s0 - (t(Phihat)%*%S)
sigmasquaredyule



I did a yule walker estimate on the simulated data and wanted to work out phi 
hat which is a vector of 4 values and sigmasquaredyule which is one value. 
However,  I want to run the simulated data 100 times i.e. in a for loop and 
then take the averages of the phi hat values and sigmasquaredyule value.

How would i repeat this simulated time series lots of times (e.g. a 100 times) 
and store the average value of phi hat and sigmasquaredyule.

Thank you


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Find interval between numbers in list or vector

2012-02-10 Thread Dimitris Rizopoulos

Have a look at function diff(), e.g.,

diff(c(1,4,10,30))


I hope it helps.

Best,
Dimitris


On 2/10/2012 1:33 PM, alend wrote:

hi,

   I have a vector as follow:

   myVec = c(1,4,10,30)

How can I get a vector that show the interval of the numbers.

  I need to get this result:

   distance

3,6,20


--
View this message in context: 
http://r.789695.n4.nabble.com/Find-interval-between-numbers-in-list-or-vector-tp4376115p4376115.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Q - scatterplots

2012-02-10 Thread Jhope
I was able to make a scatterplot but ...

1) what does the "86" mean? The "86" shows up on the graph as well. 

> scatterplot (Shells/TotalEggs ~ Sector, data = data.to.analyze)
[1] "86"

2) Also how do you change the Y axis title? I don't want it to read
Shells/TotalEggs, instead I would like it to read Average Hatching Rate (%).

3) What does this error mean? Rayos if composed of section 1, 2, 3, 4 and 5

> scatterplot (Shells/TotalEggs ~ Rayos, data = data.to.analyze)
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf

J

--
View this message in context: 
http://r.789695.n4.nabble.com/Q-scatterplots-tp4375632p4375632.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Split matrix into square submatices

2012-02-10 Thread xiddw
Hi everybody, 

I'm looking for an optimal way to split  a big matrix  (e.g. ncol = 8,
nrow=8)  into small square submatrices  (e.g. ncol=2, nrow=2)

For example

If I have 

>  h
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]19   17   25   33   41   49   57
[2,]2   10   18   26   34   42   50   58
[3,]3   11   19   27   35   43   51   59
[4,]4   12   20   28   36   44   52   60
[5,]5   13   21   29   37   45   53   61
[6,]6   14   22   30   38   46   54   62
[7,]7   15   23   31   39   47   55   63
[8,]8   16   24   32   40   48   56   64

and I want to split matriz h into 16 small submatrices:

> g[1]
 [,1] [,2]
[1,]19
[2,]2   10

> g[2]
 [,1] [,2]
[1,]   17   25
[2,]   18   26

...

> g[4]
 [,1] [,2]
[1,]   49   57
[2,]   50   58

...

> g[16]
 [,1] [,2]
[1,]   55   63
[2,]   56   64

Always the big matrix would be a square matrix and also would have a nrow
and ncol in order of  a power of two, so it always could  be splitted in at
least halves. 

Until now, I'm able to split a matrix but keeping the number of original
matrix columns.

I mean, if I want to split into 16 submatrices, using the following command
what I get its 4 lists: 
But I haven't found a way to split resultant lists in order to get 4
matrices for each list.

> g <- split(as.data.frame(h), (1:4))
> g
$`1`
  V1 V2 V3 V4 V5 V6 V7 V8
1  1  9 17 25 33 41 49 57
5  5 13 21 29 37 45 53 61

$`2`
  V1 V2 V3 V4 V5 V6 V7 V8
2  2 10 18 26 34 42 50 58
6  6 14 22 30 38 46 54 62

$`3`
  V1 V2 V3 V4 V5 V6 V7 V8
3  3 11 19 27 35 43 51 59
7  7 15 23 31 39 47 55 63

$`4`
  V1 V2 V3 V4 V5 V6 V7 V8
4  4 12 20 28 36 44 52 60
8  8 16 24 32 40 48 56 64


I would appreciate any hint. Thanks.

--
View this message in context: 
http://r.789695.n4.nabble.com/Split-matrix-into-square-submatices-tp4375563p4375563.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Rpart and splitting criteria

2012-02-10 Thread MYRIAM TABASSO
Dear All,
I have questions about the function "rpart"  to construct a regression tree
in
R code.
My problem is how to change the splitting criteria.

In the "rpart" we have : parms=list(split="..") , I ask you if in this
command is it possible to use an another splitting criterion to substitute
the
default criteria( gini or information)?

Does someone can help me ?
Thank you,
Myriam Tabasso

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to made .exe of any gui project in R

2012-02-10 Thread sagarnikam123
i made jDialog through JGR package(Java gui 4 R), 
i want to export it with imported external package e.g.bio3d,RCOR , into
.exe 

or if not possible convert code to executable jar file 
how should i go, give me package name,example codes (if possible) 

--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-made-exe-of-any-gui-project-in-R-tp4375938p4375938.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] access results of graph.maxflow and program max-flow algorithm

2012-02-10 Thread Wendy
Hi all, 

I want to use graph.maxflow function for my work. Given a network, I could
get a result, but I could not figure out how the function get the result.
Did anyone have the same problem before? How did you resolve it? Does anyone
know any other functions/programs for max-flow analysis.

Thank you in advance.
Wendy

--
View this message in context: 
http://r.789695.n4.nabble.com/access-results-of-graph-maxflow-and-program-max-flow-algorithm-tp4375744p4375744.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] clustering and the region of integration

2012-02-10 Thread Barbara Uszczynska
Dear R users,

I'm having trouble with calculating pvalues for my 2d dataset. First I
performed clustering and I would like to get some info about the strength
of cluster membership for each point. I've calculated (thanks to nice
people help) the multivariate normal densities (mnd) using dmvnorm function:

pd11=mvtnorm::dmvnorm(dataset1,mean=dataset1MC$parameters$mean[,1],sigma=dataset1MC$parameters$variance$sigma[,,1])

I've obtained a vector of mnds for each cluster:

NA12043  NA12249  NA12264  NA12707  NA12716  NA12717
   NA12751  NA12762  NA12864  NA12873  NA07034  NA07048
 NA07055  NA07345  NA07348  NA07357  NA10830
 NA10835
8.627681e+00 8.465797e+00 1.522724e+01 2.047262e+01 1.780368e+01
2.443946e+01 8.687642e+00 5.024366e+00 2.163811e+01 6.093326e-01
1.503374e+00 2.263341e+00 2.177880e+01 2.851877e+01 1.240402e+01
7.498245e+00 1.186389e+01 1.229760e+01
 NA12154  NA12234  NA12236  NA12763  NA12801
 NA12812  NA12813  NA12878  NA10851  NA10854  NA10857
   NA10859  NA10861  NA10863  NA11839  NA11840  NA11881
 NA11882
8.293616e+00 4.019101e-19 2.733848e+01 2.623284e+01 2.320810e+01
5.112927e-01 1.432336e+01 1.000314e+01 1.675454e+01 8.239816e+00
2.449679e+01 2.655419e+01 2.294064e+01 2.218329e-17 8.844933e+00
2.911991e+00 2.170381e+01 3.089883e+00
 NA11994  NA12044  NA12056  NA12057  NA12891
 NA12892
1.668749e+01 1.588963e+01 5.913443e+00 2.924297e+01 1.765777e+01
7.935129e+00

Next, what I would like to do is to calculate the pvalue for each point,
which was assigned to particular cluster. In order to do this i'm using
pmvnorm function, but I found it difficult to set the region of
integration. As I understand to get the probability of cluster membership I
should define how 'far' from the cluster mean is my point. However, I've
got 2d dataset and my mean is also 2d:

   [1,] [2,]
 1.348992  1.269590

but I've got only one density value for each point.

Using:

pmvnorm(mean=dataset1MC$parameters$mean[,1],sigma=dataset1MC$parameters$variance$sigma[,,1],
lower=2.218329e-17, upper=as.vector(dataset1MC$parameters$mean[,1]))

gives strange results, since for 2.218329e-17 the output is:

 [1] 0.348126
attr(,"error")
[1] 1e-15
attr(,"msg")
[1] "Normal Completion"

and

pmvnorm(mean=dataset1MC$parameters$mean[,1],sigma=dataset1MC$parameters$variance$sigma[,,1],
lower= as.vector(dataset1MC$parameters$mean[,1]) , upper=2.924297e+01)

gives:

[1] 0.348126
attr(,"error")
[1] 1e-15
attr(,"msg")
[1] "Normal Completion"

If it is possible I would like to get some info about:

Is my idea of calculating  the probability of cluster membership is
correct? How I can set properly the region of integration?

I would be grateful for any help.

Best Wishes,

Bas.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Trust in a glm.nb model results with an itereation limit reached

2012-02-10 Thread Lucas
Hello to everyone.
I'm fitting a glm.nb model to a count data. I'm using about 8 predictive
variables.
Once a run the script I do get a result but it tells me that the iteration
limit has been reached.
So, can i trust the results given by the model? 
Could it be a multicollinearity problem?

Thank you for taking the time to help me.
Greetings

Lucas.


--
View this message in context: 
http://r.789695.n4.nabble.com/Trust-in-a-glm-nb-model-results-with-an-itereation-limit-reached-tp4376073p4376073.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Need to aggregate large dataset by week...

2012-02-10 Thread danielrev
Hi all, 

I have a large dataset with ~8600 observations that I want to compress to
weekly means. There are 9 variables (columns), and I have already added a
"week" column with 51 weeks. I have been looking at the functions:
aggregate, tapply, apply, etc. and I am just not savvy enough with R to
figure this out on my own, though I'm sure it's fairly easy. I also have the
Dates (month/day/year) for all of the observations, but I figured just
having a week column may be easier. If someone wanted to show me how to
organize this data using a date function and aggregating by month that would
be useful too!  

Here's an example of the data set, with only 5 of the variables and 10 of
8600 obs.:

 weekrainfall windspeed   winddir  temp   oakdepth
1   1  0.2000   0.89000  245.9200  1.15   4.40
2   1  0.   0.84000  292.8800  1.19   5.30
3   1  0.2000   0.74000  258.5400  1.36   6.00
4   1  0.   0.930003.7000  1.43   4.40
5   1  0.2000   0.69000   37.8200  1.56   5.20
6   1  0.   0.8   17.2900  1.69   4.40
7   1  0.2000   0.7   28.7300  1.88   5.00
8   1  0.2000   1.12000  294.3700  1.93   6.00
9   1  0.   1.21000  274.9700  1.80   4.40
10  1  0.   1.31000  279.2400  1.86   5.80

...so after about 170 observations it changes to week 2, and so on.

I've tried something like this, but its only one variable's mean, and I
would rather have the rows=weeks and columns= the different variables. 

< tapply(metdata$rainfall,metdata$week,FUN=mean)
  1   2   3   4   5   6 
0.080952381 0.101190476 0.379761905 0.179761905 0.0 0.295238095 
  7   8   9  10  11  12 
0.146428571 0.015476190 0.16389 0.098809524 0.065476190 0.215476190 

Hope this is enough information and that I'm not just re-asking an old
question. Thanks so much in advance for any help. 



--
View this message in context: 
http://r.789695.n4.nabble.com/Need-to-aggregate-large-dataset-by-week-tp4376154p4376154.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] best option for big 3D arrays?

2012-02-10 Thread Djordje Bajic
Hi all,

I am trying to fill a 904x904x904 array, but at some point of the loop R
states that the 5.5Gb sized vector is too big to allocate. I have looked at
packages such as "bigmemory", but I need help to decide which is the best
way to store such an object. It would be perfect to store it in this "cube"
form (for indexing and computation purpouses). If not possible, maybe the
best is to store the 904 matrices separately and read them individually
when needed?

Never dealed with such a big dataset, so any help will be appreciated

(R+ESS, Debian 64bit, 4Gb RAM, 4core)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] e1071/svm: never finishes

2012-02-10 Thread Sam Steingold
When I tried to run svm on the same data frame, memory usage as reported
by top(1) doubled to 4GB almost right away and the function never
returned (has been running for ~15 hours now). ^C does not stop it.
This is most unusual, libsvm has always seemed very fast.

This is R version 2.13.1 (2011-07-08) (as distributed with ubuntu).

> * Sam Steingold  [2012-02-09 21:43:30 -0500]:
>
> I did this:
> nb <- naiveBayes(users, platform)
> pl <- predict(nb,users)
> nrow(users) ==> 314781
> ncol(users) ==> 109
>
> 1. naiveBayes() was quite fast (~20 seconds), while predict() was slow
> (tens of minutes).  why?
>
> 2. the predict results were completely off the mark (quite the opposite
> of the expected overfitting).  suffice it to show the tables:
>
> pl:
>
>android blackberry   ipad iphone lg  linuxmac 
>  3  5 11 14 312723  5 11 
> mobile  nokiasamsungsymbianunknownwindows 
>   1864 17 16112  0  0 
>
> platform:
>android blackberry   ipad iphone lg  linuxmac 
>  18013   1221   2647   1328  4   2936  34336 
> mobile  nokiasamsungsymbianunknownwindows 
> 18 88 39103   2660 251388 
>
> i.e., nb classified nearly everything as "lg" while in the actual data
> "lg" is virtually nonexistent.
>
> 3. when I print "nb", I see "A-priori probabilities" (which are what I
> expected) and "Conditional probabilities" which are confusing because
> there are only two of them, e.g.:
>
>  android0.048464998 0.43946764
>  blackberry 0.001638002 0.04045564
>  ipad   0.322251606 1.84940588
>  iphone 0.030873494 0.23250250
>  lg 0.0 0.
>  linux  0.023501362 0.34698919
>  mac0.082653774 1.22535027
>  mobile 0.0 0.
>  nokia  0.0 0.
>  samsung0.0 0.
>  symbian0.0 0.
>  unknown0.003759398 0.08219078
>  windows0.021158528 0.32916970
>
> the predictors are integers.
> is the first column for the 0 predictors and the second for all non-0?
> Is there a way to ask naiveBayes to differenciate between non-0 values?
>
> thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
http://www.childpsy.net/ http://openvotingconsortium.org http://iris.org.il
http://jihadwatch.org http://camera.org http://www.memritv.org
Don't ascribe to malice what can be adequately explained by stupidity.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to combine two matrix or vectors

2012-02-10 Thread summer
such as a=[1,2,3 ],b =[2,4] I want to get a new one [1 2 3 4 5]. Thank you.

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-combine-two-matrix-or-vectors-tp4376467p4376467.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to combine two matrix or vectors

2012-02-10 Thread R. Michael Weylandt
How could you combine a and b as given to get your "new one"? It
doesn't have the same elements...

?union
?intersect
?rbind
?c

Michael

On Fri, Feb 10, 2012 at 9:55 AM, summer  wrote:
> such as a=[1,2,3 ],b =[2,4] I want to get a new one [1 2 3 4 5]. Thank you.
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/How-to-combine-two-matrix-or-vectors-tp4376467p4376467.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Split matrix into square submatices

2012-02-10 Thread Petr Savicky
On Fri, Feb 10, 2012 at 12:02:25AM -0800, xiddw wrote:
> Hi everybody, 
> 
> I'm looking for an optimal way to split  a big matrix  (e.g. ncol = 8,
> nrow=8)  into small square submatrices  (e.g. ncol=2, nrow=2)
> 
> For example
> 
> If I have 
> 
> >  h
>  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
> [1,]19   17   25   33   41   49   57
> [2,]2   10   18   26   34   42   50   58
> [3,]3   11   19   27   35   43   51   59
> [4,]4   12   20   28   36   44   52   60
> [5,]5   13   21   29   37   45   53   61
> [6,]6   14   22   30   38   46   54   62
> [7,]7   15   23   31   39   47   55   63
> [8,]8   16   24   32   40   48   56   64
> 
> and I want to split matriz h into 16 small submatrices:
> 
> > g[1]
>  [,1] [,2]
> [1,]19
> [2,]2   10
> 
> > g[2]
>  [,1] [,2]
> [1,]   17   25
> [2,]   18   26
> 
> ...
> 
> > g[4]
>  [,1] [,2]
> [1,]   49   57
> [2,]   50   58
> 
> ...
> 
> > g[16]
>  [,1] [,2]
> [1,]   55   63
> [2,]   56   64
> 
> Always the big matrix would be a square matrix and also would have a nrow
> and ncol in order of  a power of two, so it always could  be splitted in at
> least halves. 
> 
> Until now, I'm able to split a matrix but keeping the number of original
> matrix columns.
> 
> I mean, if I want to split into 16 submatrices, using the following command
> what I get its 4 lists: 
> But I haven't found a way to split resultant lists in order to get 4
> matrices for each list.

Hi.

Try the following.

  k <- 3
  n <- 2^k
  m <- 2^(k - 2)
  a <- matrix(1:n^2, nrow=n, ncol=n)
  g <- vector("list", length=16)
  
  k <- 1
  for (j in 0:3) {
  for (i in 0:3) {
  g[[k]] <- a[m*i + 1:m, m*j + 1:m]
  k <- k + 1
  }
  }
  
  g[1:5]
  
  [[1]]
   [,1] [,2]
  [1,]19
  [2,]2   10
  
  [[2]]
   [,1] [,2]
  [1,]3   11
  [2,]4   12
  
  [[3]]
   [,1] [,2]
  [1,]5   13
  [2,]6   14
  
  [[4]]
   [,1] [,2]
  [1,]7   15
  [2,]8   16
  
  [[5]]
   [,1] [,2]
  [1,]   17   25
  [2,]   18   26

  ...

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nice report generator?

2012-02-10 Thread Aleksandar Blagotić
Hi guys,

AFAICT, this conversation now discusses some domain-specific issues.
However, in reply to OP and OQ, I'll shamelessly advertise the package that
Gergely Daróczi and I have released recently. It's named "rapport", and you
can grab it from CRAN or GitHub (https://github.com/aL3xa/rapport/). Check
it out, as its purpose is to deliver reports in "bosses-friendly" format.
It relies on pandoc, so you can easily convert it to external formats like
ODT, DOCX, PDF, LaTeX, etc. Check out the project homepage for more
details: http://rapport-package.info/

Cheers,
aL3xa



On Tue, Feb 7, 2012 at 16:45, Martin Studer wrote:

> Hi Michael, Hi all,
>
> an alternative for reading/writing tables from/to Excel is the package
> XLConnect. It is platform-independent and therefore runs under Windows,
> UNIX/Linux & Mac. It supports reading/writing worksheets as well as named
> ranges. In addition, for reporting purposes, there is support for cell
> styles. You can find an example here:
>
> http://miraisolutions.wordpress.com/2011/08/31/xlconnect-a-platform-independent-interface-to-excel/
> XLConnect – A platform-independent interface to Excel . With respect to
> cell
> styles: you either completely code up the styling according to your needs
> using /setCellStyle/ (which can be cumbersome if you need to apply a lot of
> styling) or you can use so-called "style actions" controlled via
> /setStyleAction/. See the XLConnect reference manual for more information.
>
> Best regards,
> Martin
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/nice-report-generator-tp4169939p4365144.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Split matrix into square submatices

2012-02-10 Thread Gabor Grothendieck
On Fri, Feb 10, 2012 at 3:02 AM, xiddw  wrote:
> Hi everybody,
>
> I'm looking for an optimal way to split  a big matrix  (e.g. ncol = 8,
> nrow=8)  into small square submatrices  (e.g. ncol=2, nrow=2)
>
> For example
>
> If I have
>
>>  h
>     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
> [1,]    1    9   17   25   33   41   49   57
> [2,]    2   10   18   26   34   42   50   58
> [3,]    3   11   19   27   35   43   51   59
> [4,]    4   12   20   28   36   44   52   60
> [5,]    5   13   21   29   37   45   53   61
> [6,]    6   14   22   30   38   46   54   62
> [7,]    7   15   23   31   39   47   55   63
> [8,]    8   16   24   32   40   48   56   64
>
> and I want to split matriz h into 16 small submatrices:
>
>> g[1]
>     [,1] [,2]
> [1,]    1    9
> [2,]    2   10
>
>> g[2]
>     [,1] [,2]
> [1,]   17   25
> [2,]   18   26
>
> ...
>
>> g[4]
>     [,1] [,2]
> [1,]   49   57
> [2,]   50   58
>
> ...
>
>> g[16]
>     [,1] [,2]
> [1,]   55   63
> [2,]   56   64
>
> Always the big matrix would be a square matrix and also would have a nrow
> and ncol in order of  a power of two, so it always could  be splitted in at
> least halves.
>
> Until now, I'm able to split a matrix but keeping the number of original
> matrix columns.
>
> I mean, if I want to split into 16 submatrices, using the following command
> what I get its 4 lists:
> But I haven't found a way to split resultant lists in order to get 4
> matrices for each list.
>
>> g <- split(as.data.frame(h), (1:4))
>> g
> $`1`
>  V1 V2 V3 V4 V5 V6 V7 V8
> 1  1  9 17 25 33 41 49 57
> 5  5 13 21 29 37 45 53 61
>
> $`2`
>  V1 V2 V3 V4 V5 V6 V7 V8
> 2  2 10 18 26 34 42 50 58
> 6  6 14 22 30 38 46 54 62
>
> $`3`
>  V1 V2 V3 V4 V5 V6 V7 V8
> 3  3 11 19 27 35 43 51 59
> 7  7 15 23 31 39 47 55 63
>
> $`4`
>  V1 V2 V3 V4 V5 V6 V7 V8
> 4  4 12 20 28 36 44 52 60
> 8  8 16 24 32 40 48 56 64
>
>

Try this:

h <- matrix(1:64, 8) # input

k <- kronecker(matrix(1:16, 4, byrow = TRUE), matrix(1, 2, 2))
g <- lapply(split(h, k), matrix, nr = 2)

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nice report generator?

2012-02-10 Thread Hadley Wickham
>> To be strictly correct, shouldn't that be:
>>
>> formula<- eval(substitute( value*v*LEFT ~ RIGHT, list(LEFT=LEFT,
>> RIGHT=RIGHT)))
>>
>> ?
>
>
> I think it probably doesn't matter.  The difference is that mine gives a
> pure language object, whereas yours gives a formula object.  The formula
> object has a class which means some methods will work differently, and it
> also has an environment attached, which defines where the variables in it
> should be resolved.  I suspect the variables shouldn't be resolved in the
> environment where "formula" was being created, so it's probably better not
> to attach an environment at all, but the tabular function ignores the
> environment of the formula (it uses its data argument for that), so it
> doesn't make a big difference.

Yes, sorry, I was probably being excessively picky :/  Formula
semantics are tricky though!

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trust in a glm.nb model results with an itereation limit reached

2012-02-10 Thread David Winsemius


On Feb 10, 2012, at 7:14 AM, Lucas wrote:


Hello to everyone.
I'm fitting a glm.nb model to a count data. I'm using about 8  
predictive

variables.
Once a run the script I do get a result but it tells me that the  
iteration

limit has been reached.


Most regression functions have a control parameter that can increase  
the maximum number of iterations, but you provided no code so all is  
guesswork from here.



So, can i trust the results given by the model?


Probably not. Some functions will not even record estimates if they  
have failed the convergence tests.



Could it be a multicollinearity problem?


Could be.



Thank you for taking the time to help me.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Q - scatterplots

2012-02-10 Thread John Fox
Dear J.,

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Jhope
> Sent: February-10-12 3:44 AM
> To: r-help@r-project.org
> Subject: [R] Q - scatterplots
> 
> I was able to make a scatterplot but ...
> 
> 1) what does the "86" mean? The "86" shows up on the graph as well.
> 
> > scatterplot (Shells/TotalEggs ~ Sector, data = data.to.analyze)
> [1] "86"

You don't provide enough information to answer your questions. I assume, for
example, that this is the scatterplot() function in the car package. If so,
the function returns the names (in your case, numbers) of outlying
observations, which are also labelled on the graph, but that shouldn't have
happened unless you set the id.n or id.method arguments appropriately.
Beyond the command, you haven't provided any information about the version
of the car package that you're using, or even whether that's what you're
using.

> 
> 2) Also how do you change the Y axis title? I don't want it to read
> Shells/TotalEggs, instead I would like it to read Average Hatching
> Rate (%).

See ?scatterplot, in particular, the ylab argument, assuming again that
you're using scatterplot() in the car package.

> 
> 3) What does this error mean? Rayos if composed of section 1, 2, 3, 4
> and 5
> 
> > scatterplot (Shells/TotalEggs ~ Rayos, data = data.to.analyze)
> Error in plot.window(...) : need finite 'xlim' values In addition:
> Warning messages:
> 1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by
> coercion
> 2: In min(x) : no non-missing arguments to min; returning Inf
> 3: In max(x) : no non-missing arguments to max; returning -Inf

I suspect that Rayos isn't numeric but without the data it's impossible to
tell.

Please see the posting guide for r-help
 for how to formulate an
answerable question.

Best,
 John


John Fox
Senator William McMaster
  Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox



> 
> J
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Q-
> scatterplots-tp4375632p4375632.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] e1071/svm: never finishes

2012-02-10 Thread Sam Steingold
> * Sam Steingold  [2012-02-10 10:01:54 -0500]:
>
> When I tried to run svm on the same data frame, memory usage as reported
> by top(1) doubled to 4GB almost right away and the function never
> returned (has been running for ~15 hours now). ^C does not stop it.
> This is most unusual, libsvm has always seemed very fast.

looks like it _is_ libsvm:

#0  0x72aedc64 in Solver::select_working_set (this=0x7fff97f0, 
out_i=@0x7fff95a0, out_j=@0x7fff95b0) at svm.cpp:852
#1  0x72aef91d in Solver::Solve (this=0x7fff97f0, l=285724, Q=..., 
p_=, y_=, alpha_=0x6023fb60, Cp=1, 
Cn=1, eps=, si=0x7fff9980, shrinking=1) at svm.cpp:573
#2  0x72af1747 in solve_c_svc (Cn=1, Cp=1, si=0x7fff9980, 
alpha=0x6023fb60, param=, prob=0x7fff9c30) at svm.cpp:1444
#3  svm_train_one (prob=0x7fff9c30, param=, Cp=1, Cn=1) at 
svm.cpp:1641
#4  0x72af4a8e in svm_train (prob=, 
param=0x7fff9d40) at svm.cpp:2179
#5  0x72aea281 in svmtrain (x=0x7fff7e698038, r=0x11c9b1e0, 
c=, y=, rowindex=, 
colindex=, svm_type=0x11c9b2a0, kernel_type=0x11c9b2d0, 
degree=0x11c9b300, gamma=0x356e3a28, coef0=0x356e3a60, cost=0x356e3ad0, 
nu=0x103589a8, weightlabels=0x0, weights=0x0, nweights=0x11c9b330, 
cache=0x103589e0, tolerance=0x10358a18, epsilon=0x10358a50, 
shrinking=0x11c9b360, cross=0x11c9b390, sparse=0x11c9b3c0, 
probability=0x1524dbb0, seed=0x1524dbe0, nclasses=0x1524dc10, nr=0x1524dc40, 
index=0x148a0fa8, labels=0xa3303b8, nSV=0xa330420, rho=0x170083e8, 
coefs=0x391dbb48, sigma=0x10358a88, probA=0xdf94678, probB=0xcbb7eb8, 
cresults=0x0, ctotal1=0x10358ac0, ctotal2=0x10358af8, error=0x10358b30) at 
Rsvm.c:275
#6  0x7792cefc in ?? () from /usr/lib/R/lib/libR.so
#7  0x7795da1d in Rf_eval () from /usr/lib/R/lib/libR.so
#8  0x7795f540 in ?? () from /usr/lib/R/lib/libR.so
#9  0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#10 0x7795f6c9 in ?? () from /usr/lib/R/lib/libR.so
#11 0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#12 0x77960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so
#13 0x779ad784 in Rf_usemethod () from /usr/lib/R/lib/libR.so
#14 0x779ada47 in ?? () from /usr/lib/R/lib/libR.so
#15 0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#16 0x77960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so
#17 0x7795d6e0 in Rf_eval () from /usr/lib/R/lib/libR.so
#18 0x7795f540 in ?? () from /usr/lib/R/lib/libR.so
#19 0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#20 0x7795db9b in ?? () from /usr/lib/R/lib/libR.so
#21 0x7795dad9 in Rf_eval () from /usr/lib/R/lib/libR.so
#22 0x7795f6c9 in ?? () from /usr/lib/R/lib/libR.so
#23 0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#24 0x77960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so
#25 0x7795d6e0 in Rf_eval () from /usr/lib/R/lib/libR.so
#26 0x77998055 in Rf_ReplIteration () from /usr/lib/R/lib/libR.so
#27 0x779982e0 in ?? () from /usr/lib/R/lib/libR.so
#28 0x77998370 in run_Rmainloop () from /usr/lib/R/lib/libR.so
#29 0x0040078b in main ()
#30 0x772d930d in __libc_start_main () from 
/lib/x86_64-linux-gnu/libc.so.6
#31 0x004007bd in _start ()


#0  0x72aeeb67 in Kernel::dot (px=0x48eeb220, py=0x4b21890) at 
svm.cpp:295
#1  0x72af7a25 in Kernel::kernel_rbf (this=, 
i=, j=) at svm.cpp:239
#2  0x72af782c in SVC_Q::get_Q (this=0x7fff9870, i=187701, 
len=208039) at svm.cpp:1271
#3  0x72aef9ab in Solver::Solve (this=0x7fff97f0, l=285724, Q=..., 
p_=, y_=, alpha_=0x6023fb60, Cp=1,
Cn=1, eps=, si=0x7fff9980, shrinking=1) at svm.cpp:591
#4  0x72af1747 in solve_c_svc (Cn=1, Cp=1, si=0x7fff9980, 
alpha=0x6023fb60, param=, prob=0x7fff9c30) at svm.cpp:1444
#5  svm_train_one (prob=0x7fff9c30, param=, Cp=1, Cn=1) at 
svm.cpp:1641
#6  0x72af4a8e in svm_train (prob=, 
param=0x7fff9d40) at svm.cpp:2179
#7  0x72aea281 in svmtrain (x=0x7fff7e698038, r=0x11c9b1e0, 
c=, y=, rowindex=,
colindex=, svm_type=0x11c9b2a0, kernel_type=0x11c9b2d0, 
degree=0x11c9b300, gamma=0x356e3a28, coef0=0x356e3a60, cost=0x356e3ad0,
nu=0x103589a8, weightlabels=0x0, weights=0x0, nweights=0x11c9b330, 
cache=0x103589e0, tolerance=0x10358a18, epsilon=0x10358a50,
shrinking=0x11c9b360, cross=0x11c9b390, sparse=0x11c9b3c0, 
probability=0x1524dbb0, seed=0x1524dbe0, nclasses=0x1524dc10, nr=0x1524dc40,
index=0x148a0fa8, labels=0xa3303b8, nSV=0xa330420, rho=0x170083e8, 
coefs=0x391dbb48, sigma=0x10358a88, probA=0xdf94678, probB=0xcbb7eb8,
cresults=0x0, ctotal1=0x10358ac0, ctotal2=0x10358af8, error=0x10358b30) at 
Rsvm.c:275
#8  0x7792cefc in ?? () from /usr/lib/R/lib/libR.so
#9  0x7795da1d in Rf_eval () from /usr/lib/R/lib/libR.so
#10 0x7795f540 in ?? () from /usr/lib/R/lib/libR.so
#11 

Re: [R] Importing a CSV file

2012-02-10 Thread David L Carlson
First, make sure the file name is correct.

Second, try 

list.files("Users:/sezginozcan/Downloads/")

to see if the file is listed and you are looking in the correct folder.

Third, you may be able to locate and open the file using

A <- read.table(file.choose(), sep="\t", row.names=1, header=T)

I believe that "clipboard" works only on Windows computers.

--
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of sezgin ozcan
Sent: Thursday, February 09, 2012 9:21 PM
To: r-h...@stat.math.ethz.ch
Subject: [R] Importing a CSV file

I have been trying to import a csv file to r. but I get the same message
everytime. the message is

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'Users:/sezginozcan/Downloads/beer.data.csv': No such
file or directory

I use mac.
I tried this command also
a<-read.table("clipboard",sep="\t",row.names=1,header=T)
Error: unexpected input in "a<-read.table("clipboard",sep=,"

I will appreciate if you help me before I get crazy.
thank you

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Importing a CSV file

2012-02-10 Thread Gyanendra Pokharel
-- Forwarded message --
From: Gyanendra Pokharel 
Date: Fri, Feb 10, 2012 at 11:13 AM
Subject: Re: [R] Importing a CSV file
To: sezgin ozcan 


You save your file in your own folder either in desktop or in document
where ever you want and serach the file using the command

mydatafile <- read.csv(file.choose())

then open the file
Cheers


On Thu, Feb 9, 2012 at 10:20 PM, sezgin ozcan  wrote:

> I have been trying to import a csv file to r. but I get the same message
> everytime. the message is
>
> Error in file(file, "rt") : cannot open the connection
> In addition: Warning message:
> In file(file, "rt") :
>  cannot open file 'Users:/sezginozcan/Downloads/beer.data.csv': No such
> file or directory
>
> I use mac.
> I tried this command also
> a<-read.table("clipboard",sep=”\t”,row.names=1,header=T)
> Error: unexpected input in "a<-read.table("clipboard",sep=‚"
>
> I will appreciate if you help me before I get crazy.
> thank you
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to have columns lined up?

2012-02-10 Thread 538280
A little more on topic is that there are tools for R to nicely format output.

The simplest is probably the R2wd package which will transfer objects
and text to word, you can send a matrix or data frame to word as a
nicely formatted table or wdVerbatim will send output to word but make
sure that the font is the correct type to make things line up.

If you are going to be doing a lot of this then you should investigate
the variations on Sweave (Sword, knitr, odfWeave, etc.) which allow
you to create a template file with R code and report text, then
process it to replace the R code with its output.

On Tue, Feb 7, 2012 at 9:07 PM, hithit168  wrote:
>  Hi there,
>
> Everytime when I paste My R output in WORD, the columns couldn't line up
> nicely like they appear in  R console. Is there a way to fix this problem?
> Thanks for any help!
>
>
> time n.risk n.event survival std.err lower 95% CI upper 95% CI
>    6     21       3    0.857  0.0764        0.707        1.000
>    7     17       1    0.807  0.0869        0.636        0.977
>   10     15       1    0.753  0.0963        0.564        0.942
>   13     12       1    0.690  0.1068        0.481        0.900
>   16     11       1    0.627  0.1141        0.404        0.851
>   22      7       1    0.538  0.1282        0.286        0.789
>   23      6       1    0.448  0.1346        0.184        0.712
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/How-to-have-columns-lined-up-tp4367928p4367928.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem subsetting data frame with variable instead of constant

2012-02-10 Thread vaneet
Hello,

I've encountered a very weird issue with the method subset(), or maybe this
is something I don't know about said method that when you're subsetting
based on the columns of a data frame you can only use constants (0.1, 2.3,
2.2) instead of variables?

Here's a look at my data frame called 'ea.cad.pwr':
*>ea.ca.pwr[1:5,]
   MAF   OR  POWER
1 0.02 0.01 0.
2 0.02 0.02 0.9998
3 0.02 0.03 0.9997
4 0.02 0.04 0.9995
5 0.02 0.05 0.9993*

Here's my subset lines which finds no rows:

*power1 = subset(ea.cad.pwr, MAF == maf1 & OR == odds)
power2 = subset(ea.cad.pwr, MAF == maf2 & OR == odds)
*
Now when maf1 = 0.2 and odds = 1.2 it finds nothing.  I know for a fact that
there's a row with these values:
*> ea.cad.pwr[1430:1432,]
 MAF   OR  POWER
1430 0.2 0.58 0.9996
1431 0.2 1.20 0.3092
1432 0.2 1.22 0.3914*

I have code working in a loop and each previous iteration the subset()
function is working fine, but in this iteration some different lines are
executed which are relevant to these variables, here they are:
*
maf1 = maf.adj - 0.01
maf2 = maf.adj + 0.01*

Basically maf.adj is always a 2 decimal number (in this case = 0.21), and
I'm computing the numbers around it by a difference of 0.01 (0.2,0.22) in
case maf.adj isn't in the table.  maf.adj is read from another dataframe,
when I use it to subset it always works fine but when I do this innocent
subtraction for some reason it doesn't work.  If I rewrite statements like
this it works:

*power1 = subset(ea.cad.pwr, MAF == 0.2 & OR == odds)
power2 = subset(ea.cad.pwr, MAF == 0.22 & OR == odds)
*

Even if I write this first:

maf1 = 0.2

Then:

power1 = subset(ea.cad.pwr, MAF == maf1 & OR == odds)

It works as well! That's what's really confusing, how can this subtraction
mess everything up?  Please help if you can..thank you!

Vaneet

--
View this message in context: 
http://r.789695.n4.nabble.com/problem-subsetting-data-frame-with-variable-instead-of-constant-tp4376759p4376759.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem subsetting data frame with variable instead of constant

2012-02-10 Thread Petr Savicky
On Fri, Feb 10, 2012 at 08:15:39AM -0800, vaneet wrote:
> Hello,
> 
> I've encountered a very weird issue with the method subset(), or maybe this
> is something I don't know about said method that when you're subsetting
> based on the columns of a data frame you can only use constants (0.1, 2.3,
> 2.2) instead of variables?
> 
> Here's a look at my data frame called 'ea.cad.pwr':
> *>ea.ca.pwr[1:5,]
>MAF   OR  POWER
> 1 0.02 0.01 0.
> 2 0.02 0.02 0.9998
> 3 0.02 0.03 0.9997
> 4 0.02 0.04 0.9995
> 5 0.02 0.05 0.9993*
> 
> Here's my subset lines which finds no rows:
> 
> *power1 = subset(ea.cad.pwr, MAF == maf1 & OR == odds)
> power2 = subset(ea.cad.pwr, MAF == maf2 & OR == odds)
> *
> Now when maf1 = 0.2 and odds = 1.2 it finds nothing.  I know for a fact that
> there's a row with these values:
> *> ea.cad.pwr[1430:1432,]
>  MAF   OR  POWER
> 1430 0.2 0.58 0.9996
> 1431 0.2 1.20 0.3092
> 1432 0.2 1.22 0.3914*
> 
> I have code working in a loop and each previous iteration the subset()
> function is working fine, but in this iteration some different lines are
> executed which are relevant to these variables, here they are:
> *
> maf1 = maf.adj - 0.01
> maf2 = maf.adj + 0.01*
> 
> Basically maf.adj is always a 2 decimal number (in this case = 0.21), and
> I'm computing the numbers around it by a difference of 0.01 (0.2,0.22) in
> case maf.adj isn't in the table.  maf.adj is read from another dataframe,
> when I use it to subset it always works fine but when I do this innocent
> subtraction for some reason it doesn't work.  If I rewrite statements like
> this it works:
> 
> *power1 = subset(ea.cad.pwr, MAF == 0.2 & OR == odds)
> power2 = subset(ea.cad.pwr, MAF == 0.22 & OR == odds)
> *
> 
> Even if I write this first:
> 
> maf1 = 0.2
> 
> Then:
> 
> power1 = subset(ea.cad.pwr, MAF == maf1 & OR == odds)
> 
> It works as well! That's what's really confusing, how can this subtraction
> mess everything up?  Please help if you can..thank you!

Hi.

This may be a rounding problem. Try

  0.3 - 0.1 == 0.2

  [1] FALSE

Explicit rounding to a not too large number of decimal
digits can help.

  round(0.3 - 0.1, digits=7) == 0.2

  [1] TRUE


See also FAQ 7.31 or http://rwiki.sciviews.org/doku.php?id=misc:r_accuracy

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem subsetting data frame with variable instead of constant

2012-02-10 Thread Sarah Goslee
This is likely a representation issue, as in R FAQ 7.31.

?"==" suggests that using identical and all.equal is a better strategy:

 x1 <- 0.5 - 0.3
 x2 <- 0.3 - 0.1
 x1 == x2   # FALSE on most machines
 identical(all.equal(x1, x2), TRUE) # TRUE everywhere

Sarah

On Fri, Feb 10, 2012 at 11:15 AM, vaneet  wrote:
> Hello,
>
> I've encountered a very weird issue with the method subset(), or maybe this
> is something I don't know about said method that when you're subsetting
> based on the columns of a data frame you can only use constants (0.1, 2.3,
> 2.2) instead of variables?
>
> Here's a look at my data frame called 'ea.cad.pwr':
> *>ea.ca.pwr[1:5,]
>   MAF   OR  POWER
> 1 0.02 0.01 0.
> 2 0.02 0.02 0.9998
> 3 0.02 0.03 0.9997
> 4 0.02 0.04 0.9995
> 5 0.02 0.05 0.9993*
>
> Here's my subset lines which finds no rows:
>
> *power1 = subset(ea.cad.pwr, MAF == maf1 & OR == odds)
> power2 = subset(ea.cad.pwr, MAF == maf2 & OR == odds)
> *
> Now when maf1 = 0.2 and odds = 1.2 it finds nothing.  I know for a fact that
> there's a row with these values:
> *> ea.cad.pwr[1430:1432,]
>     MAF   OR  POWER
> 1430 0.2 0.58 0.9996
> 1431 0.2 1.20 0.3092
> 1432 0.2 1.22 0.3914*
>
> I have code working in a loop and each previous iteration the subset()
> function is working fine, but in this iteration some different lines are
> executed which are relevant to these variables, here they are:
> *
> maf1 = maf.adj - 0.01
> maf2 = maf.adj + 0.01*
>
> Basically maf.adj is always a 2 decimal number (in this case = 0.21), and
> I'm computing the numbers around it by a difference of 0.01 (0.2,0.22) in
> case maf.adj isn't in the table.  maf.adj is read from another dataframe,
> when I use it to subset it always works fine but when I do this innocent
> subtraction for some reason it doesn't work.  If I rewrite statements like
> this it works:
>
> *power1 = subset(ea.cad.pwr, MAF == 0.2 & OR == odds)
> power2 = subset(ea.cad.pwr, MAF == 0.22 & OR == odds)
> *
>
> Even if I write this first:
>
> maf1 = 0.2
>
> Then:
>
> power1 = subset(ea.cad.pwr, MAF == maf1 & OR == odds)
>
> It works as well! That's what's really confusing, how can this subtraction
> mess everything up?  Please help if you can..thank you!
>
> Vaneet
>


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to stop a loop for?

2012-02-10 Thread 538280
If you want to pause for the person to look at a plot before going on
to the next plot then just do:

> par(ask=TRUE)

This will actually allow your loop to continue with calculations while
the user looks at the plot but will pause before drawing the next plot
(hitting enter in the command window or clicking on the plot window
will allow the code to continue).

If you want to pause for something other than a plot then use readline
like Michael suggests.

On Wed, Feb 8, 2012 at 12:45 PM, Juan Andres Hernandez
 wrote:
> Hi all, I have some time trying to find a way to stop a loop for( ) until the
> user presses the enter key or any other one and the loop can continue.
> This could
> be an example:
>
>  library(MASS)
>  data <- data.frame(mvrnorm(1000,rep(0,5),Sigma=diag(1,5)))
>  for(i in 1:dim(data)[2]){
>  plot(density(data[,i]), main=paste('histogram',i))
>  #here something like waituntil command
>  }
>
> Thank's in advance
> Juan A. Hernandez
> Spain.
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem subsetting data frame with variable instead of constant

2012-02-10 Thread vaneet
Thanks guys, both those solutions work.  I really appreciate the help!

--
View this message in context: 
http://r.789695.n4.nabble.com/problem-subsetting-data-frame-with-variable-instead-of-constant-tp4376759p4376826.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How indices calculated in package "boot"

2012-02-10 Thread 538280
In your second example the boot function will still generate its
random indicies before your internal function calls sample, so the
seed will be different when you call sample from what you originally
set it to.  If you really want to know what the boot function does,
look at its code (it does some things to try and be more efficient
which might give different results than a simpler version).

On Wed, Feb 8, 2012 at 1:45 PM, Deng Nina  wrote:
> Hi,there,
>
> I am using R package "boot" to bootstrap. I have one question here: does
> anybody possibly know how the boot package generates the "indices" which is
> used in the statistic function?
>
> I thought "indices = sample(data, replace=TRUE)", but when I replaced
> "indices" with this command and used "boot", I got different results.
>
> Specifically, below are the codes for illustration.
>
> (1) The typical way by generating indices in the package:
> boot1 <- function (data, indices)
> {
>  d <- data[indices]
>  return(d)
> }
> AA <- c(1:10)
> require(boot)
> set.seed(123)
> results1 <- boot(data= AA, statistic=boot1, R=100)
>
> (2) The alternative way by calculating "indices" myself:
> boot2 <- function (data,indices)
> {
>  indices <- sample(data, replace=TRUE)
>  d <- data[indices]
>  return(d)
>  }
> AA <- c(1:10)
> set.seed(123)
> results2 <- boot(data= AA, statistic=boot2, R=100)
>
> When I looked up using results1$t and results2$t, I had totoally different
> bootstrap samples. I found this even had great impacts on the results in my
> study. Does the second approach have any problem? Anyone could provide any
> inputs on this? Thank you very much in advance!
> Regards
> Nina
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] function arrows.circular not working

2012-02-10 Thread Chosid, David (MISC)
I have started using the circular package but it is not recognizing the 
function arrows.circular.  I attempted to use the example provided in the 
circular manual.  Here is the example code using the circular package:

  plot(rvonmises(10, circular(0), kappa=1))
  arrows.circular(rvonmises(10, circular(0), kappa=1))
  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2)
  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3)

My error is:  Error: could not find function "arrows.circular"

Any help would be greatly appreciated.  Thanks.

David Chosid
Conservation Engineering
MA Division of Marine Fisheries
1213 Purchase St., 3rd Floor
New Bedford, MA 02740
508-990-2860
david.cho...@state.ma.us


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function arrows.circular not working

2012-02-10 Thread Sarah Goslee
Your sessionInfo() would be helpful. Also, just to check: you did do
library(circular)
right?

Sarah

On Fri, Feb 10, 2012 at 11:55 AM, Chosid, David (MISC)
 wrote:
> I have started using the circular package but it is not recognizing the 
> function arrows.circular.  I attempted to use the example provided in the 
> circular manual.  Here is the example code using the circular package:
>
>  plot(rvonmises(10, circular(0), kappa=1))
>  arrows.circular(rvonmises(10, circular(0), kappa=1))
>  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2)
>  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
>    x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3)
>
> My error is:  Error: could not find function "arrows.circular"
>
> Any help would be greatly appreciated.  Thanks.
>
-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function arrows.circular not working

2012-02-10 Thread Jorge I Velez
Hi David,

You need to load the package before running your code:

# install.packages('circular')
require(circular)
 plot(rvonmises(10, circular(0), kappa=1))
 arrows.circular(rvonmises(10, circular(0), kappa=1))
 arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2)
 arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
   x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3)

HTH,
Jorge.-


On Fri, Feb 10, 2012 at 11:55 AM, Chosid, David (MISC) <> wrote:

> I have started using the circular package but it is not recognizing the
> function arrows.circular.  I attempted to use the example provided in the
> circular manual.  Here is the example code using the circular package:
>
>  plot(rvonmises(10, circular(0), kappa=1))
>  arrows.circular(rvonmises(10, circular(0), kappa=1))
>  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2)
>  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
>x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3)
>
> My error is:  Error: could not find function "arrows.circular"
>
> Any help would be greatly appreciated.  Thanks.
>
> David Chosid
> Conservation Engineering
> MA Division of Marine Fisheries
> 1213 Purchase St., 3rd Floor
> New Bedford, MA 02740
> 508-990-2860
> david.cho...@state.ma.us
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help needed please

2012-02-10 Thread ilai
Your script is rather inefficient with spurious cbind calls. Any
particular reason not to use
?ar directly ?

Call:
ar.yw.default(x = simtimeseries, order.max = 4)

Coefficients:
  1234
 1.9440  -1.9529   0.8450  -0.2154

Order selected 4  sigma^2 estimated as  15.29

To repeat the sim, you could use a for() loop but ?sapply is better:

out<- sapply(1:100,function(...){
  simtimeseries <- arima.sim(n=1024,list(order=c(4,0,0),
ar=c(2.7607, -3.8106, 2.6535,
-0.9258),sd=sqrt(1)))
  aryule <- ar.yw(simtimeseries,order.max=4)
  c( c(aryule$ar,NA)[1:4] , aryule$var.pred )
  }
)
rowMeans(out[1:4,])   # mean phi(1),...,4 see ?rowMeans for dealing with NA's
mean(out[5,])# mean sig^2

Cheers




On Fri, Feb 10, 2012 at 6:42 AM, Jaymin Shah  wrote:
> I have coded a time series from simulated data:
>
> simtimeseries <- arima.sim(n=1024,list(order=c(4,0,0),ar=c(2.7607, -3.8106, 
> 2.6535,  -0.9258),sd=sqrt(1)))
> #show roots are outside unit circle
> plot.ts(simtimeseries, xlab="", ylab="", main="Time Series of Simulated Data")
>
> # Yule 
> 
>
> q1 <- cbind(simtimeseries[1:1024])
> q2 <- t(q1)%*%q1
> s0 <- q2/1204
> r1 <- cbind(simtimeseries[1:1023])
> r2 <- cbind(simtimeseries[2:1024])
> r3 <- t(r1)%*%r2
> s1 <- r3/1204
> t1 <- cbind(simtimeseries[1:1022])
> t2 <- cbind(simtimeseries[3:1024])
> t3 <- t(t1)%*%t2
> s2 <- t3/1204
> u1 <- cbind(simtimeseries[1:1021])
> u2 <- cbind(simtimeseries[4:1024])
> u3 <- t(u1)%*%u2
> s3 <- u3/1204
> v1 <- cbind(simtimeseries[1:1020])
> v2 <- cbind(simtimeseries[5:1024])
> v3 <- t(v1)%*%v2
> s4 <- v3/1204
>
> i0 <- c(s0,s1,s2,s3)
> i1 <- c(s1,s0,s1,s2)
> i2 <- c(s2,s1,s0,s1)
> i3 <- c(s3,s2,s1,s0)
>
> gamma <- cbind(i0,i1,i2,i3)
> eta <-c(s1,s2,s3,s4)
> inversegamma <- solve(gamma)
> phihat <- inversegamma%*%eta
> phihat
>
> Phihat <- cbind(phihat)
> s <- c(s1,s2,s3,s4)
> S <- cbind(s)
> sigmasquaredyule <- s0 - (t(Phihat)%*%S)
> sigmasquaredyule
>
>
>
> I did a yule walker estimate on the simulated data and wanted to work out phi 
> hat which is a vector of 4 values and sigmasquaredyule which is one value. 
> However,  I want to run the simulated data 100 times i.e. in a for loop and 
> then take the averages of the phi hat values and sigmasquaredyule value.
>
> How would i repeat this simulated time series lots of times (e.g. a 100 
> times) and store the average value of phi hat and sigmasquaredyule.
>
> Thank you
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function arrows.circular not working

2012-02-10 Thread Chosid, David (MISC)
Yes, sorry to not be clear.  library(circular) was run.


From: Jorge I Velez [mailto:jorgeivanve...@gmail.com]
Sent: Friday, February 10, 2012 12:03 PM
To: Chosid, David (FWE)
Cc: r-help@r-project.org
Subject: Re: [R] function arrows.circular not working

Hi David,

You need to load the package before running your code:

# install.packages('circular')
require(circular)
 plot(rvonmises(10, circular(0), kappa=1))
 arrows.circular(rvonmises(10, circular(0), kappa=1))
 arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2)
 arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
   x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3)

HTH,
Jorge.-


On Fri, Feb 10, 2012 at 11:55 AM, Chosid, David (MISC) <> wrote:
I have started using the circular package but it is not recognizing the 
function arrows.circular.  I attempted to use the example provided in the 
circular manual.  Here is the example code using the circular package:

 plot(rvonmises(10, circular(0), kappa=1))
 arrows.circular(rvonmises(10, circular(0), kappa=1))
 arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2)
 arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
   x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3)

My error is:  Error: could not find function "arrows.circular"

Any help would be greatly appreciated.  Thanks.

David Chosid
Conservation Engineering
MA Division of Marine Fisheries
1213 Purchase St., 3rd Floor
New Bedford, MA 02740
508-990-2860
david.cho...@state.ma.us>


   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Outlier removal techniques

2012-02-10 Thread S Ellison
 

> -Original Message-
> I wonder why it is still standard practice in some circles to 
> search for "outliers" as opposed to using robust/resistent methods.  

At the risk of extending an old debate and driving us off list topic, here are 
three possible reasons:
i) Identifying outliers is important when you want to find possible mistakes in 
measurement or data entry - so irrespective of whether you use robust methods, 
you probably want to ask questions like 'why has that result been entered as 
almost exactly 1000 times the value I expected?' [typically a unit error, btw). 
And although graphical outlier checking is the obvious way to do that, eyeballs 
see oddity in chance; an outlier test can help you distinguish oddity from 
chance and save some (arguably) unnecessary follow-up. 

ii) because supervised outlier rejection at around the 99% level performs - for 
simple problems - about as well as Huber's with c set to 1.5 and is a lot 
easier to explain to, er, people who don't understand iterative numerical 
methods.

iii) Because it's written into some international Standards for statistical 
processing of data (ie, it's standard practice because it's Standard practice).

iv) because you can't do robust analysis in Excel* 

Not that all these are necesarily _good_ reasons ... ;-)

However, I do NOT understand why schools in the UK teach physics students that 
outliers should automatically and always be thrown out; that's a much larger 
leap.

*You can actually; with R or several add-ins. But that is off topic.
***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Split dataframe into new dataframes

2012-02-10 Thread 538280
I think one of your problems (the others have been addressed by
others) is that you want the expression x$y to represent a column of x
whose name is stored in y (not the name y itself).  The problem here
is that the $ notation is a magical shortcut and like any other magic
if used incorrectly is likely to do the programmatic equivalent of
turning yourself into a toad.  It is better to use '[[' instead of the
shortcut, try replacing x$y with x[[y]] and see if that fixes that
problem.

On Wed, Feb 8, 2012 at 2:11 PM, Johannes Radinger  wrote:
> Hi,
>
> I want to split a dataframe based on a grouping variable (in one column). The 
> resulting new
> dataframes should be stored in a new variable. I tried to split the dataframe 
> using split() and
> to store it using a FOR loop, but thats not working so far:
>
> df <- data.frame(A=c("A1","A1","A2","A2"),B=seq(1:4))
>
> Fsplit <- function(x,y){
>        ls <- split(x,f=x$y)
>        for (i in names(ls)){
>                i <- ls$i
>        }
> }
>
> Fsplit(df,A) #1st argument is dataframe to split, 2nd argument grouping 
> variable
>
>
> Any suggestions how to get that done?
>
> Best regards
> Johannes
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function arrows.circular not working

2012-02-10 Thread Sarah Goslee
Hi,

On Fri, Feb 10, 2012 at 12:15 PM, Chosid, David (MISC)
 wrote:
> Yes, sorry for not being more clear.  Here is the sessionInfo():
>
>> sessionInfo()
> R version 2.9.1 (2009-06-26)
> i386-pc-mingw32

That is definitely the place to start. Your version of R is 2.5 years old,
and circular is up to version 0.4-3 on CRAN. I just checked, and circular
0.3-8 doesn't *have* arrows.circular. So you must be using some
documentation from the internet for a newer version, rather than using
the docs that go with what you have installed on your computer.

You need to update R and your packages.

Sarah


> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
> States.1252;LC_MONETARY=English_United 
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] grid      tcltk     stats     graphics  grDevices utils     datasets
> [8] methods   base
>
> other attached packages:
> [1] relimp_1.0-1    Rcmdr_1.4-10    car_1.2-14      circular_0.3-8
> [5] boot_1.2-37     lattice_0.17-25 RODBC_1.3-0
>
>
>
>
> -Original Message-
> From: Sarah Goslee [mailto:sarah.gos...@gmail.com]
> Sent: Friday, February 10, 2012 12:02 PM
> To: Chosid, David (FWE)
> Cc: r-help@r-project.org
> Subject: Re: [R] function arrows.circular not working
>
> Your sessionInfo() would be helpful. Also, just to check: you did do
> library(circular)
> right?
>
> Sarah
>
> On Fri, Feb 10, 2012 at 11:55 AM, Chosid, David (MISC) 
>  wrote:
>> I have started using the circular package but it is not recognizing the 
>> function arrows.circular.  I attempted to use the example provided in the 
>> circular manual.  Here is the example code using the circular package:
>>
>>  plot(rvonmises(10, circular(0), kappa=1))
>>  arrows.circular(rvonmises(10, circular(0), kappa=1))
>>  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
>> col=2)
>>  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
>>    x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3)
>>
>> My error is:  Error: could not find function "arrows.circular"
>>
>> Any help would be greatly appreciated.  Thanks.
>>

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] debug in a loop

2012-02-10 Thread ikuzar
Hi, 

I'd like to debug in a loop (using debug() and browser() etc but not print()
). I'am looking for the first occurence of NA.
For instance:

tab = c(1:300)
tab[250] = NA
len = length(tab)
for (i in 1:len){
   if(i != len){
 tab[i] = tab[i]+tab[i+1]
   }
}

I do not want to do "Browse[2]> n" for each step ... I'd like to declare a
"browser()" in the loop with a condition. But how to write "stop running
when you encounter NA" ?

Thanks for your help

--
View this message in context: 
http://r.789695.n4.nabble.com/debug-in-a-loop-tp4376563p4376563.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Formatting Y axis.

2012-02-10 Thread sock.o
I've looked around and I just can't find anything that will work for my
needs. This is a bit of a 2 part question but pertaining to the same topic
so bare with me. 

The first is with my qq plot. On the Y axis of my qq plot it'll have my
sample quantities but because my data is log-normal it'll show numbers
between 0 - 5 (depending on the data). I'd like to know how to get it,
instead of displaying 0-5 to display actual values so that way when reading
it I know where my potential problems are. Would just save a lot of time and
be an easier more understandable read. Here's sample code to plug in and
see. 

x <- rlnorm(400, 2.3, .8)
y <- rlnorm(400, 3, 1.6)

plot.new()
q1 <- qqnorm(log1p(y))
q2 <- qqnorm(log1p(x))
points( q1, col = "red")
points( q2, col = "blue")
abline( qqline(log1p(y), col = "red") )
abline( qqline(log1p(x), col = "blue") )
legend( "bottomright", inset = 0.02, title = "Loc_ID", 
c("Sayincom",
"Sayout"), fill = c( "red", "blue"), horiz = TRUE)

So what I want is for the y axis. or even secondary y axis if it's lined up
properly to read the actual values rather than the logged values. 

The second part of this is in regards to my boxplots. In order to fit all of
the data in a readable manner I've been making log = "y". Which is fine, but
when trying to estimate the difference via notches it's difficult to
estimate how far it really is based on the axis being logged. You can use
the same x as above, or y, whichever floats your boat. 

boxplot ( x,  notch = TRUE, log = "y", boxwex = .50)

Versus 

boxplot ( x,  notch = TRUE, boxwex = .50)

What I need from this is for the ability to zoom in on a particular
location. Rather than be forced to look at things in a log format on my y
axis to just be able to say, ok I have my mean at 25 and my standard
deviation is 10 so I want to concentrate this boxplot between 5 and 45 to
view 2 standard deviations. (That whole mean thing and standard deviations
are arbitrary numbers but the concept remains)

Any help would be much appreciated. If there are particular packages
required for any suggestions please include their name in your response. 

If you need actual imaging of what the actual graphs I'm working with look
like as opposed to the randomly generated values that I'm supplying then
just say so and I'll post the actual graphs I'm trying to work with. 

Sincerely, 

Sock


--
View this message in context: 
http://r.789695.n4.nabble.com/Formatting-Y-axis-tp4376843p4376843.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Find exit status inside on.exit

2012-02-10 Thread McGehee, Robert
Hello,
Does anyone know if it is possible to find out whether a calling function 
exited with an error or not inside of an on.exit() call? I'd like to use the 
same error cleanup function inside of several functions, and for aesthetic 
reasons would prefer not to surround the body of all of these function with a 
tryCatch.

Something like this:

isError <- function() {
parentError <- ???
if (!parentError) cat("No errors here!\n")
}

X <- function() {
on.exit(isError())
1+1
}

> X()
[1] 2
No errors here!

Thanks,
Robert

Robert McGehee, CFA
Geode Capital Management, LLC
One Post Office Square, 28th Floor | Boston, MA | 02109
Direct: (617)392-8396

This e-mail, and any attachments hereto, are intended fo...{{dropped:10}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Split matrix into square submatices

2012-02-10 Thread xiddw
Thanks to Peter and Gabriel for their comments,

The first way which Peter commented was my first approach, however as I use
it for a 'large' matrix (in my case nrow = ncol = 512) and I want to split
it into very small matrices (e.g. nrow = ncol = {4, 8, 16}) both nested
loops I think are a quite expensive and that's why I was looking for a
better way.

The Gabor's approach works perfect for my needs, thanks :)

--
View this message in context: 
http://r.789695.n4.nabble.com/Split-matrix-into-square-submatices-tp4375563p4376952.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] debug in a loop

2012-02-10 Thread Justin Haynes
You can add

if(is.na(tab[i])) browser()

or

if(is.na(tab[i])) break

see inline

On Fri, Feb 10, 2012 at 7:22 AM, ikuzar  wrote:

> Hi,
>
> I'd like to debug in a loop (using debug() and browser() etc but not
> print()
> ). I'am looking for the first occurence of NA.
> For instance:
>
> tab = c(1:300)
> tab[250] = NA
> len = length(tab)
> for (i in 1:len){
>   if(i != len){
>
   if(is.na(tab[i])) browser()

> tab[i] = tab[i]+tab[i+1]
>   }
> }
>
> I do not want to do "Browse[2]> n" for each step ... I'd like to declare a
> "browser()" in the loop with a condition. But how to write "stop running
> when you encounter NA" ?
>
> Thanks for your help
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/debug-in-a-loop-tp4376563p4376563.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function arrows.circular not working

2012-02-10 Thread Sarah Goslee
It's a good idea to acknowledge that you've found a
solution on the R-help list, rather than just to me.

That way the answer appears in the list archives, and
other people will know you no longer need help with
this problem.

Sarah

On Fri, Feb 10, 2012 at 12:57 PM, Chosid, David (MISC)
 wrote:
> Ah.  Thanks.  I didn't realize that I needed an updated R... Just the 
> library.  Works now.
>
> -Original Message-
> From: Sarah Goslee [mailto:sarah.gos...@gmail.com]
> Sent: Friday, February 10, 2012 12:32 PM
> To: Chosid, David (FWE); r-help
> Subject: Re: [R] function arrows.circular not working
>
> Hi,
>
> On Fri, Feb 10, 2012 at 12:15 PM, Chosid, David (MISC) 
>  wrote:
>> Yes, sorry for not being more clear.  Here is the sessionInfo():
>>
>>> sessionInfo()
>> R version 2.9.1 (2009-06-26)
>> i386-pc-mingw32
>
> That is definitely the place to start. Your version of R is 2.5 years old, 
> and circular is up to version 0.4-3 on CRAN. I just checked, and circular
> 0.3-8 doesn't *have* arrows.circular. So you must be using some documentation 
> from the internet for a newer version, rather than using the docs that go 
> with what you have installed on your computer.
>
> You need to update R and your packages.
>
> Sarah
>
>
>> locale:
>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>> States.1252;LC_MONETARY=English_United
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] grid      tcltk     stats     graphics  grDevices utils
>> datasets [8] methods   base
>>
>> other attached packages:
>> [1] relimp_1.0-1    Rcmdr_1.4-10    car_1.2-14      circular_0.3-8 [5]
>> boot_1.2-37     lattice_0.17-25 RODBC_1.3-0
>>
>>
>>
>>
>> -Original Message-
>> From: Sarah Goslee [mailto:sarah.gos...@gmail.com]
>> Sent: Friday, February 10, 2012 12:02 PM
>> To: Chosid, David (FWE)
>> Cc: r-help@r-project.org
>> Subject: Re: [R] function arrows.circular not working
>>
>> Your sessionInfo() would be helpful. Also, just to check: you did do
>> library(circular)
>> right?
>>
>> Sarah
>>
>> On Fri, Feb 10, 2012 at 11:55 AM, Chosid, David (MISC) 
>>  wrote:
>>> I have started using the circular package but it is not recognizing the 
>>> function arrows.circular.  I attempted to use the example provided in the 
>>> circular manual.  Here is the example code using the circular package:
>>>
>>>  plot(rvonmises(10, circular(0), kappa=1))
>>>  arrows.circular(rvonmises(10, circular(0), kappa=1))
>>>  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
>>> col=2)
>>>  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
>>>    x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3)
>>>
>>> My error is:  Error: could not find function "arrows.circular"
>>>
>>> Any help would be greatly appreciated.  Thanks.
>>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] a) t-tests on loess splines; b) linear models, type II SS for unbalanced ANOVA

2012-02-10 Thread Peter Davenport
Dear all,

I have some questions regarding the validity an implementation of
statistical tests based on linear models and loess. I've searched the
R-help arhives and found several informative threads that related to my
questions, but there are still a few issues I'm not clear about. I'd be
grateful for guidance.

Background and data set:
I wish to compare the growth and metabolism of 2 strains of bacteria. I
have grown the 2 strains in 3 replicates on 2 different occasions and on
each occasion taken hourly measurements of the population density of
bacteria (represented by "od" (optical density)) and the concentrations of
several chemicals, such as glycerol, in their growth medium (environment).

The data looks like this:

> head(df.yx.t)
   uid  cid date  g tod   glycerolalanine
66 Ma,1   25 Ma 1 0.1428571 3025750049 7843841013
77 Ma,1   25 Ma 2 0.4542857 3026668036 7902016686
88 Ma,1   25 Ma 3 1.4542857 3086406597 8017589237
99 Ma,1   25 Ma 4 2.587 2821935918 6595302338
10  10 Ma,1   25 Ma 5 3.093 2674142017 6144141491
11  11 Ma,1   25 Ma 6 3.600 3026824144 7009788499

> summary(df.yx.t)
  uid   cid dategt
odglycerolalanine
 6  :  1   Ma,1   :10   25:60   Ma   :60   1  :12   Min.
:0.1171   Min.   :7.305e+08   Min.   :9.858e+07
 7  :  1   Ma,2   :10   26:60   RILI2:60   2  :12   1st
Qu.:1.3921   1st Qu.:1.757e+09   1st Qu.:3.199e+09
 8  :  1   Ma,3   :10  3  :12   Median
:3.8042   Median :2.541e+09   Median :5.664e+09
 9  :  1   Ma,4   :10  4  :12   Mean
:3.7822   Mean   :2.354e+09   Mean   :5.264e+09
 10 :  1   Ma,5   :10  5  :12   3rd
Qu.:5.9495   3rd Qu.:3.026e+09   3rd Qu.:7.699e+09
 11 :  1   Ma,6   :10  6  :12   Max.
:8.7375   Max.   :3.296e+09   Max.   :8.501e+09
 (Other):114   (Other):60
(Other):48

uid = unique id for an observation
cid = culture id (a within subjects factor identifying each bacterial
culture)
date = date of culture; data was collected in 2 batches, with a
balanced split of observations (3 replicates of each genotype in each batch)
g = genotype (the name of the bactrial strain)
t = time (1-10 hours)
od = optical density (a proxy for poulation density of the bacterial)
glycerol = glycerol "concentration"
alanine = alanine "concentration"

I have tested whether Genotype affects the growth (od) of the bacteria
using lm() and anova().

#define the model
m1<-lm(log10(od)~date*g*t,data=df.yx.t)
#perform anova
> anova(m1)
Analysis of Variance Table

Response: log10(od)
  Df Sum Sq Mean Sq   F valuePr(>F)
date   1  0.006  0.00565.3763   0.02297 *
g  1  0.107  0.1066  103.2951 4.646e-16 ***
t  9 33.646  3.7385 3620.9069 < 2.2e-16 ***
date:g 1  0.003  0.00302.8731   0.09396 .
date:t 9  0.015  0.00171.6542   0.11417
g:t9  0.021  0.00232.2329   0.02796 *
date:g:t   9  0.005  0.00050.4950   0.87383
Residuals 80  0.083  0.0010
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

So far so good. (Although I do wonder whether I should be using a within
subjects predictor variable (e.g. via car::Anova).) However, I also want to
test whether genotype affects the concentrations of metabolites, e.g.
glycerol, in the growth medium at a given population density. I envisage
testing this by one of 2 methods, neither of which I am confident is
correct, despite digging aound in various fora and statistics texts.

Method 1. Loess splines
I could fit a loess curve to glycerol~od, (the date effect disappears
when plotting against od rather than t).

m1<-loess(glycerol~od, data=df.yx.t[df.yx.t$g=="Ma",])
m2<-loess(glycerol~od, data=df.yx.t[df.yx.t$g=="RILI2",])

However, I'm not confident that I can perform valid statistic tests
based on this.

I could try to test the whether there is a significant difference in
glycerol concentration at any given od as follows:

#test for a difference at od=7
v.pred<-7
pred1<-predict(m1,newdata=data.frame(od=v.pred),se=T)
pred2<-predict(m2,newdata=data.frame(od=v.pred),se=T)


 
dfit<-data.frame(g=c("Ma","RILI2"),fit=c(pred1$fit,pred2$fit),se=c(pred1$fit,pred2$fit))
alpha<-0.05
dfit$lo<-with(dfit,fit-se*abs(qnorm(alpha/2)))
dfit$hi<-with(dfit,fit+se*abs(qnorm(alpha/2)))

Is this legitimate?

I might also try to test whether the curves are significantly different
using anova(m1,m2), though again I'm not sure of the validity of this
approach since the 2 models are based on different (and different numbers
of) observations.



Method 2. Linear 

[R] Schwefel Function Optimization

2012-02-10 Thread Vartanian, Ara
All,

I am looking for an optimization library that does well on something as chaotic 
as the Schwefel function:

schwefel <- function(x) sum(-x * sin(sqrt(abs(x

With these guys, not much luck:

> optim(c(1,1), schwefel)$value
[1] -7.890603
> optim(c(1,1), schwefel, method="SANN", control=list(maxit=1))$value
[1] -28.02825
> optim(c(1,1), schwefel, lower=c(-500,-500), upper=c(500,500), 
> method="L-BFGS-B")$value
[1] -7.890603
> optim(c(1,1), schwefel, method="BFGS")$value
[1] -7.890603
> optim(c(1,1), schwefel, method="CG")$value
[1] -7.890603

All trapped in local minima. I get the right answer when I pick a starting 
point that's close:

> optim(c(400,400), schwefel, lower=c(-500,-500), upper=c(500,500), 
> method="L-BFGS-B")$value
[1] -837.9658

Of course I can always roll my own:

r <- vector()
for(i in 1:1000) {
  x <- runif(2, -500,500)
  m <- optim(x, schwefel, lower=c(-500,-500), upper=c(500,500), 
method="L-BFGS-B")
  r <- rbind(r, c(m$par, m$value))
}

And this does fine. I'm just wondering if this is the right approach, or if 
there is some other package that wraps this kind of multi-start up so that the 
user doesn't have to think about it.

Best,

Ara


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] range and anisotropy with RandomFields

2012-02-10 Thread jusin
Hello, 

I am presently trying to get a feel for the various packages out there that
allow me to both analyze and simulate random fields. The package
RandomFields is nice, but there are still a few aspects of its
implementation that are confusing to me and I was hoping someone could help
clarify things for me. It could also be that my questions reflect a lack of
knowledge pertaining to random fields in general and not the RandomFields
package specifically, in which case someone can set me straight. 

I like the versatility of the extended definition form that can be used in
RandomFields. For example, specifying an isotropic exponential model with a
nugget of 1 as: 

modelA=list("+", 
  list("$", var=2, scale=2, list("exp")),
  list("$", var=1, list("nugget"))
)

Ultimately I would like to work with anisotropic models (and data), and this
is where I get confused. I can specify a model with (geographic) anisotropy
by defining an anisotropy matrix, m and incorporating that into the
definition: 

m = matrix( c(1,0,0,0.1), nrow=2)
modelA=list("+", 
  list("$", var=2, anis=m, list("exp")),
  list("$", var=1, list("nugget"))
)

First, nowhere in the RandomFields documentation does it say how  the
anisotropy matrix is actually defined, so I am assuming it is the usual  

A = [  1  0 ]  [  cos(phi)  -sin(phi)]
   0  R   sin(phi)cos(phi)

Where R is the ratio of the major range/minor range, and phi is the angle of
rotation.  Can anyone confirm this? 

Also, there now  seems to be no way to explicitly specify the range of the
major axis (it states this in the documentation and an error will be
returned if I do try to set e.g. scale=2 in the definition) .  Thus when I
use e.g. GaussRF() to simulate a field using the above anisotropic model
definition I have no idea how the range (scale) is being determined.  Does
anyone now how RandomFields works in this context? 

Thanks. 


--
View this message in context: 
http://r.789695.n4.nabble.com/range-and-anisotropy-with-RandomFields-tp4377104p4377104.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Discrete Event Simulation problem

2012-02-10 Thread jism7690
Hi All,

I am attempting to simulation an inventory model on R and I am having some
problems. I believe the problem is when I define my demand rate is stays
constant throughout so when I do need to reorder the model does not
recognise it as I have the initial supply arrival time set to infinity at
the start as no order has been played but I have an if statement saying, if
the level falls below a certain point then an order should be placed, I have
the expected time to order arrival set to be before the the sale but this is
not working.

Any Help would be appreciated. Thanks

maxStock = 100
minStock = 20
t.max=1100.0

LAST = t.max
START = 0

GetDemand<-function() START + runif(1,min=0,max=2)

main <- function(t.max,maxStock,minStock)
{
index = 0
  t.current = START    Starting Conditions
t.demand = START
t.supply = START
inventory = 50
order_costs = 0
storage_costs = 0
sales = 0
sum = list(inventory = 50,order_costs = 0,storage_costs = 0,sales =0)

while(index < LAST){
index = index + 1
t.demand = GetDemand()  ### expected time to next sale
t.supply = Inf  ### expected time to arrival of order, Infinity 
as order
has not been placed
t.next =min(t.demand,t.supply)###next event either sale or supply is
the one with imminent arrival
k = maxStock - inventory

  if(inventory > minStock)
  {
t.supply = Inf  ### expected time to arrival of order, Infinity 
as
order has not been placed
  }
  if(inventory > 0)
  storage_costs = (t.next-t.current)*0.10*inventory
  t.current = t.next
  
 if (inventory < minStock)
   {###Need to Order
 k = maxStock - inventory
 order_costs = 50 + 0.02*k
 sum$order_costs = sum$order_costs + order_costs
 t.supply =  t.current + 1.0
}

if(t.current ==t.demand)
  {
  sum$inventory = sum$inventory - 1.0   
   
Sale made
sum$sales = sum$sales + 1.0
t.demand = runif(1,min=0,max=2)
}

if(t.current == t.supply)
  { 
Order Arrives
sum$inventory = sum$inventory + k
k = 0
t.supply = Inf
}

  if(inventory < maxStock)
  {
  k = maxStock - inventory
  sum$storage_costs = sum$storage_costs + storage_costs
  sum$order_costs = sum$order_costs + order_costs
  sum$inventory = sum$invneotry + k - sum$sales
  sum$sales = sum$sales + sales
  sum$sales = sum$sales + sales
  }

}
sis = list(Time = index,StorageCosts =sum$storage_costs,OrderCosts=
sum$order_costs, FinalInventoryLevel=sum$inventory,sales=sum$sales ,
AverageCosts =((sum$order_costs + sum$storage_costs)/index))
return(sis)
}
main()

--
View this message in context: 
http://r.789695.n4.nabble.com/Discrete-Event-Simulation-problem-tp4377276p4377276.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Importing a CSV file

2012-02-10 Thread MacQueen, Don
I think you must have given the path to the file wrong.

Assuming you are using R.app, try

  infile <- file.choose()

And then give infile to the read.csv() function.

Then print the value of infile to find out what the path really is.

Probably, you wanted
  '/Users/sezginozcan/Downloads/beer.data.csv'

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 2/9/12 7:20 PM, "sezgin ozcan"  wrote:

>I have been trying to import a csv file to r. but I get the same message
>everytime. the message is
>
>Error in file(file, "rt") : cannot open the connection
>In addition: Warning message:
>In file(file, "rt") :
>  cannot open file 'Users:/sezginozcan/Downloads/beer.data.csv': No such
>file or directory
>
>I use mac.
>I tried this command also
>a<-read.table("clipboard",sep=”\t”,row.names=1,header=T)
>Error: unexpected input in "a<-read.table("clipboard",sep=‚"
>
>I will appreciate if you help me before I get crazy.
>thank you
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Custom caret metric based on prob-predictions/rankings

2012-02-10 Thread Yang Zhang
Sorry for not being more clear - I'm interested in accessing these
indices from within the trainControl summaryFunction, not afterward
(from the train object).

As for the weights, I'm referring to the weights argument passed into
train.

On Fri, Feb 10, 2012 at 5:50 AM, Max Kuhn  wrote:
> I think you need to read the man pages and the four vignettes. A lot
> of your questions have answers there.
>
> If you don't specify the resampling indices, they ones generated for
> you are saved in the train object:
>
>> data(iris)
>> TrainData <- iris[,1:4]
>> TrainClasses <- iris[,5]
>>
>> knnFit1 <- train(TrainData, TrainClasses,
> +                  method = "knn",
> +                  preProcess = c("center", "scale"),
> +                  tuneLength = 10,
> +                  trControl = trainControl(method = "cv"))
> Loading required package: class
>
> Attaching package: ‘class’
>
> The following object(s) are masked from ‘package:reshape’:
>
>    condense
>
> Warning message:
> executing %dopar% sequentially: no parallel backend registered
>> str(knnFit1$control$index)
> List of 10
>  $ Fold01: int [1:135] 1 2 3 4 5 6 7 9 10 11 ...
>  $ Fold02: int [1:135] 1 2 3 4 5 6 8 9 10 12 ...
>  $ Fold03: int [1:135] 1 3 4 5 6 7 8 9 10 11 ...
>  $ Fold04: int [1:135] 1 2 3 5 6 7 8 9 10 11 ...
>  $ Fold05: int [1:135] 1 2 3 4 6 7 8 9 11 12 ...
>  $ Fold06: int [1:135] 1 2 3 4 5 6 7 8 9 10 ...
>  $ Fold07: int [1:135] 1 2 3 4 5 7 8 9 10 11 ...
>  $ Fold08: int [1:135] 2 3 4 5 6 7 8 9 10 11 ...
>  $ Fold09: int [1:135] 1 2 3 4 5 6 7 8 9 10 ...
>  $ Fold10: int [1:135] 1 2 4 5 6 7 8 10 11 12 ...
>
> There is also a savePredictions argument that gives you the hold-out results.
>
> I'm not sure which weights you are referring to.
>
> On Fri, Feb 10, 2012 at 4:38 AM, Yang Zhang  wrote:
>> Actually, is there any way to get at additional information beyond the
>> classProbs?  In particular, is there any way to find out the
>> associated weights, or otherwise the row indices into the original
>> model matrix corresponding to the tested instances?
>>
>> On Thu, Feb 9, 2012 at 4:37 PM, Yang Zhang  wrote:
>>> Oops, found trainControl's classProbs right after I sent!
>>>
>>> On Thu, Feb 9, 2012 at 4:30 PM, Yang Zhang  wrote:
 I'm dealing with classification problems, and I'm trying to specify a
 custom scoring metric (recall@p, ROC, etc.) that depends on not just
 the class output but the probability estimates, so that caret::train
 can choose the optimal tuning parameters based on this metric.

 However, when I supply a trainControl summaryFunction, the data given
 to it contains only class predictions, so the only metrics possible
 are things like accuracy, kappa, etc.

 Is there any way to do this that I'm looking?  If not, could I put
 this in as a feature request?  Thanks!

 --
 Yang Zhang
 http://yz.mit.edu/
>>>
>>>
>>>
>>> --
>>> Yang Zhang
>>> http://yz.mit.edu/
>>
>>
>>
>> --
>> Yang Zhang
>> http://yz.mit.edu/
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Max



-- 
Yang Zhang
http://yz.mit.edu/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Custom caret metric based on prob-predictions/rankings

2012-02-10 Thread Yang Zhang
(I couldn't find answers to this question in the documentation)

On Fri, Feb 10, 2012 at 11:59 AM, Yang Zhang  wrote:
> Sorry for not being more clear - I'm interested in accessing these
> indices from within the trainControl summaryFunction, not afterward
> (from the train object).
>
> As for the weights, I'm referring to the weights argument passed into
> train.
>
> On Fri, Feb 10, 2012 at 5:50 AM, Max Kuhn  wrote:
>> I think you need to read the man pages and the four vignettes. A lot
>> of your questions have answers there.
>>
>> If you don't specify the resampling indices, they ones generated for
>> you are saved in the train object:
>>
>>> data(iris)
>>> TrainData <- iris[,1:4]
>>> TrainClasses <- iris[,5]
>>>
>>> knnFit1 <- train(TrainData, TrainClasses,
>> +                  method = "knn",
>> +                  preProcess = c("center", "scale"),
>> +                  tuneLength = 10,
>> +                  trControl = trainControl(method = "cv"))
>> Loading required package: class
>>
>> Attaching package: ‘class’
>>
>> The following object(s) are masked from ‘package:reshape’:
>>
>>    condense
>>
>> Warning message:
>> executing %dopar% sequentially: no parallel backend registered
>>> str(knnFit1$control$index)
>> List of 10
>>  $ Fold01: int [1:135] 1 2 3 4 5 6 7 9 10 11 ...
>>  $ Fold02: int [1:135] 1 2 3 4 5 6 8 9 10 12 ...
>>  $ Fold03: int [1:135] 1 3 4 5 6 7 8 9 10 11 ...
>>  $ Fold04: int [1:135] 1 2 3 5 6 7 8 9 10 11 ...
>>  $ Fold05: int [1:135] 1 2 3 4 6 7 8 9 11 12 ...
>>  $ Fold06: int [1:135] 1 2 3 4 5 6 7 8 9 10 ...
>>  $ Fold07: int [1:135] 1 2 3 4 5 7 8 9 10 11 ...
>>  $ Fold08: int [1:135] 2 3 4 5 6 7 8 9 10 11 ...
>>  $ Fold09: int [1:135] 1 2 3 4 5 6 7 8 9 10 ...
>>  $ Fold10: int [1:135] 1 2 4 5 6 7 8 10 11 12 ...
>>
>> There is also a savePredictions argument that gives you the hold-out results.
>>
>> I'm not sure which weights you are referring to.
>>
>> On Fri, Feb 10, 2012 at 4:38 AM, Yang Zhang  wrote:
>>> Actually, is there any way to get at additional information beyond the
>>> classProbs?  In particular, is there any way to find out the
>>> associated weights, or otherwise the row indices into the original
>>> model matrix corresponding to the tested instances?
>>>
>>> On Thu, Feb 9, 2012 at 4:37 PM, Yang Zhang  wrote:
 Oops, found trainControl's classProbs right after I sent!

 On Thu, Feb 9, 2012 at 4:30 PM, Yang Zhang  wrote:
> I'm dealing with classification problems, and I'm trying to specify a
> custom scoring metric (recall@p, ROC, etc.) that depends on not just
> the class output but the probability estimates, so that caret::train
> can choose the optimal tuning parameters based on this metric.
>
> However, when I supply a trainControl summaryFunction, the data given
> to it contains only class predictions, so the only metrics possible
> are things like accuracy, kappa, etc.
>
> Is there any way to do this that I'm looking?  If not, could I put
> this in as a feature request?  Thanks!
>
> --
> Yang Zhang
> http://yz.mit.edu/



 --
 Yang Zhang
 http://yz.mit.edu/
>>>
>>>
>>>
>>> --
>>> Yang Zhang
>>> http://yz.mit.edu/
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>>
>> Max
>
>
>
> --
> Yang Zhang
> http://yz.mit.edu/



-- 
Yang Zhang
http://yz.mit.edu/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] passing an extra argument to an S3 generic

2012-02-10 Thread Michael Friendly

On 2/9/2012 6:24 PM, ilai wrote:

You do not provide mlm.influence() so your code can't be reproduced.

Or did you mean to put lm.influence() in the formals to your hatvalues.mlm ?

If yes, then 1) you have a typo 2) lm.influence doesn't allow you to
pass on arguments, maybe try influence.lm instead.


No, I've written an S3 method, influence.mlm and a computational method,
mlm.influence, both of which take an m= argument.  I didn't post all the 
code because I thought the question might have an easy answer based on 
what I provided.


I include all the code and a test case in the attached .txt file that
can be sourced.

-Michael



Elai

On Thu, Feb 9, 2012 at 1:42 PM, Michael Friendly  wrote:

I'm trying to write some functions extending influence measures to
multivariate linear models and also
allow subsets of size m>=1 to be considered for deletion diagnostics.  I'd
like these to work roughly parallel
to those functions for the univariate lm where only single case deletion
(m=1) diagnostics are considered.

Corresponding to stats::hatvalues.lm, the S3 method for class "lm" objects,


hatvalues<-function (model, ...)

UseMethod("hatvalues")


hatvalues.lm<-

function (model, infl = lm.influence(model, do.coef = FALSE),...)
{
hat<- infl$hat
names(hat)<- names(infl$wt.res)
hat
}

I have, for class "mlm" objects

hatvalues.mlm<- function(model, m=1, infl=mlm.influence(model, m=m, do.coef
= FALSE), ...)
{
hat<- infl$H
m<- infl$m
names(hat)<- if(m==1) infl$subsets else apply(infl$subsets,1, paste,
collapse=',')
hat
}

where mlm.influence() does the calculations, but also allows the m= argument
to specify subset size.
Yet when I test this I can't seem to pass the m= argument directly, so that
it gets stuffed in to the infl=
call to mlm.influence.

# fit an mlm
library(heplots)
Rohwer2<- subset(Rohwer, subset=group==2)
rownames(Rohwer2)<- 1:nrow(Rohwer2)
Rohwer.mod<- lm(cbind(SAT, PPVT, Raven) ~ n+s+ns+na+ss, data=Rohwer2)


class(Rohwer.mod)

[1] "mlm" "lm"


## this doesn't work, as I would like it to, calling the hatvalues.mlm
method, but passing m=2:

hatvalues(Rohwer.mod, m=2)

Error in UseMethod("hatvalues") :
  no applicable method for 'hatvalues' applied to an object of class
"c('double', 'numeric')"

I don't understand why this doesn't just call hatvalues.mlm, since
Rohwer.mod is of class "mlm".

# These work -- calling hatvalues.mlm explicitly, or passing the infl=
argument with the call to
# mlm.influence

hatvalues.mlm(Rohwer.mod, m=2)
hatvalues(Rohwer.mod, infl=mlm.influence(Rohwer.mod,m=2))


Can someone help me understand what is wrong and how to make the .mlm method
allow m= to be passed
directly to the infl= computation?

thx,
-Michael

--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA
# S3 method for class "mlm"
influence.mlm <- function(model, m=1, do.coef=TRUE, ...)
mlm.influence(model, m=m, do.coef = do.coef, ...)


# main computation a la lm.influence
mlm.influence <-
function (model, m=1, do.coef = TRUE, ...) 
{

# helper functions
  vec <- function(M) {
R <- matrix(M, ncol=1)
if (is.vector(M)) return(R)
nn<-expand.grid(dimnames(M))[,2:1]
rownames(R) <- apply(as.matrix(nn), 1, paste, collapse=":")
R
}

X <- model.matrix(model)
data <- model.frame(model)
Y <- as.matrix(model.response(data))
r <- ncol(Y)
n <- nrow(X)
p <- ncol(X)
labels <- rownames(X)
call <- model$call

B <- coef(model)
E <- residuals(model)
XPXI <- solve(crossprod(X))
EPEI <- solve(crossprod(E))
vB <- vec(t(B))
S <- crossprod(E)/(n-p);
V <- solve(p*S) %x% crossprod(X)

subsets <- t(combn(n, m))
nsub <- nrow(subsets) 


Beta <- as.list(rep(0, nsub))
R <- L <- H <- Q <- as.list(rep(0, nsub))
CookD <- as.vector(rep(0, nsub))

# FIXME: naive, direct computations can use qr() or update()
for (i in seq(nsub)) {
I <- c(subsets[i,])
rows <- which(!(1:n) %in% I)
XI <- X[rows,]
YI <- Y[rows,]
BI <- solve(crossprod(XI)) %*% t(XI) %*% YI
EI <- (Y - X %*% BI)[I, , drop=FALSE]
 

Re: [R] Formatting Y axis.

2012-02-10 Thread ilai
yaxt='n' in ?par and ?axis are your friends.

# A plot on log scale labeled with original:
plot(x,log(y),yaxt='n')
axis(2,at=pretty(log(y)),labels=round(exp(pretty(log(y)

Works for qqnorm and boxplots, as well as other top level fun.

By the way this is a FAQ.


On Fri, Feb 10, 2012 at 9:43 AM, sock.o  wrote:
> I've looked around and I just can't find anything that will work for my
> needs. This is a bit of a 2 part question but pertaining to the same topic
> so bare with me.
>
> The first is with my qq plot. On the Y axis of my qq plot it'll have my
> sample quantities but because my data is log-normal it'll show numbers
> between 0 - 5 (depending on the data). I'd like to know how to get it,
> instead of displaying 0-5 to display actual values so that way when reading
> it I know where my potential problems are. Would just save a lot of time and
> be an easier more understandable read. Here's sample code to plug in and
> see.
>
> x <- rlnorm(400, 2.3, .8)
> y <- rlnorm(400, 3, 1.6)
>
> plot.new()
>        q1 <- qqnorm(log1p(y))
>        q2 <- qqnorm(log1p(x))
>                points( q1, col = "red")
>                points( q2, col = "blue")
>                abline( qqline(log1p(y), col = "red") )
>                abline( qqline(log1p(x), col = "blue") )
>                legend( "bottomright", inset = 0.02, title = "Loc_ID", 
> c("Sayincom",
> "Sayout"), fill = c( "red", "blue"), horiz = TRUE)
>
> So what I want is for the y axis. or even secondary y axis if it's lined up
> properly to read the actual values rather than the logged values.
>
> The second part of this is in regards to my boxplots. In order to fit all of
> the data in a readable manner I've been making log = "y". Which is fine, but
> when trying to estimate the difference via notches it's difficult to
> estimate how far it really is based on the axis being logged. You can use
> the same x as above, or y, whichever floats your boat.
>
> boxplot ( x,  notch = TRUE, log = "y", boxwex = .50)
>
> Versus
>
> boxplot ( x,  notch = TRUE, boxwex = .50)
>
> What I need from this is for the ability to zoom in on a particular
> location. Rather than be forced to look at things in a log format on my y
> axis to just be able to say, ok I have my mean at 25 and my standard
> deviation is 10 so I want to concentrate this boxplot between 5 and 45 to
> view 2 standard deviations. (That whole mean thing and standard deviations
> are arbitrary numbers but the concept remains)
>
> Any help would be much appreciated. If there are particular packages
> required for any suggestions please include their name in your response.
>
> If you need actual imaging of what the actual graphs I'm working with look
> like as opposed to the randomly generated values that I'm supplying then
> just say so and I'll post the actual graphs I'm trying to work with.
>
> Sincerely,
>
> Sock
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Formatting-Y-axis-tp4376843p4376843.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] passing an extra argument to an S3 generic

2012-02-10 Thread Henrik Bengtsson
For these type of setups, I typically turn to "default" values, e.g.

hatvalues.mlm <- function(model, m=1, infl=NULL, ...)
{
   if (is.null(infl)) {
 infl <- mlm.influence(model, m=m, do.coef=FALSE);
   }

   hat <- infl$H
   m <- infl$m
   names(hat) <- if(m==1) infl$subsets else apply(infl$subsets,1,
paste, collapse=',')
   hat
}

So people may prefer to do the following:

hatvalues.mlm <- function(model, m=1, infl, ...)
{
   if (missing(infl)) {
 infl <- mlm.influence(model, m=m, do.coef=FALSE);
   }

   hat <- infl$H
   m <- infl$m
   names(hat) <- if(m==1) infl$subsets else apply(infl$subsets,1,
paste, collapse=',')
   hat
}

The downside with this approach is that args() doesn't reveal the
default behavior; instead you need to document it clearly in the
help().

My $.02

/H

On Fri, Feb 10, 2012 at 12:13 PM, Michael Friendly  wrote:
> On 2/9/2012 6:24 PM, ilai wrote:
>>
>> You do not provide mlm.influence() so your code can't be reproduced.
>>
>> Or did you mean to put lm.influence() in the formals to your hatvalues.mlm
>> ?
>>
>> If yes, then 1) you have a typo 2) lm.influence doesn't allow you to
>> pass on arguments, maybe try influence.lm instead.
>
>
> No, I've written an S3 method, influence.mlm and a computational method,
> mlm.influence, both of which take an m= argument.  I didn't post all the
> code because I thought the question might have an easy answer based on what
> I provided.
>
> I include all the code and a test case in the attached .txt file that
> can be sourced.
>
> -Michael
>
>
>>
>> Elai
>>
>> On Thu, Feb 9, 2012 at 1:42 PM, Michael Friendly
>>  wrote:
>>>
>>> I'm trying to write some functions extending influence measures to
>>> multivariate linear models and also
>>> allow subsets of size m>=1 to be considered for deletion diagnostics.
>>>  I'd
>>> like these to work roughly parallel
>>> to those functions for the univariate lm where only single case deletion
>>> (m=1) diagnostics are considered.
>>>
>>> Corresponding to stats::hatvalues.lm, the S3 method for class "lm"
>>> objects,
>>>
 hatvalues<-function (model, ...)
>>>
>>> UseMethod("hatvalues")
>>>
 hatvalues.lm<-
>>>
>>> function (model, infl = lm.influence(model, do.coef = FALSE),    ...)
>>> {
>>>    hat<- infl$hat
>>>    names(hat)<- names(infl$wt.res)
>>>    hat
>>> }
>>>
>>> I have, for class "mlm" objects
>>>
>>> hatvalues.mlm<- function(model, m=1, infl=mlm.influence(model, m=m,
>>> do.coef
>>> = FALSE), ...)
>>> {
>>>    hat<- infl$H
>>>    m<- infl$m
>>>    names(hat)<- if(m==1) infl$subsets else apply(infl$subsets,1, paste,
>>> collapse=',')
>>>    hat
>>> }
>>>
>>> where mlm.influence() does the calculations, but also allows the m=
>>> argument
>>> to specify subset size.
>>> Yet when I test this I can't seem to pass the m= argument directly, so
>>> that
>>> it gets stuffed in to the infl=
>>> call to mlm.influence.
>>>
>>> # fit an mlm
>>> library(heplots)
>>> Rohwer2<- subset(Rohwer, subset=group==2)
>>> rownames(Rohwer2)<- 1:nrow(Rohwer2)
>>> Rohwer.mod<- lm(cbind(SAT, PPVT, Raven) ~ n+s+ns+na+ss, data=Rohwer2)
>>>
 class(Rohwer.mod)
>>>
>>> [1] "mlm" "lm"
>>>
>>>
>>> ## this doesn't work, as I would like it to, calling the hatvalues.mlm
>>> method, but passing m=2:

 hatvalues(Rohwer.mod, m=2)
>>>
>>> Error in UseMethod("hatvalues") :
>>>  no applicable method for 'hatvalues' applied to an object of class
>>> "c('double', 'numeric')"
>>>
>>> I don't understand why this doesn't just call hatvalues.mlm, since
>>> Rohwer.mod is of class "mlm".
>>>
>>> # These work -- calling hatvalues.mlm explicitly, or passing the infl=
>>> argument with the call to
>>> # mlm.influence

 hatvalues.mlm(Rohwer.mod, m=2)
 hatvalues(Rohwer.mod, infl=mlm.influence(Rohwer.mod,m=2))
>>>
>>>
>>> Can someone help me understand what is wrong and how to make the .mlm
>>> method
>>> allow m= to be passed
>>> directly to the infl= computation?
>>>
>>> thx,
>>> -Michael
>>>
>>> --
>>> Michael Friendly     Email: friendly AT yorku DOT ca
>>> Professor, Psychology Dept.
>>> York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
>>> 4700 Keele Street    Web:   http://www.datavis.ca
>>> Toronto, ONT  M3J 1P3 CANADA
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
> --
> Michael Friendly     Email: friendly AT yorku DOT ca
> Professor, Psychology Dept.
> York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
> 4700 Keele Street    Web:   http://www.datavis.ca
> Toronto, ONT  M3J 1P3 CANADA
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, mi

[R] apply pairs function to multiple columns in a data frame

2012-02-10 Thread jarvisma
I am very new to R and programming and thank you in advance for your patience
and help with a complete novice!

I am working with a large multivariate data set that has 10 explanatory
environmental variables (e.g. temp, depth) and over 60 response variables
(each is a separate species).  My data frame is set up like the simplified
version below:

JulianDay  Temperature  Salinity  Depth  Copepod  Barnacle  Gastropod 
Bivalve
222  12.1330.3 500  
 
756 0 178
222  12.333.2 1.1145
111 0 0
223  11.133.1 7752  
 
234 12   0

Where JulianDay, Temperature, Salinity, Depth, Copepod, Barnacle, Gastropod,
and Bivalve are the column headers.

I am using the pairs function in R to explore my data.  My data frame is
named Sunset.
Using this code:

Z=cbind(Sunset$Copepod,Sunset[,c(1:10)])   #the first 10 columns of my data
frame are explanatory variables such as temp
Pairs.Copepod=pairs(Z,main="Copepods vs. explanatory variables",
panel=panel.smooth)

I get a great pair plot of Copepods vs. all of the 10 explanatory
environmental variables.  I would like to do this for each of my 60+
species.  I can't just make one big pair plot of all of my explanatory
variables vs. all of my species because I have too many.  So instead I would
like a separate pair plot for each species (each column of data after the
first 10 columns of explanatory variables)

I would like to be able to write a loop that creates all of these plots at
once, but haven't been able to do so.  Ideally, each pair plot would have
the main title be the column header (e.g. Copepod) for that plot.  I would
love to include a way to save each pair plot as  separate jpeg to a folder
on my desktop with a file name that includes the species name (e.g.
Copepod).  

Again, I am very new to R and to programing so I would GREATLY appreciate
anyone patient and kind enough to respond with lots of detail so I can
hopefully follow your answer and suggestions.  I have searched but haven't
been able to find a way to write a loop that uses each column of data, so I
would love some help!

Thank you!
Marley

--
View this message in context: 
http://r.789695.n4.nabble.com/apply-pairs-function-to-multiple-columns-in-a-data-frame-tp4377425p4377425.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] passing an extra argument to an S3 generic

2012-02-10 Thread Michael Friendly

On 2/10/2012 4:09 PM, Henrik Bengtsson wrote:

So people may prefer to do the following:

hatvalues.mlm<- function(model, m=1, infl, ...)
{
if (missing(infl)) {
  infl<- mlm.influence(model, m=m, do.coef=FALSE);
}

hat<- infl$H
m<- infl$m
names(hat)<- if(m==1) infl$subsets else apply(infl$subsets,1,
paste, collapse=',')
hat
}
Thanks;  I tried exactly that, but I still can't pass m=2 to the mlm 
method through the generic


> hatvalues(Rohwer.mod)
 1  2  3  4  5  
6  7  8
0.16700926 0.21845327 0.14173469 0.07314341 0.56821462 0.15432157 
0.04530969 0.17661104
 9 10 11 12 13 
14 15 16
0.05131298 0.45161152 0.14542776 0.17050399 0.10374592 0.12649927 
0.33246744 0.33183461
17 18 19 20 21 
22 23 24
0.17320579 0.26353864 0.29835817 0.07880597 0.14023750 0.19380286 
0.04455330 0.20641708
25 26 27 28 29 
30 31 32
0.15712604 0.15333879 0.36726467 0.11189754 0.30426999 0.08655434 
0.08921878 0.07320950

> hatvalues(Rohwer.mod, m=2)
Error in UseMethod("hatvalues") :
  no applicable method for 'hatvalues' applied to an object of class 
"c('double', 'numeric')"


## This works:
> hatvalues.mlm(Rohwer.mod, m=2)
   ... output snipped

> hatvalues
function (model, ...)
UseMethod("hatvalues")


>

-Michael


--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] apply pairs function to multiple columns in a data frame

2012-02-10 Thread Phil Spector

A good way to solve problems like this is to write a function that
will work with one variable, and then use one of the apply family
of functions (in this case sapply) to do it for each of your 
variables.  In this case, such a function would be something like

this:

make1plot = function(var){
jpeg(paste(var,'jpg',sep='.'))
pairs(Sunset[,c(var,names(Sunset)[1:10])],main=paste(var,'vs. explanatory 
variables'),
  panel=panel.smooth)
dev.off()
}

(Notice that indexing precludes the need for cbind when you're creating a 
sub-matrix.)


Now suppose the names of your dependent variables are stored in a vector 
called "deps".  Then


sapply(deps,make1plot)

should do what you want.

I couldn't test this, since you didn't provide a reproducible example.  If you
use this list often, you'll find that providing a reproducible example goes a 
long way towards getting a good answer.

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu

On Fri, 10 Feb 2012, jarvisma wrote:


I am very new to R and programming and thank you in advance for your patience
and help with a complete novice!

I am working with a large multivariate data set that has 10 explanatory
environmental variables (e.g. temp, depth) and over 60 response variables
(each is a separate species).  My data frame is set up like the simplified
version below:

JulianDay  Temperature  Salinity  Depth  Copepod  Barnacle  Gastropod
Bivalve
222  12.1330.3 500
756 0 178
222  12.333.2 1.1145
111 0 0
223  11.133.1 7752
234 12   0

Where JulianDay, Temperature, Salinity, Depth, Copepod, Barnacle, Gastropod,
and Bivalve are the column headers.

I am using the pairs function in R to explore my data.  My data frame is
named Sunset.
Using this code:

Z=cbind(Sunset$Copepod,Sunset[,c(1:10)])   #the first 10 columns of my data
frame are explanatory variables such as temp
Pairs.Copepod=pairs(Z,main="Copepods vs. explanatory variables",
panel=panel.smooth)

I get a great pair plot of Copepods vs. all of the 10 explanatory
environmental variables.  I would like to do this for each of my 60+
species.  I can't just make one big pair plot of all of my explanatory
variables vs. all of my species because I have too many.  So instead I would
like a separate pair plot for each species (each column of data after the
first 10 columns of explanatory variables)

I would like to be able to write a loop that creates all of these plots at
once, but haven't been able to do so.  Ideally, each pair plot would have
the main title be the column header (e.g. Copepod) for that plot.  I would
love to include a way to save each pair plot as  separate jpeg to a folder
on my desktop with a file name that includes the species name (e.g.
Copepod).

Again, I am very new to R and to programing so I would GREATLY appreciate
anyone patient and kind enough to respond with lots of detail so I can
hopefully follow your answer and suggestions.  I have searched but haven't
been able to find a way to write a loop that uses each column of data, so I
would love some help!

Thank you!
Marley

--
View this message in context: 
http://r.789695.n4.nabble.com/apply-pairs-function-to-multiple-columns-in-a-data-frame-tp4377425p4377425.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Table rearranging

2012-02-10 Thread Jeffrey Joh

Hi David,
Totally forgot line 3!  So "513 red" should be ok but "420 red" and "917 
yellow" aren't.

Jeffrey


> CC: r-help@r-project.org
> From: dwinsem...@comcast.net
> To: johjeff...@hotmail.com
> Subject: Re: [R] Table rearranging
> Date: Tue, 7 Feb 2012 21:51:55 -0500
>
>
> On Feb 7, 2012, at 8:43 PM, Jeffrey Joh wrote:
>
> >
> > Hi David, I am not sure how ddply/summarize solves my issue. I have
> > the following table:
> >
> > ID measurement date door color
> > 1 0.93529385 513 open red
> > 2 0.97419293 420 open red
> > 3 0.962053514 513 closed red
> > 4 0.963909937 1230 open blue
> > 5 0.97652034 1230 open green
> > 6 0.989310795 1230 closed blue
> > 7 0.9941022 917 closed yellow
> > 8 0.8945757 1230 open blue
> >
> > I only want to keep the lines that have corresponding open/closed
> > measurements. For example, I want to keep lines 4,6,8 because for
> > the "1230 blue" condition, there exists both open and closed
> > measurements.
> >
> > However, the "513 red" condition has an open measurement, but no
> > closed measurement.
>
> Huh? what about line 3?
>
> --
> David,
> > Therefore, line 1 should be deleted.
> >
> > Jeffrey
> >
> >
> > 
> >> CC: r-help@r-project.org; wdun...@tibco.com
> >> From: dwinsem...@comcast.net
> >> To: johjeff...@hotmail.com
> >> Subject: Re: [R] Table rearranging
> >> Date: Tue, 7 Feb 2012 09:08:00 -0500
> >>
> >>
> >> On Feb 7, 2012, at 4:21 AM, Jeffrey Joh wrote:
> >>
> >>>
> >>> Thank you for your help, Bill.
> >>>
>  From the original table (not the plyr output), I would like to
>  remove all the lines that do not have a corresponding open/closed
>  measurement. For example, if there is a Closed yellow measurement
>  on 0917, but not an Open yellow 0917 measurement, then the Closed
>  yellow should be deleted.
> >>>
> >>> How can I make this change?
> >>>
> >>
> >> In R you need to assign the results of a function to an object name
> >> so
> >> you code would look like:
> >>
> >> modified_data <- ddply(d, .(date, color), summarize,
> >> meanClosed=mean(measurement[door=="closed"]),
> >> nClosed=sum(door=="closed"))
> >>
> >> --
> >> David
> >>> Jeffrey
> >>>
> >>>
> >>> 
>  From: wdun...@tibco.com
> >>>
>  To: johjeff...@hotmail.com; r-help@r-project.org
> >>>
>  Subject: RE: [R] Table rearranging
> >>>
>  Date: Tue, 7 Feb 2012 00:43:25 +
> >>>
> 
> >>>
>  Install and load the "plyr" package and try something like:
> >>>
> 
> >>>
> > ddply(d, .(date, color), summarize,
> >>>
>  + ddply(d, .(date, color), summarize
> >>>
>  + meanClosed=mean(measurement[door=="closed"]),
>  nClosed=sum(door=="closed"))
> >>>
>  date color meanOpen nOpen meanClosed nClosed
> >>>
>  1 420 red 0.9741929 1 NaN 0
> >>>
>  2 513 red 0.9352938 1 0.9620535 1
> >>>
>  3 917 yellow NaN 0 0.9941022 1
> >>>
>  4 1230 blue 0.9639099 1 0.9893108 1
> >>>
>  5 1230 green 0.9765203 1 NaN 0
> >>>
> 
> >>>
>  Bill Dunlap
> >>>
>  Spotfire, TIBCO Software
> >>>
>  wdunlap tibco.com
> >>>
> 
> >>>
> > -Original Message-
> >>>
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org
> > ] On Behalf Of Jeffrey Joh
> >>>
> > Sent: Monday, February 06, 2012 4:28 PM
> >>>
> > To: r-help@r-project.org
> >>>
> > Subject: [R] Table rearranging
> >>>
> >
> >>>
> >
> >>>
> > I have a table that looks like this:
> >>>
> >
> >>>
> > measurement date door color
> >>>
> > 0.93529385 513 open red
> >>>
> > 0.97419293 420 open red
> >>>
> > 0.962053514 513 closed red
> >>>
> > 0.963909937 1230 open blue
> >>>
> > 0.97652034 1230 open green
> >>>
> > 0.989310795 1230 closed blue
> >>>
> > 0.9941022 917 closed yellow
> >>>
> >
> >>>
> > I would like to create a table that has: Open measurement, Closed
> > measurement, date, color. For every
> >>>
> > date/color combination, there should be two columns to represent
> > the door open/closed measurement.
> >>>
> >
> >>>
> > If there are multiple datapoints with a given door/date/color
> > combination, then they should be
> >>>
> > averaged.
> >>>
> > I would also like to make two columns to represent the number of
> >>>
> > datapoints that were averaged in determining the open/closed
> >>>
> > measurements.
> >>>
> >
> >>>
> > Jeffrey
> >>>
> >
> >>>
> > __
> >>>
> > R-help@r-project.org mailing list
> >>>
> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>>
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> >>>
> > and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>>
> >>> __
> >>> R-help@r-project.org mailing list

[R] stepwise variable selection with multiple dependent variables

2012-02-10 Thread Fugate, Michael L
Good Day,

I fit a multivariate linear regression model with 3 dependent variables and 
several predictors using the lm function.  I would like to use stepwise 
variable selection to produce a set of candidate models.  However, when I pass 
the fitted lm object to step() I get the following error:

Error from R:
Error in drop1.mlm(fit, scope$drop, scale = scale, trace = trace, k = k,  : 
  no 'drop1' method for "mlm" models

My dependent data is in the matrix ymat where ymat is 35 rows by 3 columns.  
The predictors are in X where X is 35 by 6

The steps I used were:
m.fit <- lm(ymat ~ ., data=X)
m.step <- step(m.fit)

If variable selection is not possible with step() is there another package that 
will perform variable selection in a multivariate setting?

System information:
platform   x86_64-apple-darwin9.8.0 
arch   x86_64   
os darwin9.8.0  
system x86_64, darwin9.8.0  
status  
major  2
minor  13.1 
year   2011 
month  07   
day08   
svn rev56322
language   R
version.string R version 2.13.1 (2011-07-08)

Thanks in advance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Table rearranging

2012-02-10 Thread David Winsemius


On Feb 10, 2012, at 4:47 PM, Jeffrey Joh wrote:



Hi David,
Totally forgot line 3!  So "513 red" should be ok but "420 red" and  
"917 yellow" aren't.




I don't have a plyr solution but this is a base solution:

dat[ as.logical( ave(dat$door, dat$date,
   FUN=function(x) {"open" %in% x & "closed" %in% x} )) , ]
  ID measurement date   door color
1  1   0.9352938  513   open   red
3  3   0.9620535  513 closed   red
4  4   0.9639099 1230   open  blue
5  5   0.9765203 1230   open green
6  6   0.9893108 1230 closed  blue
8  8   0.8945757 1230   open  blue

(You could have used Dunlap's resutl and tested for both the meanOpen  
and meanClosed values being not-is.na.)


--
David


Jeffrey



CC: r-help@r-project.org
From: dwinsem...@comcast.net
To: johjeff...@hotmail.com
Subject: Re: [R] Table rearranging
Date: Tue, 7 Feb 2012 21:51:55 -0500


On Feb 7, 2012, at 8:43 PM, Jeffrey Joh wrote:



Hi David, I am not sure how ddply/summarize solves my issue. I have
the following table:

ID measurement date door color
1 0.93529385 513 open red
2 0.97419293 420 open red
3 0.962053514 513 closed red
4 0.963909937 1230 open blue
5 0.97652034 1230 open green
6 0.989310795 1230 closed blue
7 0.9941022 917 closed yellow
8 0.8945757 1230 open blue

I only want to keep the lines that have corresponding open/closed
measurements. For example, I want to keep lines 4,6,8 because for
the "1230 blue" condition, there exists both open and closed
measurements.

However, the "513 red" condition has an open measurement, but no
closed measurement.


Huh? what about line 3?

--
David,

Therefore, line 1 should be deleted.

Jeffrey




CC: r-help@r-project.org; wdun...@tibco.com
From: dwinsem...@comcast.net
To: johjeff...@hotmail.com
Subject: Re: [R] Table rearranging
Date: Tue, 7 Feb 2012 09:08:00 -0500


On Feb 7, 2012, at 4:21 AM, Jeffrey Joh wrote:



Thank you for your help, Bill.


From the original table (not the plyr output), I would like to
remove all the lines that do not have a corresponding open/closed
measurement. For example, if there is a Closed yellow measurement
on 0917, but not an Open yellow 0917 measurement, then the Closed
yellow should be deleted.


How can I make this change?



In R you need to assign the results of a function to an object name
so
you code would look like:

modified_data <- ddply(d, .(date, color), summarize,
meanClosed=mean(measurement[door=="closed"]),
nClosed=sum(door=="closed"))

--
David

Jeffrey




From: wdun...@tibco.com



To: johjeff...@hotmail.com; r-help@r-project.org



Subject: RE: [R] Table rearranging



Date: Tue, 7 Feb 2012 00:43:25 +







Install and load the "plyr" package and try something like:







ddply(d, .(date, color), summarize,



+ ddply(d, .(date, color), summarize



+ meanClosed=mean(measurement[door=="closed"]),
nClosed=sum(door=="closed"))



date color meanOpen nOpen meanClosed nClosed



1 420 red 0.9741929 1 NaN 0



2 513 red 0.9352938 1 0.9620535 1



3 917 yellow NaN 0 0.9941022 1



4 1230 blue 0.9639099 1 0.9893108 1



5 1230 green 0.9765203 1 NaN 0







Bill Dunlap



Spotfire, TIBCO Software



wdunlap tibco.com







-Original Message-



From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org
] On Behalf Of Jeffrey Joh



Sent: Monday, February 06, 2012 4:28 PM



To: r-help@r-project.org



Subject: [R] Table rearranging











I have a table that looks like this:







measurement date door color



0.93529385 513 open red



0.97419293 420 open red



0.962053514 513 closed red



0.963909937 1230 open blue



0.97652034 1230 open green



0.989310795 1230 closed blue



0.9941022 917 closed yellow






I would like to create a table that has: Open measurement,  
Closed

measurement, date, color. For every



date/color combination, there should be two columns to represent
the door open/closed measurement.







If there are multiple datapoints with a given door/date/color
combination, then they should be



averaged.



I would also like to make two columns to represent the number of



datapoints that were averaged in determining the open/closed



measurements.







Jeffrey







__



R-help@r-project.org mailing list



https://stat.ethz.ch/mailman/listinfo/r-help



PLEASE do read the posting guide http://www.R-project.org/posting-guide.html


and provide commented, minimal, self-contained, reproducible  
code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT





David Winsemius, MD
West Hartford, CT





David Winsemius, 

Re: [R] Choosing glmnet lambda values via caret

2012-02-10 Thread Yang Zhang
Yes, I know how to statically specify lambda choices.  I was asking
about whether there's a way to let glmnet guide that, instead of
specifying it in advance.  Again, sorry if I could've been more clear.

On Thu, Feb 9, 2012 at 7:15 PM, Max Kuhn  wrote:
> You can adjust the candidate set of tuning parameters via the tuneGrid
> argument in trian() and the process by which the optimal choice is
> made (via the 'selectionFunction' argument in trainControl()). Check
> out the package vignettes.
>
> The latest version also has an update.train() function that lets the
> user manually specify the tuning parameters after the call to train().
>
> On Thu, Feb 9, 2012 at 7:00 PM, Yang Zhang  wrote:
>> Usually when using raw glmnet I let the implementation choose the
>> lambdas.  However when training via caret::train the lambda values are
>> predetermined.  Is there any way to have caret defer the lambda
>> choices to caret::train and thus choose the optimal lambda
>> dynamically?
>>
>> --
>> Yang Zhang
>> http://yz.mit.edu/
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Max



-- 
Yang Zhang
http://yz.mit.edu/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] stepwise variable selection with multiple dependent variables

2012-02-10 Thread Mitchell Maltenfort
"Anova.mlm" would be one way to do model selection.



On Fri, Feb 10, 2012 at 4:29 PM, Fugate, Michael L  wrote:
> Good Day,
>
> I fit a multivariate linear regression model with 3 dependent variables and 
> several predictors using the lm function.  I would like to use stepwise 
> variable selection to produce a set of candidate models.  However, when I 
> pass the fitted lm object to step() I get the following error:
>
> Error from R:
> Error in drop1.mlm(fit, scope$drop, scale = scale, trace = trace, k = k,  :
>  no 'drop1' method for "mlm" models
>
> My dependent data is in the matrix ymat where ymat is 35 rows by 3 columns.  
> The predictors are in X where X is 35 by 6
>
> The steps I used were:
> m.fit <- lm(ymat ~ ., data=X)
> m.step <- step(m.fit)
>
> If variable selection is not possible with step() is there another package 
> that will perform variable selection in a multivariate setting?
>
> System information:
> platform       x86_64-apple-darwin9.8.0
> arch           x86_64
> os             darwin9.8.0
> system         x86_64, darwin9.8.0
> status
> major          2
> minor          13.1
> year           2011
> month          07
> day            08
> svn rev        56322
> language       R
> version.string R version 2.13.1 (2011-07-08)
>
> Thanks in advance.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Scriptable Integration

2012-02-10 Thread Hasan Diwan
My data:
> dput(mydata)
structure(list(V1 = c(1328565067, 1328565067.05, 1328565067.1,
1328565067.15, 1328565067.2, 1328565067.25), V2 = c(0.0963890795246276,
0.227296347215609, 0.240972698811569, 0.221208948983498, 0.230898231782485,
0.203282153087549), V3 = c(0.0245045248243853, 0.0835679411703579,
0.0612613120609633, 0.058568910563872, 0.0511868450318788, 0.0557714205674231
)), .Names = c("V1", "V2", "V3"), row.names = c(NA, 6L), class = "data.frame")

I'd like to apply an arbitrary number of integrations of the
splinefunc from mydata$V2 and mydata$V3. Therefore, it should return a
function. Anyone have any idea as to a package that allows this? Many
thanks! -- H
-- 
Sent from my mobile device
Envoyait de mon portable

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply pmax to dataframe with different args based on dataframe factor

2012-02-10 Thread Idris Raja
damn, that's pretty clever. +1

thanks

On Thu, Feb 9, 2012 at 8:11 PM, ilai  wrote:

> Your attempt was just overly complicated. All you needed was
>
> threshold <- c( .2 , .4 , .5 )[ df$track ]
> df$value <- pmax(threshold, df$value)
> df # desired outcome
>
> Cheers
>
> On Thu, Feb 9, 2012 at 3:56 PM, Idris Raja  wrote:
> > # I have a dataframe in the following form:
> >
> > track <- c(rep('A', 3), rep('B', 4), rep('C', 4))
> > value <- c(0.15, 0.25, 0.35, 0.05, 0.99, 0.32, 0.13, 0.80, 0.75, 0.60,
> 0.44)
> > df <- data.frame(track=factor(track), value=value)
> >
> > #> print(df)
> >   #track value
> > #1  A  0.15
> > #2  A  0.25
> > #3  A  0.35
> > #4  B  0.05
> > #5  B  0.99
> > #6  B  0.32
> > #7  B  0.13
> > #8  C  0.80
> > #9  C  0.75
> > #10 C  0.60
> > #11 C  0.44
> >
> >
> > # If any of the values are below a threshold value, I want to replace it
> > with the
> > # threshold value. The twist is that there is a different threshold value
> > for
> > # every track.
> >
> > # I tried something like this, but it's not working
> >
> > threshold <- list()
> > threshold['A'] <- 0.2
> > threshold['B'] <- 0.4
> > threshold['C'] <- 0.5
> >
> >
> > for (track in levels(df$track)){
> >df[df$track==track,]$outcome <- pmax(df[df$track==track,]$outcome,
> > threshold[track])
> > }
> > # Warning messages:
> > # 1: In is.na(mmm) : is.na() applied to non-(list or vector) of type
> 'NULL'
> > # 2: In is.na(mmm) : is.na() applied to non-(list or vector) of type
> 'NULL'
> > # 3: In is.na(mmm) : is.na() applied to non-(list or vector) of type
> 'NULL'
> >
> >
> > #**
> > # Desired Results:
> >
> > #> print(df)
> >   #track value
> > #1  A  0.20 # value changed
> > #2  A  0.25
> > #3  A  0.35
> > #4  B  0.40 # value changed
> > #5  B  0.99
> > #6  B  0.40 # value changed
> > #7  B  0.40 # value changed
> > #8  C  0.80
> > #9  C  0.75
> > #10 C  0.60
> > #11 C  0.50 # value changed
> >
> >
> > # Any ideas? Thanks for reading.
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multiple histograms from a dataframe

2012-02-10 Thread Adel ESSAFI
Hi list


I  need some help for drawing some histograms

I have a dataframe , say,
X Y Z T

I want to draw a histogram Z-T for each value of the couple (X-Y).
When I use thus syntax

> library(lattice)
> histogram(law[,3] ~ law[,66] | law[,1] )

it draws multiple histograms but by selecting distinct values of  law[,1]
The deal is to make the same thing but for a couple of columns

Thanks in advance for help

Adel



-- 
PhD candidate in Computer Science
Address
3 avenue lamine, cité ezzahra, Sousse 4000
Tunisia
tel: +216 97 246 706 (+33640302046 jusqu'au 15/6)
fax: +216 71 391 166

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multiple histograms from a dataframe

2012-02-10 Thread David Winsemius


On Feb 10, 2012, at 7:05 PM, Adel ESSAFI wrote:


Hi list


I  need some help for drawing some histograms

I have a dataframe , say,
X Y Z T

I want to draw a histogram Z-T for each value of the couple (X-Y).
When I use thus syntax


library(lattice)
histogram(law[,3] ~ law[,66] | law[,1] )


Perhaps (but untested in the absence of data);

 histogram( Z ~ T | interaction(X, Y)  , data=dfrmname )



it draws multiple histograms but by selecting distinct values of   
law[,1]

The deal is to make the same thing but for a couple of columns

Thanks in advance for help

Adel



--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] garchSim with starting error and variance given?

2012-02-10 Thread jfca283
Hello
i need to simulate data for and arch and garch process, n=1000.
I've been generating arch and garch data but i can't find the way to set an
starting error, epsilon(t=0) and sigma(t=0) to run the command garchSim(). I
generated the process on Excel and i know that for the Arch process has a
mean of 10 aprox, and for Garch a mean of 25 aprox. The errors comes from a
random walk
y_t=epsilon_t
Thanks for your time and interest.

--
View this message in context: 
http://r.789695.n4.nabble.com/garchSim-with-starting-error-and-variance-given-tp432p432.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Scriptable Integration

2012-02-10 Thread R. Michael Weylandt
I'm not quite sure what you mean, but perhaps this will help:

library(splines)
mydata <- structure(list(V1 = c(1328565067, 1328565067.05, 1328565067.1,
1328565067.15, 1328565067.2, 1328565067.25), V2 = c(0.0963890795246276,
0.227296347215609, 0.240972698811569, 0.221208948983498, 0.230898231782485,
0.203282153087549), V3 = c(0.0245045248243853, 0.0835679411703579,
0.0612613120609633, 0.058568910563872, 0.0511868450318788, 0.0557714205674231
)), .Names = c("V1", "V2", "V3"), row.names = c(NA, 6L), class = "data.frame")

spf <- splinefun(mydata$V2)
splInt <- function(low, up) integrate(spf, low, up)$value

splInt(0, 1)

splInt(0, 2)

Michael

On Fri, Feb 10, 2012 at 6:34 PM, Hasan Diwan  wrote:
> My data:
>> dput(mydata)
> structure(list(V1 = c(1328565067, 1328565067.05, 1328565067.1,
> 1328565067.15, 1328565067.2, 1328565067.25), V2 = c(0.0963890795246276,
> 0.227296347215609, 0.240972698811569, 0.221208948983498, 0.230898231782485,
> 0.203282153087549), V3 = c(0.0245045248243853, 0.0835679411703579,
> 0.0612613120609633, 0.058568910563872, 0.0511868450318788, 0.0557714205674231
> )), .Names = c("V1", "V2", "V3"), row.names = c(NA, 6L), class = "data.frame")
>
> I'd like to apply an arbitrary number of integrations of the
> splinefunc from mydata$V2 and mydata$V3. Therefore, it should return a
> function. Anyone have any idea as to a package that allows this? Many
> thanks! -- H
> --
> Sent from my mobile device
> Envoyait de mon portable
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] issues with read.spss

2012-02-10 Thread Jeroen Ooms
Someone supplied me with an SPSS datafile that caused a buffer
overflow and then a crash when reading it in R. Unfortunately I can't
supply the dataset at hand and I have a hard time reproducing it with
a toy example. But I found at least 2 issues that might be related. I
would like to know which of these are expected behavior, and which are
bugs. I reproduced it on R 2.14.1 both on Ubuntu Linux and Windows
7...

Below some code. The files that are referenced in the code are
available for download on http://www.stat.ucla.edu/~jeroen/spss/

#load library
library(foreign)

#problem one: long string variable is converted to multiple variables.
x <- read.spss("longstring.sav");
summary(x); #4 variables??

#problem two: use.labels does not deal correctly with duplicate labels
and generates a bad factor.
x <- read.spss("duplicate_labels.sav", use.value.labels=T);

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Scriptable Integration

2012-02-10 Thread Hasan Diwan
Michael,
On 10 February 2012 18:11, R. Michael Weylandt
 wrote:
> I'm not quite sure what you mean, but perhaps this will help:
> spf <- splinefun(mydata$V2)
> splInt <- function(low, up) integrate(spf, low, up)$value

Sort of, I'm looking to get the nth order integral, where the only
thing I know about n is that it is greater than 1, but in practise
will be under 4. Something like:
spf <- rep(splinefun(myData$V2, times=(n-1))
splInt <- integrate(spf, min(myData$V2), max(myData$V3))

However, I can't seem to figure out how to get splinefun to evaluate
itself. Hope that's clearer and thanks for the help!
-- 
Sent from my mobile device
Envoyait de mon portable

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] apply pairs function to multiple columns in a data frame

2012-02-10 Thread 538280
You might find the pairs2 function in the TeachingDemos package useful.

On Fri, Feb 10, 2012 at 1:13 PM, jarvisma  wrote:
> I am very new to R and programming and thank you in advance for your patience
> and help with a complete novice!
>
> I am working with a large multivariate data set that has 10 explanatory
> environmental variables (e.g. temp, depth) and over 60 response variables
> (each is a separate species).  My data frame is set up like the simplified
> version below:
>
> JulianDay  Temperature  Salinity  Depth  Copepod  Barnacle  Gastropod
> Bivalve
> 222              12.1                    33            0.3         500
> 756             0                     178
> 222              12.3                    33.2         1.1        145
> 111             0                     0
> 223              11.1                    33.1         7            752
> 234             12                   0
>
> Where JulianDay, Temperature, Salinity, Depth, Copepod, Barnacle, Gastropod,
> and Bivalve are the column headers.
>
> I am using the pairs function in R to explore my data.  My data frame is
> named Sunset.
> Using this code:
>
> Z=cbind(Sunset$Copepod,Sunset[,c(1:10)])   #the first 10 columns of my data
> frame are explanatory variables such as temp
> Pairs.Copepod=pairs(Z,main="Copepods vs. explanatory variables",
> panel=panel.smooth)
>
> I get a great pair plot of Copepods vs. all of the 10 explanatory
> environmental variables.  I would like to do this for each of my 60+
> species.  I can't just make one big pair plot of all of my explanatory
> variables vs. all of my species because I have too many.  So instead I would
> like a separate pair plot for each species (each column of data after the
> first 10 columns of explanatory variables)
>
> I would like to be able to write a loop that creates all of these plots at
> once, but haven't been able to do so.  Ideally, each pair plot would have
> the main title be the column header (e.g. Copepod) for that plot.  I would
> love to include a way to save each pair plot as  separate jpeg to a folder
> on my desktop with a file name that includes the species name (e.g.
> Copepod).
>
> Again, I am very new to R and to programing so I would GREATLY appreciate
> anyone patient and kind enough to respond with lots of detail so I can
> hopefully follow your answer and suggestions.  I have searched but haven't
> been able to find a way to write a loop that uses each column of data, so I
> would love some help!
>
> Thank you!
> Marley
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/apply-pairs-function-to-multiple-columns-in-a-data-frame-tp4377425p4377425.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.