from:"Jay"

[R] Extract values from a predict() result... how?

2010-02-15 Thread Jay

Hello,

silly question I suppose, but somehow I can't manage to extract the
probabilities from a glm.predict() result:

> str(res)
 Named num [1:9] 0.00814 0.01877 0.025 0.02941 0.03563 ...
 - attr(*, "names")= chr [1:9] "1" "2" "3" "4" ...

I got from:

# A Gamma example, from McCullagh & Nelder (1989, pp. 300-2)
clotting <- data.frame(
u = c(5,10,15,20,30,40,60,80,100),
lot1 = c(118,58,42,35,27,25,21,19,18),
lot2 = c(69,35,26,21,18,16,13,12,12))
model <- glm(lot1 ~ log(u), data=clotting, family=Gamma)
res <- predict(model, clotting)

I want to transfer the probabilities "0.00814 0.01877 0.025 0.02941
0.03563 ..." to a separate vector, how do I do this?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Checking the assumptions for a proper GLM model

2010-02-17 Thread Jay

Hello,

Are there any packages/functions available for testing the assumptions
underlying assumptions for a good GLM model? Like linktest in STATA
and smilar. If not, could somebody please describe their work process
when they check the validity of a logit/probit model?

Regards,
Jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Checking the assumptions for a proper GLM model

2010-02-18 Thread Jay

So what I'm looking for is readily available tools/packages that could
produce some of the following:

3.6 Summary of Useful Commands (STATA: Source:
http://www.ats.ucla.edu/stat/Stata/webbooks/logistic/chapter3/statalog3.htm)

* linktest--performs a link test for model specification, in our
case to check if logit is the right link function to use. This command
is issued after the logit or logistic command.
* lfit--performs goodness-of-fit test, calculates either Pearson
chi-square goodness-of-fit statistic or Hosmer-Lemeshow chi-square
goodness-of-fit depending on if the group option is used.
* fitstat -- is a post-estimation command that computes a variety
of measures of fit.
* lsens -- graphs sensitivity and specificity versus probability
cutoff.
* lstat -- displays summary statistics, including the
classification table, sensitivity, and specificity.
* lroc -- graphs and calculates the area under the ROC curve based
on the model.
* listcoef--lists the estimated coefficients for a variety of
regression models, including logistic regression.
* predict dbeta --  Pregibon delta beta influence statistic
* predict deviance -- deviance residual
* predict dx2 -- Hosmer and Lemeshow change in chi-square
influence statistic
* predict dd -- Hosmer and Lemeshow change in deviance statistic
* predict hat -- Pregibon leverage
* predict residual -- Pearson residuals; adjusted for the
covariate pattern
* predict rstandard -- standardized Pearson residuals; adjusted
for the covariate pattern
* ldfbeta -- influence of each individual observation on the
coefficient estimate ( not adjusted for the covariate pattern)
* graph with [weight=some_variable] option
* scatlog--produces scatter plot for logistic regression.
* boxtid--performs power transformation of independent variables
and performs nonlinearity test.

But, since I'm new to GLM, I owuld greatly appreciate how you/others
go about and test the validity of a GLM model.

On Feb 18, 1:18 am, Jay  wrote:
> Hello,
>
> Are there any packages/functions available for testing the assumptions
> underlying assumptions for a good GLM model? Like linktest in STATA
> and smilar. If not, could somebody please describe their work process
> when they check the validity of a logit/probit model?
>
> Regards,
> Jay
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Checking the assumptions for a proper GLM model

2010-02-18 Thread Jay

Well, yes and no. Obviously I was not asking for the complete recap of
a all the theory on the subject. My main concern is finding readily
available CRAN functions and packages that would help me in the
process. I've found the UCLA site to be very informative and spent a
lot of time ther the last couple of days. However, their section on
using R for validating the assumptions is very lacking. Naturally
links like google.com and amazon.com will eventually get me there, but
if somebody have other recommendations, I would be very fortunate to
get even more help.

BR,
Jay

On Feb 18, 7:01 pm, David Winsemius  wrote:
> At one time the "answer" would have been to buy a copy of Venables and  
> Ripley's "Modern Applied Statistics with S" (and R), and that would  
> still be a sensible strategy. There are now quite a few other R-
> centric texts that have been published in the last few years. Search  
> Amazon if needed. You seem to be asking for a tutorial on general  
> linear modeling (which if you read the Posting Guide you will find is  
> not a service offered by the r-help list.)  Perhaps you should have  
> edited the link you provided in the obvious fashion:
>
> http://www.ats.ucla.edu/stat/R/
>
> Perhaps one of these pages:http://www.ats.ucla.edu/stat/R/dae/default.htm
>
> The UCLA Statistics website used to be dismissive of R, but they more  
> recently appear to have seen the light. There is also a great amount  
> of contributed teaching material on CRAN:
>
> http://cran.r-project.org/other-docs.html
>
> ... and more would be readily available via Googling with "r-project"  
> as part of a search strategy. Frank Harrell's material is in  
> particular quite useful:
>
> http://biostat.mc.vanderbilt.edu/wiki/Main/StatComp
>
> --
> David.
>
> On Feb 18, 2010, at 8:32 AM, Jay wrote:
>
>
>
>
>
> > So what I'm looking for is readily available tools/packages that could
> > produce some of the following:
>
> > 3.6 Summary of Useful Commands (STATA: Source:
> >http://www.ats.ucla.edu/stat/Stata/webbooks/logistic/chapter3/statalo...)
>
> >    * linktest--performs a link test for model specification, in our
> > case to check if logit is the right link function to use. This command
> > is issued after the logit or logistic command.
> 
> > and performs nonlinearity test.
>
> > But, since I'm new to GLM, I owuld greatly appreciate how you/others
> > go about and test the validity of a GLM model.
>
> > On Feb 18, 1:18 am, Jay  wrote:
> >> Hello,
>
> >> Are there any packages/functions available for testing the  
> >> assumptions
> >> underlying assumptions for a good GLM model? Like linktest in STATA
> >> and smilar. If not, could somebody please describe their work process
> >> when they check the validity of a logit/probit model?
>
> >> Regards,
> >> Jay
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Extract information from S4 object

2010-02-22 Thread Jay

The function prediction() returns this:

Formal class 'performance' [package "ROCR"] with 6 slots
  ..@ x.name  : chr "Cutoff"
  ..@ y.name  : chr "Accuracy"
  ..@ alpha.name  : chr "none"
  ..@ x.values:List of 1
  .. ..$ : Named num [1:89933] Inf 2.23 2.22 2.17 2.16 ...
  .. .. ..- attr(*, "names")= chr [1:89933] "" "36477" "56800"
"41667" ...
  ..@ y.values:List of 1
  .. ..$ : num [1:89933] 0.5 0.5 0.5 0.5 0.5 ...
  ..@ alpha.values: list()

Now, since I want to match each prediction with its original case, I
need to extract the names, i.e. the information in "- attr(*,
"names")= chr [1:89933] "" "36477" "56800" "41667" ..." so I can use
it with a simple datafile[names,] query.

How do I get these names in plain number formats?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extract information from S4 object

2010-02-22 Thread Jay

Tahnk you. But, when I try the command you both suggested I get a NULL
as the results.

> names(object1 @ x.values)
NULL

Where did I go wrong?


On Feb 22, 4:34 pm, David Winsemius  wrote:
> On Feb 22, 2010, at 8:05 AM, Jay wrote:
>
>
>
> > The function prediction() returns this:
>
> > Formal class 'performance' [package "ROCR"] with 6 slots
> >  ..@ x.name      : chr "Cutoff"
> >  ..@ y.name      : chr "Accuracy"
> >  ..@ alpha.name  : chr "none"
> >  ..@ x.values    :List of 1
> >  .. ..$ : Named num [1:89933] Inf 2.23 2.22 2.17 2.16 ...
> >  .. .. ..- attr(*, "names")= chr [1:89933] "" "36477" "56800"
> > "41667" ...
> >  ..@ y.values    :List of 1
> >  .. ..$ : num [1:89933] 0.5 0.5 0.5 0.5 0.5 ...
> >  ..@ alpha.values: list()
>
> > Now, since I want to match each prediction with its original case, I
> > need to extract the names, i.e. the information in "- attr(*,
> > "names")= chr [1:89933] "" "36477" "56800" "41667" ..." so I can use
> > it with a simple datafile[names,] query.
>
> > How do I get these names in plain number formats?
>
> Not sure what you mean by "plain number formats" but this should get  
> you a vector of "names" assuming the prediction object is named  
> "predobject":
>
> names( predobj...@x.values )
>
> If you wanted them "as.numeric", then that is the name of the  
> appropriate function.
>
> --
> David
>
>
>
> > __
> > r-h...@r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross-validation for parameter selection (glm/logit)

2010-04-02 Thread Jay

If my aim is to select a good subset of parameters for my final logit
model built using glm(). What is the best way to cross-validate the
results so that they are reliable?

Let's say that I have a large dataset of 1000's of observations. I
split this data into two groups, one that I use for training and
another for validation. First I use the training set to build a model,
and the the stepAIC() with a Forward-Backward search. BUT, if I base
my parameter selection purely on this result, I suppose it will be
somewhat skewed due to the 1-time data split (I use only 1 training
dataset)

What is the correct way to perform this variable selection? And are
the readily available packages for this?

Similarly, when I have my final parameter set, how should I go about
and make the final assessment of the models predictability? CV? What
package?


Thank you in advance,
Jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] xyplot ontop a contourplot (package: lattice)

2010-04-16 Thread Jay

Hello,

I have a contourplot plot that shows the data I want. However, I would
like to point a certain amount of points from this plot via a
xyplot().

Example:

x <- seq(pi/4, 5 * pi, length.out = 100)
y <- seq(pi/4, 5 * pi, length.out = 100)
r <- as.vector(sqrt(outer(x^2, y^2, "+")))
grid <- expand.grid(x=x, y=y)
grid$z <- cos(r^2) * exp(-r/(pi^3))
levelplot(z~x*y, grid, cuts = 50, panel.xyplot(x~y))


But the point does not show up. What is the correct way to achieve
this?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Check for overdispersion in logit model

2010-04-16 Thread Jay

A quick question for those that are familiar with the subject, is it
OK to check for overdispersion in a logit model using:

sum(resid(model, type = "pearson")^2) / df.residual(model)

Are tehre other commands? packages?

/Jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Positioning plots on top of each other (aligment & borders)

2009-12-30 Thread Jay

Hello,

I want to place two plots on top of each other. However, the problem
is that I can't figure out a simple way to align them correctly. Is
there a way to specify this?
Since the data is bunch of coordinates and the second layer is an
outline of a map (a .ps file I import using the grImport package), I
suppose one option would be to specify a set of "artificial"
coordinates that make up the very corners of that plot, and then have
the second layer will this same space. Any ideas how to do this?


//John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] xyplot: several plots in one creates y-scale problem

2010-01-02 Thread Jay

Hello,

I've been looking for a solution to this problem for some time now but
I seem unable to solve it.. so this is the case: I want to plot 4 time
series in the same graph using xyplot(). When I do this with

xyplot(mydata[,2]+mydata[,3]+mydata[,4]+mydata[,5] ~ mydata[,1], data
= mydata,
type = "l",
auto.key = list(space="right", lines = T, points = F),
par.settings = simpleTheme(lty = c(1,2,3,4))
)

I get a graph where all lines are "maximized" to cover the entire y-
scale width. I.e., they are use their own scale independent of each
other (my data has some columns that are one magnitude smaller than
the others). How do I force them all to use the same y-scale?

I found this thread: 
http://n4.nabble.com/superimposing-xyplots-on-same-scale-td905525.html,
but  I'm not really sure what is going on there. Any ideas?

/J

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] xyplot: problems with column names & legend

2010-01-02 Thread Jay

Hello!

one more question about xyplot. If I have data which have space in the
column names, say "xyz 123". How do I create a working graph where
this text is displayed in the legend key?

Now when I try something like xyplot("xyz 123" ~ variable1, data =
mydata, ...) I get nothing.
Also, is it possible to genrate the graph with xyplot(mydata[,1] ~
variable1, data = mydata, ...) and then later in the code specify
the names that should be displayed in the legend?

Thank you!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] xyplot: problems with column names & legend

2010-01-03 Thread Jay

Thanks, the backtickes got the code working. However, now I cant get
it to draw the legend/key.
For example, look at this figure: 
http://osiris.sunderland.ac.uk/~cs0her/Statistics/xyplot5.png
My graph is similar, but instead of 1,2,...,8 as the names of the
series I want it to say "Data one" (a string with spaces) and so on.



On Jan 3, 10:58 am, baptiste auguie 
wrote:
> Hi,
>
> Using backticks might work to some extent,
>
> library(lattice)
> `my variable` = 1:10
> y=rnorm(10)
> xyplot(`my variable` ~ y)
>
> but if your data is in a data.frame the names should have been converted,
>
> make.names('my variable')
> [1] "my.variable"
>
> HTH,
>
> baptiste
>
> 2010/1/3 Jay :
>
>
>
> > Hello!
>
> > one more question about xyplot. If I have data which have space in the
> > column names, say "xyz 123". How do I create a working graph where
> > this text is displayed in the legend key?
>
> > Now when I try something like xyplot("xyz 123" ~ variable1, data =
> > mydata, ...) I get nothing.
> > Also, is it possible to genrate the graph with xyplot(mydata[,1] ~
> > variable1, data = mydata, ...) and then later in the code specify
> > the names that should be displayed in the legend?
>
> > Thank you!
>
> > __
> > r-h...@r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] xyplot: problems with column names & legend

2010-01-05 Thread Jay

Anybody? Frustrating to be unable to solve this silly little
problem...

On Jan 3, 12:48 pm, Jay  wrote:
> Thanks, the backtickes got the code working. However, now I cant get
> it to draw the legend/key.
> For example, look at this 
> figure:http://osiris.sunderland.ac.uk/~cs0her/Statistics/xyplot5.png
> My graph is similar, but instead of 1,2,...,8 as the names of the
> series I want it to say "Data one" (a string with spaces) and so on.
>
> On Jan 3, 10:58 am, baptiste auguie 
> wrote:
>
>
>
> > Hi,
>
> > Using backticks might work to some extent,
>
> > library(lattice)
> > `my variable` = 1:10
> > y=rnorm(10)
> > xyplot(`my variable` ~ y)
>
> > but if your data is in a data.frame the names should have been converted,
>
> > make.names('my variable')
> > [1] "my.variable"
>
> > HTH,
>
> > baptiste
>
> > 2010/1/3 Jay :
>
> > > Hello!
>
> > > one more question about xyplot. If I have data which have space in the
> > > column names, say "xyz 123". How do I create a working graph where
> > > this text is displayed in the legend key?
>
> > > Now when I try something like xyplot("xyz 123" ~ variable1, data =
> > > mydata, ...) I get nothing.
> > > Also, is it possible to genrate the graph with xyplot(mydata[,1] ~
> > > variable1, data = mydata, ...) and then later in the code specify
> > > the names that should be displayed in the legend?
>
> > > Thank you!
>
> > > __
> > > r-h...@r-project.org mailing list
> > >https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting 
> > > guidehttp://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
>
> > __
> > r-h...@r-project.org mailing 
> > listhttps://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] xyplot: adjusting the scale (min, max & tick)

2010-01-05 Thread Jay

Hi,

I'm terribly sorry but it seems it cannot figure this one out by
myself so, please, if somebody could help I would be very grateful.
So, when I plot with xyplot() I get an y-axis that is very ugly...
starting from a random number and having so many ticks that it becomes
unreadable.

How do I tell xyplot how to draw the axis? E.g., start from 100, end
at 200 with 25 units between ticks/labels?
Can somebody give me an example?

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] xyplot: adjusting the scale (min, max & tick)

2010-01-06 Thread Jay


Perfect, that piece of code did exactly what I wanted. However, I stumpled
upon a new problem, now my data is plotted on a totally wrong scale. The
y-values are all between 160k and 500k, BUT now with that option I find that
the plots are between 0 and 50 (?!?). What did I do wrong?

This plots the data OK, even thoug it should be between  160k and 500k on
the y-scale:

xyplot(data1[,2]+data1[,3]~data1[,1], data = data1,
type = "l",
xlab = "x",
ylab = "y",
auto.key = list(space="top", lines = T, points = F, text=c("text1",
text2")),
par.settings = simpleTheme(lty = c(1,2)),
scales=list(
  x=list(alternating=FALSE,tick.number = 11),
  y=list(limits=c(0,50))
) 
)   

If I remove the " y=list(limits=c(0,50))" the dat is plotted as it should.



Peter Ehlers wrote:
> 
> Have a look at the 'scales' argument. For example:
> 
> # default plot
> xyplot(Sepal.Length ~ Petal.Length | Species, data = iris)
> 
> # modified plot
> xyplot(Sepal.Length ~ Petal.Length | Species, data = iris,
>  scales=list(y=list(at=c(-5,0,5,10), limits=c(-5,10
> 
>   -Peter Ehlers
> 
> Jay wrote:
>> Hi,
>> 
>> I'm terribly sorry but it seems it cannot figure this one out by
>> myself so, please, if somebody could help I would be very grateful.
>> So, when I plot with xyplot() I get an y-axis that is very ugly...
>> starting from a random number and having so many ticks that it becomes
>> unreadable.
>> 
>> How do I tell xyplot how to draw the axis? E.g., start from 100, end
>> at 200 with 25 units between ticks/labels?
>> Can somebody give me an example?
>> 
>> Thanks!
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
> 
> -- 
> Peter Ehlers
> University of Calgary
> 403.202.3921
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://n4.nabble.com/xyplot-adjusting-the-scale-min-max-tick-tp999611p1008539.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Results from clogit out of range?

2013-03-01 Thread Jay

I'm not positive of the question you are asking b/c I lost some of initial 
messages in thread but I think
>predict(model, type="expected")
gives fitted probabilities 

Apologies if I answered a question no one asked. 

On Feb 28, 2013, at 7:45 PM, lisa  wrote:

> I do appreciate this answer. I heard that in SAS, conditional logistic model 
> do predictions in the same way. However, this formula can only deal with 
> in-sample predictions. How about the out-of-sample one? Is it like one of the 
> former responses by Thomas, say, it's impossible to do the out-of-sample 
> prediction??
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] nnclust: nnfind() distance metric?

2010-05-06 Thread Jay

Hello,

pardon my ingorance, but what distance metric is used in this function
in the nnclust package?
The manual only says:

"Find the nearest neighbours of points in one data set from another
data set. Useful for Mallows-type
distance metrics."


BR,
Jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Lattice: location of key inside a xyplot()

2010-05-11 Thread Jay

Hello,

adding the line:

space = 'inside'

to my xyplot() call provides a partial solution to the problem I
have.. However, this command puts the key in the left upper corner
when I want in the left upper corner. Looking at the help provides no
additional options for the "inside", but surely this must be possible
somehow?


BR,
Jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RODBC: owerwrite into a named range in Excel

2010-05-20 Thread Jay

Hello,

Let's say that I have a data frame of n numbers I want to transfer
into a Excel spreadsheet. I have opened the conection to the file
using ODBC, and I can query the content of these n cells without
problem. However, how do I transfer my new values to these cells?
I.e., overwite them.

Should I use sqlSave() or sqlUpdate()?

Using the update I get the error: "cannot update ‘data_001’ without
unique column"

sqlUpdate(connection_name, my_new_data_frame,
"name_of_the_range_in_excel")

Let's say that the range in Excel is in E10:E20. (if it matters)


BR,
Jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Shared nearest neighbor (SNN) clustering algorithm implementation?

2011-02-18 Thread Jay

Hello,

is there an implementation available for a shared nearest neighbor
(SNN) clustering algorithm?


//Jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fuzzy Discriminant Analysis (DA)

2011-02-23 Thread Jay

Hello all,

as the combination DA and R is rather new to me I would like to know:
are there packages that implement a fuzzy version of Discriminant
Analysis?



Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] kohonen: "Argument data should be numeric"

2011-02-25 Thread Jay

Hi,

I'm trying to utilize the kohonen package to build SOM's. However,
trying this on my data I get the error:

"Argument data should be numeric"

when running the som(data.train, grid = somgrid(6, 6, "hexagonal"))
function. As you see, there is a problem with the data type of
data.train which is a list. When I try to convert it to "numeric" I
get the error:

(list) object cannot be coerced to type 'double'

What should I do? I can convert the data.train if I take only one
column of the list: data.train[[1]], but that is naturally not what I
want. How did I end up with this data format?

What I did:
data1 <- read.csv("data1.txt", sep = ";")
training <- sample(nrow(data1), 1000)
data.train <- data1[training,2:20]

I tried to use scan as the import method (read about this somewhere)
and unlist, but I'm not really sure how I should get it to numeric/
working.



Thanks,
Jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ggplot2: "ndensity" and "density" parameters

2011-03-27 Thread Jay

Hello,

if I want to compare the distributions of two datasets using ggplots,
how should I choose the density type?
More exactly, what assumptions and are behind the "ndensity" and
"density" parameters? And when should they be used?

See http://had.co.nz/ggplot2/stat_bin.html

While I understand that one is scaled and the other one is not, I do
not understand which one I should rely on. The distributions look very
different when I try both alternatives.


Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Extracting columns with specific string in their names

2011-08-22 Thread Jay

Hi,

Let's say that I have a set of column names that begin with the string
"Xyz". How do I extract these specific columns? I tried to do the
following:

dataframe1[,grep("Xyz",colnames(dataframe1))]

But it does not work. What is wrong with my expression?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting columns with specific string in their names

2011-08-22 Thread Jay

Sorry, my mistake. The thing is that the command return no results at
all. However, when I just tried a simpler version of this (I had no
capital letters or no spaces in the string), it worked fine. I cant
figure it out, I think it all boils down to the fact that I'm no
expert at regexp's...




On Aug 22, 5:53 pm, "R. Michael Weylandt" 
wrote:
> Can you say a little more about what you mean "it does not work"? I'd guess
> you have a regular expression mistake and are probably getting more columns
> than desired, but without an example, it's hard to be certain.
>
> Use dput() and head() to give a small cut-and-paste-able example.
>
> Michael
>
>
>
>
>
>
>
>
>
> On Mon, Aug 22, 2011 at 10:33 AM, Jay  wrote:
> > Hi,
>
> > Let's say that I have a set of column names that begin with the string
> > "Xyz". How do I extract these specific columns? I tried to do the
> > following:
>
> > dataframe1[,grep("Xyz",colnames(dataframe1))]
>
> > But it does not work. What is wrong with my expression?
>
> > __
> > r-h...@r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>         [[alternative HTML version deleted]]
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] rpart: plot without scientific notation

2011-08-25 Thread Jay

While I'm very pleased with the results I get with rpart and
rpart.plot, I would like to change the scientific notation of the
dependent variable in the plots into integers. Right now all my 5 or
more digit numbers are displayed using scientific notation.

I managed to find this:
http://tolstoy.newcastle.edu.au/R/e8/help/09/12/8423.html
but I do not fully understand what to change, and to what.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Decision tree with the group median as response?

2011-08-25 Thread Jay

As I am only familiar with the basics regarding decision trees I would
like to ask, with the risk of sating a silly question: is it possible
to perform recursive partitioning with the group median as the
response/objective?

For example, in stead of rpart focusing on means, could a similar tree
be created with medians?


Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] rpart: apply tree to new data to get "counts"

2011-08-29 Thread Jay

Hi,

when I have made a decision tree with rpart, is it possible to "apply"
this tree to a new set of data in order to find out the distribution
of observations? Ideally I would like to plot my original tree, with
the counts (at each node) of the new data.


Reagards,
Jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rpart: apply tree to new data to get "counts"

2011-08-29 Thread Jay

I tried that, while I find the documentation a bit short, but the only
result I get from this is a probability distribution of my data (I'm
building a tree with 2 classes). How do I plot a tree where the counts
are show in each step/node?

BR,
Jay

On Aug 29, 9:40 pm, Weidong Gu  wrote:
> ? predict.rpart
>
> Weidong Gu
>
>
>
> On Mon, Aug 29, 2011 at 12:49 PM, Jay  wrote:
> > Hi,
>
> > when I have made a decision tree with rpart, is it possible to "apply"
> > this tree to a new set of data in order to find out the distribution
> > of observations? Ideally I would like to plot my original tree, with
> > the counts (at each node) of the new data.
>
> > Reagards,
> > Jay
>
> > __
> > r-h...@r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] On-line machine learning packages?

2011-09-11 Thread Jay

What R packages are available for performing classification tasks?
That is, when the predictor has done its job on the dataset (based on
the training set and a range of variables), feedback about the true
label will be available and this information should be integrated for
the next classification round.

//Jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] On-line machine learning packages?

2011-09-11 Thread Jay

Hi,

I used the rseek search engine to look for suitable solutions, however
as I was unable to find anything useful, I'm asking for help.
Anybody have experience with these kinds of problems? I looked into
dynaTree, but as information is a bit scares and as I understand it,
it might not be what I'm looking for..(?)

BR,
Jay

On Sep 11, 7:15 pm, David Winsemius  wrote:
> On Sep 11, 2011, at 11:42 AM, Jay wrote:
>
> > What R packages are available for performing classification tasks?
> > That is, when the predictor has done its job on the dataset (based on
> > the training set and a range of variables), feedback about the true
> > label will be available and this information should be integrated for
> > the next classification round.
>
> You should look at CRAN Task Views. Extremely easy to find from the  
> main R-project page.
>
> --
> David Winsemius, MD
> West Hartford, CT
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] On-line machine learning packages?

2011-09-11 Thread Jay

If the answer is so obvious, could somebody please spell it out?


On Sep 11, 10:59 pm, Jason Edgecombe  wrote:
> Try this:
>
> http://cran.r-project.org/web/views/MachineLearning.html
>
> On 09/11/2011 12:43 PM, Jay wrote:
>
>
>
> > Hi,
>
> > I used the rseek search engine to look for suitable solutions, however
> > as I was unable to find anything useful, I'm asking for help.
> > Anybody have experience with these kinds of problems? I looked into
> > dynaTree, but as information is a bit scares and as I understand it,
> > it might not be what I'm looking for..(?)
>
> > BR,
> > Jay
>
> > On Sep 11, 7:15 pm, David Winsemius  wrote:
> >> On Sep 11, 2011, at 11:42 AM, Jay wrote:
>
> >>> What R packages are available for performing classification tasks?
> >>> That is, when the predictor has done its job on the dataset (based on
> >>> the training set and a range of variables), feedback about the true
> >>> label will be available and this information should be integrated for
> >>> the next classification round.
> >> You should look at CRAN Task Views. Extremely easy to find from the  
> >> main R-project page.
>
> >> --
> >> David Winsemius, MD
> >> West Hartford, CT
>
> >> __
> >> r-h...@r-project.org mailing 
> >> listhttps://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> > __
> > r-h...@r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] On-line machine learning packages?

2011-09-12 Thread Jay

In my mind this sequential classification task with feedback is
somewhat different from an completely offline, once-off,
classification. Am I wrong?
However, it looks like the mentality on this topic is to refer me to
cran/google in order to look for solutions myself. Oblivious I know
about these sources, and as I said, I used rseek.org among other
sources to look for solutions. I did not start this topic for fun, I'm
asking for help to find a suitable machine learning packages that
readily incorporates feedback loops and online learning. If somebody
has experience these kinds of problems in R, please respond.


Or will
"http://cran.r-project.org
Look for 'Task Views'"
be my next piece of advice?

On Sep 12, 11:31 am, Dennis Murphy  wrote:
> http://cran.r-project.org/web/views/
>
> Look for 'machine learning'.
>
> Dennis
>
>
>
> On Sun, Sep 11, 2011 at 11:33 PM, Jay  wrote:
> > If the answer is so obvious, could somebody please spell it out?
>
> > On Sep 11, 10:59 pm, Jason Edgecombe  wrote:
> >> Try this:
>
> >>http://cran.r-project.org/web/views/MachineLearning.html
>
> >> On 09/11/2011 12:43 PM, Jay wrote:
>
> >> > Hi,
>
> >> > I used the rseek search engine to look for suitable solutions, however
> >> > as I was unable to find anything useful, I'm asking for help.
> >> > Anybody have experience with these kinds of problems? I looked into
> >> > dynaTree, but as information is a bit scares and as I understand it,
> >> > it might not be what I'm looking for..(?)
>
> >> > BR,
> >> > Jay
>
> >> > On Sep 11, 7:15 pm, David Winsemius  wrote:
> >> >> On Sep 11, 2011, at 11:42 AM, Jay wrote:
>
> >> >>> What R packages are available for performing classification tasks?
> >> >>> That is, when the predictor has done its job on the dataset (based on
> >> >>> the training set and a range of variables), feedback about the true
> >> >>> label will be available and this information should be integrated for
> >> >>> the next classification round.
> >> >> You should look at CRAN Task Views. Extremely easy to find from the
> >> >> main R-project page.
>
> >> >> --
> >> >> David Winsemius, MD
> >> >> West Hartford, CT
>
> >> >> __
> >> >> r-h...@r-project.org mailing 
> >> >> listhttps://stat.ethz.ch/mailman/listinfo/r-help
> >> >> PLEASE do read the posting 
> >> >> guidehttp://www.R-project.org/posting-guide.html
> >> >> and provide commented, minimal, self-contained, reproducible code.
> >> > __
> >> > r-h...@r-project.org mailing list
> >> >https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting 
> >> > guidehttp://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
>
> >> __
> >> r-h...@r-project.org mailing 
> >> listhttps://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> > __
> > r-h...@r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] On-line machine learning packages?

2011-09-13 Thread Jay

How does sequential classification differ form running a one-off
classifier for each run?
-> Because feedback from the previous round can and needs to be
incorporated into the ext round.


http://lmgtfy.com/?q=R+machine+learning
-> That is a new low. I was hoping to get help, oblivious I was wrong
to use this forum in the hopes of somebody had already battled these
kinds of problems in R.


On Sep 13, 1:52 am, Jason Edgecombe  wrote:
> I already provided the link to the task view, which provides a list of
> the more popular machine learning algorithms for R.
>
> Do you have a particular algorithm or technique in mind? Does it have a
> name?
>
> How does sequential classification differ form running a one-off
> classifier for each run?
>
> On 09/12/2011 05:24 AM, Jay wrote:
>
>
>
> > In my mind this sequential classification task with feedback is
> > somewhat different from an completely offline, once-off,
> > classification. Am I wrong?
> > However, it looks like the mentality on this topic is to refer me to
> > cran/google in order to look for solutions myself. Oblivious I know
> > about these sources, and as I said, I used rseek.org among other
> > sources to look for solutions. I did not start this topic for fun, I'm
> > asking for help to find a suitable machine learning packages that
> > readily incorporates feedback loops and online learning. If somebody
> > has experience these kinds of problems in R, please respond.
>
> > Or will
> > "http://cran.r-project.org
> > Look for 'Task Views'"
> > be my next piece of advice?
>
> > On Sep 12, 11:31 am, Dennis Murphy  wrote:
> >>http://cran.r-project.org/web/views/
>
> >> Look for 'machine learning'.
>
> >> Dennis
>
> >> On Sun, Sep 11, 2011 at 11:33 PM, Jay  wrote:
> >>> If the answer is so obvious, could somebody please spell it out?
> >>> On Sep 11, 10:59 pm, Jason Edgecombe  wrote:
> >>>> Try this:
> >>>>http://cran.r-project.org/web/views/MachineLearning.html
> >>>> On 09/11/2011 12:43 PM, Jay wrote:
> >>>>> Hi,
> >>>>> I used the rseek search engine to look for suitable solutions, however
> >>>>> as I was unable to find anything useful, I'm asking for help.
> >>>>> Anybody have experience with these kinds of problems? I looked into
> >>>>> dynaTree, but as information is a bit scares and as I understand it,
> >>>>> it might not be what I'm looking for..(?)
> >>>>> BR,
> >>>>> Jay
> >>>>> On Sep 11, 7:15 pm, David Winsemius    wrote:
> >>>>>> On Sep 11, 2011, at 11:42 AM, Jay wrote:
> >>>>>>> What R packages are available for performing classification tasks?
> >>>>>>> That is, when the predictor has done its job on the dataset (based on
> >>>>>>> the training set and a range of variables), feedback about the true
> >>>>>>> label will be available and this information should be integrated for
> >>>>>>> the next classification round.
> >>>>>> You should look at CRAN Task Views. Extremely easy to find from the
> >>>>>> main R-project page.
> >>>>>> --
> >>>>>> David Winsemius, MD
> >>>>>> West Hartford, CT
> >>>>>> __
> >>>>>> r-h...@r-project.org mailing 
> >>>>>> listhttps://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>> PLEASE do read the posting 
> >>>>>> guidehttp://www.R-project.org/posting-guide.html
> >>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>> __
> >>>>> r-h...@r-project.org mailing list
> >>>>>https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting 
> >>>>> guidehttp://www.R-project.org/posting-guide.html
> >>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>> __
> >>>> r-h...@r-project.org mailing 
> >>>> listhttps://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting 
> >>>> guidehttp://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible cod

[R] Factor analysis on ordinal & nominal data

2011-06-14 Thread Jay

Hi,

are there readily available R packages that are able to perform FA on
ordinal and/or nominal data?
If not, what other approaches and helpful packages would you suggest?



BR,
Jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Count occurances in integers (or strings)

2011-06-15 Thread Jay

Hi,

I have a dataframe column from which I want to calculate the number of
1's in each entry. Some column values could, for example, be
"0001001000" and "111".

To get the number of occurrences from a string I use this:
sum(unlist(strsplit(mydata[,"my_column"], "")) == "1")

However, as my data is not in string form.. How do I convert it? l
tried:
lapply(mydata[,"my_column"],toString)

but I do not seem to get it right (or at least I do not understand the
output format).

Also, are there other options? Can I easily calculate the occurrences
directly from the integers?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] split data, but ensure each level of the factor is represented

2008-10-13 Thread Jay

Hello,

I'll use part of the iris dataset for an example of what I want to
do.

> data(iris)
> iris<-iris[1:10,1:4]
> iris
   Sepal.Length Sepal.Width Petal.Length Petal.Width
1   5.1 3.5  1.4 0.2
2   4.9 3.0  1.4 0.2
3   4.7 3.2  1.3 0.2
4   4.6 3.1  1.5 0.2
5   5.0 3.6  1.4 0.2
6   5.4 3.9  1.7 0.4
7   4.6 3.4  1.4 0.3
8   5.0 3.4  1.5 0.2
9   4.4 2.9  1.4 0.2
10  4.9 3.1  1.5 0.1

Now if I want to split this data using the vector
> a<-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3)
> a
 [1] 3 3 3 2 3 1 2 3 2 3

Then the function split works fine
> split(iris,a)
$`1`
  Sepal.Length Sepal.Width Petal.Length Petal.Width
6  5.4 3.9  1.7 0.4

$`2`
  Sepal.Length Sepal.Width Petal.Length Petal.Width
4  4.6 3.1  1.5 0.2
7  4.6 3.4  1.4 0.3
9  4.4 2.9  1.4 0.2

$`3`
   Sepal.Length Sepal.Width Petal.Length Petal.Width
1   5.1 3.5  1.4 0.2
2   4.9 3.0  1.4 0.2
3   4.7 3.2  1.3 0.2
5   5.0 3.6  1.4 0.2
8   5.0 3.4  1.5 0.2
10  4.9 3.1  1.5 0.1


My problem is when the vector lacks one of the values from 1:n. For
example if the vector is
> a<-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3)
> a
 [1] 3 3 3 2 3 2 2 3 2 3

then split will return a list without a $`1`. I would like to have the
$`1` be a vector of 0's with the same length as the number of columns
in the dataset. In other words I want to write a function that returns

> mysplit(iris,a)
$`1`
[1] 0 0 0 0 0

$`2`
  Sepal.Length Sepal.Width Petal.Length Petal.Width
4  4.6 3.1  1.5 0.2
6  5.4 3.9  1.7 0.4
7  4.6 3.4  1.4 0.3
9  4.4 2.9  1.4 0.2

$`3`
   Sepal.Length Sepal.Width Petal.Length Petal.Width
1   5.1 3.5  1.4 0.2
2   4.9 3.0  1.4 0.2
3   4.7 3.2  1.3 0.2
5   5.0 3.6  1.4 0.2
8   5.0 3.4  1.5 0.2
10  4.9 3.1      1.5 0.1

Thank you for your time,

Jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] split data, but ensure each level of the factor is represented

2008-10-13 Thread Jay

Thanks so much.

On Oct 13, 1:14 pm, "Henrique Dallazuanna" <[EMAIL PROTECTED]> wrote:
> Try this:
>
> a<-factor(c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3), levels = 1:3)
> split(iris, a)
>
> lapply(split(iris, a), dim)
>
>
>
> On Mon, Oct 13, 2008 at 2:06 PM, Jay <[EMAIL PROTECTED]> wrote:
> > Hello,
>
> > I'll use part of the iris dataset for an example of what I want to
> > do.
>
> > > data(iris)
> > > iris<-iris[1:10,1:4]
> > > iris
> >   Sepal.Length Sepal.Width Petal.Length Petal.Width
> > 1   5.1 3.5  1.4 0.2
> > 2   4.9 3.0  1.4 0.2
> > 3   4.7 3.2  1.3 0.2
> > 4   4.6 3.1  1.5 0.2
> > 5   5.0 3.6  1.4 0.2
> > 6   5.4 3.9  1.7 0.4
> > 7   4.6 3.4  1.4 0.3
> > 8   5.0 3.4  1.5 0.2
> > 9   4.4 2.9  1.4 0.2
> > 10  4.9 3.1  1.5 0.1
>
> > Now if I want to split this data using the vector
> > > a<-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3)
> > > a
> >  [1] 3 3 3 2 3 1 2 3 2 3
>
> > Then the function split works fine
> > > split(iris,a)
> > $`1`
> >  Sepal.Length Sepal.Width Petal.Length Petal.Width
> > 6  5.4 3.9  1.7 0.4
>
> > $`2`
> >  Sepal.Length Sepal.Width Petal.Length Petal.Width
> > 4  4.6 3.1  1.5 0.2
> > 7  4.6 3.4  1.4 0.3
> > 9  4.4 2.9  1.4 0.2
>
> > $`3`
> >   Sepal.Length Sepal.Width Petal.Length Petal.Width
> > 1   5.1 3.5  1.4 0.2
> > 2   4.9 3.0  1.4 0.2
> > 3   4.7 3.2  1.3 0.2
> > 5   5.0 3.6  1.4 0.2
> > 8   5.0 3.4  1.5 0.2
> > 10  4.9 3.1  1.5 0.1
>
> > My problem is when the vector lacks one of the values from 1:n. For
> > example if the vector is
> > > a<-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3)
> > > a
> >  [1] 3 3 3 2 3 2 2 3 2 3
>
> > then split will return a list without a $`1`. I would like to have the
> > $`1` be a vector of 0's with the same length as the number of columns
> > in the dataset. In other words I want to write a function that returns
>
> > > mysplit(iris,a)
> > $`1`
> > [1] 0 0 0 0 0
>
> > $`2`
> >  Sepal.Length Sepal.Width Petal.Length Petal.Width
> > 4  4.6 3.1  1.5 0.2
> > 6  5.4 3.9  1.7 0.4
> > 7  4.6 3.4  1.4 0.3
> > 9  4.4 2.9      1.4 0.2
>
> > $`3`
> >   Sepal.Length Sepal.Width Petal.Length Petal.Width
> > 1   5.1 3.5  1.4 0.2
> > 2   4.9 3.0  1.4 0.2
> > 3   4.7 3.2  1.3 0.2
> > 5   5.0 3.6  1.4 0.2
> > 8   5.0 3.4  1.5 0.2
> > 10  4.9 3.1  1.5 0.1
>
> > Thank you for your time,
>
> > Jay
>
> > __
> > [EMAIL PROTECTED] mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>
> [[alternative HTML version deleted]]
>
> __
> [EMAIL PROTECTED] mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Comparing pooled proportions(complication and reoperation rates) of different treatment modalities

2017-06-23 Thread Jay Zola

Dear sir/madame,


I am currently writing a meta-analysis on the complication and reoperation 
rates of 5 different treatment modalities after a distal radius fracture. I was 
able to pool the rates of the 5 different rates using R. Now I have to compare 
the pooled rates of the 4 treatment modalities with the golden standard 
separately. I though the chi squared test would be the best method. How do I do 
that using r. The R code I have used for the former calculation are added as a 
Word-file attachment. Your help would be highly appreciated.


Yours sincerely,


Student
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Comparing pooled proportions(complication and reoperation rates) of different treatment modalities

2017-06-23 Thread Jay Zola

Dear sir/madame,

Currently I am writing a meta analysis on complications and reoperations of 5 
treatment modalities after an extra-articular distal radius fracture. The 
treatment modalities are EF, IMN, KW, VPO and PC as the golden standard. We 
have included 22 studies, 10 RCTs and 12 prospective studies. All examining 
different treatment methods. We retrieved the data out of these studies and 
pooled the complication and reoperation rates(n/N). Now we want to compare the 
pooled proportion of each treatment modality to the golden standard of 
PC(plaster casting). So I want to do 4 separate comparisons using the chi 
squared method. I looked it up online and at the meta package guide(meta is the 
package I used), but wasn't able to find useful information. I first posted my 
question onto the stats.stackexchange website but was redirected to the mailing 
list of r-help. I have added a picture of the most important parts of the code 
(not the egger's regression, funnel, trim and fill and outcome.pdf parts of 
 it because it didn't fit). I have added the data in excel and spss file to my 
dropbox, and added the complete Rcode to a Word file in my dropbox as well. The 
links below will refer you to them as preferred by the posting guide. Hopefully 
someone can help me.

Thank you very much.

Excel datafile: 
https://www.dropbox.com/s/19402gt0x1agt9f/Excel%20file%20Distal%20Radius%20Fracture%20basic.xlsx?dl=0

Excel file Distal Radius Fracture 
basic.xlsx<https://www.dropbox.com/s/19402gt0x1agt9f/Excel%20file%20Distal%20Radius%20Fracture%20basic.xlsx?dl=0>
www.dropbox.com
Shared with Dropbox

SPSS datafile: 
https://www.dropbox.com/s/h81pphxkfk74hzo/Meta-Analyse%20Complications%20and%20Reoperations.sav?dl=0

[https://cfl.dropboxstatic.com/static/images/icons128/page_white.png]<https://www.dropbox.com/s/h81pphxkfk74hzo/Meta-Analyse%20Complications%20and%20Reoperations.sav?dl=0>

Meta-Analyse Complications and 
Reoperations.sav<https://www.dropbox.com/s/h81pphxkfk74hzo/Meta-Analyse%20Complications%20and%20Reoperations.sav?dl=0>
www.dropbox.com
Shared with Dropbox

Rcode file Word: 
https://www.dropbox.com/s/67pnfpi10qu110v/R%20code%20voor%20forrest%20en%20funnel%20plots.rtf?dl=0

[https://cfl.dropboxstatic.com/static/images/icons128/page_white_word.png]<https://www.dropbox.com/s/67pnfpi10qu110v/R%20code%20voor%20forrest%20en%20funnel%20plots.rtf?dl=0>

R code voor forrest en funnel 
plots.rtf<https://www.dropbox.com/s/67pnfpi10qu110v/R%20code%20voor%20forrest%20en%20funnel%20plots.rtf?dl=0>
www.dropbox.com
Shared with Dropbox

https://stats.stackexchange.com/questions/286920/comparing-pooled-propotions-using-r-for-a-meta-analysis

[https://cdn.sstatic.net/Sites/stats/img/apple-touch-i...@2.png?v=344f57aa10cc]<https://stats.stackexchange.com/questions/286920/comparing-pooled-propotions-using-r-for-a-meta-analysis>

Comparing pooled propotions using R for a 
meta-analysis<https://stats.stackexchange.com/questions/286920/comparing-pooled-propotions-using-r-for-a-meta-analysis>
stats.stackexchange.com
For a meta-analysis I have pooled single proportions(complication rates) of 
several treatment methods. Now I would like to compare them(the 4 treatment 
modalities separately with the golden standar...

Van: David Winsemius 
Verzonden: vrijdag 23 juni 2017 20:18
Aan: Jay Zola
CC: r-help@r-project.org
Onderwerp: Re: [R] Comparing pooled proportions(complication and reoperation 
rates) of different treatment modalities

> On Jun 23, 2017, at 5:53 AM, Jay Zola  wrote:
>
> Dear sir/madame,
>
>
> I am currently writing a meta-analysis on the complication and reoperation 
> rates of 5 different treatment modalities after a distal radius fracture. I 
> was able to pool the rates of the 5 different rates using R. Now I have to 
> compare the pooled rates of the 4 treatment modalities with the golden 
> standard separately. I though the chi squared test would be the best method. 
> How do I do that using r. The R code I have used for the former calculation 
> are added as a Word-file attachment.

Not an acceptable format to the listserv program. Policy is set by the host 
institution. Use plain text.

> Your help would be highly appreciated.
>
>
> Yours sincerely,
>
>
> Student
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help

thz.ch/mailman/listinfo/r-help>
stat.ethz.ch
The main R mailing list, for announcements about the development of R and the 
availability of new code, questions and answers about problems and solutions 
using R ...

> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

Re: [R] Comparing pooled proportions(complication and reoperation rates) of different treatment modalities

2017-06-26 Thread Jay Zola

What is the best way to change my R code to be able to compare the pooled 
proportions(complication and reoperation rates) with the Chi square method?

Verstuurd vanaf mijn iPhone

> Op 24 jun. 2017 om 14:18 heeft Michael Dewey  het 
> volgende geschreven:
> 
> Note though that this has been put on hold on stats.stackexchange.com as 
> off-topic.
> 
>> On 23/06/2017 19:33, Bert Gunter wrote:
>> Probably the wrong list. R-help is concerned with R programming, not
>> statistics methodology questions, although the intersection can be
>> nonempty.
>> 
>> I suggest you post on stats.stackexchange.com instead, which *is*
>> concerned with statistics methodology questions.
>> 
>> 
>> Cheers,
>> Bert
>> 
>> 
>> Bert Gunter
>> 
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> 
>> 
>>> On Fri, Jun 23, 2017 at 5:53 AM, Jay Zola  wrote:
>>> Dear sir/madame,
>>> 
>>> 
>>> I am currently writing a meta-analysis on the complication and reoperation 
>>> rates of 5 different treatment modalities after a distal radius fracture. I 
>>> was able to pool the rates of the 5 different rates using R. Now I have to 
>>> compare the pooled rates of the 4 treatment modalities with the golden 
>>> standard separately. I though the chi squared test would be the best 
>>> method. How do I do that using r. The R code I have used for the former 
>>> calculation are added as a Word-file attachment. Your help would be highly 
>>> appreciated.
>>> 
>>> 
>>> Yours sincerely,
>>> 
>>> 
>>> Student
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> ---
>> This email has been checked for viruses by AVG.
>> http://www.avg.com
>> 
>> 
> 
> -- 
> Michael
> http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Comparing pooled proportions(complication and reoperation rates) of different treatment modalities

2017-06-26 Thread Jay Zola

Op 26 jun. 2017 om 15:22 heeft Jay Zola
mailto:jayjay.1...@hotmail.nl>> het volgende geschreven:

What is the best way to change my R code to be able to compare the pooled
proportions(complication and reoperation rates) with the Chi square method?

Just adding an adjustment to the links because they were not working correctly.

Dataset on my dropbox:
https://www.dropbox.com/s/j1urqzr99bt76ip/Basics%20excel%20file%20complication%20and%20reoperation%20rate.xlsx?dl=0

R code on my dropbox:
https://wwwdropbox.com/s/67pnfpi10qu110v/R%20code%20voor%20forrest%20en%20funnel%20plots.rtf?dl=0

Verstuurd vanaf mijn iPhone

Op 24 jun. 2017 om 14:18 heeft Michael Dewey
mailto:li...@dewey.myzen.co.uk>> het volgende
geschreven:

Note though that this has been put on hold on
stats.stackexchange.com<http://stats.stackexchange.com> as off-topic.

On 23/06/2017 19:33, Bert Gunter wrote:
Probably the wrong list. R-help is concerned with R programming, not
statistics methodology questions, although the intersection can be
nonempty.

I suggest you post on stats.stackexchange.com<http://stats.stackexchange.com>
instead, which *is*
concerned with statistics methodology questions.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Jun 23, 2017 at 5:53 AM, Jay Zola
mailto:jayjay.1...@hotmail.nl>> wrote:
Dear sir/madame,

I am currently writing a meta-analysis on the complication and reoperation
rates of 5 different treatment modalities after a distal radius fracture. I was
able to pool the rates of the 5 different rates using R. Now I have to compare
the pooled rates of the 4 treatment modalities with the golden standard
separately. I though the chi squared test would be the best method. How do I do
that using r. The R code I have used for the former calculation are added as a
Word-file attachment. Your help would be highly appreciated.

Yours sincerely,

Student
__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

---
This email has been checked for viruses by AVG.
http://www.avg.com

--
Michael
http://www.dewey.myzen.co.uk/home.html

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Model studies in one analysis using treatment as a five level moderator in a meta-regression

2017-06-26 Thread Jay Zola

Hello,


I am medical student, writing a meta-analysis on complication and reoperation 
rates after the five most common treatments of distal radius fractures. I have 
been busy with the statistics for months by my self, but find it quite hard 
since our classes were very basic. Now I want to compare the treatment 
modalities to see if there are significant differences. Using R I was able to 
synthesize the complication rates and reoperation rates for each treatment 
method. But I never had any R course and managed by trial and error, so the 
code probably doesn't look that great. Someone told me I could best model the 
data in one analysis using treatment as a five level moderator in a 
meta-regression. Can some help me with the R code to do this? Your help would 
be very much appreciated.


Thank you,


Jay


Study| Event Type| Treatment| Number of Events (n)| N| n/N|

Kumaravel| Complications| EF| 3| 23| 0,1304348|

Franck| Complications| EF| 2| 20| 0,1|

Schonnemann| Complications| EF| 8| 30| 0,267|

Aita| Complications| EF| 1| 16| 0,0625|

Hove| Complications| EF| 31| 39| 0,7948718|

Andersen| Complications| EF| 26| 75| 0,347|

Krughaug| Complications| EF| 22| 75| 0,293|

Moroni| Complications| EF| 0| 20| 0|

Plate| Complications| IMN| 3| 30| 0,1|

Chappuis| Complications| IMN| 4| 16| 0,25|

Gradl| Complications| IMN| 12| 66| 0,1818182|

Schonnemann| Complications| IMN| 6| 31| 0,1935484|

Aita| Complications| IMN| 1| 16| 0,0625|

Dremstrop| Complications| IMN| 17| 44| 0,3863636|

Wong| Complications| PC| 1| 30| 0,033|

Kumaravel| Complications| PC| 4| 25| 0,16|


Dataset on my dropbox: 
https://www.dropbox.com/s/j1urqzr99bt76ip/Basics%20excel%20file%20complication%20and%20reoperation%20rate.xlsx?dl=0

Basics excel file complication and reoperation 
rate.xlsx<https://www.dropbox.com/s/j1urqzr99bt76ip/Basics%20excel%20file%20complication%20and%20reoperation%20rate.xlsx?dl=0>
www.dropbox.com
Shared with Dropbox




library(meta)
library(stargazer)
library(foreign)

All <-read.spss("C:\\Users\\313635aa.STUDENT\\Desktop\\Meta-Analyse 
Complications and Reoperations.sav",to.data.frame = T, use.value.labels = T)
All <- na.omit(All)

Complications <- All[which(All[,"Event_Type"] == "Complications"),]
Re_operation <- All[which(All[,"Event_Type"] == "Reoperations"),]

EF <- All[which(All[,"Treatment"] == "EF"),]
IMN <- All[which(All[,"Treatment"] == "IMN"),]
pc <- All[which(All[,"Treatment"] == "PC"),]
KW <- All[which(All[,"Treatment"] == "KW"),]
VPO <- All[which(All[,"Treatment"] == "VPO"),]

EF_C <- EF[which(EF[,"Event_Type"] == "Complications"),]
EF_R <- EF[which(EF[,"Event_Type"] == "Reoperations"),]

IMN_C <- IMN[which(IMN[,"Event_Type"] == "Complications"),]
IMN_R <- IMN[which(IMN[,"Event_Type"] == "Reoperations"),]

pc_C <- pc[which(pc[,"Event_Type"] == "Complications"),]
pc_R <- pc[which(pc[,"Event_Type"] == "Reoperations"),]

KW_C <- KW[which(KW[,"Event_Type"] == "Complications"),]
KW_R <- KW[which(KW[,"Event_Type"] == "Reoperations"),]

VPO_C <- VPO[which(VPO[,"Event_Type"] == "Complications"),]
VPO_R <- VPO[which(VPO[,"Event_Type"] == "Reoperations"),]

Output <- function(x, y, k.min=10){
file <- metaprop(Events_n, N, Study_ID, data = x)

forest.meta(file, studlab = T, pooled.totals = T, bysort = F)

dev.copy2pdf(file=y, width = 11.69, height = 8.27)
print(file)
}

R code on my dropbox: 
https://www.dropbox.com/s/67pnfpi10qu110v/R%20code%20voor%20forrest%20en%20funnel%20plots.rtf?dl=0

[https://cfl.dropboxstatic.com/static/images/icons128/page_white_word.png]<https://www.dropbox.com/s/67pnfpi10qu110v/R%20code%20voor%20forrest%20en%20funnel%20plots.rtf?dl=0>

R code voor forrest en funnel 
plots.rtf<https://www.dropbox.com/s/67pnfpi10qu110v/R%20code%20voor%20forrest%20en%20funnel%20plots.rtf?dl=0>
www.dropbox.com
Shared with Dropbox






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Model studies in one analysis using treatment as a five level moderator in a meta-regression

2017-06-26 Thread Jay Zola

Dear Vito,

Thank you for your reply. I tried to contact the statistics departement 
numerous times, but did not receive any reply. That is why I started to look on 
the internet for help.

Yours sincerely,

Jay

Verstuurd vanaf mijn iPhone

> Op 26 jun. 2017 om 22:05 heeft Vito Michele Rosario Muggeo 
>  het volgende geschreven:
> 
> hi Jay,
> Consult a local statistician. Statistics is not you think is (namely simple 
> computations, R and probably plotting..).
> 
> regards,
> vito
> 
> 
> 
> Jay Zola  ha scritto:
> 
>> Hello,
>> 
>> 
>> I am medical student, writing a meta-analysis on complication and 
>> reoperation rates after the five most common treatments of distal radius 
>> fractures. I have been busy with the statistics for months by my self, but 
>> find it quite hard since our classes were very basic. Now I want to compare 
>> the treatment modalities to see if there are significant differences. Using 
>> R I was able to synthesize the complication rates and reoperation rates for 
>> each treatment method. But I never had any R course and managed by trial and 
>> error, so the code probably doesn't look that great. Someone told me I could 
>> best model the data in one analysis using treatment as a five level 
>> moderator in a meta-regression. Can some help me with the R code to do this? 
>> Your help would be very much appreciated.
>> 
>> 
>> Thank you,
>> 
>> 
>> Jay
>> 
>> 
>> Study| Event Type| Treatment| Number of Events (n)| N| n/N|
>> 
>> Kumaravel| Complications| EF| 3| 23| 0,1304348|
>> 
>> Franck| Complications| EF| 2| 20| 0,1|
>> 
>> Schonnemann| Complications| EF| 8| 30| 0,267|
>> 
>> Aita| Complications| EF| 1| 16| 0,0625|
>> 
>> Hove| Complications| EF| 31| 39| 0,7948718|
>> 
>> Andersen| Complications| EF| 26| 75| 0,347|
>> 
>> Krughaug| Complications| EF| 22| 75| 0,293|
>> 
>> Moroni| Complications| EF| 0| 20| 0|
>> 
>> Plate| Complications| IMN| 3| 30| 0,1|
>> 
>> Chappuis| Complications| IMN| 4| 16| 0,25|
>> 
>> Gradl| Complications| IMN| 12| 66| 0,1818182|
>> 
>> Schonnemann| Complications| IMN| 6| 31| 0,1935484|
>> 
>> Aita| Complications| IMN| 1| 16| 0,0625|
>> 
>> Dremstrop| Complications| IMN| 17| 44| 0,3863636|
>> 
>> Wong| Complications| PC| 1| 30| 0,033|
>> 
>> Kumaravel| Complications| PC| 4| 25| 0,16|
>> 
>> 
>> Dataset on my dropbox: 
>> https://urlsand.esvalabs.com/?u=https%3A%2F%2Fwww.dropbox.com%2Fs%2Fj1urqzr99bt76ip%2FBasics%2520excel%2520file%2520complication%2520and%2520reoperation%2520rate.xlsx%3Fdl%3D0&e=541e9c83&h=065e9ef9&f=y
>> 
>> Basics excel file complication and reoperation 
>> rate.xlsx<https://urlsand.esvalabs.com/?u=https%3A%2F%2Fwww.dropbox.com%2Fs%2Fj1urqzr99bt76ip%2FBasics%2520excel%2520file%2520complication%2520and%2520reoperation%2520rate.xlsx%3Fdl%3D0&e=541e9c83&h=065e9ef9&f=y>
>> https://urlsand.esvalabs.com/?u=http%3A%2F%2Fwww.dropbox.com&e=541e9c83&h=4bc36151&f=y
>> Shared with Dropbox
>> 
>> 
>> 
>> 
>> library(meta)
>> library(stargazer)
>> library(foreign)
>> 
>> All <-read.spss("C:\\Users\\313635aa.STUDENT\\Desktop\\Meta-Analyse 
>> Complications and Reoperations.sav",to.data.frame = T, use.value.labels = T)
>> All <- na.omit(All)
>> 
>> Complications <- All[which(All[,"Event_Type"] == "Complications"),]
>> Re_operation <- All[which(All[,"Event_Type"] == "Reoperations"),]
>> 
>> EF <- All[which(All[,"Treatment"] == "EF"),]
>> IMN <- All[which(All[,"Treatment"] == "IMN"),]
>> pc <- All[which(All[,"Treatment"] == "PC"),]
>> KW <- All[which(All[,"Treatment"] == "KW"),]
>> VPO <- All[which(All[,"Treatment"] == "VPO"),]
>> 
>> EF_C <- EF[which(EF[,"Event_Type"] == "Complications"),]
>> EF_R <- EF[which(EF[,"Event_Type"] == "Reoperations"),]
>> 
>> IMN_C <- IMN[which(IMN[,"Event_Type"] == "Complications"),]
>> IMN_R <- IMN[which(IMN[,"Event_Type"] == "Reoperations"),]
>> 
>> pc_C <- pc[which(pc[,"Event_Type"] == "Complications"),]
>> pc_R <- pc[which(pc[,"Event_Type"] == "Reoperations"),]
>> 
>> KW_C <- KW[which(KW[,"Event_Type"] == "Complications

[R] Change Rcode for a meta-analysis(netmeta) to use a random effects model instead of a mixed effects model

2017-06-29 Thread Jay Zola

Hello,


I am writing a meta-analysis on the complication and reoperation rates after 5 
treatment modalities of a distal radius fracture. I have a code to compare the 
complication and reoperation rates. Currently it is using a mixed effects 
model. Is it possible to change the code so a random effects model is used?


Thank you very much,


Jay



R code


library(meta) library(readxl) All <- read_excel("Basics excel file complication 
and reoperation rate.xlsx", sheet=1) names(All) <- 
c("Study_ID","Event_Type","Treatment","Events_n","N","nN") All$Treatment <- 
factor(All$Treatment, levels=c("PC","EF","IMN","KW","VPO")) # Outcomes 
Complications <- subset(All, Event_Type=="Complications") Reoperations <- 
subset(All, Event_Type=="Reoperations") # Comparison of treatment effects to 
gold standard in the Complications subset mtpr1 <- metaprop(Events_n, N, 
Study_ID, data = Complications) meta::metareg(mtpr1, ~Treatment) # Comparison 
of treatment effects to gold standard in the Reoperations subset mtpr2 <- 
metaprop(Events_n, N, Study_ID, data = Reoperations) meta::metareg(mtpr2, 
~Treatment) # Comparison of treatment effects to gold standard in the All 
dataset # Interaction effects have been considered mtpr <- metaprop(Events_n, 
N, Study_ID, data = All) meta::metareg(mtpr, ~Treatment*Event_Type)


A part of the dataset:

Study| Event Type| Treatment| Number of Events (n)| N| n/N|
Kumaravel| Complications| EF| 3| 23| 0,1304348|
Franck| Complications| EF| 2| 20| 0,1|
Schonnemann| Complications| EF| 8| 30| 0,267|
Aita| Complications| EF| 1| 16| 0,0625|
Hove| Complications| EF| 31| 39| 0,7948718|
Andersen| Complications| EF| 26| 75| 0,347|
Krughaug| Complications| EF| 22| 75| 0,293|
Moroni| Complications| EF| 0| 20| 0|
Plate| Complications| IMN| 3| 30| 0,1|
Chappuis| Complications| IMN| 4| 16| 0,25|
Gradl| Complications| IMN| 12| 66| 0,1818182|
Schonnemann| Complications| IMN| 6| 31| 0,1935484|
Aita| Complications| IMN| 1| 16| 0,0625|
Dremstrop| Complications| IMN| 17| 44| 0,3863636|
Wong| Complications| PC| 1| 30| 0,033|
Kumaravel| Complications| PC| 4| 25| 0,16|


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Change Rcode for a meta-analysis(netmeta) to use a random effects model instead of a mixed effects model

2017-06-29 Thread Jay Zola

Link Dropbox R code: 
https://www.dropbox.com/s/9u6e89t6dq39r53/Rcode%20metaregression.docx?dl=0

Rcode 
metaregression.docx<https://www.dropbox.com/s/9u6e89t6dq39r53/Rcode%20metaregression.docx?dl=0>
www.dropbox.com
Shared with Dropbox




Link Dropbox part of dataset: 
https://www.dropbox.com/s/j1urqzr99bt76ip/Basics%20excel%20file%20complication%20and%20reoperation%20rate.xlsx?dl=0




Van: Viechtbauer Wolfgang (SP) 
Verzonden: donderdag 29 juni 2017 19:47
Aan: Jay Zola; r-help@r-project.org
Onderwerp: RE: Change Rcode for a meta-analysis(netmeta) to use a random 
effects model instead of a mixed effects model

The code in your mail in a mangled mess, since you posted in HTML. Please 
configure your email client to send emails in plain text.

Could you explain what exactly you mean by "Currently it is using a mixed 
effects model. Is it possible to change the code so a random effects model is 
used?"

Best,
Wolfgang

>-Original Message-
>From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jay Zola
>Sent: Thursday, June 29, 2017 19:38
>To: r-help@r-project.org
>Subject: [R] Change Rcode for a meta-analysis(netmeta) to use a random
>effects model instead of a mixed effects model
>
>Hello,
>
>I am writing a meta-analysis on the complication and reoperation rates
>after 5 treatment modalities of a distal radius fracture. I have a code to
>compare the complication and reoperation rates. Currently it is using a
>mixed effects model. Is it possible to change the code so a random effects
>model is used?
>
>Thank you very much,
>
>Jay
>
>R code
>
>library(meta) library(readxl) All <- read_excel("Basics excel file
>complication and reoperation rate.xlsx", sheet=1) names(All) <-
>c("Study_ID","Event_Type","Treatment","Events_n","N","nN") All$Treatment
><- factor(All$Treatment, levels=c("PC","EF","IMN","KW","VPO")) # Outcomes
>Complications <- subset(All, Event_Type=="Complications") Reoperations <-
>subset(All, Event_Type=="Reoperations") # Comparison of treatment effects
>to gold standard in the Complications subset mtpr1 <- metaprop(Events_n,
>N, Study_ID, data = Complications) meta::metareg(mtpr1, ~Treatment) #
>Comparison of treatment effects to gold standard in the Reoperations
>subset mtpr2 <- metaprop(Events_n, N, Study_ID, data = Reoperations)
>meta::metareg(mtpr2, ~Treatment) # Comparison of treatment effects to gold
>standard in the All dataset # Interaction effects have been considered
>mtpr <- metaprop(Events_n, N, Study_ID, data = All) meta::metareg(mtpr,
>~Treatment*Event_Type)
>
>A part of the dataset:
>
>Study| Event Type| Treatment| Number of Events (n)| N| n/N|
>Kumaravel| Complications| EF| 3| 23| 0,1304348|
>Franck| Complications| EF| 2| 20| 0,1|
>Schonnemann| Complications| EF| 8| 30| 0,267|
>Aita| Complications| EF| 1| 16| 0,0625|
>Hove| Complications| EF| 31| 39| 0,7948718|
>Andersen| Complications| EF| 26| 75| 0,347|
>Krughaug| Complications| EF| 22| 75| 0,293|
>Moroni| Complications| EF| 0| 20| 0|
>Plate| Complications| IMN| 3| 30| 0,1|
>Chappuis| Complications| IMN| 4| 16| 0,25|
>Gradl| Complications| IMN| 12| 66| 0,1818182|
>Schonnemann| Complications| IMN| 6| 31| 0,1935484|
>Aita| Complications| IMN| 1| 16| 0,0625|
>Dremstrop| Complications| IMN| 17| 44| 0,3863636|
>Wong| Complications| PC| 1| 30| 0,033|
>Kumaravel| Complications| PC| 4| 25| 0,16|
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-
>guide.html
>and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Qvalue package: I am getting back 1, 000 q values when I only want 1 q value.

2017-01-17 Thread Jay Tanzman

What you're doing makes no sense.  Given p-values p_i, i=1...n, resulting
from hypothesis tests t_i, i=1...n, the q-value of p_i is the expected
proportion of false positives among all n tests if the significance level
of each test is α=p_i. Thus a q-value is only defined for an observed
p-value.  Assuming that you have stored n observed p-values in an R vector
P, and the ith p-value P[i]==.05, then the R syntax to obtain the q-value
for P[i] is qvalue(P)$qvalues[i].

If, instead (as I suspect), that .05 is not among your observed p-values,
but you want to know what the FDR would be, given your sequence of
p-values, if the significance level of every test were .05, then the R
syntax would be
max(qvalue(P)$qvalues[P<=.05]).

On Fri, Jan 13, 2017 at 2:08 AM, Thomas Ryan 
wrote:

> Jim,
>
> Thanks for the reply. Yes I'm just playing around with the data at the
> minute, but regardless of where the p values actually come from, I can't
> seem to get a Q value that makes sense.
>
> For example, in one case, I have an actual P value of 0.05.  I have a list
> of 1,000 randomised p values: range of these randomised p values is 0.002
> to 0.795, average of the randomised p values is 0.399 and the median of the
> randomised p values is 0.45.
>
> So I thought it would be reasonable to expect the FDR Q Value (i.e the
> number of expected false positives over the number of significant results)
> to
> be at least over 0.05, given that 869 of the randomised p values are >
> 0.05?
>
> When I run the code:
>
> library(qvalue)
> list1 <-scan("ListOfPValues")
>
> qobj <-qvalue(p=list1)
>
> qobj$pi0
>
>
> The answer is 0.0062. That's why I thought qobj$pi0 isn't the right
> variable to be looking at? So my problem (or my mis-understanding) is that
> I have an actual P value of 0.05, but then a Q value that is lower, 0.006?
>
>
> Thanks again for your help,
>
> Tom
>
>
>
>
>
>
>
>
> On Thu, Jan 12, 2017 at 9:27 PM, Jim Lemon  wrote:
>
> > Hi Tom,
> > From a quick scan of the docs, I think you are looking for qobj$pi0.
> > The vector qobj$qvalue seems to be the local false discovery rate for
> > each of your randomizations. Note that the manual implies that the p
> > values are those of multiple comparisons within a data set, not
> > randomizations of the data, so I'm not sure that your usage is valid
> > for the function..
> >
> > Jim
> >
> >
> > On Fri, Jan 13, 2017 at 4:12 AM, Thomas Ryan 
> > wrote:
> > > Hi all, I'm wondering if someone could put me on the right path to
> using
> > > the "qvalue" package correctly.
> > >
> > > I have an original p value from an analysis, and I've done 1,000
> > > randomisations of the data set. So I now have an original P value and
> > 1,000
> > > random p values. I want to work out the false discovery rate (FDR) (Q;
> as
> > > described by Storey and Tibshriani in 2003) for my original p value,
> > > defined as the number of expected false positives over the number of
> > > significant results for my original P value.
> > >
> > > So, for my original P value, I want one Q value, that has been
> calculated
> > > as described above based on the 1,000 random p values.
> > >
> > > I wrote this code:
> > >
> > > pvals <- c(list_of_p_values_obtained_from_randomisations)
> > > qobj <-qvalue(p=pvals)
> > > r_output1 <- qobj$pvalue
> > > r_output2 <- qobj$qvalue
> > >
> > > r_output1 is the list of 1,000 p values that I put in, and r_output2 is
> > a q
> > > value for each of those p values (i.e. so there are 1,000 q values).
> > >
> > > The problem is I don't want there to be 1,000 Q values (i.e one for
> each
> > > random p value). The Q value should be the false discovery rate (FDR)
> > (Q),
> > > defined as the number of expected false positives over the number of
> > > significant results. So I want one Q value for my original P value, and
> > to
> > > calculate that one Q value using the 1,000 random P values I have
> > generated.
> > >
> > > Could someone please tell me where I'm going wrong.
> > >
> > > Thanks
> > > Tom
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do rea

[R] Setting fixed size for segement plot using stars() (axes size vs print size)

2008-07-30 Thread Jay Douillard


I have been making some segment plots with five variables. They work great, 
especially when I used a different scale function, which scaled them by area of 
the circle rather than radius


scale <- function(x, Mr = 1 , Mx = 100) { ((x/Mx)^.5)*Mr}

Where x is the the value, Mr is the Maximum radius, and Mx is the maximum data 
value. You could change the exponent .5 to .57 if you wanted Flannery 
compensation.


My problem is that I want the print size of these proportional symbols to be 
the same regardless of the number of data points

as in this example, where exporting these two plots as PDF(which have been 
scaled) will produce different size symbols for the same value, when compared 
side by side.I've tried manually setting the ncol and nrow attributes, and it 
still produces different results for the data sets. 

stars(large[2:6], draw.segments = TRUE, labels = large$size,scale = FALSE, 
flip.labels = TRUE, axes = TRUE,)
stars(small[2:6], draw.segments = TRUE, labels = small$size,scale = FALSE, 
flip.labels = TRUE, axes = TRUE,)


Thanks!


small <-
structure(list(size = c(5, 10, 15, 20, 25, 30, 50), one = c(0.223606797749979, 
0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 
0.547722557505166, 0.707106781186548), two = c(0.223606797749979, 
0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 
0.547722557505166, 0.707106781186548), three = c(0.223606797749979, 
0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 
0.547722557505166, 0.707106781186548), four = c(0.223606797749979, 
0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 
0.547722557505166, 0.707106781186548), five = c(0.223606797749979, 
0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 
0.547722557505166, 0.707106781186548)), .Names = c("size", "one", 
"two", "three", "four", "five"), row.names = c(NA, 7L), class = "data.frame")


large <-
structure(list(size = c(5L, 10L, 15L, 20L, 25L, 30L, 50L, 5L, 
10L, 15L, 20L, 25L, 30L, 50L, 5L, 10L, 15L, 20L, 25L, 30L, 50L, 
5L, 10L, 15L, 20L, 25L, 30L, 50L), one = c(0.223606797749979, 
0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 
0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 
0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 
0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 
0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 
0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 
0.5, 0.547722557505166, 0.707106781186548), two = c(0.223606797749979, 
0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 
0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 
0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 
0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 
0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 
0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 
0.5, 0.547722557505166, 0.707106781186548), three = c(0.223606797749979, 
0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 
0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 
0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 
0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 
0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 
0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 
0.5, 0.547722557505166, 0.707106781186548), four = c(0.223606797749979, 
0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 
0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 
0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 
0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 
0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 
0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 
0.5, 0.547722557505166, 0.707106781186548), five = c(0.223606797749979, 
0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 
0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 
0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 
0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 
0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 
0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 
0.5, 0.547722557505166, 0.707106781186548)), .Names = c("size", 
"one", "two", "three", "four", "five"), row.names = c(NA, -28L
), class = "data.frame")

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] call perl

2008-08-21 Thread Jay an

Hi,
 
It may be the old question.
can anyone tell me how to call perl in R?
thanks
 
Y.
 
 


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reading web log file into R

2009-09-23 Thread Jay Emerson

Sebastian,

There is rarely a completely free lunch, but fortunately for us R has
some wonderful tools
to make this possible.  R supports regular expressions with commands
like grep(),
gsub(), strsplit(), and others documented on the help pages.  It's
just a matter of
constructing and algorithm that does the job.  In your case, for
example (though please
note there are probably many different, completely reasonable approaches in R):

x <- scan("logfilename", what="", sep="\n")

should give you a vector of character strings, one line per element.  Now, lines
containing "GET" seem to identify interesting lines, so

x <- x[grep("GET", x)]

should trim it to only the interesting lines.  If you want information
from other lines, you'll
have to treat them separately.  Next, you might try

y <- strsplit(x)

which by default splits on whitespace, returning a list (one component
per line) of vectors
based on the split.  Try it.  It it looks good, you might check

lapply(y, length)

to see if all lines contain the same number of records.  If so, you
can then get quickly into
a matrix,

z <- matrix(unlist(strsplit(x)), ncol=K, byrow=TRUE)

where K is the common length you just observed.  If you think this is
cool, great!  If not, well...
hire a programmer, or if you're lucky Microsoft or Apache have tools
to help you with this.
There might be something in the Perl/Python world.  Or maybe there's a
package in R designed
just for this, but I encourage students to develop the raw skills...

Jay



-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RMySQL_0.7-4 core dumped on dbWriteTable

2010-03-09 Thread Jay Castino

Good Afternoon:

Have an R script that uses RMySQL package.

Everything works great within 32 bit ubuntu linux environment
(/usr/sbin/mysqld: ELF 32-bit LSB shared object, Intel 80386, version
1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15,
stripped).
> mysqlClientLibraryVersions()
5.1.41 5.1.37
 50141  50137

Now testing on 64 bit ubuntu linux environment (/usr/sbin/mysqld:ELF
64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked
(uses shared libs), for GNU/Linux 2.6.15, stripped).
> mysqlClientLibraryVersions()
5.0.75 5.0.75
 50075  50075

Followed instructions for RMySQL installation (specifying MySQL
headers and library directories)
export PKG_CPPFLAGS="-I/usr/include/mysql"
export PKG_LIBS="-L/usr/lib/ -lmysqlclient" (This is where the
'/usr/lib64/mysql' symbolic link ends up).

Made sure I could successfully query and write to the database
otherwise (with RODBC).

So far, can successfully connect and disconnect using RMySQL

Also, am able to execute dbGetQuery command.

However, upon executing the dbWriteTable command (see partial .RHistory below),

R crashes with "***buffer overflow detected***:
/usr/lib64/R/bin/exec/R terminated"

How can I fix this?

Appreciate your help.

Sincerely,

Jay James Castino, PE
Principal
JJCENG.COM, PC
www.jjceng.com
+1 (541) 633-7990
1560 NE 1st ST. #14
Bend, OR USA 97701

## partial .RHistory ###
>sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-pc-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

> library(RMySQL)
Loading required package: DBI
> con<- dbConnect(dbDriver("MySQL"), dbname = "knottlf_local",user="mysql", 
> password="xx", host="localhost")
> ch4dd<-data.frame(6,"2010-03-06")
> names(ch4dd)<-c("scada_terminal_id","timestamp_on")
> ch4dd
  scada_terminal_id timestamp_on
1 6   2010-03-06
> dbGetQuery(con, "SELECT LAST_INSERT_ID() FROM 
> `knottlf_local`.`R_ch4_concentrations`")
data frame with 0 columns and 0 rows
> dbWriteTable(con, name = "R_ch4_concentrations",ch4dd, append = TRUE, 
> row.names = FALSE)
*** buffer overflow detected ***: /usr/lib64/R/bin/exec/R terminated
=== Backtrace: =
/lib/libc.so.6(__fortify_fail+0x37)[0x7fb4375292c7]
/lib/libc.so.6[0x7fb437527170]
/lib/libc.so.6[0x7fb437526519]
/lib/libc.so.6(_IO_default_xsputn+0x96)[0x7fb4374a0426]
/lib/libc.so.6(_IO_vfprintf+0x348d)[0x7fb437472e2d]
/lib/libc.so.6(__vsprintf_chk+0x99)[0x7fb4375265b9]
/lib/libc.so.6(__sprintf_chk+0x80)[0x7fb437526500]
/home/jbiztino/R/x86_64-pc-linux-gnu-library/2.10/RMySQL/libs/RMySQL.so(RS_MySQL_exec+0x1be)[0x7fb4348630de]
/usr/lib64/R/lib/libR.so[0x7fb437847ace]
/usr/lib64/R/lib/libR.so(Rf_eval+0x6b6)[0x7fb437877ed6]
/usr/lib64/R/lib/libR.so[0x7fb43787a0e0]
/usr/lib64/R/lib/libR.so(Rf_eval+0x46e)[0x7fb437877c8e]
/usr/lib64/R/lib/libR.so[0x7fb43787a1ce]
/usr/lib64/R/lib/libR.so(Rf_eval+0x46e)[0x7fb437877c8e]
/usr/lib64/R/lib/libR.so(Rf_applyClosure+0x2d3)[0x7fb43787ba93]
/usr/lib64/R/lib/libR.so(Rf_eval+0x3c3)[0x7fb437877be3]
/usr/lib64/R/lib/libR.so[0x7fb43787b39c]
/usr/lib64/R/lib/libR.so(R_execMethod+0x241)[0x7fb43787b6d1]
/usr/lib64/R/library/methods/libs/methods.so[0x7fb435259655]
/usr/lib64/R/lib/libR.so[0x7fb4378c205c]
/usr/lib64/R/lib/libR.so(Rf_eval+0x5dc)[0x7fb437877dfc]
/usr/lib64/R/lib/libR.so[0x7fb43787802f]
/usr/lib64/R/lib/libR.so(Rf_eval+0x26d)[0x7fb437877a8d]
/usr/lib64/R/lib/libR.so(Rf_eval+0x64b)[0x7fb437877e6b]
/usr/lib64/R/lib/libR.so[0x7fb43787802f]
/usr/lib64/R/lib/libR.so(Rf_eval+0x26d)[0x7fb437877a8d]
/usr/lib64/R/lib/libR.so(Rf_eval+0x64b)[0x7fb437877e6b]
/usr/lib64/R/lib/libR.so[0x7fb437878f7d]
/usr/lib64/R/lib/libR.so(Rf_eval+0x58e)[0x7fb437877dae]
/usr/lib64/R/lib/libR.so[0x7fb43787a0e0]
/usr/lib64/R/lib/libR.so(Rf_eval+0x46e)[0x7fb437877c8e]
/usr/lib64/R/lib/libR.so[0x7fb43787a1ce]
/usr/lib64/R/lib/libR.so(Rf_eval+0x46e)[0x7fb437877c8e]
/usr/lib64/R/lib/libR.so(Rf_applyClosure+0x2d3)[0x7fb43787ba93]
/usr/lib64/R/lib/libR.so(Rf_eval+0x3c3)[0x7fb437877be3]
/usr/lib64/R/lib/libR.so[0x7fb43787ae41]
/usr/lib64/R/lib/libR.so(Rf_eval+0x46e)[0x7fb437877c8e]
/usr/lib64/R/lib/libR.so[0x7fb43787a7e6]
/usr/lib64/R/lib/libR.so(Rf_eval+0x46e)[0x7fb437877c8e]
/usr/lib64/R/lib/libR.so[0x7fb43787a1ce]
/usr/lib64/R/lib/libR.so(Rf_eval+0x46e)[0x7fb437877c8e]
/usr/lib64/R/lib/libR.so(Rf_eval+0x46e)[0x7fb437877c8e]
/usr/lib64/R/lib/libR.so[0x7fb43787a1ce]
/usr/lib64/R/lib/libR.so(Rf_eval+0x46e)[0x7fb437877c8e]
/usr/lib64/R/lib/lib

[R] ' R ' - General Question (newbie)

2009-07-10 Thread Jay Mistry

Hi,

First-off, I apologize if this is the wrong list to post to, but I would
like to install and try out 'R', as an alternative to 'SAS' . As a newbie,
could you pl let me know about the following (in terms of online resources
and print books)

I have previously used SAS/BASE in a Biostatistics/ Epidemiology (Public
Health) class, and familiar with very basic terminology and SAS-BASE use.

1) Basics of 'R'

2) Where to download & How to install it on Windows (XP), and any needed
add-on modules (for Data Analysis and Biostatistics procedures) + others
similar to ODS of SAS.

2) Any print/ online documentation for the beginning user of R.

Thanks,

Jay

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Building a big.matrix using foreach

2009-07-19 Thread Jay Emerson

Michael,

If you have a big.matrix, you just want to iterate over the rows.  I'm not
in R and am just making this up on the fly (from a bar in Beijing, if you
believe that):

foreach(i=1:nrow(x),.combine=c) %dopar% f(x[i,])

should work, essentially applying the functin f() to the rows of x?  But
perhaps I misunderstand you.  Please feel free to email me or Mike (
michael.k...@yale.edu) directoy with questions about bigmemory, we are very
interested in applications of it to real problems.

Note that the package foreach uses package iterators, and is very flexible,
in case you need more general iteration in parellel.

Regards,

Jay



Original message:
Hi there!
I have become a big fan of the 'foreach' package allowing me to do a
lot of stuff in parallel. For example, evaluating the function f on
all elements in a vector x is easily accomplished:
foreach(i=1:length(x),.combine=c) %dopar% f(x[i])
Here the .combine=c option tells foreach to combine output using the
c()-function. That is, to return it as a vector.
Today I discovered the 'bigmemory' package, and I would like to
contruct a big.matrix in a parralel fashion row by row. To use foreach
I see no other way than to come up with a substitute for c in the
.combine option. I have checked out the big.matrix manual, but I can't
find a function suitable for just that.
Actually, I wouldn't even know how to do it for a usual matrix. Any clues?
Thanks!
--
Michael Knudsen
micknud...@gmail.com
http://lifeofknudsen.blogspot.com/



-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kmeans.big.matrix

2009-07-22 Thread Jay Emerson

This sort of question is ideal to send directly to the maintainer.

We've removed kmeans.big.matrix for the time being and will place it in a
new package, bigmemoryAnalytics.  bigmemory itself is the core building
block and tool, and we don't want to pollute it with lots of extras.

Allan's point is right: big data packages (like bigmemory and ff) can't
be used directly with R functions (like lm).  And because of R's design you
can't extract subsets with more than 2^31-1 elements, even though the
big.matrix can be as large as you need (with filebacking).

I hope that helps.

Jay

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mosaic plots

2010-03-23 Thread Jay Emerson

As pointed out by others, vcd supports mosaic plots on top of the grid
engine (which is extremely helpful for those of us who love playing around
with grid).  The standard mosaicplot() function is directly available (it
isn't clear if you knew this).  The proper display of names is a real
challenge faced by all of us with these plots, so  you should try each
version.  I'm not sure what you intend to do with a legend, but if you want
the ability to customize and hack code, I suggest you look at grid and a
modification to vcd's version to suit your purposes.

Jay

>
>
>
>
>> Subject: [R] Mosaic Plots
>> Message-ID: <1269256874432-1677468.p...@n4.nabble.com>
>> Content-Type: text/plain; charset=us-ascii
>>
>>
>> Hello Everyone
>>
>> I want to plot Moasic Plots, I have tried them using iplots package (using
>> imosaic). The problem is the names dont get alligned properly, is there a
>> way to a align the names and provide legend in Mosaic plots using R?
>>
>> Also I would like to know any other packages using which I can plot Mosaic
>> Plots
>>
>>
>> Thank you in advance
>> Sunita
>> --
>>
>>
> --
> John W. Emerson (Jay)
> Associate Professor of Statistics
> Department of Statistics
> Yale University
> http://www.stat.yale.edu/~jay <http://www.stat.yale.edu/%7Ejay>
>



-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] large dataset

2010-03-27 Thread Jay Emerson

A little more information would help, such as the number of columns?  I
imagine it must
be large, because 100,000 rows isn't overwhelming.  Second, does the
read.csv() fail,
or does it work but only after a long time?  And third, how much RAM do you
have
available?

R Core provides some guidelines in the Installation and Administration
documentation
that suggests that a single object around 10% of your RAM is reasonable, but
beyond
that things can become challenging, particularly once you start working with
your data.

There are a wide range of packages to help with large data sets.  For
example,
RMySQL supports MySQL databases.  At the other end of the spectrum, there
are
possibilities discussed on a nice page by Dirk Eddelbuettel which you might
look at:

http://cran.r-project.org/web/views/HighPerformanceComputing.html

Jay

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

(original message below)
--

Message: 128
Date: Sat, 27 Mar 2010 10:19:33 +0100
From: "n\.vial...@libero\.it" 
To: "r-help" 
Subject: [R] large dataset
Message-ID: 
Content-Type: text/plain; charset=iso-8859-1

Hi I have a question,
as im not able to import a csv file which contains a big dataset(100.000
records) someone knows how many records R can handle without giving
problems?
What im facing when i try to import the file is that R generates more than
100.000 records and is very slow...
thanks a lot!!!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Huge data sets and RAM problems

2010-04-20 Thread Jay Emerson

Stella,

A few brief words of advice:

1. Work through your code a line at a time, making sure that each is what
you would expect.  I think some of your later problems are a result of
something
early not being as expected.  For example, if the read.delim() is in fact
not
giving you what you expect, stop there before moving onwards.  I suspect
some funny character(s) or character encodings might be a problem.

2. 32-bit Windows can be limiting. With 2 GB of RAM, you're probably not
going to be able to work effectively in native R with objects over 200-300
MB,
and the error indicates that something (you or a package you're using)
simply
have run out of memory.  So...

3. Consider more RAM (and preferably with 64-bit R).  Other solutions might
be possible, such as using a database to hand the data transition into R.
2.5 million rows by 18 columns is apt to be around 360 MB.  Although you
can afford 1 (or a few) copies of this, it doesn't leave you much room for
the memory overhead of working with such an object.

Part of the oringal message below.

Jay

-

Message: 80
Date: Mon, 19 Apr 2010 22:07:03 +0200
From: Stella Pachidi 
To: r-h...@stat.math.ethz.ch
Subject: [R]  Huge data sets and RAM problems
Message-ID:
   
Content-Type: text/plain; charset=ISO-8859-1

Dear all,



I am using R 2.10.1 in a laptop with Windows 7 - 32bit system, 2GB RAM
and CPU Intel Core Duo 2GHz.

.

Finally, another problem I have is when I perform association mining
on the data set using the package arules: I turn the data frame into
transactions table and then run the apriori algorithm. When I put too
low support in order to manage to find the rules I need, the vector of
rules becomes too big and I get problems with the memory such as:
Error: cannot allocate vector of size 923.1 Mb
In addition: Warning messages:
1: In items(x) : Reached total allocation of 153Mb: see help(memory.size)

Could you please help me with how I could allocate more RAM? Or, do
you think there is a way to process the data by loading them into a
document instead of loading all into RAM? Do you know how I could
manage to read all my data set?

I would really appreciate your help.

Kind regards,
Stella Pachidi


-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bigmemory package woes

2010-04-24 Thread Jay Emerson

Zerdna,

Please note that the CRAN version 3.12 is about
to be replaced by a new cluster of packages now on R-Forge; we consider the
new bigmemory >= 4.0 to be "stable" and recommend you start using it
immediately.  Please see http://www.bigmemory.org.

In your case, two comments:

(1) Your for() loop will generate three identical copies of filebackings on
disk,
yes.  Note that when the loop exists, the R object xx will reference only
the
3rd of these, so xx[1,1] <- 1 will modify only the third filebacking, not
the
first two.  You'll need to use the separate descriptor files (probably
created
automatically for you, but we recommend naming them specifically using
descriptorfile=) to attach.big.matrix() whatever of these you really want to
be using.

(2) In the problem with "hanging" I believe you have exhausted the shared
resources on your system.  This problem will no longer arise in the >= 4.0
problems, as we're handling mutexes separately rather than automatically.
These shared resource limits are mysterious, depending on the OS as well
as the hardware and other jobs or tasks in existence at any given point in
time.
But again, it shouldn't be a problem with the new version.

The CRAN update should take place early next week, along with some revised
documentation.

Regards,

Jay

---


Message: 125
Date: Fri, 23 Apr 2010 13:51:32 -0800 (PST)
From: zerdna 
To: r-help@r-project.org
Subject: [R] bigmemory package woes
Message-ID: <1272059492009-2062996.p...@n4.nabble.com>
Content-Type: text/plain; charset=us-ascii


I have pretty big data sizes, like matrices of .5 to 1.5GB so once i need to
juggle several of them i am in need of disk cache. I am trying to use
bigmemory package but getting problems that are hard to understand. I am
getting seg faults and machine just hanging. I work by the way on Red Hat
Linux, 64 bit R version 10.

Simplest problem is just saving matrices. When i do something like

r<-matrix(rnorm(100), nr=10); librarybigmemory)
for(i in 1:3) xx<-as.big.matrix(r, backingfile=paste("r",i, sep="",
collapse=""), backingpath=MyDirName)

it works just fine -- saves small matrices  as three different matrices on
disc. However, when i try it with real size, like

with r<-matrix(normal(5000), nr=1000)

I am either getting seg fault on saving the third big matrix, or hang
forever.

Am i doing something obviously wrong, or is it an unstable package at the
moment? Could anyone recommend something similar that is reliable in this
case?


-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help producing plot for assessing forecasting accuracy

2009-10-09 Thread Jay Ulfelder

Dear colleagues,

I'm trying (and failing) to write the script required to generate a
chart that would help me assess the forecasting accuracy of a logistic
regression model by plotting the cumulative proportion of observed
events occurring in cases across the range of possible predicted
probabilities. In other words, let:

x = any value on 0-1 scale

phat_i = predicted probability of event Y from logit model for case i

y_i = observed outcome (0/1) for case i

Y_cond = sum(y_i) conditional on phat_i <= x

Y_tot = total number of events observed in sample

What I'm trying to plot is (Y_cond)/(Y_tot) across all values of x. I
would be grateful for any guidance you can offer, and I'm sorry if
I've overlooked some really simple solution; I'm fairly new to R and
learning by doing.

Regards,
Jay

-- 
Jay Ulfelder, Ph.D.
Research Director
Political Instability Task Force
Science Applications International Corp. (SAIC)
jay_ulfel...@stanfordalumni.org
(301) 588-8478 [home office]
(301) 580-8736 [mobile]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Estimation in a changepoint regression with R

2009-10-16 Thread Jay Emerson

Package bcp does Bayesian changepoint analysis, though not in the
general regression
framework.  The most recent reference is Bioinformatics 24(19) 2143-2148; doi:
  10.1093/bioinformatics/btn404; slightly older is JSS 23(3).  Both
reference some
alternatives you might want to consider (including strucchange, among others).


Jay



Message: 4
Date: Thu, 15 Oct 2009 03:56:22 -0700 (PDT)
From: FMH 
Subject: [R] Estimation in a changepoint regression with R
To: r-help@r-project.org
Message-ID: <365399.56401...@web38303.mail.mud.yahoo.com>
Content-Type: text/plain; charset=iso-8859-1

Dear All,

I'm trying to do the estimation in a changepoint regression problem
via R, but never found any suitable function which might help me to do
this.

Could someone give me a hand?on this matter?

Thank you.

--
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multicore package: sharing/modifying variable accross processes

2009-10-31 Thread Jay Emerson

Renaud,

Package bigmemory can help you with shared-memory matrices, either in RAM or
filebacked.  Mutex support currently exists as part of the package, although
for various reasons will soon be abstracted from the package and provided
via a new package, synchronicity.

bigmemory works beautifully with multicore.  Feel free to email us with
questions, and we appreciate feedback.

Jay



Original message:

Hi,

I want to parallelize some computations when it's possible on multicore
machines.
Each computation produces a big objects that I don't want to store if
not necessary: in the end only the object that best fits my data have to
be returned. In non-parallel mode, a single gloabl object is updated if
the current computation gets a better result than the best previously found.
My plan was to use package multicore. But there is obviously an issue of
concurrent access to the global result variable.
Is there a way to implement something like a lock/mutex to ensure make
the procedure thread safe?
Maybe something already exist to deal with such things?
It looks like package multicore run the different processes in different
environments with copy-on-change of everything when forking. Anybody has
experimented working with a shared environment with package multicore?



-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bigmemory - extracting submatrix from big.matrix object

2009-06-02 Thread Jay Emerson

Thanks for trying this out.

Problem 1.  We'll check this.  Options should certainly be available.  Thanks!

Problem 2. Fascinating.  We just (yesterday) implemented a
sub.big.matrix() function doing exactly
this, creating something that is a big matrix but which just
references a contiguous subset of the
original matrix.  This will be available in an upcoming version
(hopefully in the next week).  A more
specialized function would create an entirely new big.matrix from a
subset of a first big.matrix,
making an actual copy, but this is something else altogether. You
could do this entirely within R
without much work, by the way, and only 2* memory overhead.

Problem 3. You can count missing values using mwhich().  For other
exploration (e.g. skewness)
at the moment you should just extract a single column (variable)  at a
time into R, study it, then get the
next column, etc... .  We will not be implementing all of R's
functions directly with big.matrix objects.
We will be creating a new package "bigmemoryAnalytics" and would
welcome contributions to the
package.

Feel free to email us directly with bugs, questions, etc...

Cheers,

Jay


--

From: utkarshsinghal 
Date: Tue, Jun 2, 2009 at 8:25 AM
Subject: [R] bigmemory - extracting submatrix from big.matrix object
To: r help 
I am using the library(bigmemory) to handle large datasets, say 1 GB,
and facing following problems. Any hints from anybody can be helpful.
_Problem-1:
_
I am using "read.big.matrix" function  to create a filebacked big
matrix of my data and get the following warning:
> x = 
> read.big.matrix("/home/utkarsh.s/data.csv",header=T,type="double",shared=T,backingfile
>  = "backup", backingpath = "/home/utkarsh.s")
Warning message:
In filebacked.big.matrix(nrow = numRows, ncol = numCols, type = type,  :
 A descriptor file has not been specified.  A descriptor named
backup.desc will be created.
However there is no such argument in "read.big.matrix". Although there
is an argument "descriptorfile" in the function "as.big.matrix" but if
I try to use it in "read.big.matrix", I get an error showing it as
unused argument (as expected).
_Problem-2:_
I want to get a filebacked *sub*matrix of "x", say only selected
columns: x[, 1:100]. Is there any way of doing that without actually
loading the data into R memory.
_
Problem-3
_There are functions available like:  summary, colmean, colsd, ... for
standard summary statistics. But is there any way to calculate other
summaries say number of missing values or skewness of each variable,
without loading the whole data into R memory.
Regards
Utkarsh

--
John W. Emerson (Jay)
Assistant Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bigmemory - extracting submatrix from big.matrix object

2009-06-03 Thread Jay Emerson

Utkarsh,

Thanks again for the feedback and suggestions on bigmemory.

A follow-up on counting NAs: we have exposed a new function colna()
to the user in upcoming release 3.7.  Of course mwhich() can still be
helpful.

As for the last topic -- applying any function to columns of a big.matrix
object.  Once you peel away the shell, a big.matrix column
is identical to an R matrix column (or vector) -- a pointer and a length
and knowledge of the type is sufficient.  Because we (ideally) want to
support our current 4 types (and hopefully add complex and maybe
more, soon), we rely on C++ template functions for the summaries we
have implemented to date.  But yes, looking at our implementation
of colmean(), for example, would be a good place to start.

Keep in mind that there are differences between big.matrix objects
and R internals.  bigmemory indexes everything using longs instead of
integers (and uses numerics when passing indices between R and
C/C++).  So simply using an existing R function (or the C function  under
the hood of R) would be limiting not only with respect to the various
types of big.matrix objects, but also with respect to the size.  On 64-bit
R platforms, there is no practical limit to the size of a filebacked big.matrix
(other than your disk space or filesystem limitations).  But R won't handle
vectors in excess of 2 billion elements, even if you have the RAM to
support such beasts.  Operating on chunks within R is of course another
possibility.

Further discussion of development ideas would be great, but should
probably be moved offine or over to R-devel.  As always, we appreciate
feedback, complaints, bug reports, etc...

Thanks,

Jay

On Wed, Jun 3, 2009 at 3:16 AM, utkarshsinghal
 wrote:
> Thanks for the really valuable inputs, developing the package and updating
> it regularly. I will be glad if I can contribute in any way.
>
> In problem three, however, I am interested in knowing a generic way to apply
> any function on columns of a big.matrix object (obviously without loading
> the data into R). May be the source code of the function "colmean" can help,
> if that is not too much to ask for. Or if we can develop a function similar
> to "apply" of the base R.
>
>
> Regards
> Utkarsh
>
>
>
>
> Jay Emerson wrote:
>>
>> We also have ColCountNA(), which is not currently exposed to the user
>> but will be in the next version.
>>
>> Jay
>>
>> On Tue, Jun 2, 2009 at 2:08 PM, Jay Emerson  wrote:
>>
>>>
>>> Thanks for trying this out.
>>>
>>> Problem 1.  We'll check this.  Options should certainly be available.
>>>  Thanks!
>>>
>>> Problem 2. Fascinating.  We just (yesterday) implemented a
>>> sub.big.matrix() function doing exactly
>>> this, creating something that is a big matrix but which just
>>> references a contiguous subset of the
>>> original matrix.  This will be available in an upcoming version
>>> (hopefully in the next week).  A more
>>> specialized function would create an entirely new big.matrix from a
>>> subset of a first big.matrix,
>>> making an actual copy, but this is something else altogether. You
>>> could do this entirely within R
>>> without much work, by the way, and only 2* memory overhead.
>>>
>>> Problem 3. You can count missing values using mwhich().  For other
>>> exploration (e.g. skewness)
>>> at the moment you should just extract a single column (variable)  at a
>>> time into R, study it, then get the
>>> next column, etc... .  We will not be implementing all of R's
>>> functions directly with big.matrix objects.
>>> We will be creating a new package "bigmemoryAnalytics" and would
>>> welcome contributions to the
>>> package.
>>>
>>> Feel free to email us directly with bugs, questions, etc...
>>>
>>> Cheers,
>>>
>>> Jay
>>>
>>>
>>> --
>>>
>>> From: utkarshsinghal 
>>> Date: Tue, Jun 2, 2009 at 8:25 AM
>>> Subject: [R] bigmemory - extracting submatrix from big.matrix object
>>> To: r help 
>>> I am using the library(bigmemory) to handle large datasets, say 1 GB,
>>> and facing following problems. Any hints from anybody can be helpful.
>>> _Problem-1:
>>> _
>>> I am using "read.big.matrix" function  to create a filebacked big
>>> matrix of my data and get the following warning:
>>>
>>>>
>>>> x =
>>>> read.big.matrix("/home/utkarsh.s/data.csv",header=T,type="double",shared=T,ba

[R] [R-pkgs] Major bigmemory revision released.

2009-04-16 Thread Jay Emerson

The re-engineered bigmemory package is now available (Version 3.5
and above) on CRAN.  We strongly recommend you cease using
the older versions at this point.

bigmemory now offers completely platform-independent support for
the big.matrix class in shared memory and, optionally, as filebacked
matrices for larger-than-RAM applications.  We're working on updating
the package vignette, and a draft is available upon request (just send
me an email if you're interested).  The user interface is largely unchanged.

Feedback, bug reports, etc... are welcome.

Jay Emerson & Michael Kane

-- 
John W. Emerson (Jay)
Assistant Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Bivariate pdf and marginal pdf of non-standard distributions

2009-05-02 Thread Jay Liu

Hi,

I am trying to find marginal pdfs when joint pdf of two continuous variables is 
given.

Simple case is like this: 

The joint pdf is

f(x,y) = 4xy, x lies between 0 and 1, and y also lies between 0 and 1.

I tried following crude way of doing it but it is not complete.

Could you post a suggestion to it. 
Thanks,
Jay Liu, University of Tennesse at Knoxville

##
# f(x,y) = 4xy, range of x is (0,1), range of y is (0,1)
# Checking if f (x,y) is a joint pdf
Pxy<-integrate(function(y) { 
sapply(y, function(y) {
integrate(function(x) {
sapply(x, function(x) (4*x*y))
}, 0, 1)$value
})
}, 0, 1)
 
print("Value of int int f(x,y)dx dy")
Pxy
# To find marginal distribution, I tried this, but this is incorrect because x 
is not all considered while integrating the joint pdf
 # Assume x to be a constant
# Px1 = f(x)
Px1<-integrate(function(y) { 
 4*y
}, 0, 1)$value


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about bigmemory: releasing RAM from a big.matrix that isn't used anymore

2010-02-06 Thread Jay Emerson

>>> See inline for responses.  But people are always welcome to contact
>>> us directly.

Hi all,

I'm on a Linux server with 48Gb RAM. I did the following:

x <-
big.matrix(nrow=2,ncol=50,type='short',init=0,dimnames=list(1:2,1:50))
#Gets around the 2^31 issue - yeah!

>>> We strongly discourage use of dimnames.

in Unix, when I hit the "top" command, I see R is taking up about 18Gb
RAM, even though the object x is 0 bytes in R. That's fine: that's how
bigmemory is supposed to work I guess. My question is how do I return
that RAM to the system once I don't want to use x any more? E.g.,

rm(x)

then "top" in Unix, I expect that my RAM footprint is back ~0, but it
remains at 18Gb. How do I return RAM to the system?

>>> It can take a while for the OS to free up memory, even after a gc().
>>> But it's available for re-use; if you want to be really sure, have a
look
>>> in /dev/shm to make sure the shared memory segments have been
>>> deleted.

Thanks,

Matt

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay <http://www.stat.yale.edu/%7Ejay>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Stats Package Fix

2019-08-07 Thread Jay Waldron

 Good afternoon,

Using this file, with the tab for Growth_Value (I have pasted some code
below), if I add the argument *exact = F *it produces the Wilcoxon signed
rank test with continuity correction. The description documentation leads
me to believe that's the wrong argument (should be the "correct" not the
"exact" argument. I am using R Studio Version 1.2.1335 running R 3.6.1

This was posed in our Business Statistics Class, taught at Utah Valley
University, of which I am a senior undergraduate.

A response by tomorrow would be ideal, but I would still like an answer
even if that timeline is too aggressive. Thank you for your consideration.
My contact information is:

Jay Spencer Waldron
Personal Email (This is the one I am subscribed to R-Help with):
gateofti...@gmail.com
(385) 335-7879

#---Code paste begins---
> library(readxl)
> Growth_Value <- read_excel("2019
SUMMER/ECON-3340-003/Ch20/Chapter20.xlsx",
+ sheet = "Growth_Value")
> View(Growth_Value)
> wilcox.test(Growth_Value$'Growth', mu=5, alternative="greater")

Wilcoxon signed rank test

data:  Growth_Value$Growth
V = 40, p-value = 0.1162
alternative hypothesis: true location is greater than 5

>
>
> wilcox.test(Growth_Value$'Growth', Growth_Value$'Value',
alternative="two.sided", paired=TRUE)

Wilcoxon signed rank test

data:  Growth_Value$Growth and Growth_Value$Value
V = 40, p-value = 0.2324
alternative hypothesis: true location shift is not equal to 0

> wilcox.test(Growth_Value$'Growth', mu=5, alternative="greater", exact=F)

Wilcoxon signed rank test with continuity correction

data:  Growth_Value$Growth
V = 40, p-value = 0.1106
alternative hypothesis: true location is greater than 5


#---Code Paste Ends---

*Documentation referenced*




*exact a logical indicating whether an exact p-value should be
computed.correct a logical indicating whether to apply continuity
correction in the normal approximation for the p-value.*


Thanks you for your time,

Jay
Educational Email:
10809...@my.uvu.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] t-statistic for independent samples

2013-04-17 Thread Jay Kerns

Dear David,

On Wed, Apr 17, 2013 at 6:24 PM, David Arnold  wrote:
> Hi,

[snip]

>
> D.

Before posting to StackExchange, check out the Wikipedia entry for
"Behrens-Fisher problem".

Cheers,
Jay

-- 
G. Jay Kerns, Ph.D.
Youngstown State University
http://people.ysu.edu/~gkerns/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] practical to loop over 2million rows?

2012-10-10 Thread Jay Rice

New to R and having issues with loops. I am aware that I should use
vectorization whenever possible and use the apply functions, however,
sometimes a loop seems necessary.

I have a data set of 2 million rows and have tried run a couple of loops of
varying complexity to test efficiency. If I do a very simple loop such as
add every item in a column I get an answer quickly.

If I use a nested ifelse statement in a loop it takes me 13 minutes to get
an answer on just 50,000 rows. I am aware of a few methods to speed up
loops. Preallocating memory space and compute as much outside of the loop
as possible (or use create functions and just loop over the function) but
it seems that even with these speed ups I might have too much data to run
loops.  Here is the loop I ran that took 13 minutes. I realize I can
accomplish the same goal using vectorization (and in fact did so).

y<-numeric(length(x))

for(i in 1:length(x))

ifelse(!is.na(x[i]), y[i]<-x[i],

ifelse(strataID[i+1]==strataID[i], y<-x[i+1], y<-x[i-1]))

Presumably, complicated loops would be more intensive than the nested if
statement above. If I write more efficient loops time will come down but I
wonder if I will ever be able to write efficient enough code to perform a
complicated loop over 2 million rows in a reasonable time.

Is it useless for me to try to do any complicated loops on 2 million rows,
or if I get much better at programming in R will it be manageable even for
complicated situations?


Jay

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] error msg using na.approx "x and index must have the same length"

2012-10-11 Thread Jay Rice

Below I have written out some simplified data from my dataset. My goal is
to interpolate Price based on timestamp. Therefore the closer a Price is in
time to another price, the more like that price it will be. I want the
interpolations for each St and not across St (St  is a factor with levels
A, B, and C). Unfortunately, I get error messages from code I wrote.

In the end only IDs 10 and 14 will receive interpolated values because all
other NAs occur at the beginning of a level.  My code is given below the
dataset.

ID is int
St is factor with 3 levels
timestamp is POSIXlt
Price is num

Data.frame name is portfolio

ID   St   timestamp   Price
1 A2012-01-01 12:50:24.760  NA
2 A2012-01-01 12:51:25.860   72.09
3 A2012-01-01 12:52:21.613   72.09
4 A2012-01-01 12:52:42.010   75.30
5 A2012-01-01 12:52:42.113   75.30
6 B2012-01-01 12:56:20.893   NA
7 B2012-01-01 12:56:46.02367.70
8 B2012-01-01 12:57:19.30076.06
9 B2012-01-01 12:58:20.75077.85
10   B2012-01-01 12:58:20.797  NA
11   B2012-01-01 12:59:19.52779.57
12   C2012-01-01 13:00:21.84781.53
13   C2012-01-01 13:00:21.86081.53
14   C2012-01-01 13:00:21.873   NA
15   C2012-01-01 13:00:43.49384.69
16   D2012-01-01 12:01:21.52024.63
17   D2012-01-01 12:02:18.88021.13

I tried the following using na.approx from zoo package

interpolatedPrice<-unlist(tapply(portfolio$Price, portfolio$St, na.approx,
portfolio$timestamp, na.rm=FALSE))

but keep getting error
"Error in na.approx.default(X[[1L]], ...) :
  x and index must have the same length"

I checked the length of every variable in the formula and they all have the
same length so I am not sure why I get the error message.

Jay

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] First time Rcpp user needs compiler help, I think

2012-10-15 Thread Jay Rice

I am trying to write a C++ function to be called from R and have never done
this before.  I have done the following:
require(Rcpp)
require(inline)
src <- 'blahblahblah'
fun <- cxxfunction(signature(a="numeric",b="numeric"),src,plugin="Rcpp")

That last line generates the error message:

Error in compileCode(f, code, language = language, verbose = verbose) :
  Compilation ERROR, function(s)/method(s) not created!
In addition: Warning message:
running command 'C:/PROGRA~1/R/R-215~1.1/bin/i386/R CMD SHLIB
file26476eb7fd0.cpp 2> file26476eb7fd0.cpp.txt' had status 1

I am assuming that I don't have the necessary compiler installed on my
machine.  I tried installing Rtools, which created a C:/Rtools/ directory
with the files, but I am not sure if my R software knows how to properly
access it.  I was advised to make sure my path is set correctly, but wasn't
sure which path--my library search path in R, my Rtools installation
setting for "PATH", my "PATH" in DOS, or what.

I am operating on Windows 7.

I would greatly appreciate some instructions on how to set this up.  All
the manuals and posts seem to gloss over the details of setting up the C++
compiler (I have looked at The Art of R Programming, and also the
R-admin.pdf file, among other sources).  I assume it is something that can
be done in 5 minutes, if I only knew how ...

Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Parallel computing on Windows (foreach) (Sergey Goriatchev)

2010-06-16 Thread Jay Emerson

foreach (or virtually anything you might use for concurrent programming)
only really makes sense if the work the "clients" are doing is substantial
enough to overwhelm the communication overhead.  And there are many ways to
accomplish the same task more or less efficiently (for example, doing blocks
of tasks in chunks rather than passing each one as an individual job).

But more to the point, doSNOW works just fine on an SMP, no problem, it
doesn't require a cluster.

Jay


<>

Not only is the sequential foreach much slower than the simple
for-loop (as least in this particular instance), but I am not quite
sure how to make foreach run parallel. Where would I get this parallel
backend? I looked at doMC and doRedis, but these do not run on
Windows, as far as I understand. And doSNOW is something to use when
you have a cluster, while I have a simple dual-core PC.

It is not really clear for how to make parallel computing work. Please,
help.

Regards,
Sergey

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay <http://www.stat.yale.edu/%7Ejay>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (help) This is an R workspace memory processing question

2010-06-23 Thread Jay Emerson

You should look at packages like ff, bigmemory, RMySQL, and so on.  However,
you should really consider moving to a different platform for large-data
work (Linux, Mac, or Windows 7 64-bit).

Jay


-

This is an R workspace memory processing question.


  There is a method which from the R will control 10GB data at 500MB units?


  my work environment :

  R version : 2.11.1
  OS : WinXP Pro sp3

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error installing tkrplot

2010-07-17 Thread Jay-pea


Hi,

I've been trying to install tkrplot and have been coming across this error.



Loading required package: tcltk
Loading Tcl/Tk interface ... Error : .onLoad failed in loadNamespace() for
'tcltk', details:
  call: dyn.load(file, DLLpath = DLLpath, ...)
  error: unable to load shared library
'/Library/Frameworks/R.framework/Resources/library/tcltk/libs/i386/tcltk.so':
 
dlopen(/Library/Frameworks/R.framework/Resources/library/tcltk/libs/i386/tcltk.so,
10): Library not loaded: /usr/local/lib/libtcl8.5.dylib
  Referenced from:
/Library/Frameworks/R.framework/Resources/library/tcltk/libs/i386/tcltk.so
  Reason: image not found
Error: package 'tcltk' could not be loaded

I have been clicking on 'tkrplot' in 'R Package manager', yet I get this
error saying that I'm trying to load the package above that in the list
'tcltk'.

Anybody know why this is happening. Is there another way to load it?

Thanks,

James

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Error-installing-tkrplot-tp2292646p2292646.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Prediction plot for logistic regression output

2010-09-23 Thread Jay Vaughan

How do I construct a figure showing predicted value plots for the dependent 
variable as a function of each explanatory variable (separately) using the 
results of a logistic regression? It would also be helpful to know how to show 
uncertainty in the prediction (95% CI or SE).

Thanks-


This email has been processed by SmoothZap - www.smoothwall.net

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merging and working with big data sets

2010-10-12 Thread Jay Emerson

I can't speak for ff and filehash, but bigmemory's data structure
doesn't allow "clever" merges (for actually good reasons).  However,
it is still probably less painful (and faster) than other options,
though we don't implement it: we leave it to the user because details
may vary depending on the example and the code is trivial.

- Allocate an empty new filebacked big.matrix of the proper size.
- Fill it in chunks (typically a column at a time if you can afford
the RAM overhead, or a portion of a column at a time).   Column
operations are more efficient than row operations (again, because of
the internals of the data structure).
- Because you'll be using filebackings, RAM limitations won't matter
other than the overhead of copying each chunk.

I should note: if you used separated=TRUE, each column would have a
separate binary file, and a "smart" cbind() would be possible simply
by manipulating the descriptor file.  Again, not something we advise
or formally provide, but it wouldn't be hard.

Jay

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] big data and lmer

2010-10-22 Thread Jay Emerson

Though bigmemory, ff, and other big data solutions (databases, etc...)
can help easily manage massive data, their data objects are not
natively compatible with all the advanced functionality of R.
Exceptions include lm and glm (both ff and bigmemory support his via
Lumley's biglm package), kmeans, and perhaps a few other things.  In
many cases, it's just a matter of someone deciding to port a
tool/analysis of interest to one of these different object types -- we
welcome collaborators and would be happy to offer advice if you want
to adapt something for bigmemory structures!

Jay

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [Fwd: adding more columns in big.matrix object of bigmemory package]

2010-12-17 Thread Jay Emerson

For good reasons (having to do with avoiding copies of massive things)
we leave such merging to the user: create a new filebacking of the
proper size, and fill it (likely a column at a time, assuming you have
enough RAM to support that).

Jay

On Fri, Dec 17, 2010 at 2:16 AM, utkarshsinghal
 wrote:
>
> Hi,
>
> With reference to the mail below, I have large datasets, coming from various 
> different sources, which I can read into filebacked big.matrix using library 
> bigmemory. I want to merge them all into one 'big.matrix' object. (Later, I 
> want to run regression using library 'biglm').
>
> I am unsuccessfully trying to do this from quite some time now. Can you 
> please suggest some way? Am I missing some already available function?
>
> Even a functionality of the following will work for me:
>
> Just appending more columns in an existing big.matrix object (not merging).
> If the individual datasets are small enough to be read in usual R, just the 
> combined dataset is huge.
>
> Any thoughts are welcome.
>
> Thanks,
> Utkarsh
>
>
>  Original Message 
> Subject: adding more columns in big.matrix object of bigmemory package
> Date: Thu, 16 Dec 2010 18:29:38 +0530
> From: utkarshsinghal 
> To: r help 
>
> Hi all,
>
> Is there any way I can add more columns to an existing filebacked big.matrix 
> object.
>
> In general, I want a way to modify an existing big.matrix object, i.e., add 
> rows/columns, rename colnames, etc.
> I tried the following:
>
> > library(bigmemory)
> > x = 
> > read.big.matrix("test.csv",header=T,type="double",shared=T,backingfile="test.backup",descriptorfile="test.desc")
> > x[,"v4"] = "new"
> Error in mmap(j, colnames(x)) :
>   Couldn't find a match to one of the arguments.
> (The above functionality is presently there in usual data.frames in R.)
>
>
> Thanks in advance,
> Utkarsh
>



--
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bigmemory: Error Running Example

2010-08-11 Thread Jay Emerson

It seems very likely you are working on a 32-bit version of R, but it's a
little surprising still that you would have a problem with any single year.
Please tell us the operating system and version of R.  Did you preprocess
the airline CSV file using the utilities provided on bigmemory.org?  If you
don't, then anything character will be converted to NA.  Is your R
environment empty, or did you have other objects in memory?

It might help to just do some tests yourself:

x <- big.matrix(nrow=100, ncol=10, ... other options .)

Make sure it works, then increase the size until you get a failure.  This
sort of exercise is extremely helpful in situations like this.

Jay


Subject: [R] Bigmemory: Error Running Example
Message-ID:
   

>
Content-Type: text/plain

Hi,

I am trying to run the bigmemory example provided on the
http://www.bigmemory.org/

The example runs on the "airline data" and generates summary of the csv
files:-

library(bigmemory)
library(biganalytics)
x <- read.big.matrix("2005.csv", type="integer", header=TRUE,
backingfile="airline.bin",
 descriptorfile="airline.desc",
extraCols="Age")
summary(x)


This runs fine for the provided csv for year 1987 (size=121MB). However, for
big files like for year 2005 (size=639MB), it gives following errors:-

Error in filebacked.big.matrix(nrow = nrow, ncol = ncol, type = type,  :
 Problem creating filebacked matrix.

Error: object 'x' not found
Error in summary(x) :
 error in evaluating the argument 'object' in selecting a method for
function 'summary'

Here is the output from running the memory.limit() :-
[1] 2047

Here is the output from running the memory.profile() :-

  NULL  symbolpairlist closure environment promise
 19381  3255706477 7443710
  language special builtinchar logical integer
121940 1781600   1506895188981
double complex   character ... anylist
  7983  17   47593   0   04073
 expressionbytecode externalptr weakref raw  S4
 2   0 618 117 1191838


Anyone who has previously worked with bigmemory before could throw some
light on it.
Were you able to run  the examples successfully?

Thanks in advance.

Harsh Yadav

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay <http://www.stat.yale.edu/%7Ejay>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] DNA sequence Fst

2010-08-23 Thread blue jay

Hi,

I want to analyse DNA sequence data (mtDNA) in R as in calculate Fst,
Heterozygosity and such summary statistics. Package Adagenet converts the
DNA sequence into retaining only retaining the polymorphic sites and then
calcuates Fst.. but is there any other way to do this? I mean analyse the
DNA sequence as it is.. and calculate the statistics?


Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bigmemory doubt

2010-09-08 Thread Jay Emerson

By far the easiest way to achieve this would be to use the bigmemory
C++ structures in your program itself.  However, if you do something
on your own (but fundamentally have a column-major matrix in shared
memory), it should be possible to play around with the pointer with
R/bigmemory to accomplish this, yes.  Feel free to email us directly
for advice.

Jay


> Message: 153
> Date: Wed, 8 Sep 2010 10:52:19 +0530 (IST)
> From: "raje...@cse.iitm.ac.in" 
> To: r-help  
> Subject: [R] bigmemory doubt
> Message-ID:
>   <1204692515.13855.1283923339865.javamail.r...@mail.cse.iitm.ac.in>
> Content-Type: text/plain
>
> Hi,

> Is it possible for me to read data from shared memory created by a vc++ 
> program into R using bigmemory?

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] "Bayesian change point" package bcp 2.2.0 available

2010-05-10 Thread Jay Emerson

Version 2.2.0 of package bcp is now available.  It replaces the suggests of
NetWorkSpaces (previously used for optional parallel MCMC) with the
dependency on package foreach, giving greater flexibility and supporting a
wider range of parallel backends (see doSNOW, doMC, etc...).

For those unfamiliar with foreach (thanks to Steve Weston for this
contribution), it's a beautiful and highly portable looping construct which
can run sequentially or in parallel based on the user's actions (rather than
the programmer's choices).  We think other package authors might want to
consider taking advantage of it for tasks that might be computationally
intensive and could be easily done in parallel.  Some vignettes are
available at http://cran.r-project.org/web/packages/foreach/index.html.

Jay Emerson & Chandra Erdman

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] Bayesian change point" package bcp 2.2.0 available

2010-05-17 Thread Jay Emerson

Version 2.2.0 of package bcp is now available.  It replaces the
suggests of NetWorkSpaces (previously used for optional parallel MCMC)
with the dependency on package foreach, giving greater flexibility and
supporting a wider range of parallel backends (see doSNOW, doMC,
etc...).

For those unfamiliar with foreach (thanks to Steve Weston for this
contribution), it's a beautiful and highly portable looping construct
which can run sequentially or in parallel based on the user's actions
(rather than the programmer's choices).  We think other package
authors might want to consider taking advantage of it for tasks that
might be computationally intensive and could be easily done in
parallel.  Some vignettes are available at
http://cran.r-project.org/web/packages/foreach/index.html.

Jay Emerson & Chandra Erdman

(Apologies, the first version of this announcement was not plain-text.)

--
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] bigmemory 4.2.3

2010-05-17 Thread Jay Emerson

The long-promised revision to bigmemory has arrived, with package
4.2.3 now on CRAN.  The mutexes (locks) have been extracted and will
be available through package synchronicity (on R-Forge, soon to appear
on CRAN).  Initial versions of packages biganalytics and bigtabulate
are on CRAN, and new versions which resolve the warnings and have
streamlined CRAN-friendly configurations will appear shortly.  Package
bigalgebra will remain on R-Forge for the time being as the
user-interface is developed and the configuration possibilities
expand.

For more information, please feel free to email us or visit
http://www.bigmemory.org/.

Jay Emerson & Mike Kane

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lm without intercept

2011-02-18 Thread Jay Emerson

No, this is a cute problem, though: the definition of R^2 changes
without the intercept, because the
"empty" model used for calculating the total sums of squares is always
predicting 0 (so the total sums
of squares are sums of squares of the observations themselves, without
centering around the sample
mean).

Your interpretation of the p-value for the intercept in the first
model is also backwards: 0.9535 is extremely
weak evidence against the hypothesis that the intercept is 0.  That
is, the intercept might be near zero, but
could also be something veru different.  With a standard error of 229,
your 95% confidence interval
for the intercept (if you trusted it based on other things) would have
a margin of error of well over 400.  If you
told me that an intercept of, say 350 or 400 were consistent with your
knowledge of the problem, I wouldn't
blink.

This is a very small data set: if you sent an R command such as:

x <- c(x1, x2, ..., xn)
y <- c(y1, y2, ..., yn)

you might even get some more interesting feedback.  One of the many
good intro stats textbooks might
also be helpful as you get up to speed.

Jay
-
Original post:

Message: 135
Date: Fri, 18 Feb 2011 11:49:41 +0100
From: Jan 
To: "R-help@r-project.org list" 
Subject: [R] lm without intercept
Message-ID: <1298026181.2847.19.camel@jan-laptop>
Content-Type: text/plain; charset="UTF-8"

Hi,

I am not a statistics expert, so I have this question. A linear model
gives me the following summary:

Call:
lm(formula = N ~ N_alt)

Residuals:
   Min  1Q  Median  3Q Max
-110.30  -35.80  -22.77   38.07  122.76

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
(Intercept)  13.5177   229.0764   0.059   0.9535
N_alt 0.2832 0.1501   1.886   0.0739 .
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 56.77 on 20 degrees of freedom
 (16 observations deleted due to missingness)
Multiple R-squared: 0.151, Adjusted R-squared: 0.1086
F-statistic: 3.558 on 1 and 20 DF,  p-value: 0.07386

The regression is not very good (high p-value, low R-squared).
The Pr value for the intercept seems to indicate that it is zero with a
very high probability (95.35%). So I repeat the regression forcing the
intercept to zero:

Call:
lm(formula = N ~ N_alt - 1)

Residuals:
   Min  1Q  Median  3Q Max
-110.11  -36.35  -22.13   38.59  123.23

Coefficients:
 Estimate Std. Error t value Pr(>|t|)
N_alt 0.292046   0.007742   37.72   <2e-16 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 55.41 on 21 degrees of freedom
 (16 observations deleted due to missingness)
Multiple R-squared: 0.9855, Adjusted R-squared: 0.9848
F-statistic:  1423 on 1 and 21 DF,  p-value: < 2.2e-16

1. Is my interpretation correct?
2. Is it possible that just by forcing the intercept to become zero, a
bad regression becomes an extremely good one?
3. Why doesn't lm suggest a value of zero (or near zero) by itself if
the regression is so much better with it?

Please excuse my ignorance.

Jan Rheinl?nder


-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Kolmogorov-smirnov test

2011-02-28 Thread Jay Emerson

Taylor Arnold and I have developed a package ks.test (available on R-Forge
in beta version) that modifies stats::ks.test to handle discrete null
distributions
for one-sample tests.  We also have a draft of a paper we could provide (email
us).  The package uses methodology of Conover (1972) and Gleser (1985) to
provide exact p-values.  It also corrects an algorithmic problem with
stats::ks.test
in the calculation of the test statistic.  This is not a bug, per se,
because it was
never intended to be used this way.  We will submit this new function for
inclusion in package stats once we're done testing.

So, for example:
# With the default ks.test (ouch):
> stats::ks.test(c(0,1), ecdf(c(0,1)))

One-sample Kolmogorov-Smirnov test

data:  c(0, 1)
D = 0.5, p-value = 0.5
alternative hypothesis: two-sided

# With our new function (what you would want in this toy example):
> ks.test::ks.test(c(0,1), ecdf(c(0,1)))

One-sample Kolmogorov-Smirnov test

data:  c(0, 1)
D = 0, p-value = 1
alternative hypothesis: two-sided

Original Message:

Date: Mon, 28 Feb 2011 21:31:26 +1100
From: Glen Barnett 
To: tsippel 
Cc: r-help@r-project.org
Subject: Re: [R] Kolmogorov-smirnov test
Message-ID:

Content-Type: text/plain; charset=ISO-8859-1

It's designed for continuous distributions. See the first sentence here:

http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

K-S is conservative on discrete distributions

On Sat, Feb 19, 2011 at 1:52 PM, tsippel  wrote:
> Is the kolmogorov-smirnov test valid on both continuous and discrete data?
> ?I don't think so, and the example below helped me understand why.
>
> A suggestion on testing the discrete data would be appreciated.
>
> Thanks,

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Exception while using NeweyWest function with doMC

2011-08-29 Thread Jay Emerson

Simon,

Though we're please to see another use of bigmemory, it really isn't
clear that it is gaining you
anything in your example; anything like as.big.matrix(matrix(...))
still consumes full RAM for both
the inner matrix() and the new big.matrix -- is the filebacking really
necessary.  It also doesn't
appear that you are making use of shared memory, so I'm unsure what
the gains are.  However,
I don't have any particular insight as to the subsequent problem with
NeweyWest (which doesn't
seem to be using the big.matrix objects).

Jay

--
Message: 32
Date: Sat, 27 Aug 2011 21:37:55 +0200
From: Simon Zehnder 
To: r-help@r-project.org
Subject: [R] Exception while using NeweyWest function with doMC
Message-ID:
   
Content-Type: text/plain

Dear R users,

I am using R right now for a simulation of a model that needs a lot of
memory. Therefore I use the *bigmemory* package and - to make it faster -
the *doMC* package. See my code posted on http://pastebin.com/dFRGdNrG

< snip >
-----

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Installation of bigmemory fails

2011-06-25 Thread Jay Emerson

Premal,

Package authors generally welcome direct emails.

We've been away from this project since the release of 2.13.0 and I only just
noticed the build errors.  These generally occur because of some (usually
small and solvable) problem with compilers and the BOOST libraries.  We'll
look at it and see what we can do.  Please email us if you don't hear back
in the next week or so.

Thanks,

Jay


---
Hello All,
I tried to intall the bigmemory package from a CRAN mirror site and
received the following output while installing. Any idea what's going
on and how to fix it? The system details are provided below.

- begin error messages ---
* installing *source* package 'bigmemory' ...
 checking for Sun Studio compiler...no
 checking for Darwin...yes
** libs
g++45 -I/usr/local/lib/R/include -I../inst/include -fpic  -O2
-fno-strict-aliasing -pipe -Wl,-rpath=/usr/local/lib/gcc45 -c B\
igMatrix.cpp -o BigMatrix.o
g++45 -I/usr/local/lib/R/include -I../inst/include -fpic  -O2
-fno-strict-aliasing -pipe -Wl,-rpath=/usr/local/lib/gcc45 -c S\
haredCounter.cpp -o SharedCounter.o
g++45 -I/usr/local/lib/R/include -I../inst/include -fpic  -O2
-fno-strict-aliasing -pipe -Wl,-rpath=/usr/local/lib/gcc45 -c b\
igmemory.cpp -o bigmemory.o
bigmemory.cpp: In function 'bool TooManyRIndices(index_type)':
bigmemory.cpp:40:27: error: 'powl' was not declared in this scope
*** Error code 1

Stop in /tmp/Rtmpxwe3p4/R.INSTALL4f539336/bigmemory/src.
ERROR: compilation failed for package 'bigmemory'
* removing '/usr/local/lib/R/library/bigmemory'

The downloaded packages are in
   '/tmp/RtmpMZCOVp/downloaded_packages'
Updating HTML index of packages in '.Library'
Making packages.html  ... done
Warning message:
In install.packages("bigmemory") :
 installation of package 'bigmemory' had non-zero exit status
- end error messages -----
It's a 64-bit FreeBSD 7.2 system running R version 2-13.0.
Thanks,
Premal

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory limit for Windows 64bit build of R

2012-08-06 Thread Jay Emerson

Alan,

More RAM will definitely help.  But if you have an object needing more than
2^31-1 ~ 2 billion elements, you'll hit a wall regardless.  This could be
particularly limiting for matrices.  It is less limiting for data.frame
objects (where each column could be 2 billion elements).  But many R
analytics under the hood use matrices, so you may not know up front where
you could hit a limit.

Jay

 Original message 
I have a Windows Server 2008 R2 Enterprise machine, with 64bit R installed
running on 2 x Quad-core Intel Xeon 5500 processor with 24GB DDR3 1066 Mhz
RAM.  I am seeking to analyse very large data sets (perhaps as much as
10GB), without the addtional coding overhead of a package such as
bigmemory().

My question is this - if we were to increase the RAM on the machine to
(say) 128GB, would this become a possibility?  I have read the
documentation on memory limits and it seems so, but would like some
additional confirmation before investing in any extra RAM.
-


-- 
John W. Emerson (Jay)
Associate Professor of Statistics, Adjunct, and Acting Director of Graduate
Studies
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] bigmemory on Solaris

2011-12-01 Thread Jay Emerson

At one point we might have gotten something working (older version?) on
Solaris x86, but were never successful on Solaris sparc that I remember --
it isn't a platform we can test and support.  We believe there are problems
with BOOST library compatibilities.

We'll try (again) to clear up the other warnings in the logs, though.  !-)

We should also revisit the possibility of a CRAN BOOST library for use by a
small group of packages (like bigmemory) which might make patches to BOOST
easier to track and maintain.  This might improve things in the long run.

Jay

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] efficient coding with foreach and bigmemory

2011-09-30 Thread Jay Emerson

First, we strongly recommend 64-bit R.  Otherwise, you may not be able
to scale up as far as you would like.

Second, as I think you realize, with big objects you may have to do
things in chunks.  I generally recommend working a column at a time
rather than in blocks of rows if possible (better performance,
particularly if the filebacking is used because of matrices exceeding
RAM), and you may find that alternative data organization can really
pay off.  Keep an open mind.

Third, you really need to avoid this runif(1,...) usage.  It can't
possibly be efficient.  If a single call to runif() doesn't work,
break it into chunks, certainly, but going down to chunks of size 1
just can't make any sense.

Fourth, although you aren't there yet, once you get to the point you
are trying to do things in parallel with foreach and bigmemory, you
*may* need to place the following inside your foreach loop to make use
of the shared memory properly:

mdesc <- describe(m)
foreach(...) %dopar% {
  require(bigmemory)
  m <- attach.big.matrix(mdesc)
    now operate on m
}

I say *may* because the backend doMC (not available in Windows) does
not require this, but the other backends do; otherwise, the workers
will not be able to properly address the shared-memory or filebacked
big.matrix.  Some documentation on bigmemory.org may help, and feel
free to email us directly.

Jay


-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Foreach (doMC)

2011-10-19 Thread Jay Emerson

> P.S. Is there any particular reason why there are so seldom answers to posts 
> regarding foreach and all these doMC/doSMP packages ?  Do so few people use 
> these packages or does this have anything to do with the commercial origin of 
> these packages?

Jannis,

An interesting question.  I'm a huge fan of foreach and the parallel
backends, and have used foreach in some of my packages.  It leaves the
choice of backend to the user, rather than forcing some environment.
If you like multicore, great -- the package doesn't care.  Someone
else may use doSNOW.  No problem.

To answer your question, foreach was originally written by (primarily,
at least) Steve Weston, previously of REvolution Computing.  It, along
with some of the parallel backends (perhaps all at this point, I'm out
of touch) are available open-source.  Hence, I'd argue that the
"commercial origin" is a moot point -- it doesn't matter, it will
always be available, and it's really useful.  Steve is no longer with
REvolution, however, and I can't speak for the responsiveness/interest
of current REvolution folks on this point.  Scanning R-help daily for
things relating to my own packages is something I try to do, but it
doesn't always happen.

I would like to think foreach is widely used -- it does have a growing
list of reverse depends/suggests.  And was updated as recently as last
May, I just noticed.
http://cran.r-project.org/web/packages/foreach/index.html

Jay

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Foreach (doMC)

2011-10-20 Thread Jay Emerson

Jannis,

I'm not complete sure I understand your first point, but maybe someone
from REvolution will weigh in.  Nobody is forcing anyone to purchase
any products, and there are attractive alternatives such as the CRAN R
and R Studio (to name two).  This issue has arisen many times of the
various lists and you are welcome to search the archives and read many
very intelligent, thoughtful opinions.

As for foreach, etc... if you have fairly focused questions
(preferably with a reproducible example if there is a problem) and if
you have done reading on examples available on using it, then you
might try joining the r-sig-...@r-project.org group.  Clearly there
are far more users of "core" R and hence "mainstream" questions on
r-help are likely to be answered more quickly (on average) than
specialized questions.

Regards,

Jay

On Thu, Oct 20, 2011 at 4:27 PM, Jannis  wrote:
> Dear list members, dear Jay,
>
> Well, I personally do not care about Revolutions Analytics selling their
> products as this is also included into the idea of many open source
> licences. Especially as Revolutions provide their packages to the community
> and its is everybodies personal choice to buy their special R version.
>
> I was just wondering about this issue as usually most questions on r-help
> are answered pretty soon and by many different people and I had the
> impression that this is not the case for posts regarding the
> foreach/doMC/doSMP etc packages. This may, however, be also due to the
> probably limited use of these packages for most users who do not need these
> high performance computing things. Or it was just my personal perception or
> pure chance.
>
> Thanks however, to the authors of such packages! They were of great help to
> me on several ocasions and I have deep respect for everybody devoting his
> time to open source software!
>
> Jannis
>
>
>
> On 10/19/2011 01:26 PM, Jay Emerson wrote:
>>>
>>> P.S. Is there any particular reason why there are so seldom answers to
>>> posts regarding foreach and all these doMC/doSMP packages ?  Do so few
>>> people use these packages or does this have anything to do with the
>>> commercial origin of these packages?
>>
>> Jannis,
>>
>> An interesting question.  I'm a huge fan of foreach and the parallel
>> backends, and have used foreach in some of my packages.  It leaves the
>> choice of backend to the user, rather than forcing some environment.
>> If you like multicore, great -- the package doesn't care.  Someone
>> else may use doSNOW.  No problem.
>>
>> To answer your question, foreach was originally written by (primarily,
>> at least) Steve Weston, previously of REvolution Computing.  It, along
>> with some of the parallel backends (perhaps all at this point, I'm out
>> of touch) are available open-source.  Hence, I'd argue that the
>> "commercial origin" is a moot point -- it doesn't matter, it will
>> always be available, and it's really useful.  Steve is no longer with
>> REvolution, however, and I can't speak for the responsiveness/interest
>> of current REvolution folks on this point.  Scanning R-help daily for
>> things relating to my own packages is something I try to do, but it
>> doesn't always happen.
>>
>> I would like to think foreach is widely used -- it does have a growing
>> list of reverse depends/suggests.  And was updated as recently as last
>> May, I just noticed.
>> http://cran.r-project.org/web/packages/foreach/index.html
>>
>> Jay
>>
>
>



-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bigmemory

2012-05-11 Thread Jay Emerson

To answer your first question about read.big.matrix(), we don't know what
your acc3.dat file is, but it doesn't appear to have been detected as a
standard file (like a CSV file) or -- perhaps -- doesn't even exist (or
doesn't exist in your current directory)?

Next:

> In addition, I am planning to do a multiple imputation with MICE package
> using the data read by bigmemory package.
> So usually, the multiple imputation code is like this:
 > imp=mice(data.frame,m=50,seed=1234,print=F)
> the data.frame is required. How can I change the big.matrix class
> generated by bigmemory package to a data.frame?

Please read the help files for bigmemory -- only matrix-like objects are
supported.  However, the more serious problem is that you can't expect to
run just any R function on a big.matrix (or on an ff object, if you check
out ff for some nice features).  In particular, for large data sets you
would likely use up all of RAM (other reasons are more subtle and
important, but out of place in this reply).

Jay

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bigmemory

2012-05-11 Thread Jay Emerson

R internally uses 32-bit integers for indexing (though this may change).
For this and other reasons these external objects with specialized purposes
(larger-than-RAM, shared memory) simply can't behave exactly as R objects.
Best case, some R functions will work.  Others would simply break.  Others
would perhaps work if the problem is small enough, but would choke in the
creation of temporary objects in memory.

I understand your sentiment, but it isn't that easy.  If you are
interested, however, we do provide examples of authoring functions in C++
which can work interchangeably on both matrix and big.matrix objects.

Jay

 Hi Jay,
>
> I have a question about your reply.
>
> You mentioned that "the more serious problem is that you can't expect to
> run just any R function on a big.matrix (or on an ff object, if you check
> out ff for some nice features).  "
>
> I am confused why the packages could not communicate with each other. I
> understand that maybe for some programming or statistical reasons, one
> package need its own "class" so that specific algorithm can be implemented.
> However, R as a statistical programming environment, one of its advantages
> is the abundance of the packages under R structure. If different packages
> generate different kinds of object and can not be recognized and used for
> further analysis by other packages, then each package would appears to be
> similar with the normal independent software, e.g., SAS, MATLAB... then
> this could reduce the whole R ability for handling complicated analysis
> situation.
>
> This is just a general thought.
>
> Thank you very much.
>
> --
> ya
>
> --
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] Package bigmemory now available on CRAN

2008-06-26 Thread Jay Emerson

Package "bigmemory" is now available on CRAN.  A brief abstract follows:

Multi-gigabyte data sets challenge and frustrate R users even on
well-equipped hardware.
C/C++ and Fortran programming can be helpful, but is cumbersome for interactive
data analysis and lacks the flexibility and power of R's rich
statistical programming environment.
The new package bigmemory bridges this gap, implementing massive matrices
in memory (managed in R but implemented in C++) and supporting their
basic manipulation
and exploration. It is ideal for problems involving the analysis in R
of manageable
subsets of the data, or when an analysis is conducted mostly in C++.
In a Unix environment,
the data structure may be allocated to shared memory with transparent read
and write locking, allowing separate processes on the same computer to
share access to a
single copy of the data set. This opens the door for more powerful
parallel analyses and
data mining of massive data sets.

-- 
John W. Emerson (Jay)
Assistant Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

___
R-packages mailing list
[EMAIL PROTECTED]
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R package building

2008-05-17 Thread Jay Emerson

I agree with others that the packaging system is generally easy to
use, and between the "Writing R Extensions" documentation and other
scattered sources (including these lists) there shouldn't be many
obstacles.  Using "package.skeleton()" is a great way to get started:
I'd recommend just having one data object and one new function in the
session for starters.  You can build up from there.

I've only ran into time-consuming walls on more advanced, obscure
issues.  For example: the "Suggests:" field in DESCRIPTION generated
quite some debate back in 2005, but until I found that thread in the
email lists I didn't understand the issue.  For completeness, I'll
round out this discussion, hoping I'm correct.  In essence, I think
the choice of the word "Suggests:" was intended for the package user,
not for the developer.  The user isn't required to have a suggested
package in order to load and use the desired package.  But the
developer is required (in the R CMD check) to have the suggested
package in order to avoid warnings or fails.  This does, actually,
make sense, because we assume a developer would want/need to check
features that involve the suggested package.  In a few isolated cases
(I think I had one of them), this caused a problem, where a desired
suggested package isn't distributed by CRAN on all platforms, so I
would risk getting into trouble with R CMD check on the platform
without the suggested package.  But this is pretty obscure, and the
issue was obviously well-debated in the past.  The addition of a line
or two about this in "Writing R Extensions" would be friendly (the
current content is correct and minimal sufficient I believe).  Maybe I
should draft this and submit it to the group.

Secondly, I would advice a newbie to the packaging system to avoid S4
at first.  Ultimately, I think it's pretty cool.  But, for example,
documentation on proper documentation (to handle the man pages
correctly) has puzzled me, and even though I can create a package with
S4 that passes R CMD check cleanly, I'm not convinced I've got it
quite right.  If someone has recently created more documentation or a
'white pages' on this, please do spread the word.

Thanks to all who have -- and continue -- to work on the system!

Jay

>Subject: [R] R package building
>
>In a few days I'll give a talk on R package development and my
>personal experience, at the 3rd Free / Libre / Open Source Software
>(FLOSS) Conference which will take place on May 27th & 28th 2008, in
>the National Technical University of Athens, in Greece.
>
>I would appreciate if you could share
>your thoughts with me; What are today's obstacles on R package
>building, according to your
>opinion and personal experience.
>
>Thanks,
>--
>Angelos I. Markos
>Scientific Associate, Dpt of Exact Sciences, TEI of Thessaloniki, GR
>"I'm not an outlier; I just haven't found my distribution yet"



-- 
John W. Emerson (Jay)
Assistant Professor of Statistics
Director of Graduate Studies
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using very large matrix

2009-02-26 Thread Jay Emerson

Corrado,

Package bigmemory has undergone a major re-engineering and will be available
soon (available now in Beta version upon request).  The version
currently on CRAN
is probably of limited use unless you're in Linux.

bigmemory may be useful to you for data management, at the very least, where

x <- filebacked.big.matrix(8, 8, init=n, type="double")

would accomplish what you want using filebacking (disk space) to hold
the object.
But even this requires 64-bit R (Linux or Mac, or perhaps a Beta
version of Windows 64-bit
R that REvolution Computing is working on).

Subsequent operations (e.g. extraction of a small portion for analysis) are then
easy enough:

y <- x[1,]

would give you the first row of x as an object y in R.  Note that x is
not itself an R matrix,
and most existing R analytics can't work on x directly (and would max
out the RAM if they
tried, anyway).

Feel free to email me for more information (and this invitation
applies to anyone who is
interested in this).

Cheers,

Jay

#Dear friends,
#
#I have to use a very large matrix. Something of the sort of
#matrix(8,8,n)  where n is something numeric of the sort 0.xx
#
#I have not found a way of doing it. I keep getting the error
#
#Error in matrix(nrow = 8, ncol = 8, 0.2) : too many elements specified
#
#Any suggestions? I have searched the mailing list, but to no avail.
#
#Best,
#--
#Corrado Topi
#
#Global Climate Change & Biodiversity Indicators
#Area 18,Department of Biology
#University of York, York, YO10 5YW, UK
#Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk



-- 
John W. Emerson (Jay)
Assistant Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using very large matrix

2009-03-02 Thread Jay Emerson

Steve et.al.,

The old version is still on CRAN, but I strongly encourage anyone
interested to email me directly and I'll make the new version available.
In fact, I wouldn't mind just pulling the old version off of CRAN, but of course
that's not a great idea.  !-)

Jay


On Mon, Mar 2, 2009 at 8:47 AM,   wrote:
> I'm very interested in the bigmemory package for windows 32-bit
> environments.  Who do I need to contact to request the Beta version?
>
> Thanks
> Steve
>
> Steve Friedman Ph. D.
> Spatial Statistical Analyst
> Everglades and Dry Tortugas National Park
> 950 N Krome Ave (3rd Floor)
> Homestead, Florida 33034
>
> steve_fried...@nps.gov
> Office (305) 224 - 4282
> Fax     (305) 224 - 4147
>
>
>
>             Corrado
>                          >                                                          To
>             Sent by:                  john.emer...@yale.edu, Tony Breyal
>             r-help-boun...@r-         
>             project.org                                                cc
>                                       r-help@r-project.org
>                                                                   Subject
>             03/02/2009 10:46          Re: [R] Using very large matrix
>             AM GMT
>
>
>
>
>
>
>
>
>
> Thanks a lot!
>
> Unfortunately, the R package I have to sue for my research was only
> released
> on 32 bit R on 32 bit MS Windows and only closed source   I normally
> use
> 64 bit R on 64 bit Linux  :)
>
> I tried to use the bigmemory in cran with 32 bit windows, but I had some
> serious problems.
>
> Best,
>
> On Thursday 26 February 2009 15:43:11 Jay Emerson wrote:
>> Corrado,
>>
>> Package bigmemory has undergone a major re-engineering and will be
>> available soon (available now in Beta version upon request).  The version
>> currently on CRAN
>> is probably of limited use unless you're in Linux.
>>
>> bigmemory may be useful to you for data management, at the very least,
>> where
>>
>> x <- filebacked.big.matrix(8, 8, init=n, type="double")
>>
>> would accomplish what you want using filebacking (disk space) to hold
>> the object.
>> But even this requires 64-bit R (Linux or Mac, or perhaps a Beta
>> version of Windows 64-bit
>> R that REvolution Computing is working on).
>>
>> Subsequent operations (e.g. extraction of a small portion for analysis)
> are
>> then easy enough:
>>
>> y <- x[1,]
>>
>> would give you the first row of x as an object y in R.  Note that x is
>> not itself an R matrix,
>> and most existing R analytics can't work on x directly (and would max
>> out the RAM if they
>> tried, anyway).
>>
>> Feel free to email me for more information (and this invitation
>> applies to anyone who is
>> interested in this).
>>
>> Cheers,
>>
>> Jay
>>
>> #Dear friends,
>> #
>> #I have to use a very large matrix. Something of the sort of
>> #matrix(8,8,n)  where n is something numeric of the sort
>> 0.xx #
>> #I have not found a way of doing it. I keep getting the error
>> #
>> #Error in matrix(nrow = 8, ncol = 8, 0.2) : too many elements
>> specified #
>> #Any suggestions? I have searched the mailing list, but to no avail.
>> #
>> #Best,
>> #--
>> #Corrado Topi
>> #
>> #Global Climate Change & Biodiversity Indicators
>> #Area 18,Department of Biology
>> #University of York, York, YO10 5YW, UK
>> #Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk
>
>
> --
> Corrado Topi
>
> Global Climate Change & Biodiversity Indicators
> Area 18,Department of Biology
> University of York, York, YO10 5YW, UK
> Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>



-- 
John W. Emerson (Jay)
Assistant Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 167 matches

Mail list logo