Re: [R] Problems using gamlss to model zero-inflated and overdispersed count data: "the global deviance is increasing"

2010-06-03 Thread Gavin Simpson
On Wed, 2010-06-02 at 13:45 +0200, Joris Meys wrote:
> Hi Diederik,
> 
> I can't say immediately why your models fail without seeing the data and
> running a number of tests. You could play around with the other parameters,
> as the problem might be related to the optimization algorithm.
> 
> A different approach would be using the methods in the vegan package. This
> package contains a set of ordination methods that allow to map environmental
> variables on the ordination space.

vegan is probably not too useful here as the response is univariate;
counts of ducks.

> You could also use the hurdle model from the pscl library, but this is not
> guaranteed to work better. It is not a zero-inflation model, but a
> two-component model that fits the zeros using eg a binomial and the rest of
> the counts using eg a poisson or a negative binomial.

zeroinfl() in the same (pscl) package will fit ZIP and ZINB models.
Neither of these will be a GAM though, so if the OP was wanting smooth
functions of covariates they may not so useful, although bs() in package
splines might be used to include smooth terms into these functions.

gam() in mgcv has the negative binomial family that might be a useful
starting point to modelling the overdispersion.

Package vgam has functions to fit ZIP/ZINB or the hurdle models
mentioned above with smooth functions of covariates.

In any case the OP should probably seek help directly from the gamlss
package maintainers to get gamlss working on their data or suggest why
the fitting isn't working.

HTH

G

> 
> Cheers
> Joris
> 
> On Wed, Jun 2, 2010 at 12:56 PM, Strubbe Diederik  > wrote:
> 
> > Dear all,
> >
> > I am using gamlss (Package gamlss version 4.0-0, R version 2.10.1, Windows
> > XP Service Pack 3 on a HP EliteBook) to relate bird counts to habit
> > variables. However, most models fail because  the global deviance is
> > increasing and I am not sure what causes this behaviour. The dataset
> > consists of counts of birds (duck) and 5 habit variables measured in the
> > field (n= 182). The dependent variable (the number of ducks
> > counted)suffers from zero-inflation and overdisperion:
> >
> > > proportion_non_zero <- (sum(ifelse(data$duck == 0,0,1))/182)
> > > mean <- mean(data$duck)
> > > var <- var(data$duck)
> > > proportion_non_zero
> > [1] 0.1153846
> > > mean
> > [1] 1.906593
> > > var
> > [1] 37.35587
> >
> > (I have no idea how to simulate a zero-inflated overdispersed Poisson
> > variable, but the data used can be found at
> > http://www.ua.ac.be/main.aspx?c=diederik.strubbe&n=23519).
> >
> >
> > First, I create a (strong) pattern in the dataset by:
> > data$LFAP200 <- data$LFAP200 + (data$duck*data$duck)
> >
> > I try to analyze these data by fitting several possible distributions
> > (Poisson PO, zero-inflated Poisson ZIP, negative binomial type I and type II
> > NBI NBII and zero-inflated negative binomial ZINBI) while using cubic
> > splines with a df=3. The best fitting model will then be choses on the basis
> > of its AIC.
> >
> > However, these models frequently fail to converge, and I am not sure why,
> > and what to do about it. For example:
> >
> > > model_Poisson <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family=
> > PO)
> > GAMLSS-RS iteration 1: Global Deviance = 1350.623
> > GAMLSS-RS iteration 2: Global Deviance = 1350.623
> >
> > > model_ZIPoisson <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family=
> > ZIP)
> > GAMLSS-RS iteration 1: Global Deviance = 326.3819
> > GAMLSS-RS iteration 2: Global Deviance = 225.1232
> > GAMLSS-RS iteration 3: Global Deviance = 319.9663
> > Error in RS() : The global deviance is increasing
> >  Try different steps for the parameters or the model maybe inappropriate
> > In addition: There were 14 warnings (use warnings() to see them)
> >
> > > model_NBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family=
> > NBI)
> > GAMLSS-RS iteration 1: Global Deviance = 291.8607
> > GAMLSS-RS iteration 2: Global Deviance = 291.3291
> > ##..##
> > GAMLSS-RS iteration 4: Global Deviance = 291.1135
> > GAMLSS-RS iteration 20: Global Deviance = 291.107
> > Warning message:
> > In RS() : Algorithm RS has not yet converged
> >
> > > model_NBII <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family=
> > NBII)
> > GAMLSS-RS iteration 1: Global Deviance = 284.5993
> > GAMLSS-RS iteration 2: Global Deviance = 281.9548
> > ##..##
> > GAMLSS-RS iteration 5: Global Deviance = 280.7311
> > GAMLSS-RS iteration 15: Global Deviance = 280.6343
> >
> > > model_ZINBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,fam

Re: [R] Sweave glm.fit

2010-06-03 Thread Gavin Simpson
On Wed, 2010-06-02 at 20:29 +0200, Jimmy Söderly wrote:
> Dear R users,
> 
> After running Sweave, this is what I get :
> 
> Warning messages:
> 1: glm.fit: algorithm did not converge
> 2: glm.fit: algorithm did not converge
> There is no glm.fit function in my code.

glm calls glm.fit did you call glm() or any code that might use glm()
(or glm.fit()). glm.fit() is the function that does the actual model
fitting etc, with glm() being a convenience wrapper providing the
formula interface etc.

> Where does it come from ? From Sweave ? From system.time ?

Your call to glm() or any function that calls glm(). This is nothing to
sweave or system.time.

Your model fitting has failed to converge. There may be many reasons why
glm.fit is failing to converge for the model your are fitting.

Either extract the R code or tangle the sweave file to do the
extraction, and then step through the code to see where the warning is
being generated. Then figure out why glm.fit is not converging (you
might need to up the number of iterations or provide better starting
values or rethink the type of model you are trying to fit).

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] intersection on more than two sets

2010-06-03 Thread amir
Hi,

I want to do an intersection or union on more than two sets.
Is there any command to do this?

Regards,
Amir

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] compare results of glms

2010-06-03 Thread Sacha Viquerat

dear list!
i have run several glm analysises to estimate a mean rate of dung decay 
for independent trials. i would like to compare these results 
statistically but can't find any solution. the glm calls are:


dung.glm1<-glm(STATE~DAYS, data=o_cov, family="binomial(link="logit"))

dung.glm2<-glm(STATE~DAYS, data=o_cov_T12, family="binomial(link="logit"))

as all the trials have different sample sizes (around 20 each),

anova(dung.glm1, dung.glm2)

is not applicable. has anyone an idea?
thanks in advance!

ps: my advisor urges me to use the z-test (the common test statistic in 
my field of research), but i reject that due to the small sample size.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] intersection on more than two sets

2010-06-03 Thread jim holtman
?Reduce

> x <- function(z) Reduce('union', z)
> x(list(1:3, 4:6))
[1] 1 2 3 4 5 6
> x(list(1:3, 4:6, 3:7))
[1] 1 2 3 4 5 6 7
> x <- function(z) Reduce('intersect', z)
> x(list(1:3, 4:6, 3:7))
integer(0)
> x(list(1:3, 2:6, 3:7))
[1] 3


On Thu, Jun 3, 2010 at 3:59 AM, amir  wrote:
> Hi,
>
> I want to do an intersection or union on more than two sets.
> Is there any command to do this?
>
> Regards,
> Amir
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] compare results of glms

2010-06-03 Thread Sacha Viquerat

dear list!
i have run several glm analysises to estimate a mean rate of dung decay 
for independent trials. i would like to compare these results 
statistically but can't find any solution. the glm calls are:


dung.glm1<-glm(STATE~DAYS, data=o_cov, family="binomial(link="logit"))

dung.glm2<-glm(STATE~DAYS, data=o_cov_T12, family="binomial(link="logit"))

as all the trials have different sample sizes (around 20 each),

anova(dung.glm1, dung.glm2)

is not applicable. has anyone an idea?
thanks in advance!

ps: my advisor urges me to use the z-test (the common test statistic in 
my field of research), but i reject that due to the small sample size.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] increase memory

2010-06-03 Thread azam jaafari
Dear All
 
Pleas help me to increase the memory in R. 
 
In order increase memory, I read the FAQ and follow the instruction as below
 
 
Close R, then right-click on your R program icon (the icon on your desktop or 
in your programs directory). Select ``Properties'', and then select the 
``Shortcut'' tab. Look for the ``Target'' field and after the closing quotes 
around the location of the R executible, add 
 --max-mem-size=500M 
It does not work. 

When I add the --max-mem-size=500M in Target field. There is error like as 
below: 
“The name "C:\Program Files\R\R-2.11.0\bin\Rgui.exe"--max-mem-size=500M 
specified in the Target box is not valid” 
 
I use R2.11.0. in window vista with 3Gb RAM. 
 
Have a nice day
 
Azam


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] increase memory

2010-06-03 Thread Henrik Bengtsson
On Thu, Jun 3, 2010 at 11:11 AM, azam jaafari  wrote:
> Dear All
>
> Pleas help me to increase the memory in R.
>
> In order increase memory, I read the FAQ and follow the instruction as below
>
>
> Close R, then right-click on your R program icon (the icon on your desktop or 
> in your programs directory). Select ``Properties'', and then select the 
> ``Shortcut'' tab. Look for the ``Target'' field and after the closing quotes 
> around the location of the R executible, add
>  --max-mem-size=500M
> It does not work.
>
> When I add the --max-mem-size=500M in Target field. There is error like as 
> below:
> “The name "C:\Program Files\R\R-2.11.0\bin\Rgui.exe"--max-mem-size=500M 
> specified in the Target box is not valid”

You need a space between

"C:\Program Files\R\R-2.11.0\bin\Rgui.exe"

and

--max-mem-size=500M

The first is the application that Windows calls, and then following
parts are options that Windows pass to that application.  Without the
space, Windows things all of it is the application to be called.

/H
>
> I use R2.11.0. in window vista with 3Gb RAM.
>
> Have a nice day
>
> Azam
>
>
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lapply or data.table to find a unit's previous transaction

2010-06-03 Thread Matthew Dowle
William,

Try a rolling join in data.table, something like this (untested) :

setkey(Data, UnitID, TranDt)# sort by unit then date
previous = transform(Data, TranDt=TranDt-1)
Data[previous,roll=TRUE]# lookup the prevailing date before, if any, 
for each row within that row's UnitID

Thats all it is, no loops required. That should be fast and memory 
efficient. 100's of times faster than a subquery in SQL.

If you have trouble please follow up on datatable-help.

Matthew


"William Rogers"  wrote in message 
news:aanlktikk_avupm7j108iseryo9fucpnjhanxpaqvt...@mail.gmail.com...
I have a dataset of property transactions that includes the
transaction ID (TranID), property ID (UnitID), and transaction date
(TranDt). I need to create a data frame (or data table) that includes
the previous transaction date, if one exists.
This is an easy problem in SQL, where I just run a sub-query, but I'm
trying to make R my one-stop-shopping program. The following code
works on a subset of my data, but I can't run this on my full dataset
because my computer runs out of memory after about 30 minutes. (Using
a 32-bit machine.)
Use the following synthetic data for example.

n<- 100
TranID<- lapply(n:(2*n), function(x) (
as.matrix(paste(x, sample(seq(as.Date('2000-01-01'),
as.Date('2010-01-01'), "days"), sample(1:5, 1)), sep= "D"), ncol= 1)))
TranID<- do.call("rbind", TranID)
UnitID<- substr(TranID, 1, nchar(n))
TranDt<- substr(TranID, nchar(n)+2, nchar(n)+11)
Data<- data.frame(TranID= TranID, UnitID= UnitID, TranDt= as.Date(TranDt))

#First I create a list of all the previous transactions by unit

TranList<- as.matrix(Data$TranID, ncol= 1)
PreTran<- lapply(TranList,
function(x) (with(Data,
Data[
UnitID== substr(x, 1, nchar(n))&
TranDt< Data[TranID== x, "TranDt"], ]
))
)

#I do get warnings about missing data because some transactions have
no predecessor.
#Some transactions have no previous transactions, others have many so
I pick the most recent

BeforeTran<- lapply(seq_along(PreTran), function(x) (
with(PreTran[[x]], PreTran[[x]][which(TranDt== max(TranDt)), ])))

#I need to add the current transaction's TranID to the list so I can merge 
later

BeforeTran<- lapply(seq_along(PreTran), function(x) (
transform(BeforeTran[[x]], TranID= TranList[x, 1])))

#Finally, I convert from a list to a data frame

BeforeTran<- do.call("rbind", BeforeTran)

#I have used a combination of data.table and for loops, but that seems
cheesey and doesn't preform much better.

library(data.table)

#First I create a list of all the previous transactions by unit

TranList2<- vector(nrow(Data), mode= "list")
names(TranList2)<- levels(Data$TranID)
DataDT<- data.table(Data)

#Use a for loop and data.table to find the date of the previous transaction

for (i in levels(Data$TranID)) {
if (DataDT[UnitID== substr(i, 1, nchar(n))&
TranDt<= (DataDT[TranID== i, TranDt]),
length(TranDt)]> 1)
TranList2[[i]]<- cbind(TranID= i,
DataDT[UnitID== substr(i, 1, nchar(n))&
TranDt< (DataDT[TranID== i, TranDt]),
list(TranDt= max(TranDt))])
}

#Finally, I convert from a list to a data table

BeforeTran2<- do.call("rbind", TranList2)

#My intution says that this code doesn't take advantage of
data.table's attributes.
#Are there any ideas out there? Thank you.
#P.S. I've tried plyr and it does not help my memory problem.

--
William H. Rogers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help barplots

2010-06-03 Thread Jim Lemon

On 06/03/2010 01:27 AM, khush  wrote:

Hi all,

I want to draw a arrow (small) of any length along with OsCYP right side i.e
parrallel and antiparallel arrows both

  text(os[1], 10.2,  pos = 4, "OsCYP", font=1, cex = 1,  col = "red")

and

how can I connect two points with dotted line lets say

bp[1 ]10.2  bp[3], 15.2

how to make a dotted line using R, is it possible


Hi khush,

I'm not sure what you want to do with the arrows, but look at the 
"arrows" function.


for the second one, look at the "segments" function, and the "lty" argument.

segments(10.2,y,15.2,y,lty=2)

It also helps if you change the subject line when you ask a new question.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pdf function, resize xyplot plot automatically

2010-06-03 Thread Patrick Connolly
On Wed, 02-Jun-2010 at 02:34PM -0400, Afshartous, David wrote:

|> 
|> All,
|> 
|> When saving plots to a pdf file via the pdf function, I would like to be
|> able to automatically expand the graphics device to achieve the same result
|> as when one does this manually (e.g., clicking the green expand button on
|> the upper left of the graph on Mac OS).   Consider simple example below:

If it's a file you want to keep, it makes more sense IMHO to keep a
record of just what you did to get the size you want.  For that
reason, it's a better idea to set the width &/or height in the call to
pdf().  Peter Ehlers has told you how to use the 'width' argument.

For interactive use, the normal X11 window can be resized in the way
you would probably like.  I quite like the way R offers you two quite
different ways of doing such things.

[...]

HTH

-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___Patrick Connolly   
 {~._.~}   Great minds discuss ideas
 _( Y )_ Average minds discuss events 
(:_~*~_:)  Small minds discuss people  
 (_)-(_)  . Eleanor Roosevelt
  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] function

2010-06-03 Thread n.via...@libero.it

Dear list,
I would like to ask you a question. I'm trying to build  the time series' 
production with the Divisia index. The final step would require to do the 
following calculations:
a)PROD(2006)=PROD(2007)/1+[DELTA_PROD(2007)]
b)PROD(2005)=PROD(2006)+[1+DELTA_PROD(2006)]
c)PROD(2004)=PROD(2005)+[1+DELTA_PROD(2005)]
my question is how can I tell R to take the value generated in the previous 
step (for example is the case of the produciton of 2005 that need the value of 
the production of 2006) in order to generate the time series production??
(PS:my data.frame is not set as a time series)
Thanks for your attention!!



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ISO 8601 Weeks/Years on Windows with strptime

2010-06-03 Thread Michael Höhle
Dear R-help,

I am working on a R package for public health surveillance where the ISO
8601 representation of dates is of importance. Especially, the ISO Week
and ISO Year of a date needs to be extracted. I was quite happy to find
all of this implemented in the "Date" class with appropriate calls to
strptime/format (using e.g. %G and %V).

However, only later I realized that this functionality is currently not
implemented on Windows (I'm a happy Mac/Linux user). As this seriously
limits the applicability, I would like to enquire, if there are any plans
to make this functionality available in Windows as well? Or are there any
good workarounds to make

> format.Date("2001-12-31", "%G")

give "2002" instead of "" on Windows?

Best regards,

Michael Höhle

--

> sessionInfo()
R version 2.10.0 (2009-10-26)
i386-pc-mingw32

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252

attached base packages:
[1] grid  stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
 [1] surveillance_1.1-6 Matrix_0.999375-31 msm_0.9.7  vcd_1.2-8
 [5] colorspace_1.0-1   MASS_7.3-3 maptools_0.7-34lattice_0.17-26
 [9] foreign_0.8-40 sp_0.9-62  spc_0.3xtable_1.5-6

loaded via a namespace (and not attached):
[1] mvtnorm_0.9-9   splines_2.10.0  survival_2.35-7 tools_2.10.0
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rules for optimizing samples in CLARA (size and numbers) ?

2010-06-03 Thread vincent vicaire
Hi,

With  a 9000 observations dataset, I have noticed a significant variability
in the silhouette index when I change the default value for samples (5
default value) and sampsize (40+2*clusters number) in CLARA.

Is there somes rules according to the number of cluster and observations to
fix samples and sampsize parameters efficiently, so as to avoid under- and
oversampling with CLARA in one hand and keeping a good time running in other
hand ?

I didn't not find any rules of this type on the web (except avoiding biaised
samples...).

Gratefully yours.
vincent

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mlogit and weights

2010-06-03 Thread Achim Zeileis

On Wed, 2 Jun 2010, Misha Spisok wrote:


Hello,

I can't figure out why using and not using weights in mlogit yields
identical results.  My motivation is for the case when an
"observation" or "individual" represents a number of individuals.  For
example,

library(mlogit)
library(AER)
data("TravelMode", package = "AER")
TM <- mlogit.data(TravelMode, choice = "choice", shape = "long",
alt.levels = c("air", "train", "bus", "car"))
myweight = rep(floor(1000*runif(nrow(TravelMode)/4)), each = 4)

summary(mlogit(choice ~ wait + vcost + travel + gcost, data=TM))
summary(mlogit(choice ~ wait + vcost + travel + gcost, weights=income, data=TM))
summary(mlogit(choice ~ wait + vcost + travel + gcost,
weights=myweight, data=TM))

Each gives the same result.


I can't replicate that. For me all three give different results. For 
example, the first two (which do not contain random elements) are


   alttrain  altbus  altcarwait   vcost  travel
-0.84413818 -1.44150828 -5.20474275 -0.10364955 -0.08493182 -0.01333220
  gcost
 0.06929537

and

   alttrain  altbus  altcarwait   vcost  travel
-1.56910793 -1.67020936 -5.44725428 -0.11157800 -0.08866886 -0.01435371
  gcost
 0.08087749

respectively. I'm using the current "mlogit" version from CRAN: 0.1-7.


Am I specifying "weights" incorrectly?


Yes, I think so.


Is there a better way to do what I want to do?  That is, if "myweight"
contains the number of observations represented by an "observation,"
is this the correct approach?


You will get the correct parameter estimates but not the correct 
inference. Following most of the basic model fitting function (such as 
lm() or glm()), the weights are _not_ interpreted as case weights. I.e., 
the function treats

  length(weights > 0)
as the number of observations and not
  sum(weights)

A simple example using lm():

  x <- 1:5
  y <- c(0, 2, 1, 4, 5)
  w <- rep(2, 5)
  xx <- c(x, x)
  yy <- c(y, y)

Then you can fit both models

  fm1 <- lm(y ~ x, weights = w)
  fm2 <- lm(yy ~ xx)

and you get the same coefficients

  all.equal(coef(fm1), coef(fm2))

(which only mentions that the strings 'xx' and 'x' are different.) But fm1 
thinks 2 parameters have been estimated from 5 observations while the 
latter thinks 2 parameters have been estimated from 10 observations. Hence


  df.residual(fm1) / df.residual(fm2)
  vcov(fm2) / vcov(fm1)

Hope that helps,
Z



If so, what am I doing wrong?  If not,
what suggestions are there?

Thank you for your time.

Best,

Misha

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ISO 8601 Weeks/Years on Windows with strptime

2010-06-03 Thread Gustaf Rydevik
2010/6/3 Michael Höhle :
> Dear R-help,
>
> I am working on a R package for public health surveillance where the ISO
> 8601 representation of dates is of importance. Especially, the ISO Week
> and ISO Year of a date needs to be extracted. I was quite happy to find
> all of this implemented in the "Date" class with appropriate calls to
> strptime/format (using e.g. %G and %V).
>
> However, only later I realized that this functionality is currently not
> implemented on Windows (I'm a happy Mac/Linux user). As this seriously
> limits the applicability, I would like to enquire, if there are any plans
> to make this functionality available in Windows as well? Or are there any
> good workarounds to make
>
>> format.Date("2001-12-31", "%G")
>
> give "2002" instead of "" on Windows?
>
> Best regards,
>
> Michael Höhle
>
> -

Hello,

This seems to be a problem that crops up from time to time.
I wrote a small function  that got the ISO week of a Date object, that
you can find in a bug-fixed version here:
http://tolstoy.newcastle.edu.au/R/e10/help/10/05/5588.html

Hope this is of help. I agree that it would be of interest to
incorporate OS-independent date management in R, but not being part of
the R development team, I'm not sure how to go about implementing
it...

Best regards,
Gustaf

-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Draw text with a box surround in plot.

2010-06-03 Thread Jim Lemon

On 06/03/2010 08:57 AM, g...@ucalgary.ca wrote:

text() can draw text on a plot.
Do we have a way/function to draw text with a box surround it?


Hi james,
Have a look at boxed.labels and textbox in the plotrix package.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function

2010-06-03 Thread Joris Meys
This is what you asked for.

> Prod2007 <- 1:10

> Prod2006 <- Prod2007/1+c(0,diff(Prod2007))

> Prod2005 <- Prod2006+(1+c(0,diff(Prod2006)))

> Prod2004 <- Prod2005+(1+c(0,diff(Prod2005)))

> Prod2006
 [1]  1  3  4  5  6  7  8  9 10 11

> Prod2005
 [1]  2  6  6  7  8  9 10 11 12 13

> Prod2004
 [1]  3 11  7  9 10 11 12 13 14 15

Sure that's what you want?

On Thu, Jun 3, 2010 at 12:30 PM, n.via...@libero.it wrote:

>
> Dear list,
> I would like to ask you a question. I'm trying to build  the time series'
> production with the Divisia index. The final step would require to do the
> following calculations:
> a)PROD(2006)=PROD(2007)/1+[DELTA_PROD(2007)]
> b)PROD(2005)=PROD(2006)+[1+DELTA_PROD(2006)]
> c)PROD(2004)=PROD(2005)+[1+DELTA_PROD(2005)]
> my question is how can I tell R to take the value generated in the previous
> step (for example is the case of the produciton of 2005 that need the value
> of the production of 2006) in order to generate the time series production??
> (PS:my data.frame is not set as a time series)
> Thanks for your attention!!
>
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] gam error

2010-06-03 Thread natalieh

Hi all,

I'm trying to use a gam (mgcv package) to analyse some data with a roughly U
shaped curve. My model is very simple with just one explanatory variable:

m1<-gam(CoT~s(incline))

However I just keep getting the error message

"Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : 
  A term has fewer unique covariate combinations than specified maximum
degrees of freedom"

Just wondering if anyone had come across this before/ could offer any advice
on where the problem might lie

Many thanks,

Natalie



-- 
View this message in context: 
http://r.789695.n4.nabble.com/gam-error-tp2241518p2241518.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ARIMA order

2010-06-03 Thread nuncio m
Hi all,
Is there any way in R to select the order of an ARIMA model
automatically
nuncio

-- 
Nuncio.M
Research Scientist
National Center for Antarctic and Ocean research
Head land Sada
Vasco da Gamma
Goa-403804

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gam error

2010-06-03 Thread Joris Meys
Data?

On Thu, Jun 3, 2010 at 1:24 PM, natalieh  wrote:

>
> Hi all,
>
> I'm trying to use a gam (mgcv package) to analyse some data with a roughly
> U
> shaped curve. My model is very simple with just one explanatory variable:
>
> m1<-gam(CoT~s(incline))
>
> However I just keep getting the error message
>
> "Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
>  A term has fewer unique covariate combinations than specified maximum
> degrees of freedom"
>
> Just wondering if anyone had come across this before/ could offer any
> advice
> on where the problem might lie
>
> Many thanks,
>
> Natalie
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/gam-error-tp2241518p2241518.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fancy Page layout

2010-06-03 Thread Carlos Ortega
Hello,

You can the same as trellis but with the standar graphics library in the
direction you are exploring.
Use par(mfrow=c(5,2)) to display your graphics.

The size and location of the graphics can be managed with the layout()
function, present in the graphics library.
And with mtext and text functions you can write whatever text you like in
any place of your graphics page.

Regards,
Carlos.

On Wed, Jun 2, 2010 at 2:43 AM, Dejian Zhao  wrote:

> I think you can use grid.layout() to create the appropriate layout,
> allocating proper space for the upper plotting area and the bottom text
> region, and then use viewport() with the layout parameter to control the
> output by pushing the viewport at the proper region on the graphical device.
>Viewport alone can solve your three quesions, but with grid.layout the
> layout is better controlled.
>The above-mentioned functions, or grid.layout(), viewport() and
> pushViewport(), are in the grid package. Possibly the work can be done by
> combining lattice with grid.
>
>
> On 2010-6-2 1:10, Noah Silverman wrote:
>
>> Thanks Jim,
>>
>> That helps.
>>
>> Ben Bolker had a nice suggestion on how to get the lattice package to
>> easily plot all 22 variables in one window.
>>
>> Ultimately, I'd like to generate a PDF that will print on a standard
>> (8.5 x 11) page.
>>
>> A few things I'm still stuck are:
>> 1) How to use the lattice command AND leave room at the bottom for a
>> text block
>> 2) How to tell lattice the size of the window
>> 3) How to integrate all this together - draw a big window, plot
>> trellis in the top half and then text box in the bottom.
>>
>> Any thoughts?
>>
>> -N
>>
>>
>> On 6/1/10 4:53 AM, Jim Lemon wrote:
>>
>>
>>> On 06/01/2010 04:16 AM, Noah Silverman wrote:
>>>
>>>
 Hi,

 Working on a report that is going to have a large number of graphs and
 summaries.  We have 80 "groups" with 20 variables each.

 Ideally, I'd like to produce ONE page for each group.  It would have two
 columns of 10 graphs and then the 5 number summary of the variables at
 the bottom.
 So, perhaps the top 2/3 of the page has the graphs and the bottom third
 has 20 rows of data summary(maybe a table of sorts.)
 This COULD be done in Latex, but would have to be hand coded for each of
 the 80 groups which would be painfully slow.

 I can easily do the graphs with par(mfrow=c(5,2))  band then draw the
 graphs in a loop.

 But I am stuck from here:

 1) How do I control the size of the plot window.  (Ideally, it should
 print to fill an 8.5 x 11 piece of paper)
 2) Is there a way to "easily" insert a 5 number summary (summary
 command) into the lower half of the page.

 Does anybody have any ideas??



>>> Hi Noah,
>>> One easy way is to leave some space at the bottom, either by using:
>>>
>>> par(mfrow=c(6,2))
>>>
>>> or the more flexible "layout" function, and then use "text" or a
>>> fancier function (textbox, boxed.labels, addtable2plot, etc.) to add
>>> your text after:
>>>
>>> par(xpd=NA)
>>>
>>> allows you to display the text anywhere you please. If you use a
>>> bitmap graphics device, make it big:
>>>
>>> png("numberoneofeighty.png",850,1100)
>>>
>>> so that it won't look lumpy on the printed page.
>>>
>>> Jim
>>>
>>>
>>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glmnet strange error message

2010-06-03 Thread Dave_F

I think I have figured out the problem. The response variable has one level
with only one observation -- hence when that one observation was in the test
set the training data did not contain any observations from that level and
so did not fit a logistic model for that level causing the somewhat cryptic
error message.


-- 
View this message in context: 
http://r.789695.n4.nabble.com/glmnet-strange-error-message-tp2240458p2241566.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Continous variables with implausible transformation?

2010-06-03 Thread Joris Meys
x <- rnorm(100,10,1)
sqrtx <- sqrt(x)
y <- rbinom(100,1,0.5)

lrm(y~x+sqrtx)

works. What's the problem?

But you wrote linear+ square. Don't you mean:
lrm(Y~x+x^2)

Cheers

On Thu, Jun 3, 2010 at 6:34 AM, zhu yao  wrote:

> Dear r users
>
> I have a question in coding continuous variables in logistic regression.
>
> When "rcs" is used in transforming variables, sometime it gives implausible
> associations with the outcome although the model x2 is high.
>
> So what's your tips and tricks in coding continuous variables.
>
> P.S. How to code variables as linear+square in the formula such as lrm.
> lrm(y~x+sqrt(x)) can't work.
>
> Many thanks.
>
> Yao Zhu.
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ordinal variables

2010-06-03 Thread Iasonas Lamprianou
Dear colleagues,

I teach statistics using SPSS. I want to use R instead. I hit on one problem 
and I need some quick advice. When I want to work with ordinal variables, in 
SPSS I can compute the median or create a barchart or compute a spearman 
correlation with no problems. In R, if I "read" the ordinal variable as 
numeric, then I cannot do a barplot because I miss the category names. If I 
read the variables as characters, then I cannot run a spearman. How can I read 
a variable as numeric, still have the chance to assign value labels, and be 
able to get table of frequencies etc? I want to be able to do all these things 
in R commander. My students will probable be scared away if I try anything else 
other than R commander (just writing commands will not make them happy).

I hope I am not asking for too much. Hopefully there is a way




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Faster union of polygons?

2010-06-03 Thread nikhil kaza
Reduce might work. Not sure about the speed advantages though. It does
simplify code.

 Unionall <- function(x) Reduce('union', x)
leaveout <- Unionall(leaves)


On Tue, Jun 1, 2010 at 9:53 PM, Remko Duursma wrote:

> Dear R-helpers,
>
> thanks for yesterday's speeding-up tip. Here is my next query:
>
> I have lots of polygons (not necessarily convex ones, and they never
> have holes) given by x,y coordinates.
>
> I want to get the polygon that is the union of these polygons. This is
> my current method, but I am hoping there is a faster method (up to
> thousands of polygons, each with ca. 40 xy points).
>
> Example:
>
> library(gpclib)
>
> # A polygon
> leaf <- structure(c(0, 1, 12.9, 16.5, 18.8, 17, 16.8, 15.5, 12.1, 8.2,
> 6.3, 5, 2, 0, -1.5, -4.3, -6.6, -10.3, -14.8, -19.4, -22.2, -23.5,
> -22.2, -17.6, -7.8, 0, 0, -2.4, 2.8, 8.9, 19.9, 33.9, 34.8, 40.4,
> 49.7, 69.2, 77.4, 83.4, 91.4, 99, 92.8, 87.3, 81.2, 71.1, 57.6,
> 45.4, 39.2, 26, 15.6, 5.3, 0.6, 0), .Dim = c(26L, 2L), .Dimnames = list(
>NULL, c("X", "Y")))
>
> # Lots of polygons:
> releaf <-
> function(leaf)cbind(leaf[,1]+rnorm(1,0,50),leaf[,2]+rnorm(1,0,50))
> leaves <- replicate(500, releaf(leaf), simplify=FALSE)
>
> # Make into gpc.poly class:
> leaves <- lapply(leaves, as, "gpc.poly")
>
> # Make union .
> system.time({
> leavesoutline <- union(leaves[[1]], leaves[[2]])
> for(i in 3:length(leaves))leavesoutline <- union(leavesoutline,
> leaves[[i]])
> })
> # about 1sec here.
>
> # Check it:
> plot(leavesoutline)
>
>
>
> thanks!
>
> Remko
>
>
> -
> Remko Duursma
> Research Lecturer
>
> Centre for Plants and the Environment
> University of Western Sydney
> Hawkesbury Campus
> Richmond NSW 2753
>
> Dept of Biological Science
> Macquarie University
> North Ryde NSW 2109
> Australia
>
> Mobile: +61 (0)422 096908
> www.remkoduursma.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rgl problem on Fedora 13

2010-06-03 Thread Håvard Rue
Just installed Fedora 13, and reinstalled the lastest rgl package 0.91.
The problem is that open3d() opens a window, but then halt. Nothing
happens and I have to kill R.
In the previous version of Fedora 12, everything worked fine. 

Any tests etc I should/could do?

Best,
H
 
-- 
 Håvard Rue
 Department of Mathematical Sciences
 Norwegian University of Science and Technology
 N-7491 Trondheim, Norway
 Voice: +47-7359-3533URL  : http://www.math.ntnu.no/~hrue  
 Fax  : +47-7359-3524Email: havard@math.ntnu.no

 This message was created in a Microsoft-free computing environment.
name of display: :0.0
display: :0  screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.4
server glx extensions:
GLX_ARB_multisample, GLX_EXT_import_context, GLX_EXT_texture_from_pixmap, 
GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_MESA_copy_sub_buffer, 
GLX_OML_swap_method, GLX_SGI_make_current_read, GLX_SGI_swap_control, 
GLX_SGIS_multisample, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, 
GLX_SGIX_visual_select_group, GLX_INTEL_swap_event
client glx vendor string: Mesa Project and SGI
client glx version string: 1.4
client glx extensions:
GLX_ARB_get_proc_address, GLX_ARB_multisample, GLX_EXT_import_context, 
GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_MESA_allocate_memory, 
GLX_MESA_copy_sub_buffer, GLX_MESA_swap_control, 
GLX_MESA_swap_frame_usage, GLX_OML_swap_method, GLX_OML_sync_control, 
GLX_SGI_make_current_read, GLX_SGI_swap_control, GLX_SGI_video_sync, 
GLX_SGIS_multisample, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, 
GLX_SGIX_visual_select_group, GLX_EXT_texture_from_pixmap, 
GLX_INTEL_swap_event
GLX version: 1.4
GLX extensions:
GLX_ARB_get_proc_address, GLX_ARB_multisample, GLX_EXT_import_context, 
GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_MESA_copy_sub_buffer, 
GLX_OML_swap_method, GLX_OML_sync_control, GLX_SGI_make_current_read, 
GLX_SGI_swap_control, GLX_SGIS_multisample, GLX_SGIX_fbconfig, 
GLX_SGIX_pbuffer, GLX_SGIX_visual_select_group, 
GLX_EXT_texture_from_pixmap, GLX_INTEL_swap_event
OpenGL vendor string: Tungsten Graphics, Inc
OpenGL renderer string: Mesa DRI Mobile Intel® GM45 Express Chipset GEM 
20100328 2010Q1 
OpenGL version string: 2.1 Mesa 7.8.1
OpenGL shading language version string: 1.20
OpenGL extensions:
GL_ARB_copy_buffer, GL_ARB_depth_texture, GL_ARB_depth_clamp, 
GL_ARB_draw_buffers, GL_ARB_draw_elements_base_vertex, 
GL_ARB_fragment_coord_conventions, GL_ARB_fragment_program, 
GL_ARB_fragment_program_shadow, GL_ARB_fragment_shader, 
GL_ARB_framebuffer_object, GL_ARB_half_float_pixel, 
GL_ARB_half_float_vertex, GL_ARB_map_buffer_range, GL_ARB_multisample, 
GL_ARB_multitexture, GL_ARB_occlusion_query, GL_ARB_pixel_buffer_object, 
GL_ARB_point_parameters, GL_ARB_point_sprite, GL_ARB_provoking_vertex, 
GL_ARB_seamless_cube_map, GL_ARB_shader_objects, 
GL_ARB_shading_language_100, GL_ARB_shading_language_120, GL_ARB_shadow, 
GL_ARB_sync, GL_ARB_texture_border_clamp, GL_ARB_texture_compression, 
GL_ARB_texture_cube_map, GL_ARB_texture_env_add, 
GL_ARB_texture_env_combine, GL_ARB_texture_env_crossbar, 
GL_ARB_texture_env_dot3, GL_ARB_texture_mirrored_repeat, 
GL_ARB_texture_non_power_of_two, GL_ARB_texture_rectangle, 
GL_ARB_transpose_matrix, GL_ARB_vertex_array_bgra, 
GL_ARB_vertex_array_object, GL_ARB_vertex_buffer_object, 
GL_ARB_vertex_program, GL_ARB_vertex_shader, GL_ARB_window_pos, 
GL_EXT_abgr, GL_EXT_bgra, GL_EXT_blend_color, 
GL_EXT_blend_equation_separate, GL_EXT_blend_func_separate, 
GL_EXT_blend_logic_op, GL_EXT_blend_minmax, GL_EXT_blend_subtract, 
GL_EXT_cull_vertex, GL_EXT_compiled_vertex_array, GL_EXT_copy_texture, 
GL_EXT_draw_buffers2, GL_EXT_draw_range_elements, GL_EXT_framebuffer_blit, 
GL_EXT_framebuffer_object, GL_EXT_fog_coord, 
GL_EXT_gpu_program_parameters, GL_EXT_multi_draw_arrays, 
GL_EXT_packed_depth_stencil, GL_EXT_packed_pixels, 
GL_EXT_pixel_buffer_object, GL_EXT_point_parameters, 
GL_EXT_polygon_offset, GL_EXT_provoking_vertex, GL_EXT_rescale_normal, 
GL_EXT_secondary_color, GL_EXT_separate_specular_color, 
GL_EXT_shadow_funcs, GL_EXT_stencil_two_side, GL_EXT_stencil_wrap, 
GL_EXT_subtexture, GL_EXT_texture, GL_EXT_texture3D, 
GL_EXT_texture_cube_map, GL_EXT_texture_edge_clamp, 
GL_EXT_texture_env_add, GL_EXT_texture_env_combine, 
GL_EXT_texture_env_dot3, GL_EXT_texture_filter_anisotropic, 
GL_EXT_texture_lod_bias, GL_EXT_texture_object, GL_EXT_texture_rectangle, 
GL_EXT_texture_sRGB, GL_EXT_texture_swizzle, GL_EXT_vertex_array, 
GL_EXT_vertex_array_bgra, GL_3DFX_texture_compression_FXT1, 
GL_APPLE_client_storage, GL_APPLE_packed_pixels, 
GL_APPLE_vertex_array_object, GL_APPLE_object_purgeable, 
GL_ATI_blend_equation_separate, GL_ATI_envmap_bumpmap, 
GL_ATI_texture_env_combine3, GL_ATI_sep

Re: [R] compare results of glms

2010-06-03 Thread Joris Meys
Mailing this twice ain't going to help you. Reading a course on statistics
might.

The test you want to do is answering following hypothesis : The mean
predicted value of a specific model differs when different datasets are used
to fit it. Seems likely to me if the datasets are not almost identical. Why
testing?

About that Z-test : that should be used in your field of research to test 2
proportions that are not too close to 0 or 1 and that originate from a
binomial distribution with large enough n. Suggesting to use it for
comparing a number of series of around 20 logit-transformed predicted
probabilities is plain shocking.

In case you are interested in the difference of the intercept for these
specific trials, add trial as a fixed effect to your model and do the
appropriate testing. You want to know whether the relation between state and
days differs in slope, you add an interaction term and again use the
appropriate testing. To know what is the appropriate testing, see line 1.

Cheers
Joris

On Thu, Jun 3, 2010 at 10:31 AM, Sacha Viquerat
wrote:

> dear list!
> i have run several glm analysises to estimate a mean rate of dung decay for
> independent trials. i would like to compare these results statistically but
> can't find any solution. the glm calls are:
>
> dung.glm1<-glm(STATE~DAYS, data=o_cov, family="binomial(link="logit"))
>
> dung.glm2<-glm(STATE~DAYS, data=o_cov_T12, family="binomial(link="logit"))
>
> as all the trials have different sample sizes (around 20 each),
>
> anova(dung.glm1, dung.glm2)
>
> is not applicable. has anyone an idea?
> thanks in advance!
>
> ps: my advisor urges me to use the z-test (the common test statistic in my
> field of research), but i reject that due to the small sample size.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gam error

2010-06-03 Thread Gavin Simpson
On Thu, 2010-06-03 at 04:24 -0700, natalieh wrote:
> Hi all,
> 
> I'm trying to use a gam (mgcv package) to analyse some data with a roughly U
> shaped curve. My model is very simple with just one explanatory variable:
> 
> m1<-gam(CoT~s(incline))
> 
> However I just keep getting the error message
> 
> "Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : 
>   A term has fewer unique covariate combinations than specified maximum
> degrees of freedom"

What this means is that you have fewer unique values for 'incline' than
you have knots in the spline. I forget now what the default (k = -1) in
function s() means, but it will be there in the help.

To set an upper limit, you could try:

K <- length(unique(incline)) - 1
m1 <- gam(CoT ~ s(incline, k = K))

Furthermore, you could try:

str(obj)

where 'obj' is/are your R data objects to check that R and you have the
same impression of what your data actually.

HTH

G

> Just wondering if anyone had come across this before/ could offer any advice
> on where the problem might lie
> 
> Many thanks,
> 
> Natalie
> 
> 
> 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Comparing a 4-point and 5-point Likert scale

2010-06-03 Thread Simon Kiss

Help with survey data:
Hello R colleagues,
I hope this is an appropriate place to direct this question.  It  
relates specifically to the comparability of a 5-point likert to a 4- 
point likert scale.


One question in my dataset asks "How much should be done to reduce the  
gap between rich and poor"

Much more, somewhat more, about the same, somewhat less and much less.

The second questions ask:
"People who can afford to, should be able to pay for their own health  
care"

strongly agree, agree, disagree, strongly agree.

Now, assuming that I rescale them so that 1 equals the most  
egalitarian position and the highest number (4 or 5) equals the least  
egalitarian position, how can I make these two results comparable.


Two ways come to mind: one is to collapse both into a dichotomous  
variable and do a logistic regression on both. The danger here is that  
I have to decide what to do with the middle position in the first  
question, assign it to the egalitarian or non-egalitarian category.
A second way would be to multiply the scores in the first question by  
4 (to get results that are either 4, 8, 12, 16 or 20) and then  
multiply the second question by five to get responses that are either  
5, 10, 15 or 20. My idea is then to add the two, average them and use  
that value as an index of economic egalitarianism?

Yes / no? Suggestions?
I am an R user and I hope that a purely statistical question is not  
especially misplaced.

Yours truly,
Simon Kiss
*
Simon J. Kiss, PhD
SSHRC and DAAD Post-Doctoral Fellow
John F. Kennedy Institute of North America Studies
Free University of Berlin
Lansstraße 7-9
14195 Berlin, Germany
Cell: +49 (0)1525-300-2812,
Web: http://www.jfki.fu-berlin.de/index.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] deduplication

2010-06-03 Thread Epi-schnier

Colleagues, 

I am trying to de-duplicate a large (long) database (approx 1mil records) of
diagnostic tests. Individuals in the database can have up-to 25
observations, but most will have only one. IDs for de-duplication (names,
sex, lab number...) are patchy. In a first step, I am using Andreas Borg's
excellent record linkage package (), that leaves me with a list of 'pairs'
looking very much like this:
id1<-c(4,17,9,1,1,1,3,3,6,15,1,1,1,1,3,3,3,3,4,4,4,5,5,12,9,9,10,10)
id2<-c(8,18,10,3,6,7,6,7,7,16,4,5,12,18,4,5,12,18,5,12,18,12,18,18,15,16,15,16)
id<-data.frame(cbind(id1,id2))
where a pair means that the records belong to the same individual (e.g.,
record 4 and record 8; 17 and 18...). My problem now is to get a list with
all records that belong to the same person (in the example, obervations
1,3,4,5,6,7,8,12, 17 and 18 are all from the same person). The problem is to
find the link between 1 and 8 (only through 1 and 4 and 4 and 8) and the
link between 1 and 17 (through 18). I can do it in my head, but I am missing
the code that would work its way through too many records.  

Any clever ideas?
(using R 2.10.1 on Windows XP)

Thanks, 

Christian

 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/deduplication-tp2241637p2241637.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] function

2010-06-03 Thread n.via...@libero.it

Dear list,

I would like to ask you a question. I'm trying to build  the time series' 
production with the Divisia index. The final step would require to do the 
following calculations:
a)PROD(2006)=PROD(2007)/[1+DELTA_PROD(2007)/100]
b)PROD(2005)=PROD(2006)/[1+DELTA_PROD(2006)/100]
c)PROD(2004)=PROD(2005)/[1+DELTA_PROD(2005)/100]
my question is how can I tell R to take the value generated in the previous 
step (for example is the case of the produciton of 2005 that need the value of 
the production of 2006) in order to generate the time series production??
I have the value of the  following variables:
prod(2007)=2
delta_prod(2007)=3

delta_prod(2006)=5
delta_prod(2005)=4
What i would like to do is
prod(2006)=2/(1+3/100) which is equal to 1.95
prod(2005)=1.95/(1+5/100)
and so on...

(PS:my data.frame is not set as a time series)
Thanks for your attention!!







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ordinal variables

2010-06-03 Thread Joris Meys
see ?factor and ?as.factor. On ordered factors you can technically do a
spearman without problem, apart from the fact that a spearman test by
definition cannot give exact p-values with ties present.

x <- sample(c("a","b","c","d","e"),100,replace=T)
y <- sample(c("a","b","c","d","e"),100,replace=T)

x.ordered <- factor(x,levels=c("e","b","a","d","c"),ordered=T)
x.ordered
y.ordered <- factor(y,levels=c("e","b","a","d","c"),ordered=T)
y.ordered

cor.test(x.ordered,y.ordered,method="spearman")

require(pspearman)
spearman.test(x.ordered,y.ordered)

R commander has some menu options to deal with factors. R commander also
provides a scripting window. Please do your students a favor, and show them
how to use those commands.

Cheers
Joris


On Thu, Jun 3, 2010 at 2:25 PM, Iasonas Lamprianou wrote:

> Dear colleagues,
>
> I teach statistics using SPSS. I want to use R instead. I hit on one
> problem and I need some quick advice. When I want to work with ordinal
> variables, in SPSS I can compute the median or create a barchart or compute
> a spearman correlation with no problems. In R, if I "read" the ordinal
> variable as numeric, then I cannot do a barplot because I miss the category
> names. If I read the variables as characters, then I cannot run a spearman.
> How can I read a variable as numeric, still have the chance to assign value
> labels, and be able to get table of frequencies etc? I want to be able to do
> all these things in R commander. My students will probable be scared away if
> I try anything else other than R commander (just writing commands will not
> make them happy).
>
> I hope I am not asking for too much. Hopefully there is a way
>
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] General-purpose GPU computing in statistics (using R)

2010-06-03 Thread Ravi Varadhan
Hi All,

 

I have been reading about general purpose GPU (graphical processing units)
computing for computational statistics.  I know very little about this, but
I read that GPUs currently cannot handle double-precision floating points
and also that they are not necessarily IEEE compliant.  However, I am not
sure what the practical impact of this limitation is likely to be on
computational statistics problems (e.g. optimization, multivariate analysis,
MCMC, etc.).  

 

What are the main obstacles that are likely to prevent widespread use of
this technology in computational statistics? Can algorithms be coded in R to
take advantage of the GPU architecture to speed up computations?  I would
appreciate hearing from R sages about their views on the usefulness of
general purpose GPU (graphical processing units) computing for computational
statistics.  I would also like to hear about views on the future of GPGPU -
i.e. is it here to stay or is it just a gimmick that will quietly disappear
into the oblivion. 

 

Thanks very much.

 

Best regards,

Ravi.


--

Ravi Varadhan, Ph.D.

Assistant Professor, 

Center on Aging and Health, 

Johns Hopkins University School of Medicine

(410)502-2619

rvarad...@jhmi.edu 

http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
tml 

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ordinal variables

2010-06-03 Thread Iasonas Lamprianou
Thank you Joris,
I'll have a look into the commands you sent me. They look convincing. I hope my 
students will also see them in a positive way (although I can force them to 
pretend that they have a positive attitude)!

Dr. Iasonas Lamprianou





Assistant Professor (Educational Research and Evaluation)

Department of Education Sciences

European University-Cyprus

P.O. Box 22006

1516 Nicosia

Cyprus 

Tel.: +357-22-713178

Fax: +357-22-590539





Honorary Research Fellow

Department of Education

The University of Manchester

Oxford Road, Manchester M13 9PL, UK

Tel. 0044  161 275 3485

iasonas.lampria...@manchester.ac.uk

--- On Thu, 3/6/10, Joris Meys  wrote:

From: Joris Meys 
Subject: Re: [R] ordinal variables
To: "Iasonas Lamprianou" 
Cc: r-help@r-project.org
Date: Thursday, 3 June, 2010, 14:35

see ?factor and ?as.factor. On ordered factors you can technically do a 
spearman without problem, apart from the fact that a spearman test by 
definition cannot give exact p-values with ties present.

x <- sample(c("a","b","c","d","e"),100,replace=T)

y <- sample(c("a","b","c","d","e"),100,replace=T)

x.ordered <- factor(x,levels=c("e","b","a","d","c"),ordered=T)

x.ordered
y.ordered <- factor(y,levels=c("e","b","a","d","c"),ordered=T)
y.ordered

cor.test(x.ordered,y.ordered,method="spearman")

require(pspearman)

spearman.test(x.ordered,y.ordered)

R commander has some menu options to deal with factors. R commander also 
provides a scripting window. Please do your students a favor, and show them how 
to use those commands. 


Cheers
Joris


On Thu, Jun 3, 2010 at 2:25 PM, Iasonas Lamprianou  wrote:

Dear colleagues,



I teach statistics using SPSS. I want to use R instead. I hit on one problem 
and I need some quick advice. When I want to work with ordinal variables, in 
SPSS I can compute the median or create a barchart or compute a spearman 
correlation with no problems. In R, if I "read" the ordinal variable as 
numeric, then I cannot do a barplot because I miss the category names. If I 
read the variables as characters, then I cannot run a spearman. How can I read 
a variable as numeric, still have the chance to assign value labels, and be 
able to get table of frequencies etc? I want to be able to do all these things 
in R commander. My students will probable be scared away if I try anything else 
other than R commander (just writing commands will not make them happy).




I hope I am not asking for too much. Hopefully there is a way









__

R-help@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering 
Department of Applied mathematics, biometrics and process control


Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be 
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php





  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Seeking help on Vectorize()

2010-06-03 Thread Duncan Murdoch

Megh Dal wrote:

Dear falks, here I have written following function :
 
fn <- Vectorize(function(x = 1:3, y = 3:6) {

x <- matrix(x, nrow=1)
y <- matrix(y, ncol=1)
dat <- apply(x, 2, function(xx) {
  apply(y, 1, function(yy) {
  return(xx + yy) } ) })
return(dat)}, SIMPLIFY = TRUE)

If I run this function, I got some warning message, even format of the returned 
object is not correct, for example :
 
  

  fn(x = 1:3, y = 3:7)


[1] 4 6 8 7 9
Warning message:
In mapply(FUN = function (x = 1:3, y = 3:6)  :
  longer argument not a multiple of length of shorter
 
However if I run individual line of codes like :
 
  

x <- 1:3; y = 3:7
x <- matrix(x, nrow=1)
 y <- matrix(y, ncol=1)
 dat <- apply(x, 2, function(xx) {


+   apply(y, 1, function(yy) {
+   return(xx + yy) } ) })
  

dat


 [,1] [,2] [,3]
[1,]456
[2,]567
[3,]678
[4,]789
[5,]89   10

 
I get exactly what I want. Where I am making fault?


I think you don't understand what Vectorize is trying to do.  It is 
basically a way to run a loop, calling your function with each matching 
pair of x and y values, i.e. x=1, y=3 in the first call, then x=2 y=4 in 
the second, etc.  The warning comes because you have three x values and 
five y values, so it will repeat the 1st two x values -- but it warns 
you that's likely not what you intended.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Tukey HSD

2010-06-03 Thread Claire Berger

Hello,
I am a little out of date and am still using S-Plus instead of R.  I haven't 
been able to find the right place to ask this question, so I thought I would 
ask it here and hope that someone can help.  I am unable to located the 
TukeyHSD() function in S-Plus.  I have been able to use the GUI to run Tukey's 
test, but the results don't provide a p-value.  Does anyone know how I can go 
about obtaining or calculating a p-value for Tukey's test using S-Plus?
Thank you!
  
_
Hotmail is redefining busy with tools for the New Busy. Get more from your 
inbox.

N:WL:en-US:WM_HMP:042010_2
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating dummy variables

2010-06-03 Thread Arantzazu Blanco Bernardeau

Hello R project
I am a R beginner trying to create a dummy variable to clasificate soil types.
So, I have a column in my database called codtipo (typecode in english) where 
soil type is coded as 
1.1 to 1.4 arenosol (I have 4 types)
2.1 to 2.3 calcisols 
4.1 to 4.4 fluvisols
and so on
To make dummy variables I understand that, I create different columns as for 
gipsisols
datos$gipsi=datos$codsuelo
for (i in 1:length(datos$gipsi)){if(datos$codsuelo[i]>=5.1 && 
(datos$codsuelo[i]<=5.4){datos$gipsi[i]=1}else{0}
}
for cambisols it should be
datos$cambi=datos$codsuelo
for (i in 1:length(datos$cambi)){if(datos$codsuelo[i]>=3.1 && 
datos$codsuelo[i]>=3.3){datos$cambi[i]=1}else{0} 
}
and so on... 
but anyway R answers that a necesary value TRUE/FALSE is not existing.
What can I do?
thanks a lot!!


Arantzazu Blanco Bernardeau
Dpto de Química Agrícola, Geología y Edafología  
Universidad de Murcia-Campus de Espinardo







> Date: Thu, 3 Jun 2010 06:51:42 -0700
> From: lampria...@yahoo.com
> To: jorism...@gmail.com
> CC: r-help@r-project.org
> Subject: Re: [R] ordinal variables
> 
> Thank you Joris,
> I'll have a look into the commands you sent me. They look convincing. I hope 
> my students will also see them in a positive way (although I can force them 
> to pretend that they have a positive attitude)!
> 
> Dr. Iasonas Lamprianou
> 
> 
> 
> 
> 
> Assistant Professor (Educational Research and Evaluation)
> 
> Department of Education Sciences
> 
> European University-Cyprus
> 
> P.O. Box 22006
> 
> 1516 Nicosia
> 
> Cyprus 
> 
> Tel.: +357-22-713178
> 
> Fax: +357-22-590539
> 
> 
> 
> 
> 
> Honorary Research Fellow
> 
> Department of Education
> 
> The University of Manchester
> 
> Oxford Road, Manchester M13 9PL, UK
> 
> Tel. 0044  161 275 3485
> 
> iasonas.lampria...@manchester.ac.uk
> 
> --- On Thu, 3/6/10, Joris Meys  wrote:
> 
> From: Joris Meys 
> Subject: Re: [R] ordinal variables
> To: "Iasonas Lamprianou" 
> Cc: r-help@r-project.org
> Date: Thursday, 3 June, 2010, 14:35
> 
> see ?factor and ?as.factor. On ordered factors you can technically do a 
> spearman without problem, apart from the fact that a spearman test by 
> definition cannot give exact p-values with ties present.
> 
> x <- sample(c("a","b","c","d","e"),100,replace=T)
> 
> y <- sample(c("a","b","c","d","e"),100,replace=T)
> 
> x.ordered <- factor(x,levels=c("e","b","a","d","c"),ordered=T)
> 
> x.ordered
> y.ordered <- factor(y,levels=c("e","b","a","d","c"),ordered=T)
> y.ordered
> 
> cor.test(x.ordered,y.ordered,method="spearman")
> 
> require(pspearman)
> 
> spearman.test(x.ordered,y.ordered)
> 
> R commander has some menu options to deal with factors. R commander also 
> provides a scripting window. Please do your students a favor, and show them 
> how to use those commands. 
> 
> 
> Cheers
> Joris
> 
> 
> On Thu, Jun 3, 2010 at 2:25 PM, Iasonas Lamprianou  
> wrote:
> 
> Dear colleagues,
> 
> 
> 
> I teach statistics using SPSS. I want to use R instead. I hit on one problem 
> and I need some quick advice. When I want to work with ordinal variables, in 
> SPSS I can compute the median or create a barchart or compute a spearman 
> correlation with no problems. In R, if I "read" the ordinal variable as 
> numeric, then I cannot do a barplot because I miss the category names. If I 
> read the variables as characters, then I cannot run a spearman. How can I 
> read a variable as numeric, still have the chance to assign value labels, and 
> be able to get table of frequencies etc? I want to be able to do all these 
> things in R commander. My students will probable be scared away if I try 
> anything else other than R commander (just writing commands will not make 
> them happy).
> 
> 
> 
> 
> I hope I am not asking for too much. Hopefully there is a way
> 
> 
> 
> 
> 
> 
> 
> 
> 
> __
> 
> R-help@r-project.org mailing list
> 
> https://stat.ethz.ch/mailman/listinfo/r-help
> 
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> 
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> 
> -- 
> Joris Meys
> Statistical Consultant
> 
> Ghent University
> Faculty of Bioscience Engineering 
> Department of Applied mathematics, biometrics and process control
> 
> 
> Coupure Links 653
> B-9000 Gent
> 
> tel : +32 9 264 59 87
> joris.m...@ugent.be 
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> 
> 
> 
> 
> 
>   
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
_
Citas sin compromiso por Internet Te damos las claves para encon

Re: [R] R linux install: FORTRAN compiling error

2010-06-03 Thread vaneet

Thanks for your replies.

I understand you think the best solution is to find the gfortran and
possibly libgfortran packages from the version of my Linux and gcc
distribution which according to a quick check is the following (also this is
a 64 bit linux machine):

gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The question is do you know where I can find a free gfortran package from
this distribution and would I need to completely reinstall the gcc package
from that version that includes gfortran and other compilers or can I simply
add gfortran to my existing packages? 

Thanks
-- 
View this message in context: 
http://r.789695.n4.nabble.com/R-linux-install-FORTRAN-compiling-error-tp2240821p2241731.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Exporting a list of data frames

2010-06-03 Thread Christoph Scherber
Dear R helpers,

Suppose I have a list of data frames. How can I export them (e.g. as a
text file) without losing the columns of the individual data.frames?

Example:

data(Orange)
mylist=list(Orange1=Orange,Orange2=Orange,Orange3=Orange)


I would like to use something like

write(mylist,"mylist.txt")

or

format (mylist...)

But somehow I cannot get the information contained in "mylist" exported
in a nicely looking way. Any ideas? Many thanks for any help!

Best wishes,
Christoph


(using R 2.11.0 on Windows XP)


-- 
Dr. rer.nat. Christoph Scherber University of Goettingen DNPW,
Agroecology Waldweg 26 D-37073 Goettingen Germany phone +49 (0)551 39
8807 fax +49 (0)551 39 8806 Homepage http://www.gwdg.de/~cscherb1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] moving average on irregular time series

2010-06-03 Thread Gustaf Rydevik
Hi all,


I wonder if there is any way to calculate a moving average on an
irregular time series, or use the rollapply function in zoo?
I have a set of dates where I want to check if there has been an event
14 days prior to each time point in order to mark these timepoints for
removal, and can't figure out a good way to do it.

Many thanks in advance!

Gustaf


Example data:

exData<-structure(list(Datebegin = structure(c(14476, 14569, 14576, 14621,
14627, 14632, 14661, 14671, 14705, 14715, 14751, 14756, 14495,
14518, 14523, 14526, 14528, 14529, 14545, 14548), class = "Date"),
Event = c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE,
TRUE, FALSE, FALSE, FALSE)), .Names = c("Datebegin", "Event"
), row.names = c(NA, 20L), class = "data.frame")

###In this example, row 18 is a date less than 14 days after an event
and should be marked for removal.



-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rgl problem on Fedora 13

2010-06-03 Thread Marc Schwartz
On Jun 3, 2010, at 8:00 AM, Håvard Rue wrote:

> Just installed Fedora 13, and reinstalled the lastest rgl package 0.91.
> The problem is that open3d() opens a window, but then halt. Nothing
> happens and I have to kill R.
> In the previous version of Fedora 12, everything worked fine. 
> 
> Any tests etc I should/could do?
> 
> Best,
> H


Hi,

Were there any console error messages at all that might indicate a driver 
incompatibility issue or a missing component?

>From the file you attached, it looks like you have an Intel GPU, so some of 
>the video related changes in F13, such as the Nouveau driver, I don't believe 
>will be applicable, but that is not absolutely certain. There may yet be other 
>relevant changes. 

Did you do a clean install or an upgrade over F12? If the former, it is 
possible that some system configuration specification is amiss.

Of course, given any brand new Linux release, there is always the possibility 
of a bug that may or may not yet have been fixed since release. Be sure that 
you have fully updated the system with yum since installing.

You might also want to search the Fedora Bugzilla repo for any references to 
OpenGL reports for F13 that might apply.

Also consider contacting the rgl package maintainers to see if there is any 
additional information that they can provide based upon their more intimate 
knowledge.

Finally, there is a R SIG Fedora e-mail list. More info at:

  https://stat.ethz.ch/mailman/listinfo/r-sig-fedora

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R linux install: FORTRAN compiling error

2010-06-03 Thread vaneet

P.S.  My linux machine has the following info:

Red Hat Enterprise Linux Server release 5.5 (Tikanga)

-- 
View this message in context: 
http://r.789695.n4.nabble.com/R-linux-install-FORTRAN-compiling-error-tp2240821p2241770.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tukey HSD

2010-06-03 Thread RICHARD M. HEIBERGER
Tukey's pairwise comparisons test is included in the multicomp package.
Also, see the
MMC function in the HH package, available from CSAN.

Rich

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ordinal variables

2010-06-03 Thread S Ellison
If you set them a problem that has them doing the same sort of thing
five times and compare the time it takes with code pasted from an editor
(eg Tinn-R) and the time it takes via menius, you may have more luck
convincing them.

A command line sequence is harder than menus the first two times but
easier for any n iterations thereafter.

Steve ellison

>>> Iasonas Lamprianou  03/06/2010 14:51 >>>
Thank you Joris,
I'll have a look into the commands you sent me. They look convincing. I
hope my students will also see them in a positive way (although I can
force them to pretend that they have a positive attitude)!

Dr. Iasonas Lamprianou





Assistant Professor (Educational Research and Evaluation)

Department of Education Sciences

European University-Cyprus

P.O. Box 22006

1516 Nicosia

Cyprus 

Tel.: +357-22-713178

Fax: +357-22-590539





Honorary Research Fellow

Department of Education

The University of Manchester

Oxford Road, Manchester M13 9PL, UK

Tel. 0044  161 275 3485

iasonas.lampria...@manchester.ac.uk 

--- On Thu, 3/6/10, Joris Meys  wrote:

From: Joris Meys 
Subject: Re: [R] ordinal variables
To: "Iasonas Lamprianou" 
Cc: r-help@r-project.org 
Date: Thursday, 3 June, 2010, 14:35

see ?factor and ?as.factor. On ordered factors you can technically do a
spearman without problem, apart from the fact that a spearman test by
definition cannot give exact p-values with ties present.

x <- sample(c("a","b","c","d","e"),100,replace=T)

y <- sample(c("a","b","c","d","e"),100,replace=T)

x.ordered <- factor(x,levels=c("e","b","a","d","c"),ordered=T)

x.ordered
y.ordered <- factor(y,levels=c("e","b","a","d","c"),ordered=T)
y.ordered

cor.test(x.ordered,y.ordered,method="spearman")

require(pspearman)

spearman.test(x.ordered,y.ordered)

R commander has some menu options to deal with factors. R commander
also provides a scripting window. Please do your students a favor, and
show them how to use those commands. 


Cheers
Joris


On Thu, Jun 3, 2010 at 2:25 PM, Iasonas Lamprianou
 wrote:

Dear colleagues,



I teach statistics using SPSS. I want to use R instead. I hit on one
problem and I need some quick advice. When I want to work with ordinal
variables, in SPSS I can compute the median or create a barchart or
compute a spearman correlation with no problems. In R, if I "read" the
ordinal variable as numeric, then I cannot do a barplot because I miss
the category names. If I read the variables as characters, then I cannot
run a spearman. How can I read a variable as numeric, still have the
chance to assign value labels, and be able to get table of frequencies
etc? I want to be able to do all these things in R commander. My
students will probable be scared away if I try anything else other than
R commander (just writing commands will not make them happy).




I hope I am not asking for too much. Hopefully there is a way









__

R-help@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help 

PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html 

and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering 
Department of Applied mathematics, biometrics and process control


Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be 
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php 





  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R linux install: FORTRAN compiling error

2010-06-03 Thread Marc Schwartz
On Jun 3, 2010, at 9:17 AM, vaneet wrote:

> 
> Thanks for your replies.
> 
> I understand you think the best solution is to find the gfortran and
> possibly libgfortran packages from the version of my Linux and gcc
> distribution which according to a quick check is the following (also this is
> a 64 bit linux machine):
> 
> gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48)
> Copyright (C) 2006 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> 
> The question is do you know where I can find a free gfortran package from
> this distribution and would I need to completely reinstall the gcc package
> from that version that includes gfortran and other compilers or can I simply
> add gfortran to my existing packages? 
> 
> Thanks


Sorry that I am coming to this late, but is there a reason that you did not 
install R using already available RPMs other than perhaps not being aware that 
they exist?

For RHEL 4 and 5 there is the EPEL:

  https://fedoraproject.org/wiki/EPEL

which provides pre-built binaries of many add-on packages, including R. R 
2.11.0 is available there and Tom Callaway posted on the Fedora SIG list this 
morning that 2.11.1 is in the updates testing repo.

If you do need a FORTRAN compiler, which you may yet for add-on CRAN packages, 
one should be available via the regular RHEL repos, for which you would have 
already paid for access (not to mention direct telephone support from RH). 

For RHEL 5 (based upon your followup), follow the instructions here:

  https://fedoraproject.org/wiki/EPEL/FAQ#howtouse

to gain access to the EPEL.

Then, as root, you can use:

  yum install R-core R-devel

to install R.

By staying with RPMs specifically built for your release, you tend to obviate 
version incompatibility issues, as Prof. Ripley has noted.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gam error

2010-06-03 Thread Jannis
How many datapoints do you have? The spline smoothers assume a default of 10 
degrees of freedom (i think), so If you have less datapoints than 10 I would 
guess that you get this error. Same would be true if you have >10 datapoints 
but many replicates of CoT~incline combinations.

Either reduce the degrees of freedom of the smoother or gather more data ;-).

I would not be too sure about the sense of fitting univariate gamsThere 
should be not too much of a difference to just plotting CoT~incline and fitting 
some spline to it ( i might be wrong here!)

HTH
Jannis

--- natalieh  schrieb am Do, 3.6.2010:

> Von: natalieh 
> Betreff: [R] gam error
> An: r-help@r-project.org
> Datum: Donnerstag, 3. Juni, 2010 11:24 Uhr
> 
> Hi all,
> 
> I'm trying to use a gam (mgcv package) to analyse some data
> with a roughly U
> shaped curve. My model is very simple with just one
> explanatory variable:
> 
> m1<-gam(CoT~s(incline))
> 
> However I just keep getting the error message
> 
> "Error in smooth.construct.tp.smooth.spec(object, dk$data,
> dk$knots) : 
>   A term has fewer unique covariate combinations than
> specified maximum
> degrees of freedom"
> 
> Just wondering if anyone had come across this before/ could
> offer any advice
> on where the problem might lie
> 
> Many thanks,
> 
> Natalie
> 
> 
> 
> -- 
> View this message in context: 
> http://r.789695.n4.nabble.com/gam-error-tp2241518p2241518.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
> 



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems using gamlss to model zero-inflated and overdispersed count data: "the global deviance is increasing"

2010-06-03 Thread Joris Meys
On Thu, Jun 3, 2010 at 9:27 AM, Gavin Simpson wrote:

>
> vegan is probably not too useful here as the response is univariate;
> counts of ducks.
>

If we assume that only one species is counted and of interest for the whole
research.  I (probably wrongly) assumed that data for multiple species was
available.

Without knowledge about the whole research setup it is difficult to say
which method is the best, or even which methods are appropriate. VGAM is
indeed a powerful tool, but :

> proportion_non_zero <- (sum(ifelse(data$duck == 0,0,1))/182)
means 182 observations in the dataset

> model_NBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,
family= NBI)

is 5 splines with 3df, an intercept, that's a lot of df for only 182
observations. using VGAM ain't going to help here. I'd reckon that the model
itself should be reconsidered, rather than the distribution used to fit the
error terms.

Cheers
Joris



> > You could also use the hurdle model from the pscl library, but this is
> not
> > guaranteed to work better. It is not a zero-inflation model, but a
> > two-component model that fits the zeros using eg a binomial and the rest
> of
> > the counts using eg a poisson or a negative binomial.
>
> zeroinfl() in the same (pscl) package will fit ZIP and ZINB models.
> Neither of these will be a GAM though, so if the OP was wanting smooth
> functions of covariates they may not so useful, although bs() in package
> splines might be used to include smooth terms into these functions.
>
> gam() in mgcv has the negative binomial family that might be a useful
> starting point to modelling the overdispersion.
>
> Package vgam has functions to fit ZIP/ZINB or the hurdle models
> mentioned above with smooth functions of covariates.
>
> In any case the OP should probably seek help directly from the gamlss
> package maintainers to get gamlss working on their data or suggest why
> the fitting isn't working.
>
> HTH
>
> G
>
> >
> > Cheers
> > Joris
> >
> > On Wed, Jun 2, 2010 at 12:56 PM, Strubbe Diederik <
> diederik.stru...@ua.ac.be
> > > wrote:
> >
> > > Dear all,
> > >
> > > I am using gamlss (Package gamlss version 4.0-0, R version 2.10.1,
> Windows
> > > XP Service Pack 3 on a HP EliteBook) to relate bird counts to habit
> > > variables. However, most models fail because  the global deviance is
> > > increasing and I am not sure what causes this behaviour. The dataset
> > > consists of counts of birds (duck) and 5 habit variables measured in
> the
> > > field (n= 182). The dependent variable (the number of ducks
> > > counted)suffers from zero-inflation and overdisperion:
> > >
> > > > proportion_non_zero <- (sum(ifelse(data$duck == 0,0,1))/182)
> > > > mean <- mean(data$duck)
> > > > var <- var(data$duck)
> > > > proportion_non_zero
> > > [1] 0.1153846
> > > > mean
> > > [1] 1.906593
> > > > var
> > > [1] 37.35587
> > >
> > > (I have no idea how to simulate a zero-inflated overdispersed Poisson
> > > variable, but the data used can be found at
> > > http://www.ua.ac.be/main.aspx?c=diederik.strubbe&n=23519).
> > >
> > >
> > > First, I create a (strong) pattern in the dataset by:
> > > data$LFAP200 <- data$LFAP200 + (data$duck*data$duck)
> > >
> > > I try to analyze these data by fitting several possible distributions
> > > (Poisson PO, zero-inflated Poisson ZIP, negative binomial type I and
> type II
> > > NBI NBII and zero-inflated negative binomial ZINBI) while using cubic
> > > splines with a df=3. The best fitting model will then be choses on the
> basis
> > > of its AIC.
> > >
> > > However, these models frequently fail to converge, and I am not sure
> why,
> > > and what to do about it. For example:
> > >
> > > > model_Poisson <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3)
> +
> > > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) +
> cs(LFAP200,df=3),data=data,family=
> > > PO)
> > > GAMLSS-RS iteration 1: Global Deviance = 1350.623
> > > GAMLSS-RS iteration 2: Global Deviance = 1350.623
> > >
> > > > model_ZIPoisson <- gamlss(duck ~ cs(HHCDI200,df=3) +
> cs(HHCDI1000,df=3) +
> > > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) +
> cs(LFAP200,df=3),data=data,family=
> > > ZIP)
> > > GAMLSS-RS iteration 1: Global Deviance = 326.3819
> > > GAMLSS-RS iteration 2: Global Deviance = 225.1232
> > > GAMLSS-RS iteration 3: Global Deviance = 319.9663
> > > Error in RS() : The global deviance is increasing
> > >  Try different steps for the parameters or the model maybe
> inappropriate
> > > In addition: There were 14 warnings (use warnings() to see them)
> > >
> > > > model_NBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> > > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) +
> cs(LFAP200,df=3),data=data,family=
> > > NBI)
> > > GAMLSS-RS iteration 1: Global Deviance = 291.8607
> > > GAMLSS-RS iteration 2: Global Deviance = 291.3291
> > > ##..##
> > > GAMLSS-RS iteration 4: Global Deviance = 291.1135
> > > GAMLSS-RS iteration 20: Global 

[R] Bug with ESS? R freezes after cat()

2010-06-03 Thread Jannis
Dears,


I am tumbling over something that appears to me as a bug. If I run the 
following code using ESS in one block (CTRL+C CTRL+R), R (or ESS) just freezes. 
Running the same line by line does not cause the error. Running the same from 
the R Gui either in steps or in a block does not produce the freezes either.

i=2
a=rep('Hallo',times=2)
cat('b')
cat(i)
c=a[i]


Is the freeze reproducable on some other machines? Do use cat() in a non 
appropriate way?

I am using ESS 5.8, R 2.10.1 on a WinXP machine.


Cheers
Jannis



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating dummy variables

2010-06-03 Thread Jannis
Was that the original code that you ran? As there appear to be several mistakes 
in the code:

1. In the gipsisoil stuff, there is a ')' too much
2. In the gambisoil stuff both > signs point in the same direction, you 
probably want one > and one <


My general suggestion would be to skip the loops altogether and vectorize your 
code:

datos$cambi=datos$codsuelo
datos$cambi[datos$codsuelo>=3.1 && datos$codsuelo <=3.3] <- 1

Another source of your error could be that datos$codtipo is not numeric. What 
does class(datos$codzuelo) say?


HTH
Jannis
> for (i in
> 1:length(datos$cambi)){if(datos$codsuelo[i]>=3.1
> &&
> datos$codsuelo[i]>=3.3){datos$cambi[i]=1}else{0} 
> }

--- Arantzazu Blanco Bernardeau  schrieb am Do, 3.6.2010:

> Von: Arantzazu Blanco Bernardeau 
> Betreff: [R] Creating dummy variables
> An: r-help@r-project.org
> Datum: Donnerstag, 3. Juni, 2010 14:11 Uhr
> 
> Hello R project
> I am a R beginner trying to create a dummy variable to
> clasificate soil types.
> So, I have a column in my database called codtipo (typecode
> in english) where soil type is coded as 
> 1.1 to 1.4 arenosol (I have 4 types)
> 2.1 to 2.3 calcisols 
> 4.1 to 4.4 fluvisols
> and so on
> To make dummy variables I understand that, I create
> different columns as for gipsisols
> datos$gipsi=datos$codsuelo
> for (i in
> 1:length(datos$gipsi)){if(datos$codsuelo[i]>=5.1
> &&
> (datos$codsuelo[i]<=5.4){datos$gipsi[i]=1}else{0}
> }
> for cambisols it should be
> datos$cambi=datos$codsuelo
> for (i in
> 1:length(datos$cambi)){if(datos$codsuelo[i]>=3.1
> &&
> datos$codsuelo[i]>=3.3){datos$cambi[i]=1}else{0} 
> }
> and so on... 
> but anyway R answers that a necesary value TRUE/FALSE is
> not existing.
> What can I do?
> thanks a lot!!
> 
> 
> Arantzazu Blanco Bernardeau
> Dpto de Química Agrícola, Geología y Edafología  
> Universidad de Murcia-Campus de Espinardo
> 
> 
> 
> 
> 
> 
> 
> > Date: Thu, 3 Jun 2010 06:51:42 -0700
> > From: lampria...@yahoo.com
> > To: jorism...@gmail.com
> > CC: r-help@r-project.org
> > Subject: Re: [R] ordinal variables
> > 
> > Thank you Joris,
> > I'll have a look into the commands you sent me. They
> look convincing. I hope my students will also see them in a
> positive way (although I can force them to pretend that they
> have a positive attitude)!
> > 
> > Dr. Iasonas Lamprianou
> > 
> > 
> > 
> > 
> > 
> > Assistant Professor (Educational Research and
> Evaluation)
> > 
> > Department of Education Sciences
> > 
> > European University-Cyprus
> > 
> > P.O. Box 22006
> > 
> > 1516 Nicosia
> > 
> > Cyprus 
> > 
> > Tel.: +357-22-713178
> > 
> > Fax: +357-22-590539
> > 
> > 
> > 
> > 
> > 
> > Honorary Research Fellow
> > 
> > Department of Education
> > 
> > The University of Manchester
> > 
> > Oxford Road, Manchester M13 9PL, UK
> > 
> > Tel. 0044  161 275 3485
> > 
> > iasonas.lampria...@manchester.ac.uk
> > 
> > --- On Thu, 3/6/10, Joris Meys 
> wrote:
> > 
> > From: Joris Meys 
> > Subject: Re: [R] ordinal variables
> > To: "Iasonas Lamprianou" 
> > Cc: r-help@r-project.org
> > Date: Thursday, 3 June, 2010, 14:35
> > 
> > see ?factor and ?as.factor. On ordered factors you can
> technically do a spearman without problem, apart from the
> fact that a spearman test by definition cannot give exact
> p-values with ties present.
> > 
> > x <- sample(c("a","b","c","d","e"),100,replace=T)
> > 
> > y <- sample(c("a","b","c","d","e"),100,replace=T)
> > 
> > x.ordered <-
> factor(x,levels=c("e","b","a","d","c"),ordered=T)
> > 
> > x.ordered
> > y.ordered <-
> factor(y,levels=c("e","b","a","d","c"),ordered=T)
> > y.ordered
> > 
> > cor.test(x.ordered,y.ordered,method="spearman")
> > 
> > require(pspearman)
> > 
> > spearman.test(x.ordered,y.ordered)
> > 
> > R commander has some menu options to deal with
> factors. R commander also provides a scripting window.
> Please do your students a favor, and show them how to use
> those commands. 
> > 
> > 
> > Cheers
> > Joris
> > 
> > 
> > On Thu, Jun 3, 2010 at 2:25 PM, Iasonas Lamprianou
> 
> wrote:
> > 
> > Dear colleagues,
> > 
> > 
> > 
> > I teach statistics using SPSS. I want to use R
> instead. I hit on one problem and I need some quick advice.
> When I want to work with ordinal variables, in SPSS I can
> compute the median or create a barchart or compute a
> spearman correlation with no problems. In R, if I "read" the
> ordinal variable as numeric, then I cannot do a barplot
> because I miss the category names. If I read the variables
> as characters, then I cannot run a spearman. How can I read
> a variable as numeric, still have the chance to assign value
> labels, and be able to get table of frequencies etc? I want
> to be able to do all these things in R commander. My
> students will probable be scared away if I try anything else
> other than R commander (just writing commands will not make
> them happy).
> > 
> > 
> > 
> > 
> > I hope I am not asking for too much. Hopefully there
> is a way
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 

Re: [R] Exporting a list of data frames

2010-06-03 Thread Christoph Scherber
Dear all,

I found one possible solution using a single call to write.table:

# assuming that mylist is a named list of data.frames:
data(Orange)
mylist=list(Orange1=Orange,Orange2=Orange,Orange3=Orange)

sapply(1:length(mylist),
  function(x){
write.table(

data.frame(mydf=rep(names(mylist)[[x]],dim(mylist[[x]])[1]),mylist[[x]]),
sep="\t",append=T,file="My.data.frames.txt")
}
)


Best wishes,
Christoph




> Dear R helpers,
>
> Suppose I have a list of data frames. How can I export them
> (e.g. as a
> text file) without losing the columns of the individual
> data.frames?
>
> Example:
>
> data(Orange)
> mylist=list(Orange1=Orange,Orange2=Orange,Orange3=Orange)
>
>
> I would like to use something like
>
> write(mylist,"mylist.txt")
>
> or
>
> format (mylist...)
>
> But somehow I cannot get the information contained in
> "mylist" exported
> in a nicely looking way. Any ideas? Many thanks for any
> help!
>
> Best wishes,
> Christoph
>
>
> (using R 2.11.0 on Windows XP)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bug with ESS? R freezes after cat()

2010-06-03 Thread RICHARD M. HEIBERGER
I will guess that your use of cat without a line end leaves ESS waiting for
a prompt.

cat("b", "\n")

would be the way to avoid that.

The way to unfreeze ESS (and emacs more generally) is Ctrl-g

Please put followup ESS questions on the ESS list
ess-h...@stat.math.ethz.ch



Rich

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems using gamlss to model zero-inflated and overdispersed count data: "the global deviance is increasing"

2010-06-03 Thread Gavin Simpson
On Thu, 2010-06-03 at 17:00 +0200, Joris Meys wrote:
> On Thu, Jun 3, 2010 at 9:27 AM, Gavin Simpson wrote:
> 
> >
> > vegan is probably not too useful here as the response is univariate;
> > counts of ducks.
> >
> 
> If we assume that only one species is counted and of interest for the whole
> research.  I (probably wrongly) assumed that data for multiple species was
> available.
> 
> Without knowledge about the whole research setup it is difficult to say
> which method is the best, or even which methods are appropriate. VGAM is
> indeed a powerful tool, but :
> 
> > proportion_non_zero <- (sum(ifelse(data$duck == 0,0,1))/182)
> means 182 observations in the dataset
> 
> > model_NBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,
> family= NBI)
> 
> is 5 splines with 3df, an intercept, that's a lot of df for only 182
> observations. using VGAM ain't going to help here.

How do you know?

>  I'd reckon that the model
> itself should be reconsidered, rather than the distribution used to fit the
> error terms.

I was going to mention that too, but the OP did reduce this down to a
single smooth and the problem of increasing deviance remained. Hence
trying to fit a /similar/ model in other software might give an
indication whether the problems are restricted to a single software or a
more general issue of the data/problem?

At this stage the OP is stuck not knowing what is wrong; (s)he has
nothing to do model checking on etc. Trying zeroinfl() and fitting a
parametric model, for example, might be a useful starting point, then
move on to models with smoothers if required.

G

> 
> Cheers
> Joris
> 
> 
> 
> > > You could also use the hurdle model from the pscl library, but this is
> > not
> > > guaranteed to work better. It is not a zero-inflation model, but a
> > > two-component model that fits the zeros using eg a binomial and the rest
> > of
> > > the counts using eg a poisson or a negative binomial.
> >
> > zeroinfl() in the same (pscl) package will fit ZIP and ZINB models.
> > Neither of these will be a GAM though, so if the OP was wanting smooth
> > functions of covariates they may not so useful, although bs() in package
> > splines might be used to include smooth terms into these functions.
> >
> > gam() in mgcv has the negative binomial family that might be a useful
> > starting point to modelling the overdispersion.
> >
> > Package vgam has functions to fit ZIP/ZINB or the hurdle models
> > mentioned above with smooth functions of covariates.
> >
> > In any case the OP should probably seek help directly from the gamlss
> > package maintainers to get gamlss working on their data or suggest why
> > the fitting isn't working.
> >
> > HTH
> >
> > G
> >
> > >
> > > Cheers
> > > Joris
> > >
> > > On Wed, Jun 2, 2010 at 12:56 PM, Strubbe Diederik <
> > diederik.stru...@ua.ac.be
> > > > wrote:
> > >
> > > > Dear all,
> > > >
> > > > I am using gamlss (Package gamlss version 4.0-0, R version 2.10.1,
> > Windows
> > > > XP Service Pack 3 on a HP EliteBook) to relate bird counts to habit
> > > > variables. However, most models fail because  the global deviance is
> > > > increasing and I am not sure what causes this behaviour. The dataset
> > > > consists of counts of birds (duck) and 5 habit variables measured in
> > the
> > > > field (n= 182). The dependent variable (the number of ducks
> > > > counted)suffers from zero-inflation and overdisperion:
> > > >
> > > > > proportion_non_zero <- (sum(ifelse(data$duck == 0,0,1))/182)
> > > > > mean <- mean(data$duck)
> > > > > var <- var(data$duck)
> > > > > proportion_non_zero
> > > > [1] 0.1153846
> > > > > mean
> > > > [1] 1.906593
> > > > > var
> > > > [1] 37.35587
> > > >
> > > > (I have no idea how to simulate a zero-inflated overdispersed Poisson
> > > > variable, but the data used can be found at
> > > > http://www.ua.ac.be/main.aspx?c=diederik.strubbe&n=23519).
> > > >
> > > >
> > > > First, I create a (strong) pattern in the dataset by:
> > > > data$LFAP200 <- data$LFAP200 + (data$duck*data$duck)
> > > >
> > > > I try to analyze these data by fitting several possible distributions
> > > > (Poisson PO, zero-inflated Poisson ZIP, negative binomial type I and
> > type II
> > > > NBI NBII and zero-inflated negative binomial ZINBI) while using cubic
> > > > splines with a df=3. The best fitting model will then be choses on the
> > basis
> > > > of its AIC.
> > > >
> > > > However, these models frequently fail to converge, and I am not sure
> > why,
> > > > and what to do about it. For example:
> > > >
> > > > > model_Poisson <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3)
> > +
> > > > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) +
> > cs(LFAP200,df=3),data=data,family=
> > > > PO)
> > > > GAMLSS-RS iteration 1: Global Deviance = 1350.623
> > > > GAMLSS-RS iteration 2: Global Deviance = 1350.623
> > > >
> > > > > model_ZIPoisson <- gamlss(duck ~ cs(HHCDI200,df=3) +
> > cs(HHCDI1000,d

Re: [R] R linux install: FORTRAN compiling error

2010-06-03 Thread vaneet

I tried downloading the pre-built binaries of R from this website and then
installing the rpms but is seems they depend on so many other packages to be
installed first.  I tried simply the R package first:

warning: R-2.11.0-1.el5.x86_64.rpm: Header V3 DSA signature: NOKEY, key ID
217521f6
error: Failed dependencies:
R-devel = 2.11.0-1.el5 is needed by R-2.11.0-1.el5.x86_64
libRmath-devel = 2.11.0-1.el5 is needed by R-2.11.0-1.el5.x86_64

Then when I tried to install those packages there were more dependencies:

warning: R-devel-2.11.0-1.el5.x86_64.rpm: Header V3 DSA signature: NOKEY,
key ID 217521f6
error: Failed dependencies:
R-core = 2.11.0-1.el5 is needed by R-devel-2.11.0-1.el5.x86_64
bzip2-devel is needed by R-devel-2.11.0-1.el5.x86_64
gcc-gfortran is needed by R-devel-2.11.0-1.el5.x86_64
libX11-devel is needed by R-devel-2.11.0-1.el5.x86_64
pcre-devel is needed by R-devel-2.11.0-1.el5.x86_64
pkgconfig is needed by R-devel-2.11.0-1.el5.x86_64
tcl-devel is needed by R-devel-2.11.0-1.el5.x86_64
tetex-latex is needed by R-devel-2.11.0-1.el5.x86_64
tk-devel is needed by R-devel-2.11.0-1.el5.x86_64

So I thought perhaps R-core would have everything needed to install and run
R but still:

warning: R-core-2.11.0-1.el5.x86_64.rpm: Header V3 DSA signature: NOKEY, key
ID 217521f6
error: Failed dependencies:
libgfortran.so.1()(64bit) is needed by R-core-2.11.0-1.el5.x86_64
perl(File::Copy::Recursive) is needed by R-core-2.11.0-1.el5.x86_64
tetex-latex is needed by R-core-2.11.0-1.el5.x86_64
xdg-utils is needed by R-core-2.11.0-1.el5.x86_64

Is there any way I can get an RPM that has all the necessary packages built
in to install and run R? Or do I have to find all these packages that are
needed to install these 'R' rpms for which many are not found on the
FedoraProject website (libgfortran, gcc-fortran).  Also which 'R' rpm
would I use to simply run R on the linux machine, would it be R-core?

Thanks


-- 
View this message in context: 
http://r.789695.n4.nabble.com/R-linux-install-FORTRAN-compiling-error-tp2240821p2241862.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] import text file into R

2010-06-03 Thread dhanush

can anyone tell me how to import a text file in R? the text file I want to
import is a large file, about 800MB in size. Thanks in advance.

I tried using the following

data<-read.table("file",header=T,sep="\t")


-- 
View this message in context: 
http://r.789695.n4.nabble.com/import-text-file-into-R-tp2241525p2241525.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gam error

2010-06-03 Thread natalieh

Data?

The data are measures of energy use (continuous variable) for running on 7
inclines between -90 and +90 degrees (n=7-21).
-- 
View this message in context: 
http://r.789695.n4.nabble.com/gam-error-tp2241518p2241608.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R linux install: FORTRAN compiling error

2010-06-03 Thread vaneet

Also would I need to install the R package as root in order for all users to
use it? I'm assuming I would have to, I don't have root access now but I
could possibly obtain it.  For now this is simply a test install so if there
is a way to install the R package simply under my user name please let me
know how.

Thanks
-- 
View this message in context: 
http://r.789695.n4.nabble.com/R-linux-install-FORTRAN-compiling-error-tp2240821p2241875.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Overlay of barchart and xyplot

2010-06-03 Thread Chen, Huapeng FOR:EX

Hello,

I am new to R. I am trying to use “DoubleYScale” to overlay one barchart with a 
xyplot with two lines. I have two problems. One is that I could not figure out 
how to custom line (point and line) with “par.settings” and seems to me that 
whatever “par.settings” I used to custom the barchart was also applied to the 
xyplot (lines) and I don’t know how to apply par.setting separately to each 
barchart and xyplot for  my customization.

The other problem is how to add a key for both barchart and xyplot in a 
meaningful way.

I attached a piece of R code below I have been struggling to get this to work. 
Your kind help and advice are greatly appreciated.

Thanks,

Huapeng

###
dispersal<-barchart(NTLST_Dispersal_VAR_00_08$LDP_PER*100 + 
NTLST_Dispersal_VAR_00_08$SPP_PER*100 + 
 NTLST_Dispersal_VAR_00_08$SPG_PER*100 ~ NTLST_Dispersal_VAR_00_08$Year 
| NTLST_Dispersal_VAR_00_08$District,
 data=NTLST_Dispersal_VAR_00_08,
 horizontal=FALSE,
 stack=TRUE,
 layout=c(5,5),
 xlab="Year",
 ylab="%",
 strip = strip.custom( bg="light gray")
 #par.settings = simpleTheme(col = c("dark gray", "light gray", 
"white")),
 #auto.key=TRUE
 )



vars<-xyplot(sqrt(NTLST_Dispersal_VAR_00_08$Infestation_NUM) + 
NTLST_Dispersal_VAR_00_08$AI  ~ NTLST_Dispersal_VAR_00_08$Year | 
NTLST_Dispersal_VAR_00_08$District,
data=NTLST_Dispersal_VAR_00_08,
layout=c(5,5),
type="b",
ylab="Square roots of number of infested cells/Landscape aggregation index"
#par.settings = simpleTheme(col = c("black", "black"), pch=c(16,17))
) 

doubleYScale(dispersal, vars,add.ylab2 = TRUE,
#auto.key=list(text=c("LDP","SPP", "SPG","Infestation","Landscape aggregation 
index"),
# points = list(pch=c(16,17)), lines=list(lty=c(1,2)), 
# rectangles = list(size=1.7,border="black", col=c("black", "light 
gray", "white")),
# space = "right")
)

update(trellis.last.object(),
par.settings = simpleTheme(col = c("dark gray", "light gray", 
"white"),pch=c(16,17),lty=c(1,2)))
###

##
A sample of data used


DistrictYearSPG_PER SPP_PER LDP_PER SPP_LDP_PER Infestation_NUM 
AI
DAB 19990.446808511 0.205673759 0.34751773  0.553191489 
461 48.9143
DAB 20000.480769231 0.192307692 0.326923077 0.519230769 
332 37.3591
DAB 20010.366141732 0.212598425 0.421259843 0.633858268 
663 37.3418
DAB 20020.663366337 0.222772277 0.113861386 0.336633663 
791 41.7671
DAB 20030.553278689 0.178278689 0.268442623 0.446721311 
213854.333
DAB 20040.799204771 0.115308151 0.085487078 0.200795229 
292162.5666
DAB 20050.824295011 0.106290672 0.069414317 0.175704989 
278357.9458
DAB 20060.794520548 0.131506849 0.073972603 0.205479452 
221050.0829
DAB 20070.77972028  0.148601399 0.071678322 0.22027972  
246654.0918
DCC 19990.844036697 0.105504587 0.050458716 0.155963303 
207757.9
DCC 20000.7731569   0.096408318 0.130434783 0.2268431   
429475.5286
DCC 20010.905714286 0.065714286 0.028571429 0.094285714 
430274.709
DCC 20020.799256506 0.066914498 0.133828996 0.200743494 
609580.5838
DCC 20030.896703297 0.065934066 0.037362637 0.103296703 
791985.7303
DCC 20040.887254902 0.068627451 0.044117647 0.112745098 
851684.7028
DCC 20050.923728814 0.06779661  0.008474576 0.076271186 
775979.4637
DCC 20060.941935484 0.04516129  0.012903226 0.058064516 
663774.5994
DCC 20070.890909091 0.072727273 0.036363636 0.109090909 
428868.593
DCH 19990.691176471 0.161764706 0.147058824 0.308823529 
204 38.2682
DCH 20000.245901639 0.151639344 0.602459016 0.754098361 
109955.3403
DCH 20010.525316456 0.189873418 0.284810127 0.474683544 
135052.3413
DCH 20020.326599327 0.1 0.562289562 0.673400673 
933288.5776
DCH 20030.797342193 0.102990033 0.099667774 0.202657807 
11416   93.0988
DCH 20040.720430108 0.134408602 0.14516129  0.279569892 
13120   94.1959
DCH 20050.773480663 0.066298343 0.160220994 0.226519337 
13726   93.0554
DCH 20060.943319838 0.032388664 0.024291498 0.056680162 
14393   92.

Re: [R] import text file into R

2010-06-03 Thread Sarah Goslee
That sounds like a good plan.

What went wrong?

You're more likely to get an answer if you tell us what your
problem/question actually is.

Sarah

On Thu, Jun 3, 2010 at 7:32 AM, dhanush  wrote:
>
> can anyone tell me how to import a text file in R? the text file I want to
> import is a large file, about 800MB in size. Thanks in advance.
>
> I tried using the following
>
> data<-read.table("file",header=T,sep="\t")
>



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plot polar coordinates

2010-06-03 Thread Thomas Steiner
Hi,
I'd like to plot in in polar coordinates a line which is given as a
vector of lengths and angles.

library("plotrix")
polar.plot(length=c(1,-13,3,-1),polar.pos=c(20,20,45,55),rp.type="p",line.col="red",lwd=3,clockwise=TRUE,label.pos=seq(30,360,by=30),labels=c("ESE","SSE","S","SSW","WSW","W","WNW","NNW","N","NNE","ENE","E"),radial.lim=c(0,90))
lines(seq(0,285,by=15),rep(c(40,60),times=10),col="blue")

My questions:
a) Why is the radial.lim=c(0,90) ignored?
b) Why is there no red line?
c) Why is the blue line not "jumping" between two circles of radius 40
respectively 60 (The first argument of "line" should be the angle, the
second the length.)? Or: which function plots in existing polar plots?
(Remark the the polygone is not complete, eg going around)

My final goal will be such a plot:
http://www.mysundial.ca/tsp/images/sun_chart_50_solar_polar.jpg which
is the polar-coordinate version of my wikimedia plot
http://commons.wikimedia.org/wiki/File:Sonnenstand.png (made in R, see
source)
Thanks for hints and help,
Thomas

PS: I asked a close question last year:
http://www.mail-archive.com/r-help@r-project.org/msg64998.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] For Loop help needed

2010-06-03 Thread Geeti Khan
Hi, 
I have a dataset with three column like this
x1 x2 Count
1  0  523
1 0 23
0 0 2
0 1  45
0 0  3
1 1 433

I need to create a loop so that when c(x1,x2)=c(1,1), I can add the 
corresponding Counts.When c(x1,x2)=c(1,0), can add the corresponding counts and 
so on. Can anyone help me



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] import text file into R

2010-06-03 Thread Erik Iverson



dhanush wrote:

can anyone tell me how to import a text file in R? the text file I want to
import is a large file, about 800MB in size. Thanks in advance.

I tried using the following

data<-read.table("file",header=T,sep="\t")




And what happened??

Your specification is incomplete, as we don't know what your text file 
looks like.  Is it CSV, etc.?  How much memory does your system have?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function

2010-06-03 Thread Joris Meys
That's a bit more clear.
> Prod2007=2

> Delta=c(4,3,5)

> Delta <- 1+Delta/100

> Series <- Prod2007+cumsum(Delta)
> Series
[1] 3.04 4.07 5.12

On Thu, Jun 3, 2010 at 1:21 PM, n.via...@libero.it wrote:

> What I would like to do is for example:
> Suppose that I have the following value
>
> a)PROD(2006)=PROD(2007)/1+[DELTA_PROD(2007)]
> b)PROD(2005)=PROD(2006)+[1+DELTA_PROD(2006)]
> c)PROD(2004)=PROD(2005)+[1+DELTA_PROD(2005)]
>
>
> where prod(2007)=2
>
> DELTA_PROD(2007)=4
>
> DELTA_PROD(2006)=3
>
> DELTA_PROD(2005)=5
>
> so prod(2007) is like the starting value of production from wich starts the
> construction of its the time series. So:
>
> prod(2006)=2+[1+4/100] which is equal to 3.04
>
> so i will have:
>
> prod(2005)=3.04+ [1+3/100]
>
> and so on
>
>
>
>
>
>
>
> Messaggio originale
> Da: jorism...@gmail.com
> Data: 03/06/2010 13.05
> A: "n.via...@libero.it"
> Cc: 
> Ogg: Re: [R] function
>
>
> This is what you asked for.
>
> > Prod2007 <- 1:10
>
> > Prod2006 <- Prod2007/1+c(0,diff(Prod2007))
>
> > Prod2005 <- Prod2006+(1+c(0,diff(Prod2006)))
>
> > Prod2004 <- Prod2005+(1+c(0,diff(Prod2005)))
>
> > Prod2006
>  [1]  1  3  4  5  6  7  8  9 10 11
>
> > Prod2005
>  [1]  2  6  6  7  8  9 10 11 12 13
>
> > Prod2004
>  [1]  3 11  7  9 10 11 12 13 14 15
>
> Sure that's what you want?
>
> On Thu, Jun 3, 2010 at 12:30 PM, n.via...@libero.it wrote:
>
>>
>> Dear list,
>> I would like to ask you a question. I'm trying to build  the time series'
>> production with the Divisia index. The final step would require to do the
>> following calculations:
>> a)PROD(2006)=PROD(2007)/1+[DELTA_PROD(2007)]
>> b)PROD(2005)=PROD(2006)+[1+DELTA_PROD(2006)]
>> c)PROD(2004)=PROD(2005)+[1+DELTA_PROD(2005)]
>> my question is how can I tell R to take the value generated in the
>> previous step (for example is the case of the produciton of 2005 that need
>> the value of the production of 2006) in order to generate the time series
>> production??
>> (PS:my data.frame is not set as a time series)
>> Thanks for your attention!!
>>
>>
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joris Meys
> Statistical Consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> Coupure Links 653
> B-9000 Gent
>
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
>
>


-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question about avoid the for loop

2010-06-03 Thread Carrie Li
Dear R-helpers,

I would like to generate a binary random variable within a stratum's
stratum. Here is a simple example.


## x is the first level strata index, here I have 3 strata.
x=c(rep(1,5), rep(2,5), rep(3,5))

## within x, there is a second strata indexed by t=0 and t=1
t=rep(c(0,0,1,1,1),3)


## and within strata i and t=0 and t=1, I generate the random binomial
variable respectively, and save in y
y=rep(NA, length(x))
for (i in 1:3)
{
y[(x==i)&(t==0)]=rbinom(2, 1, 0.2)
y[(x==i)&(t==1)]=rbinom(3, 1, 0.8)
}


My question: is there any way to avoid the for loop, since I have the first
level strata has thousands of strata. (Within each x stratum, the t only has
2 levels, 0 and 1 )

Thanks for any help!

Carrie

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (no subject)

2010-06-03 Thread Rosario Garcia Gil
Dear R users,

I am trying to draw error bars in a bar plot, I use this code (tried many 
others which did not work):

install.packages()
library(gplots)
y <-c(39.02, 46.42)
se <- c(7.57,7.35)
plot <- barplot(y, beside=TRUE, col=0, ylim=c(0,47), axis.lty=1, 
main="far-red", xlab="latitude", names.arg=c("56N", "68N"))
superpose.eb(plot, y, se, col="orange", lwd=2)

Then I get an error saying that it cannot find the superpose.eb function.

Why? Isn´t under gplots?

Thanks
Rosario

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cumsum function with data frame

2010-06-03 Thread n.via...@libero.it

Dear list,
I have a problem with the cumsum function.
I have a data frame like the following one
variableYear   value
EC01 2005 5

EC01 2006 10

AAO12005  2 

AAO1   2006  4
what I would like to obtain is
variableYear   value   cumsum


EC01 2005 5   5


EC01 2006 10 15


AAO12005  22


AAO1   2006  46


if I use the by function or the aggregate function the result is a list or 
something else, what I want is a data frame as I showed above...
anyone knows how to get it???
THANKS A LOT





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Exporting a list of data frames

2010-06-03 Thread Allan Engelhardt

On 03/06/10 15:24, Christoph Scherber wrote:

Dear R helpers,

Suppose I have a list of data frames. How can I export them (e.g. as a
text file) without losing the columns of the individual data.frames?
   


save(mylist, file="/tmp/foo.RData")


But somehow I cannot get the information contained in "mylist" exported
in a nicely looking way. Any ideas? Many thanks for any help!
   


Define "nicely looking"?  Maybe "dput" but more likely writing each data 
frame to its own file (or combining to one big data frame).


Hope this helps a little.

Allan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R linux install: FORTRAN compiling error

2010-06-03 Thread Marc Schwartz
On Jun 3, 2010, at 10:38 AM, vaneet wrote:

> 
> I tried downloading the pre-built binaries of R from this website and then
> installing the rpms but is seems they depend on so many other packages to be
> installed first.  I tried simply the R package first:
> 
> warning: R-2.11.0-1.el5.x86_64.rpm: Header V3 DSA signature: NOKEY, key ID
> 217521f6
> error: Failed dependencies:
>R-devel = 2.11.0-1.el5 is needed by R-2.11.0-1.el5.x86_64
>libRmath-devel = 2.11.0-1.el5 is needed by R-2.11.0-1.el5.x86_64
> 
> Then when I tried to install those packages there were more dependencies:
> 
> warning: R-devel-2.11.0-1.el5.x86_64.rpm: Header V3 DSA signature: NOKEY,
> key ID 217521f6
> error: Failed dependencies:
>R-core = 2.11.0-1.el5 is needed by R-devel-2.11.0-1.el5.x86_64
>bzip2-devel is needed by R-devel-2.11.0-1.el5.x86_64
>gcc-gfortran is needed by R-devel-2.11.0-1.el5.x86_64
>libX11-devel is needed by R-devel-2.11.0-1.el5.x86_64
>pcre-devel is needed by R-devel-2.11.0-1.el5.x86_64
>pkgconfig is needed by R-devel-2.11.0-1.el5.x86_64
>tcl-devel is needed by R-devel-2.11.0-1.el5.x86_64
>tetex-latex is needed by R-devel-2.11.0-1.el5.x86_64
>tk-devel is needed by R-devel-2.11.0-1.el5.x86_64
> 
> So I thought perhaps R-core would have everything needed to install and run
> R but still:
> 
> warning: R-core-2.11.0-1.el5.x86_64.rpm: Header V3 DSA signature: NOKEY, key
> ID 217521f6
> error: Failed dependencies:
>libgfortran.so.1()(64bit) is needed by R-core-2.11.0-1.el5.x86_64
>perl(File::Copy::Recursive) is needed by R-core-2.11.0-1.el5.x86_64
>tetex-latex is needed by R-core-2.11.0-1.el5.x86_64
>xdg-utils is needed by R-core-2.11.0-1.el5.x86_64
> 
> Is there any way I can get an RPM that has all the necessary packages built
> in to install and run R? Or do I have to find all these packages that are
> needed to install these 'R' rpms for which many are not found on the
> FedoraProject website (libgfortran, gcc-fortran).  Also which 'R' rpm
> would I use to simply run R on the linux machine, would it be R-core?
> 
> Thanks


The whole point of using 'yum' from the CLI to install RPMs, rather then 
downloading them separately and using 'rpm' from the CLI, is that yum will 
reconcile the dependencies of any packages required to install the RPMs that 
you actually want. Unless the RPM packagers goofed and messed up the dependency 
specifications (which happens now and then), yum should be pretty 
straightforward.

What you have above, is what is colloquially known as "RPM Dependency Hell", 
which is what yum was designed to avoid.  :-)

If you install the EPEL configuration RPM (as root), as is noted on the 'How To 
Use EPEL' page that I linked to, then all you should need to do is use the yum 
command (as root) that I provided to install R. Any required dependencies will 
be installed during that process. yum will get any needed RPMS from either the 
EPEL or the RHEL repos as may be required, to satisfy them.

And to answer your follow up question, yes, you will need to be root to engage 
in a system-wide R installation (and for the other missing system wide 
components). I don't know that the required RPMs are 'relocatable' to enable a 
local user install. For that, you might very well need to compile R from 
source, with appropriate configure/install time location options defined. 

However, as noted, you are missing required system wide components, for which 
you will need to be root anyway. So you may as well get root access and move 
forward from there.

HTH,

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Shell command help

2010-06-03 Thread Uwe Ligges



On 03.06.2010 04:19, galen kaufman wrote:


Uwe,



Thank you so much for the response. I tried running the command in the format 
you suggested but still could not get the shell command to execute. The 
following is the command that I used per your suggestion and the subsequent 
error message.



shell('"C:\\programx.exe"' -import '"C:\\inputx.inp"' -run -quit)



This cannot work. Is this all expected to be the first argument of 
shell()? If so, you need to quote it as a whole.




#Error that is returned from above code…..


'C:\Program' is not recognized as an internal or external command, operable 
program or>batch file


Since the above command cannot work, this error message is the resukt 
from another call you have not given.
Particularly the "c:\Program" points us to a problem with spaces in the 
directory "Program Files" but since you have not used that in your call 
from R, it must be something in C:\programx.exe or C:\inputx.inp that 
causes the problem, but it is not R related - unless you have used a 
completely different call.




Warning message:
In shell("\"C:\\programx.exe\" -import>\"C:\\input.inp\" -run -quit") :

  >'"C:\programx.exe" -import "C:\input.inp" -run -quit' execution failed with 
error code 1



These two lines are also from two different calls. I don't see the ">" 
from the first line is in the second line now


So, *please* copy and paste only code and error messages that fit to 
each other!


Uwe Ligges




Thinking that it may be a problem with the command line I checked with 
developer of the program and he said that it should be the correct language.



Any thoughts?



Thank you again,



Galen





Date: Mon, 3 May 2010 09:19:49 +0200
From: lig...@statistik.tu-dortmund.de
To: leavealett...@hotmail.com
CC: r-help@r-project.org
Subject: Re: [R] Shell command help



On 02.05.2010 21:55, galen kaufman wrote:


Dear R Community,

I am trying to run a command line in R that will open an external program, have 
it import a specific input file, run the program, then close the program. The 
command line that I got from the developer of the model to do this looks like 
what you see below:

c:\programx.exe -import 'inputx.inp' -run -quit

I have been trying to use shell() or shell.exec() to run this in R. I can get R 
to open the program but I have not been able to do the rest of the steps.

I have been able to open the model using :


shell('"C:\\programx.exe"')


or


shell.exec("C:\\programx.exe")


To do the rest I have been trying variations of the following:


shell('"C:\\programx.exe"' -import '"C:\\inputx.inp"' -run –quit)




Hm, you need to pass one string to the shell (and the underlking
system()) call:

shell('"C:\\programx.exe" -import "C:\\inputx.inp" -run –quit')

Best,
Uwe Ligges




but I am not getting anywhere.

Can you provide any insight as to how to do the rest?

Thank you,

Galen
_
Hotmail is redefining busy with tools for the New Busy. Get more from your 
inbox.

N:WL:en-US:WM_HMP:042010_2
[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


_
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] General-purpose GPU computing in statistics (using R)

2010-06-03 Thread Allan Engelhardt
You may be interested in the "gputools" package and associated web site 
which discusses some of your questions in the second paragraph.


Hope this helps a little.

Allan

On 03/06/10 14:43, Ravi Varadhan wrote:

Hi All,



I have been reading about general purpose GPU (graphical processing units)
computing for computational statistics.  I know very little about this, but
I read that GPUs currently cannot handle double-precision floating points
and also that they are not necessarily IEEE compliant.  However, I am not
sure what the practical impact of this limitation is likely to be on
computational statistics problems (e.g. optimization, multivariate analysis,
MCMC, etc.).



What are the main obstacles that are likely to prevent widespread use of
this technology in computational statistics? Can algorithms be coded in R to
take advantage of the GPU architecture to speed up computations?  I would
appreciate hearing from R sages about their views on the usefulness of
general purpose GPU (graphical processing units) computing for computational
statistics.  I would also like to hear about views on the future of GPGPU -
i.e. is it here to stay or is it just a gimmick that will quietly disappear
into the oblivion.



Thanks very much.



Best regards,

Ravi.


--

Ravi Varadhan, Ph.D.

Assistant Professor,

Center on Aging and Health,

Johns Hopkins University School of Medicine

(410)502-2619

rvarad...@jhmi.edu

http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
tml






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with 'svyby' function from SURVEY package

2010-06-03 Thread Thomas Lumley

On Thu, 3 Jun 2010, Roni Kobrosly wrote:


Hello,

I'm using a complex survey dataset and my goal is to simply spit out a bunch of 
probability-weighted outcome variable means for the different levels of 
covariate. So I first define the structure of the study design (I'm using the 
CDC's NHANES data):

dhanes <- svydesign(id=~PSU, strat=~STRATA, weight=~lab_weight, data=final, 
nest=TRUE)

No problem there.
Now I use the "svyby" function as follows:

svyby(~outcome, ~covariate, design=dhanes, svymean, na.rm=T) -> haha
print(haha)

  covariate outcome   se.outcome
11   0.4961189 0.08828457
22   0.4474706 0.22214557
33   0.5157026 0.12076008
44   0.6773910 0.20605025
NA  NA   0.8728167 0.15622274

...and it works just fine. I get a nice table of the mean and standard error 
for each level of the covariate. I started writing a custom function to 
automate this and I had problems. Consider this really basic custom function 
(that does not seem very different from the above code):

this_is_a_test <-function(outcome, covariate)
{

svyby(~outcome, ~covariate, design=dhanes, svymean, na.rm=T) -> haha

print(hah)


}



You are asking for the mean of a variable called 'outcome', divided up 
according to a variable called 'covariate'. Presumably you don't have variables 
with either of those names, so R is getting confused.

Formulas don't work the way you want them to.  As a simpler example with 
nothing to do with the survey package

this_is_a_simpler_example<-function(outcome){
   ~outcome
}


this_is_a_simpler_example(test)

~outcome

If you want to substitute a variable into a formula, you need to do it 
yourself. In your case, you probably want to use make.formula(), from the 
survey package


make.formula("test")

~test

make.formula(c("fred","barney","wilma"))

~fred + barney + wilma

Presumably you want to do something like

approach_that_works <-function(outcome, covariate, design=dhanes,...) 
svyby(make.formula(outcome),  make.formula(covariate), design,...)

some_outcomes <- colnames(dhanes)[47:63]

some_covariates <- colnames(dhanes)[83:95]

lapply( some_outcomes,
  function(an_outcome) lapply(some_covariates,  
approach_that_works, outcome=an_outcome)
   )


For another recent thread using another approach to a related question, see 
http://tolstoy.newcastle.edu.au/R/e10/help/10/05/5676.html

-thomas

Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gam error

2010-06-03 Thread Joris Meys
This doesn't tell us much either. What does the variable "incline"
represent, and what does the variable "ToC" represent?
I could guess your data looks something like :
ToC Incline
x1-90
x2-60
x3-30
x4 0
x5 30
x6 60
x7 90
x8 -90
...  ...

Or incline could be the number of the sample (going from 1 to 7). No way to
know what you did. Please, read the posting guide and take the hints given
there into consideration.

This said, you very likely just have not enough data to use a thin plate
regression spline without limiting k. see ?choose.k and
?null.space.dimension

Cheers
Joris

On Thu, Jun 3, 2010 at 2:49 PM, natalieh  wrote:

>
> Data?
>
> The data are measures of energy use (continuous variable) for running on 7
> inclines between -90 and +90 degrees (n=7-21).
> --
> View this message in context:
> http://r.789695.n4.nabble.com/gam-error-tp2241518p2241608.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2010-06-03 Thread Sarah Goslee
I don't have any idea where you came up with superpose.ep - rseek.org
doesn't return any results for a function of that name.

If you check the package information for gplots you'll see that
plotCI() is available to plot error bars.

There's also the errbar() function from hmisc, and probably several
others.

Sarah

On Thu, Jun 3, 2010 at 11:33 AM, Rosario Garcia Gil
 wrote:
> Dear R users,
>
> I am trying to draw error bars in a bar plot, I use this code (tried many 
> others which did not work):
>
> install.packages()
> library(gplots)
> y <-c(39.02, 46.42)
> se <- c(7.57,7.35)
> plot <- barplot(y, beside=TRUE, col=0, ylim=c(0,47), axis.lty=1, 
> main="far-red", xlab="latitude", names.arg=c("56N", "68N"))
> superpose.eb(plot, y, se, col="orange", lwd=2)
>
> Then I get an error saying that it cannot find the superpose.eb function.
>
> Why? Isn´t under gplots?
>

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Shell command help

2010-06-03 Thread Uwe Ligges



On 03.06.2010 06:40, RICHARD M. HEIBERGER wrote:

#Error that is returned from above code…..


'C:\Program' is not recognized as an internal or external command,

operable program or



This statement means that the Windows CMD processor does not realize that
'C:\Program Files' is a single word.  Use the 8.3 version of the filename
'C:\progra~1' instead.  Get the correct 8.3 name for your computer from the
Windows
dir /x
command on the appropriate directory.  Maybe you can use the ENVIRONMENT
variable  %PROGRAMFILES% directly.  Sometimes you can use explicit quotation
marks "c:\Program Files\rest of path\yourprogram.exe" to get past this
issue.



Rich,

actually, 8.3 names are not required if you use the quotes correctly, at 
least.




This problem may be happening inside the program you are calling, not at
your level.

Uwe, a discussion of %PROGRAMFILES% and 8.3 names needs to be added
to the Windows FAQ.  I think it belongs in "Section 5 Windows Features"
because it is
a standard Windows feature to put software in the "Program Files" directory
with an
embedded blank in its name and it is a standard MSDOS feature to trip on
embedded blanks.


I would not discuss this there for educational purposes: Since correct 
quoting is sufficient and does not depend on the OS. Trying to be OS 
independent seems to be the better way in R, hence we should avoid 8.3 
names / %PROGRAMFILES% that other operating systems are not aware of.


Best wishes,
Uwe



>

Rich



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Draw text with a box surround in plot.

2010-06-03 Thread Bert Gunter
?legend   perhaps.

Bert Gunter
Genentech Nonclinical Biostatistics
 
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Peter Ehlers
Sent: Wednesday, June 02, 2010 4:14 PM
To: g...@ucalgary.ca
Cc: r-help@r-project.org
Subject: Re: [R] Draw text with a box surround in plot.

On 2010-06-02 16:57, g...@ucalgary.ca wrote:
> text() can draw text on a plot.
> Do we have a way/function to draw text with a box surround it?
> Thanks,
>

There may be a function in some package, but I would just use rect().
?rect

  -Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Continous variables with implausible transformation?

2010-06-03 Thread Bert Gunter
Below. -- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
 
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Joris Meys
Sent: Thursday, June 03, 2010 5:20 AM
To: zhu yao
Cc: R-help
Subject: Re: [R] Continous variables with implausible transformation?

x <- rnorm(100,10,1)
sqrtx <- sqrt(x)
y <- rbinom(100,1,0.5)

lrm(y~x+sqrtx)

works. What's the problem?
--
But you wrote linear+ square. Don't you mean:
lrm(Y~x+x^2)

--- I believe this is the same as lrm(Y ~ x).
You must protect the x^2 via

lrm(Y ~ x + I(x^2))

--

Cheers

On Thu, Jun 3, 2010 at 6:34 AM, zhu yao  wrote:

> Dear r users
>
> I have a question in coding continuous variables in logistic regression.
>
> When "rcs" is used in transforming variables, sometime it gives
implausible
> associations with the outcome although the model x2 is high.
>
> So what's your tips and tricks in coding continuous variables.
>
> P.S. How to code variables as linear+square in the formula such as lrm.
> lrm(y~x+sqrt(x)) can't work.
>
> Many thanks.
>
> Yao Zhu.
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rgl problem on Fedora 13

2010-06-03 Thread Allan Engelhardt
You could try stepping through the open3d function to see where it 
hangs: I assume in rgl.open()?  You should probably also remove all 
installed packages and re-install and re-compile them.

On 03/06/10 14:00, HÃ¥vard Rue wrote:
> Just installed Fedora 13, and reinstalled the lastest rgl package 0.91.
> The problem is that open3d() opens a window, but then halt. Nothing
> happens and I have to kill R.
> In the previous version of Fedora 12, everything worked fine.
>
> Any tests etc I should/could do?
>
> Best,
> H
>
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] For Loop help needed

2010-06-03 Thread Allan Engelhardt

One option:

t <- data.frame(x1=c(1,1,0,0,0,1), x2=c(0,0,0,1,0,1), 
Count=c(523,23,2,45,3,433))

t.sum <- function(df, x1, x2) sum(df[df$x1==x1 & df$x2==x2,]$Count)
t.sum(t, 1, 0)
# [1] 546
t.sum(t, 0, 0)
# [1] 5

Hope this helps a little.

Allan

On 03/06/10 16:18, Geeti Khan wrote:

Hi,
I have a dataset with three column like this
x1 x2 Count
1  0  523
1 0 23
0 0 2
0 1  45
0 0  3
1 1 433

I need to create a loop so that when c(x1,x2)=c(1,1), I can add the 
corresponding Counts.When c(x1,x2)=c(1,0), can add the corresponding counts and 
so on. Can anyone help me




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about avoid the for loop

2010-06-03 Thread Jorge Ivan Velez
Hi Carrie,

Here are two options:

# Option 1
d <- data.frame(x, t)
y <- with(d, ifelse(t == 0, rbinom(2, 1, 0.2), rbinom(3, 1, 0.8)))
y

# Option 2  -- more general case, e.g. you do not know
#  how many 0's and 1's you have within each strata
spd <- with(d, split(d, x))
do.call(c, lapply(spd, function(comp)
with(comp, ifelse(t == 0, rbinom(sum(t==0), 1, 0.2),
rbinom(sum(t!=0), 1, 0.8)

HTH,
Jorge


On Thu, Jun 3, 2010 at 11:49 AM, Carrie Li <> wrote:

> Dear R-helpers,
>
> I would like to generate a binary random variable within a stratum's
> stratum. Here is a simple example.
>
>
> ## x is the first level strata index, here I have 3 strata.
> x=c(rep(1,5), rep(2,5), rep(3,5))
>
> ## within x, there is a second strata indexed by t=0 and t=1
> t=rep(c(0,0,1,1,1),3)
>
>
> ## and within strata i and t=0 and t=1, I generate the random binomial
> variable respectively, and save in y
> y=rep(NA, length(x))
> for (i in 1:3)
> {
> y[(x==i)&(t==0)]=rbinom(2, 1, 0.2)
> y[(x==i)&(t==1)]=rbinom(3, 1, 0.8)
> }
>
>
> My question: is there any way to avoid the for loop, since I have the first
> level strata has thousands of strata. (Within each x stratum, the t only
> has
> 2 levels, 0 and 1 )
>
> Thanks for any help!
>
> Carrie
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing a 4-point and 5-point Likert scale

2010-06-03 Thread Joshua Wiley
Dear Simon,

These two questions already are comparable (using a strict sense of
that word).  What is your goal in trying to put them both on the same
scale?  Even if both were measured 1-5, it would be unreasonable to
say that a 3 on one question meant the same as a 3 on the other
question because they are fundamentally different question and the
distributions both in your data and in the population I am presuming
you wish to generalize to are potentially very different.
Additionally, the responses and wording of the question are quite
different (how much should be done vs. level of agreement).

If your goal is to make some sort of composite variable that you are
predicting, what about z-scoring both?  If you do dichotomize, why not
use a theory driven approach?  Does someone who thinks 'about the
same' in whatever country the data was collected in fit your idea or
definition of egalitarian?  I imagine this could very quite a bit by
country even, depending how much (if anything) was actually being done
to lessen the gap between rich and poor.

It seems like your question is really more methodological than even
statistical (although that may just be me).

Best regards,

Josh

On Thu, Jun 3, 2010 at 6:11 AM, Simon Kiss  wrote:
> Help with survey data:
> Hello R colleagues,
> I hope this is an appropriate place to direct this question.  It relates
> specifically to the comparability of a 5-point likert to a 4-point likert
> scale.
>
> One question in my dataset asks "How much should be done to reduce the gap
> between rich and poor"
> Much more, somewhat more, about the same, somewhat less and much less.
>
> The second questions ask:
> "People who can afford to, should be able to pay for their own health care"
> strongly agree, agree, disagree, strongly agree.
>
> Now, assuming that I rescale them so that 1 equals the most egalitarian
> position and the highest number (4 or 5) equals the least egalitarian
> position, how can I make these two results comparable.
>
> Two ways come to mind: one is to collapse both into a dichotomous variable
> and do a logistic regression on both. The danger here is that I have to
> decide what to do with the middle position in the first question, assign it
> to the egalitarian or non-egalitarian category.
> A second way would be to multiply the scores in the first question by 4 (to
> get results that are either 4, 8, 12, 16 or 20) and then multiply the second
> question by five to get responses that are either 5, 10, 15 or 20. My idea
> is then to add the two, average them and use that value as an index of
> economic egalitarianism?
> Yes / no? Suggestions?
> I am an R user and I hope that a purely statistical question is not
> especially misplaced.
> Yours truly,
> Simon Kiss
> *
> Simon J. Kiss, PhD
> SSHRC and DAAD Post-Doctoral Fellow
> John F. Kennedy Institute of North America Studies
> Free University of Berlin
> Lansstraße 7-9
> 14195 Berlin, Germany
> Cell: +49 (0)1525-300-2812,
> Web: http://www.jfki.fu-berlin.de/index.html
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Senior in Psychology
University of California, Riverside
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] For Loop help needed

2010-06-03 Thread Jorge Ivan Velez
Hi Geeti,

If d is your data.frame, the following is an option:

as.data.frame.table(with(d, tapply(Count, list(x1, x2), sum)))

HTH,
Jorge


On Thu, Jun 3, 2010 at 11:18 AM, Geeti Khan <> wrote:

> Hi,
> I have a dataset with three column like this
> x1 x2 Count
> 1  0  523
> 1 0 23
> 0 0 2
> 0 1  45
> 0 0  3
> 1 1 433
>
> I need to create a loop so that when c(x1,x2)=c(1,1), I can add the
> corresponding Counts.When c(x1,x2)=c(1,0), can add the corresponding counts
> and so on. Can anyone help me
>
>
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2010-06-03 Thread Jorge Ivan Velez
Hi Rosario,

It saying that because you could have not loaded that function in your R
session. It availbale at [1]. Go there, copy and paste it in your session.
You should be ready to go after doing this.

HTH,
Jorge

[1] http://egret.psychol.cam.ac.uk/statistics/R/graphs1.html


On Thu, Jun 3, 2010 at 11:33 AM, Rosario Garcia Gil <> wrote:

> Dear R users,
>
> I am trying to draw error bars in a bar plot, I use this code (tried many
> others which did not work):
>
> install.packages()
> library(gplots)
> y <-c(39.02, 46.42)
> se <- c(7.57,7.35)
> plot <- barplot(y, beside=TRUE, col=0, ylim=c(0,47), axis.lty=1,
> main="far-red", xlab="latitude", names.arg=c("56N", "68N"))
> superpose.eb(plot, y, se, col="orange", lwd=2)
>
> Then I get an error saying that it cannot find the superpose.eb function.
>
> Why? Isn´t under gplots?
>
> Thanks
> Rosario
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Continous variables with implausible transformation?

2010-06-03 Thread Joris Meys
You're right, it is the same. using I() won't work for the same reason sqrt
don't, so :

> x2 <- x^2
> lrm(y~x+x2)

Thx for the correction.
Cheers
Joris

On Thu, Jun 3, 2010 at 6:14 PM, Bert Gunter  wrote:

> Below. -- Bert
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
> --
> But you wrote linear+ square. Don't you mean:
> lrm(Y~x+x^2)
>
> --- I believe this is the same as lrm(Y ~ x).
> You must protect the x^2 via
>
> lrm(Y ~ x + I(x^2))
>
> --
>
> --
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2010-06-03 Thread John Kane
It looks like superpose.eb is someone's function and not in a package.

https://stat.ethz.ch/pipermail/r-help/2002-November/027299.html

--- On Thu, 6/3/10, Rosario Garcia Gil  wrote:

> From: Rosario Garcia Gil 
> Subject: [R] (no subject)
> To: "r-help@r-project.org" 
> Received: Thursday, June 3, 2010, 11:33 AM
> Dear R users,
> 
> I am trying to draw error bars in a bar plot, I use this
> code (tried many others which did not work):
> 
> install.packages()
> library(gplots)
> y <-c(39.02, 46.42)
> se <- c(7.57,7.35)
> plot <- barplot(y, beside=TRUE, col=0, ylim=c(0,47),
> axis.lty=1, main="far-red", xlab="latitude",
> names.arg=c("56N", "68N"))
> superpose.eb(plot, y, se, col="orange", lwd=2)
> 
> Then I get an error saying that it cannot find the
> superpose.eb function.
> 
> Why? Isn´t under gplots?
> 
> Thanks
> Rosario
> 
> __
> R-help@r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
> 



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] deduplication

2010-06-03 Thread Allan Engelhardt

Maybe something like the following will get you started:

library("igraph")
g <- graph.data.frame(id, directed=FALSE)
neighborhood(g, +Inf)

There is perhaps a more efficient way, but I hope this helps a little.

Allan.



On 03/06/10 14:14, Epi-schnier wrote:

Colleagues,

I am trying to de-duplicate a large (long) database (approx 1mil records) of
diagnostic tests. Individuals in the database can have up-to 25
observations, but most will have only one. IDs for de-duplication (names,
sex, lab number...) are patchy. In a first step, I am using Andreas Borg's
excellent record linkage package (), that leaves me with a list of 'pairs'
looking very much like this:
id1<-c(4,17,9,1,1,1,3,3,6,15,1,1,1,1,3,3,3,3,4,4,4,5,5,12,9,9,10,10)
id2<-c(8,18,10,3,6,7,6,7,7,16,4,5,12,18,4,5,12,18,5,12,18,12,18,18,15,16,15,16)
id<-data.frame(cbind(id1,id2))
where a pair means that the records belong to the same individual (e.g.,
record 4 and record 8; 17 and 18...). My problem now is to get a list with
all records that belong to the same person (in the example, obervations
1,3,4,5,6,7,8,12, 17 and 18 are all from the same person). The problem is to
find the link between 1 and 8 (only through 1 and 4 and 4 and 8) and the
link between 1 and 17 (through 18). I can do it in my head, but I am missing
the code that would work its way through too many records.

Any clever ideas?
(using R 2.10.1 on Windows XP)

Thanks,

Christian





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems using gamlss to model zero-inflated and overdispersed count data: "the global deviance is increasing"

2010-06-03 Thread Joris Meys
See below.

On Thu, Jun 3, 2010 at 5:35 PM, Gavin Simpson wrote:

> On Thu, 2010-06-03 at 17:00 +0200, Joris Meys wrote:
> > On Thu, Jun 3, 2010 at 9:27 AM, Gavin Simpson  >wrote:
> >
> > >
> > > vegan is probably not too useful here as the response is univariate;
> > > counts of ducks.
> > >
> >
> > If we assume that only one species is counted and of interest for the
> whole
> > research.  I (probably wrongly) assumed that data for multiple species
> was
> > available.
> >
> > Without knowledge about the whole research setup it is difficult to say
> > which method is the best, or even which methods are appropriate. VGAM is
> > indeed a powerful tool, but :
> >
> > > proportion_non_zero <- (sum(ifelse(data$duck == 0,0,1))/182)
> > means 182 observations in the dataset
> >
> > > model_NBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,
> > family= NBI)
> >
> > is 5 splines with 3df, an intercept, that's a lot of df for only 182
> > observations. using VGAM ain't going to help here.
>
> How do you know?
>

I don't. I thought it would be like that because you use essentially the
same splines, and I overlooked the fact that the OP tried to reduce to a
single smooth. I stand corrected.

Cheers
Joris


> >  I'd reckon that the model
> > itself should be reconsidered, rather than the distribution used to fit
> the
> > error terms.
>
> I was going to mention that too, but the OP did reduce this down to a
> single smooth and the problem of increasing deviance remained. Hence
> trying to fit a /similar/ model in other software might give an
> indication whether the problems are restricted to a single software or a
> more general issue of the data/problem?
>
> At this stage the OP is stuck not knowing what is wrong; (s)he has
> nothing to do model checking on etc. Trying zeroinfl() and fitting a
> parametric model, for example, might be a useful starting point, then
> move on to models with smoothers if required.
>

He (quite positive on that one :-) ) can indeed try to use VGAM on the model
with one smooth and see if that turns out to give something. That should
give some clarity on the question whether it is the optimization of pscl
that goes wrong, or whether the problem is inherent to the data.

I'd like to suggest next to that to take a closer look at the iteration
parameters of the gamlss function itself. Honestly, I've never tried these
ones out before, but you never know whether it would work.
See ?gamlss.control

Cheers
Joris

>
> --
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems using gamlss to model zero-inflated and overdispersed count data: "the global deviance is increasing"

2010-06-03 Thread Joris Meys
Correction : That should give some clarity on the question whether it is the
optimization of GAMLSS that goes wrong, or whether the problem is inherent
to the data.

On Thu, Jun 3, 2010 at 7:00 PM, Joris Meys  wrote:

> See below.
>
> On Thu, Jun 3, 2010 at 5:35 PM, Gavin Simpson wrote:
>
>> On Thu, 2010-06-03 at 17:00 +0200, Joris Meys wrote:
>> > On Thu, Jun 3, 2010 at 9:27 AM, Gavin Simpson > >wrote:
>> >
>> > >
>> > > vegan is probably not too useful here as the response is univariate;
>> > > counts of ducks.
>> > >
>> >
>> > If we assume that only one species is counted and of interest for the
>> whole
>> > research.  I (probably wrongly) assumed that data for multiple species
>> was
>> > available.
>> >
>> > Without knowledge about the whole research setup it is difficult to say
>> > which method is the best, or even which methods are appropriate. VGAM is
>> > indeed a powerful tool, but :
>> >
>> > > proportion_non_zero <- (sum(ifelse(data$duck == 0,0,1))/182)
>> > means 182 observations in the dataset
>> >
>> > > model_NBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
>> > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,
>> > family= NBI)
>> >
>> > is 5 splines with 3df, an intercept, that's a lot of df for only 182
>> > observations. using VGAM ain't going to help here.
>>
>> How do you know?
>>
>
> I don't. I thought it would be like that because you use essentially the
> same splines, and I overlooked the fact that the OP tried to reduce to a
> single smooth. I stand corrected.
>
> Cheers
> Joris
>
>
>> >  I'd reckon that the model
>> > itself should be reconsidered, rather than the distribution used to fit
>> the
>> > error terms.
>>
>> I was going to mention that too, but the OP did reduce this down to a
>> single smooth and the problem of increasing deviance remained. Hence
>> trying to fit a /similar/ model in other software might give an
>> indication whether the problems are restricted to a single software or a
>> more general issue of the data/problem?
>>
>> At this stage the OP is stuck not knowing what is wrong; (s)he has
>> nothing to do model checking on etc. Trying zeroinfl() and fitting a
>> parametric model, for example, might be a useful starting point, then
>> move on to models with smoothers if required.
>>
>
> He (quite positive on that one :-) ) can indeed try to use VGAM on the
> model with one smooth and see if that turns out to give something. That
> should give some clarity on the question whether it is the optimization of
> pscl that goes wrong, or whether the problem is inherent to the data.
>
> I'd like to suggest next to that to take a closer look at the iteration
> parameters of the gamlss function itself. Honestly, I've never tried these
> ones out before, but you never know whether it would work.
> See ?gamlss.control
>
> Cheers
> Joris
>
>>
>> --
> Joris Meys
> Statistical Consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> Coupure Links 653
> B-9000 Gent
>
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rgl problem on Fedora 13

2010-06-03 Thread Håvard Rue
Thanks for suggestions.

I installed from Fedora repos as `GL' related libraries, and restarted,
and then restarted. The it worked perfect again. Weird behaviour
considering I did a clean install, and no complains when compiling
'rgl'. I cannot explain it. 

Thanks!

best,
H



On Thu, 2010-06-03 at 09:29 -0500, Marc Schwartz wrote:
> On Jun 3, 2010, at 8:00 AM, Håvard Rue wrote:
> 
> > Just installed Fedora 13, and reinstalled the lastest rgl package 0.91.
> > The problem is that open3d() opens a window, but then halt. Nothing
> > happens and I have to kill R.
> > In the previous version of Fedora 12, everything worked fine. 
> > 
> > Any tests etc I should/could do?
> > 
> > Best,
> > H
> 
> 
> Hi,
> 
> Were there any console error messages at all that might indicate a driver 
> incompatibil issue or a missing component?
> 
> From the file you attached, it looks like you have an Intel GPU, so some of 
> the video related changes in F13, such as the Nouveau driver, I don't believe 
> will be applicable, but that is not absolutely certain. There may yet be 
> other relevant changes. 
> 
> Did you do a clean install or an upgrade over F12? If the former, it is 
> possible that some system configuration specification is amiss.
> 
> Of course, given any brand new Linux release, there is always the possibility 
> of a bug that may or may not yet have been fixed since release. Be sure that 
> you have fully updated the system with yum since installing.
> 
> You might also want to search the Fedora Bugzilla repo for any references to 
> OpenGL reports for F13 that might apply.
> 
> Also consider contacting the rgl package maintainers to see if there is any 
> additional information that they can provide based upon their more intimate 
> knowledge.
> 
> Finally, there is a R SIG Fedora e-mail list. More info at:
> 
>   https://stat.ethz.ch/mailman/listinfo/r-sig-fedora
> 
> HTH,
> 
> Marc Schwartz
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating dummy variables

2010-06-03 Thread Arantzazu Blanco Bernardeau

hey thanks
I did solve it already, it had more mistakes as you see :S
bye

Arantzazu Blanco Bernardeau
Dpto de Química Agrícola, Geología y Edafología  
Universidad de Murcia-Campus de Espinardo







> Date: Thu, 3 Jun 2010 14:40:30 +
> From: bt_jan...@yahoo.de
> Subject: AW: [R] Creating dummy variables
> To: r-help@r-project.org; aramu...@hotmail.com
> 
> Was that the original code that you ran? As there appear to be several 
> mistakes in the code:
> 
> 1. In the gipsisoil stuff, there is a ')' too much
> 2. In the gambisoil stuff both > signs point in the same direction, you 
> probably want one > and one <
> 
> 
> My general suggestion would be to skip the loops altogether and vectorize 
> your code:
> 
> datos$cambi=datos$codsuelo
> datos$cambi[datos$codsuelo>=3.1 && datos$codsuelo <=3.3] <- 1
> 
> Another source of your error could be that datos$codtipo is not numeric. What 
> does class(datos$codzuelo) say?
> 
> 
> HTH
> Jannis
> > for (i in
> > 1:length(datos$cambi)){if(datos$codsuelo[i]>=3.1
> > &&
> > datos$codsuelo[i]>=3.3){datos$cambi[i]=1}else{0} 
> > }
> 
> --- Arantzazu Blanco Bernardeau  schrieb am Do, 
> 3.6.2010:
> 
> > Von: Arantzazu Blanco Bernardeau 
> > Betreff: [R] Creating dummy variables
> > An: r-help@r-project.org
> > Datum: Donnerstag, 3. Juni, 2010 14:11 Uhr
> > 
> > Hello R project
> > I am a R beginner trying to create a dummy variable to
> > clasificate soil types.
> > So, I have a column in my database called codtipo (typecode
> > in english) where soil type is coded as 
> > 1.1 to 1.4 arenosol (I have 4 types)
> > 2.1 to 2.3 calcisols 
> > 4.1 to 4.4 fluvisols
> > and so on
> > To make dummy variables I understand that, I create
> > different columns as for gipsisols
> > datos$gipsi=datos$codsuelo
> > for (i in
> > 1:length(datos$gipsi)){if(datos$codsuelo[i]>=5.1
> > &&
> > (datos$codsuelo[i]<=5.4){datos$gipsi[i]=1}else{0}
> > }
> > for cambisols it should be
> > datos$cambi=datos$codsuelo
> > for (i in
> > 1:length(datos$cambi)){if(datos$codsuelo[i]>=3.1
> > &&
> > datos$codsuelo[i]>=3.3){datos$cambi[i]=1}else{0} 
> > }
> > and so on... 
> > but anyway R answers that a necesary value TRUE/FALSE is
> > not existing.
> > What can I do?
> > thanks a lot!!
> > 
> > 
> > Arantzazu Blanco Bernardeau
> > Dpto de Química Agrícola, Geología y Edafología  
> > Universidad de Murcia-Campus de Espinardo
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > > Date: Thu, 3 Jun 2010 06:51:42 -0700
> > > From: lampria...@yahoo.com
> > > To: jorism...@gmail.com
> > > CC: r-help@r-project.org
> > > Subject: Re: [R] ordinal variables
> > > 
> > > Thank you Joris,
> > > I'll have a look into the commands you sent me. They
> > look convincing. I hope my students will also see them in a
> > positive way (although I can force them to pretend that they
> > have a positive attitude)!
> > > 
> > > Dr. Iasonas Lamprianou
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Assistant Professor (Educational Research and
> > Evaluation)
> > > 
> > > Department of Education Sciences
> > > 
> > > European University-Cyprus
> > > 
> > > P.O. Box 22006
> > > 
> > > 1516 Nicosia
> > > 
> > > Cyprus 
> > > 
> > > Tel.: +357-22-713178
> > > 
> > > Fax: +357-22-590539
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Honorary Research Fellow
> > > 
> > > Department of Education
> > > 
> > > The University of Manchester
> > > 
> > > Oxford Road, Manchester M13 9PL, UK
> > > 
> > > Tel. 0044  161 275 3485
> > > 
> > > iasonas.lampria...@manchester.ac.uk
> > > 
> > > --- On Thu, 3/6/10, Joris Meys 
> > wrote:
> > > 
> > > From: Joris Meys 
> > > Subject: Re: [R] ordinal variables
> > > To: "Iasonas Lamprianou" 
> > > Cc: r-help@r-project.org
> > > Date: Thursday, 3 June, 2010, 14:35
> > > 
> > > see ?factor and ?as.factor. On ordered factors you can
> > technically do a spearman without problem, apart from the
> > fact that a spearman test by definition cannot give exact
> > p-values with ties present.
> > > 
> > > x <- sample(c("a","b","c","d","e"),100,replace=T)
> > > 
> > > y <- sample(c("a","b","c","d","e"),100,replace=T)
> > > 
> > > x.ordered <-
> > factor(x,levels=c("e","b","a","d","c"),ordered=T)
> > > 
> > > x.ordered
> > > y.ordered <-
> > factor(y,levels=c("e","b","a","d","c"),ordered=T)
> > > y.ordered
> > > 
> > > cor.test(x.ordered,y.ordered,method="spearman")
> > > 
> > > require(pspearman)
> > > 
> > > spearman.test(x.ordered,y.ordered)
> > > 
> > > R commander has some menu options to deal with
> > factors. R commander also provides a scripting window.
> > Please do your students a favor, and show them how to use
> > those commands. 
> > > 
> > > 
> > > Cheers
> > > Joris
> > > 
> > > 
> > > On Thu, Jun 3, 2010 at 2:25 PM, Iasonas Lamprianou
> > 
> > wrote:
> > > 
> > > Dear colleagues,
> > > 
> > > 
> > > 
> > > I teach statistics using SPSS. I want to use R
> > instead. I hit on one problem and I need some quick advice.
> > When I want to work with ordinal variables, in SPSS I can
> > compute the median or create a barch

Re: [R] import text file into R

2010-06-03 Thread John Kane
What does the text look like?  

--- On Thu, 6/3/10, dhanush  wrote:

> From: dhanush 
> Subject: [R] import text file into R
> To: r-help@r-project.org
> Received: Thursday, June 3, 2010, 7:32 AM
> 
> can anyone tell me how to import a text file in R? the text
> file I want to
> import is a large file, about 800MB in size. Thanks in
> advance.
> 
> I tried using the following
> 
> data<-read.table("file",header=T,sep="\t")
> 
> 
> -- 
> View this message in context: 
> http://r.789695.n4.nabble.com/import-text-file-into-R-tp2241525p2241525.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] selecting the most recent files

2010-06-03 Thread Albert-Jan Roskam
Hi,

I am writing a program that returns the a vector of the most recent R files in 
a given dir.
The files are named like "f1.R", "f2.R", and sometimes "f3a.R". I want to use 
the resulting
vector to easily source() all the most recent file versions of a given dir. 
Here's what I've cooked up till sofar:

library(plyr)
rfiles <- Sys.glob("G:\\temp\\*.R")
rfiles2 <- data.frame(rfiles, mtime = ldply(rfiles, "file.info")[,"mtime"])
rfiles2$basename <- gsub("([0-9]+[a-z]?)\\.R", "", rfiles)
srt <- with(rfiles2, order(rfiles, mtime))
rfiles2[srt,][!duplicated(rfiles2$basename),] # does not work

Thank you in advance!

Cheers!!

Albert-Jan



~~

All right, but apart from the sanitation, the medicine, education, wine, public 
order, irrigation, roads, a fresh water system, and public health, what have 
the Romans ever done for us?

~~


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating dummy variables

2010-06-03 Thread Bert Gunter
Do **NOT** use dummy variables in R. R's modeling functions takes care of
this themselves using factors. You say you are a beginner. OK, so begin
**properly** -- by reading An Introduction to R. Chapter 11 on Statistical
Models in R was written precisely to help people like you learn what to do
and avoid asking inappropriate questions like this on this list.

Bert Gunter
Genentech Nonclinical Biostatistics
 
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Arantzazu Blanco Bernardeau
Sent: Thursday, June 03, 2010 10:04 AM
To: bt_jan...@yahoo.de; r-help@r-project.org
Subject: Re: [R] Creating dummy variables


hey thanks
I did solve it already, it had more mistakes as you see :S
bye

Arantzazu Blanco Bernardeau
Dpto de Qummica Agrmcola, Geologma y Edafologma  
Universidad de Murcia-Campus de Espinardo







> Date: Thu, 3 Jun 2010 14:40:30 +
> From: bt_jan...@yahoo.de
> Subject: AW: [R] Creating dummy variables
> To: r-help@r-project.org; aramu...@hotmail.com
> 
> Was that the original code that you ran? As there appear to be several
mistakes in the code:
> 
> 1. In the gipsisoil stuff, there is a ')' too much
> 2. In the gambisoil stuff both > signs point in the same direction, you
probably want one > and one <
> 
> 
> My general suggestion would be to skip the loops altogether and vectorize
your code:
> 
> datos$cambi=datos$codsuelo
> datos$cambi[datos$codsuelo>=3.1 && datos$codsuelo <=3.3] <- 1
> 
> Another source of your error could be that datos$codtipo is not numeric.
What does class(datos$codzuelo) say?
> 
> 
> HTH
> Jannis
> > for (i in
> > 1:length(datos$cambi)){if(datos$codsuelo[i]>=3.1
> > &&
> > datos$codsuelo[i]>=3.3){datos$cambi[i]=1}else{0} 
> > }
> 
> --- Arantzazu Blanco Bernardeau  schrieb am Do,
3.6.2010:
> 
> > Von: Arantzazu Blanco Bernardeau 
> > Betreff: [R] Creating dummy variables
> > An: r-help@r-project.org
> > Datum: Donnerstag, 3. Juni, 2010 14:11 Uhr
> > 
> > Hello R project
> > I am a R beginner trying to create a dummy variable to
> > clasificate soil types.
> > So, I have a column in my database called codtipo (typecode
> > in english) where soil type is coded as 
> > 1.1 to 1.4 arenosol (I have 4 types)
> > 2.1 to 2.3 calcisols 
> > 4.1 to 4.4 fluvisols
> > and so on
> > To make dummy variables I understand that, I create
> > different columns as for gipsisols
> > datos$gipsi=datos$codsuelo
> > for (i in
> > 1:length(datos$gipsi)){if(datos$codsuelo[i]>=5.1
> > &&
> > (datos$codsuelo[i]<=5.4){datos$gipsi[i]=1}else{0}
> > }
> > for cambisols it should be
> > datos$cambi=datos$codsuelo
> > for (i in
> > 1:length(datos$cambi)){if(datos$codsuelo[i]>=3.1
> > &&
> > datos$codsuelo[i]>=3.3){datos$cambi[i]=1}else{0} 
> > }
> > and so on... 
> > but anyway R answers that a necesary value TRUE/FALSE is
> > not existing.
> > What can I do?
> > thanks a lot!!
> > 
> > 
> > Arantzazu Blanco Bernardeau
> > Dpto de Qummica Agrmcola, Geologma y Edafologma  
> > Universidad de Murcia-Campus de Espinardo
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > > Date: Thu, 3 Jun 2010 06:51:42 -0700
> > > From: lampria...@yahoo.com
> > > To: jorism...@gmail.com
> > > CC: r-help@r-project.org
> > > Subject: Re: [R] ordinal variables
> > > 
> > > Thank you Joris,
> > > I'll have a look into the commands you sent me. They
> > look convincing. I hope my students will also see them in a
> > positive way (although I can force them to pretend that they
> > have a positive attitude)!
> > > 
> > > Dr. Iasonas Lamprianou
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Assistant Professor (Educational Research and
> > Evaluation)
> > > 
> > > Department of Education Sciences
> > > 
> > > European University-Cyprus
> > > 
> > > P.O. Box 22006
> > > 
> > > 1516 Nicosia
> > > 
> > > Cyprus 
> > > 
> > > Tel.: +357-22-713178
> > > 
> > > Fax: +357-22-590539
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Honorary Research Fellow
> > > 
> > > Department of Education
> > > 
> > > The University of Manchester
> > > 
> > > Oxford Road, Manchester M13 9PL, UK
> > > 
> > > Tel. 0044  161 275 3485
> > > 
> > > iasonas.lampria...@manchester.ac.uk
> > > 
> > > --- On Thu, 3/6/10, Joris Meys 
> > wrote:
> > > 
> > > From: Joris Meys 
> > > Subject: Re: [R] ordinal variables
> > > To: "Iasonas Lamprianou" 
> > > Cc: r-help@r-project.org
> > > Date: Thursday, 3 June, 2010, 14:35
> > > 
> > > see ?factor and ?as.factor. On ordered factors you can
> > technically do a spearman without problem, apart from the
> > fact that a spearman test by definition cannot give exact
> > p-values with ties present.
> > > 
> > > x <- sample(c("a","b","c","d","e"),100,replace=T)
> > > 
> > > y <- sample(c("a","b","c","d","e"),100,replace=T)
> > > 
> > > x.ordered <-
> > factor(x,levels=c("e","b","a","d","c"),ordered=T)
> > > 
> > > x.ordered
> > > y.ordered <-
> > factor(y,levels=c("e","b","a","d","c"),ordered=T)
> > > y.ordered
> > > 
> > > cor.test(x.ordered,y.ordered,method="spearman")
> > > 
> >

[R] split a row into multiple columns

2010-06-03 Thread Kevin Burnham
Would somebody please help me break this row:

"Main Group\t1000\tMP Test\tMP Test, 1\tAudio (1, f1-qaddara.aiff)\tl
(target is right word)\tl\tPressed\tl (target is right
word)\tC\t3111\t\t\t\t\t"

into multiple columns along the \t separator?

When I try the strsplit (x,"\t") command I get:

[[1]]
 [1] "Main Group" "1000"   "MP
Test""MP Test, 1" "Audio (1,
f1-qaddara.aiff)"
 [6] "l (target is right word)"   "l"
"Pressed""l (target is right word)"
"C"
[11] "3111"   ""
""   ""   ""

Which which is closer to what I need, but still not in columns.

Thanks,
Kevin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot polar coordinates

2010-06-03 Thread Greg Snow
It looks like polar.plot does not handle negative lengths, you can preprocess 
the data and just add 180 degrees to the angles corresponding to negative 
lengths and then use their absolute values.

The radial.lim is not ignored, based on the documentation the range is then 
made pretty (and 100 is apparently prettier than 90), if you pass a longer 
vector (e.g. c(30,60,90) ) then those exact values will be used.

The lines function is plotting in Cartesian coordinates, not the polar 
coordinates.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Thomas Steiner
> Sent: Thursday, June 03, 2010 8:09 AM
> To: r-help@r-project.org
> Subject: [R] plot polar coordinates
> 
> Hi,
> I'd like to plot in in polar coordinates a line which is given as a
> vector of lengths and angles.
> 
> library("plotrix")
> polar.plot(length=c(1,-13,3,-
> 1),polar.pos=c(20,20,45,55),rp.type="p",line.col="red",lwd=3,clockwise=
> TRUE,label.pos=seq(30,360,by=30),labels=c("ESE","SSE","S","SSW","WSW","
> W","WNW","NNW","N","NNE","ENE","E"),radial.lim=c(0,90))
> lines(seq(0,285,by=15),rep(c(40,60),times=10),col="blue")
> 
> My questions:
> a) Why is the radial.lim=c(0,90) ignored?
> b) Why is there no red line?
> c) Why is the blue line not "jumping" between two circles of radius 40
> respectively 60 (The first argument of "line" should be the angle, the
> second the length.)? Or: which function plots in existing polar plots?
> (Remark the the polygone is not complete, eg going around)
> 
> My final goal will be such a plot:
> http://www.mysundial.ca/tsp/images/sun_chart_50_solar_polar.jpg which
> is the polar-coordinate version of my wikimedia plot
> http://commons.wikimedia.org/wiki/File:Sonnenstand.png (made in R, see
> source)
> Thanks for hints and help,
> Thomas
> 
> PS: I asked a close question last year:
> http://www.mail-archive.com/r-help@r-project.org/msg64998.html
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cumsum function with data frame

2010-06-03 Thread Joris Meys
See ?split and ?unsplit.

Data <- read.table(textConnection("variableYear   value
EC01 2005 5
EC01 2006 10
AAO12005  2
AAO1   2006  4"),header=T)

Datalist <-split(Data,Data$variable)
resultlist <- lapply(Datalist,function(x){
x$cumul <- cumsum(x$value)
return(x)
})
result <- unsplit(resultlist,Data$variable)
result

  variable Year value cumul
1 EC01 2005 5 5
2 EC01 20061015
3 AAO1 2005 2 2
4 AAO1 2006 4 6

On a side note: I've used this construction now for a number of problems.
Some could be better solved using more specific functions (e.g. ave() for
adding a column with means for example). I'm not sure however this is the
most optimal approach to applying a function to subsets of a dataframe and
adding the result of that function as an extra variable. Anybody care to
elaborate on how the R masters had it in mind?

Cheers
Joris

On Thu, Jun 3, 2010 at 5:58 PM, n.via...@libero.it wrote:

>
> Dear list,
> I have a problem with the cumsum function.
> I have a data frame like the following one
> variableYear   value
> EC01 2005 5
>
> EC01 2006 10
>
> AAO12005  2
>
> AAO1   2006  4
> what I would like to obtain is
> variableYear   value   cumsum
>
>
> EC01 2005 5   5
>
>
> EC01 2006 10 15
>
>
> AAO12005  22
>
>
> AAO1   2006  46
>
>
> if I use the by function or the aggregate function the result is a list or
> something else, what I want is a data frame as I showed above...
> anyone knows how to get it???
> THANKS A LOT
>
>
>
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Use apply only on non-missing values

2010-06-03 Thread Doran, Harold
Actually, this is clever. I modified your advice and vectorized this as:

1/ (1 +   exp(b_vector - t(matrix(theta, nrow= nrow(dat) , ncol= ncol(dat)

Instead of using the apply() function as I did before. In terms of speed, this 
new solution is immensely faster, as you also noted. Now, whether it works on 
my real world problem is TBD. It is running now, slowly.

From: Joris Meys [mailto:jorism...@gmail.com]
Sent: Wednesday, June 02, 2010 7:35 PM
To: Doran, Harold
Cc: r-help@r-project.org
Subject: Re: [R] Use apply only on non-missing values

Not really a direct answer on your question, but:
> system.time(replicate(1,apply(as.matrix(theta), 1, rasch, b_vector)))
   user  system elapsed
   4.510.034.55

> system.time(replicate(1,theta%*%t(b_vector)))
   user  system elapsed
   0.250.000.25

It does make a difference on large datasets...
Cheers
Joris
On Wed, Jun 2, 2010 at 4:44 PM, Doran, Harold 
mailto:hdo...@air.org>> wrote:
I have a function that I am currently using very inefficiently. The following 
are needed to illustrate the problem:

set.seed(12345)
dat <- matrix(sample(c(0,1), 110, replace = TRUE), nrow = 11, ncol=10)
mis <- sample(1:110, 5)
dat[mis] <- NA
theta <- rnorm(11)
b_vector <- runif(10, -4,4)
empty <- which(is.na(t(dat)))

So, I have a matrix (dat) with some values within the matrix missing. In my 
real world problem, the matrix is huge, and most values are missing. The 
function in question is called derivs() and is below. But, let me step through 
the inefficient portions.

First, I create a matrix of some predicted probabilities as:

rasch <- function(theta,b) 1/ (1 + exp(b-theta))
mat <- apply(as.matrix(theta), 1, rasch, b_vector)

However, I only need those predicted probabilities in places where the data are 
not missing. So, the next step in the function is

mat[empty] <- NA

which manually places NAs in places where the data are missing (notice the 
matrix 'mat' is the transpose of the data matrix and so I get the empty 
positions from the transpose of dat).

Afterwards, the function computes the gradient and hessians needed to complete 
the MLE estimation.

All of this works in the sense that it yields the correct answers for my 
problem. But, the glaring problem is that I create predicted probabilities for 
every cell in 'mat' when in many cases they are not needed. I end up replacing 
those values with NAs. In my real world problem, this is horribly inefficient 
and slow.

My question is then is there a way to use apply such that is computes the 
necessary predicted probabilities only when the data are not missing to yield 
the matrix 'mat'. My desired end result is the matrix 'mat' created after the 
manually placing the NAs in the appropriate cells.

Thanks
Harold


derivs <- function(dat, b_vector, theta){
   mat <- apply(as.matrix(theta), 1, rasch, 
b_vector)
   mat[empty] <- NA
   gradient <- -(colSums(dat, na.rm = TRUE) - 
rowSums(mat, na.rm = TRUE))
   hessian <-  -(rowSums(mat * (1-mat), na.rm = 
TRUE))
   list('gradient' = gradient, 'hessian' = hessian)
   }



> sessionInfo()
R version 2.10.1 (2009-12-14)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
  LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C   LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.10.1
>

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R linux install: FORTRAN compiling error

2010-06-03 Thread R P Herrold

On Thu, 3 Jun 2010, vaneet wrote:


Is there any way I can get an RPM that has all the necessary packages built
in to install and run R? Or do I have to find all these packages that are
needed to install these 'R' rpms for which many are not found on the
FedoraProject website (libgfortran, gcc-fortran).  Also which 'R' rpm
would I use to simply run R on the linux machine, would it be R-core?


This is really a usage issue of Fedora and its package 
dependency solving and retrieval tool 'yum' and of the 
author's unfamiliarity with the tools



error: Failed dependencies:
   R-core = 2.11.0-1.el5 is needed by R-devel-2.11.0-1.el5.x86_64
   bzip2-devel is needed by R-devel-2.11.0-1.el5.x86_64
   gcc-gfortran is needed by R-devel-2.11.0-1.el5.x86_64
   libX11-devel is needed by R-devel-2.11.0-1.el5.x86_64
   pcre-devel is needed by R-devel-2.11.0-1.el5.x86_64
   pkgconfig is needed by R-devel-2.11.0-1.el5.x86_64
   tcl-devel is needed by R-devel-2.11.0-1.el5.x86_64
   tetex-latex is needed by R-devel-2.11.0-1.el5.x86_64
   tk-devel is needed by R-devel-2.11.0-1.el5.x86_64


All of these except 'R-core' are present to a base Fedora or 
RHEL system through use of that 'yum' tool (which is somewhat 
like 'apt-get' for those from a Debian background; the R-core 
pachage is almost certainly adjacent to the R package you are 
trying to install


The suggestions of others about ancient versions and so forth 
are speculation, and not accurate, and acting to take 
'curative steps' upon such speculation will almost certainly 
damage the integrity of a RHEL installation


I build, and 'spot' of Red Hat build R packages that install 
without issue on Fedora, and for me, for CentOS [a community 
rebuild of RHEL]   While I have not had occasion to build the 
latest release over the weekend, I am certain I shall without 
a need for the invasive changes noted


There is a support list for just this in the R list heirarchy, 
to which such questons might properly be taken

r-sig-fedora.r-project.org
which is not limited solely to the Fedora users of the RPM 
packaging system


-- Russ herrold

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Deleting duplicate rows in a matrix at random

2010-06-03 Thread Magnus Torfason

> I need to remove all but one of each [row in a matrix], which
> must be chosen at random.

This request (included in full at the bottom), has been unanswered for a 
while, but I had the same problem and ended up writing a function to 
solve it. I call it "duplicated.random()" and it does exactly the same 
thing as the "duplicated()" function apart from the fact that the choice 
of which of the duplicated observations gets a FALSE in the result is 
random, rather than always being the first. There is no way to specify 
any distribution probabilities; each duplicated observation is equally 
likely to be chosen.


The implementation is through permuting the original using "sample()", 
then running "duplicated()" and finally reversing the permutation on the 
result. So the randomization should have "similar properties" as 
sample(), probably including reproducibility by setting the random seed 
(although haven't tested that explicitly).


The function and some test code are included below. It handles vectors 
and matrices for now, but adding other data structures that are handled 
correctly by duplicated() should be a simple matter of ensuring that the 
indexing is handled correctly in the permutation process. If anyone 
makes any improvements to the function, I'd be grateful to be notified.


#

# This function returns a logical vector, the elements of which
# are FALSE, unless there are duplicated values in x, in which
# case all but one elements are TRUE (for each set of duplicates).
# The only difference between this function and the duplicated()
# function is that rather than always returning FALSE for the first
# instance of a duplicated value, the choice of instance is random.
duplicated.random = function(x, incomparables = FALSE, ...)
{
if ( is.vector(x) )
{
permutation = sample(length(x))
x.perm  = x[permutation]
result.perm = duplicated(x.perm, incomparables, ...)
result  = result.perm[order(permutation)]
return(result)
}
else if ( is.matrix(x) )
{
permutation = sample(nrow(x))
x.perm  = x[permutation,]
result.perm = duplicated(x.perm, incomparables, ...)
result  = result.perm[order(permutation)]
return(result)
}
else
{
stop(paste("duplicated.random() only supports vectors",
"matrices for now."))
}
}

#

# Test code for vector case
x = sample(1:5,10,T)
d = duplicated(x)
r = duplicated.random(x)
cbind(x,d,r)
x[!d]
x[!r]

# Test code for matrix case
x = matrix(sample(1:2,30,T), ncol=3)
d = duplicated(x)
r = duplicated.random(x)
cbind(x,d,r)

#


On 3/24/2010 11:44 AM, jeff.m.ewers wrote:


Hello,

I am relatively new to R, and I've run into a problem formatting my data for
input into the package RankAggreg.

I have a matrix of gene titles and P-values (weights) in two columns:

KCTD12  4.06904E-22
UNC93A  9.91852E-22
CDKN3   1.24695E-21
CLEC2B  4.71759E-21
DAB21.12062E-20
HSPB1   1.23125E-20
...

The data contains many, many duplicate gene titles, and I need to remove all
but one of each, which must be chosen at random. I have looked for quite
some time, and I've been unable to find a way to do this. Any help would be
greatly appreciated!

Thanks,

Jeff


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Continous variables with implausible transformation?

2010-06-03 Thread Frank E Harrell Jr

On 06/03/2010 11:32 AM, Joris Meys wrote:

You're right, it is the same. using I() won't work for the same reason sqrt
don't, so :


x2<- x^2
lrm(y~x+x2)


Thx for the correction.
Cheers
Joris

On Thu, Jun 3, 2010 at 6:14 PM, Bert Gunter  wrote:


Below. -- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
--
But you wrote linear+ square. Don't you mean:
lrm(Y~x+x^2)

--- I believe this is the same as lrm(Y ~ x).
You must protect the x^2 via

lrm(Y ~ x + I(x^2))


But don't use that construct.  Use lrm(Y ~ pol(x, 2))

Frank



--

--

Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent



--
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cumsum function with data frame

2010-06-03 Thread Felipe Carrillo
You can also use ddply from the plyr package:

library(plyr)
Data <- read.table(textConnection("variable    Year  value
EC01    2005    5
EC01    2006    10
AAO1    2005  2
AAO1  2006  4"),header=T)
Data
ddply(Data,.(variable),summarise,Year=Year,value=value,CUMSUM=cumsum(value))
 
Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA



- Original Message 
> From: Joris Meys 
> To: "n.via...@libero.it" 
> Cc: r-help@r-project.org
> Sent: Thu, June 3, 2010 9:26:17 AM
> Subject: Re: [R] cumsum function with data frame
> 
> See ?split and ?unsplit.

Data <- 
> read.table(textConnection("variable        Year    
>   value
EC01            2005    
>     5
EC01            2006  
>       10
AAO1            
> 2005          2
AAO1        
>   2006          4"),header=T)

Datalist 
> <-split(Data,Data$variable)
resultlist <- 
> lapply(Datalist,function(x){
    x$cumul <- 
> cumsum(x$value)
    return(x)
})
result <- 
> unsplit(resultlist,Data$variable)
result

  variable Year value 
> cumul
1    EC01 2005    5    5
2  
>   EC01 2006    10    15
3    AAO1 
> 2005    2    2
4    AAO1 2006    
> 4    6

On a side note: I've used this construction now for a 
> number of problems.
Some could be better solved using more specific functions 
> (e.g. ave() for
adding a column with means for example). I'm not sure however 
> this is the
most optimal approach to applying a function to subsets of a 
> dataframe and
adding the result of that function as an extra variable. 
> Anybody care to
elaborate on how the R masters had it in 
> mind?

Cheers
Joris

On Thu, Jun 3, 2010 at 5:58 PM, > ymailto="mailto:n.via...@libero.it"; 
> href="mailto:n.via...@libero.it";>n.via...@libero.it <> 
> ymailto="mailto:n.via...@libero.it"; 
> href="mailto:n.via...@libero.it";>n.via...@libero.it>wrote:

>
> 
> Dear list,
> I have a problem with the cumsum function.
> I have a 
> data frame like the following one
> variable        
> Year      value
> EC01          
>   2005        5
>
> EC01    
>         2006        10
>
> 
> AAO1            2005        
>   2
>
> AAO1          2006  
>         4
> what I would like to obtain is
> 
> variable        Year      value  
> cumsum
>
>
> EC01            
> 2005        5          
> 5
>
>
> EC01            
> 2006        10        
> 15
>
>
> AAO1            
> 2005          2          
>   2
>
>
> AAO1          
> 2006          4          
>   6
>
>
> if I use the by function or the aggregate 
> function the result is a list or
> something else, what I want is a data 
> frame as I showed above...
> anyone knows how to get it???
> THANKS 
> A LOT
>
>
>
>
>
>        
> [[alternative HTML version deleted]]
>
> 
> __
> > ymailto="mailto:R-help@r-project.org"; 
> href="mailto:R-help@r-project.org";>R-help@r-project.org mailing list
> 
> > >https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the 
> posting guide
> http://www.R-project.org/posting-guide.html
> and 
> provide commented, minimal, self-contained, reproducible 
> code.
>



-- 
Joris Meys
Statistical 
> Consultant

Ghent University
Faculty of Bioscience 
> Engineering
Department of Applied mathematics, biometrics and process 
> control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 
> 87
> href="mailto:joris.m...@ugent.be";>joris.m...@ugent.be
---
Disclaimer 
> : http://helpdesk.ugent.be/e-maildisclaimer.php

    
> [[alternative HTML version 
> deleted]]

__
> ymailto="mailto:R-help@r-project.org"; 
> href="mailto:R-help@r-project.org";>R-help@r-project.org mailing list
> href="https://stat.ethz.ch/mailman/listinfo/r-help"; target=_blank 
> >https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting 
> guide > >http://www.R-project.org/posting-guide.html
and provide commented, 
> minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >