date:20110317

Re: [R] plotting multiple figures on one page

2011-03-17 Thread Jim Lemon


On 03/17/2011 07:46 AM, scarlet wrote:

I am new to the R language. I am trying to plot multiple figures on one page
through a loop, but the code just produce one graph on one page. Can someone
show some light on what's wrong?

Here is my code:

library("quantreg")
tcdata<-read.table("mydata.txt",header=TRUE)
postscript("myfigure.ps")
basins<- paste(c("b1","b2","b3","b4","b5","b6"),sep="")
par(mfrow = c(2,3),mar=c(0,4,0,2),oma=c(5,0,3,0))

for (k in 1:6) {

data0=subset(tcdata,basin==basins[k])
attach(data0)

model=rq(rain~year,tau=seq(0.1,0.9,0.1))
plot(summary(model,alpha=0.05,se="iid"),parm=2,pch=19,cex=1.2,mar=c(5,5,5,5),
ylab=" ",xlab=" ")

}
dev.off()



Hi scarlet,
First, you don't really have to paste the values for "basin" together, 
the "c" is all you need. Trying this code:


tcdata<-data.frame(rain=sample(400:1500,360),
 year=rep(1927:1986,each=6),basin=rep(1:6,60))
par(mfrow = c(2,3),mar=c(0,4,0,2),oma=c(5,0,3,0))
for (k in unique(tcdata$basin)) {
 data0=subset(tcdata,basin==k)
 model=rq(rain~year,data=data0,tau=seq(0.1,0.9,0.1))
 plot(summary(model,alpha=0.05,se="iid"),parm=2,pch=19,
  cex=1.2,mar=c(5,5,5,5),ylab=" ",xlab=" ")
}

does exactly what you say, that is, ignores the division of the figure 
area into six parts. As ordinary plots work okay, I assume that there is 
something in the plot method for an rq model that overrides the call to 
par(mfrow...)


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regex query (Apache logs)

2011-03-17 Thread Allan Engelhardt

Some comments:

1. [^\s] matches everything up to a literal 's', unless perl=TRUE.
2. The (.*) is greedy, so you'll need (.*?)"\s"(.*?)"\s"(.*?)"$ or 
similar at the end of the expression

With those changes (and removing a space inserted by the newsgroup 
posting) the expression works for me.

> (pat <- readLines("/tmp/b.txt")[1])
[1] 
"^(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\s([^\\s]*)\\s([^\\s]*)\\s\\[([^\\]]+)\\]\\s\"([A-Z]*)\\s([^\\s]*)\\s([^\\s]*)\"\\s([^\\s]+)\\s(\\d+)\\s\"(.*?)\"\\s\"(.*?)\"\\s\"(.*?)\"$"

> regexpr(pat, test, perl=TRUE)
[1] 1
attr(,"match.length")
[1] 436

3. Consider a different approach, e.g. scan(textConnection(test), 
what=character(0))

Hope this helps

Allan

On 16/03/11 22:18, Saptarshi Guha wrote:

Hello R users,

I have this regex see [1] for apache log lines. I tried using R to parse
some data (only because I wanted to stay in R).
A sample line is [2]

(a) I saved the line in [1] into "~/tmp/a.txt" and [2] into "/tmp/a.txt"

pat<- readLines("~/tmp/a.txt")
test<- readLines("/tmp/a.txt")
test
grep(pat,test)

returns integer(0)

The same query works in python via re.match() (i.e does return groups)

Using readLines, the regex is escaped for me. Does Python and R use
different regex styles?

Cheers
Saptarshi

[1]

^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s([^\s]*)\s([^\s]*)\s\[([^\]]+)\]\s"([A-Z]*)\s([^\s]*)\s([^\s]*)"\s([^\s]+)\s(\d+)\s"(.*)"\s"(.*)"\s"(.*)"$

[2]
220.213.119.925 addons.mozilla.org - [10/Jan/2001:01:55:07 -0800] "GET
/blocklist/3/%8ce33983c0-fd0e-11dc-12aa-0800200c9a66%7D/4.0b5/Fennec/20110217140304/Android_arm-eabi-gcc3/chrome:%2F%2Fglobal%2Flocale%2Fintl.properties/beta/Linux%
202.6.32.9/default/default/6/6/1/ HTTP/1.1" 200 3243 "-" "Mozilla/5.0
(Android; Linux armv7l; rv:2.0b12pre) Gecko/20110217 Firefox/4.0b12pre
Fennec/4.0b5" "BLOCKLIST_v3=110.163.217.169.1299218425.9706"

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Flexible rbind

2011-03-17 Thread Santosh Srinivas

Dear All,

I am trying to create a empty structure that I want to fill gradually
through the code.
I want to use something like rbind to create the basic structure first.

I am looking for a possibility to do an rbind where the columns names
dont match fully (but the missing columns can be defaulted to either
zero or n/a)  (my actual data has a lot of columns).
Please see the data frames below

I have the following data frame that is filled
> dput(d)
structure(list(type = structure(1:9, .Label = c("test1", "test2",
"test3", "test4", "test5", "test6", "test7", "test8", "test9"
), class = "factor"), meter = c(37.25122438, 58.65845732, 63.47108421,
94.76085162, 13.75013867, 8.520664878, 79.74623167, 62.16109819,
20.47806657), degree = c(2.884440333, 74.94067898, 32.64251152,
83.24820274, 58.36084794, 12.64490368, 4.428796741, 32.12824213,
97.83710088)), .Names = c("type", "meter", "degree"), class =
"data.frame", row.names = c(NA,
-9L))

To the above data I want to append the following data to create an
empty structure for the new data
> dput(dNewTests)
structure(list(type = structure(1:9, .Label = c("test14", "test15",
"test16", "test17", "test18", "test19", "test20", "test21", "test22"
), class = "factor")), .Names = "type", class = "data.frame", row.names = c(NA,
-9L))

rbind obviously throws the following error:

 rbind(d, dNewTests)
Error in rbind(deparse.level, ...) :
  numbers of columns of arguments do not match

Any way that I can default the missing columns to NA or some default
value like zero?

Also, I tried to go through the archives and tried merge_all. What am
I doing wrong wih the code below ?

require(reshape)
merge_all(d, dNewTests,by="type")

Error in fix.by(by.x, x) :
  'by' must specify column(s) as numbers, names or logical


Thanks,
Santosh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] table() reading problem

2011-03-17 Thread Jim Lemon


On 03/16/2011 08:20 PM, fre wrote:

I have the following problem:

I have some string with numbers like k. I want to have a table like the
function table() gives. However I am not able to call the first row, the 1,
2, 3, 5, 6 or 9. I tried to do that by a data.frame, but that doesn't seem
to work either. The levels keep bothering me.

This is an example of the code:

k<-c(1,1,1,1,1,3,5,6,2,1,3,9,2,3,1,1,1)

table(k)

k
1 2 3 5 6 9
9 2 3 1 1 1

x<-table(k)

dim(x)

[1] 6


x[1] #But i only want the one

1
9


x<-data.frame(x)

x[1,1] #You are not allowed to use this one for example 3*x[1,1] is
impossible

[1] 1
Levels: 1 2 3 5 6 9




I hope anyone has an idea of using the table function without this
inconvenience. I thought about writing a counter myself, but that seems
complicated.
Because I have to examine very large examples later on, I don't want to slow
the calculations down if possible.

Thanks for your help in advance.


Hi Frederique,
It seems to me that you want the names of the table rather than the 
counts, as Sarah has already suggested. So:


as.numeric(names(x))

should give you what you want. However, if all you want to do is 
determine the unique elements of k, then:


unique(k)

might be what you are seeking.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] table() reading problem

2011-03-17 Thread Jim Lemon

Sender: r-help-boun...@r-project.org
On-Behalf-Of: j...@bitwrit.com.au
Subject: Re: [R] table() reading problem
Message-Id: <4d81c14a.2000...@bitwrit.com.au>
Recipient: killian.door...@barclayscapital.com

___


 e at 1 Churchill Place, London, E14 5HP.  This email may relate to or be sent 
from other members of the Barclays Group.
___
--- Begin Message ---

On 03/16/2011 08:20 PM, fre wrote:

I have the following problem:

I have some string with numbers like k. I want to have a table like the
function table() gives. However I am not able to call the first row, the 1,
2, 3, 5, 6 or 9. I tried to do that by a data.frame, but that doesn't seem
to work either. The levels keep bothering me.

This is an example of the code:

k<-c(1,1,1,1,1,3,5,6,2,1,3,9,2,3,1,1,1)

table(k)

k
1 2 3 5 6 9
9 2 3 1 1 1

x<-table(k)

dim(x)

[1] 6


x[1] #But i only want the one

1
9


x<-data.frame(x)

x[1,1] #You are not allowed to use this one for example 3*x[1,1] is
impossible

[1] 1
Levels: 1 2 3 5 6 9




I hope anyone has an idea of using the table function without this
inconvenience. I thought about writing a counter myself, but that seems
complicated.
Because I have to examine very large examples later on, I don't want to slow
the calculations down if possible.

Thanks for your help in advance.


Hi Frederique,
It seems to me that you want the names of the table rather than the 
counts, as Sarah has already suggested. So:


as.numeric(names(x))

should give you what you want. However, if all you want to do is 
determine the unique elements of k, then:


unique(k)

might be what you are seeking.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--- End Message ---
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Flexible rbind

2011-03-17 Thread Santosh Srinivas

Sender: r-help-boun...@r-project.org
On-Behalf-Of: santosh.srini...@gmail.com
Subject: [R] Flexible rbind
Message-Id: 
Recipient: simon.r...@barclayscapital.com

___


 e at 1 Churchill Place, London, E14 5HP.  This email may relate to or be sent 
from other members of the Barclays Group.
___
--- Begin Message ---
Dear All,

I am trying to create a empty structure that I want to fill gradually
through the code.
I want to use something like rbind to create the basic structure first.

I am looking for a possibility to do an rbind where the columns names
dont match fully (but the missing columns can be defaulted to either
zero or n/a)  (my actual data has a lot of columns).
Please see the data frames below

I have the following data frame that is filled
> dput(d)
structure(list(type = structure(1:9, .Label = c("test1", "test2",
"test3", "test4", "test5", "test6", "test7", "test8", "test9"
), class = "factor"), meter = c(37.25122438, 58.65845732, 63.47108421,
94.76085162, 13.75013867, 8.520664878, 79.74623167, 62.16109819,
20.47806657), degree = c(2.884440333, 74.94067898, 32.64251152,
83.24820274, 58.36084794, 12.64490368, 4.428796741, 32.12824213,
97.83710088)), .Names = c("type", "meter", "degree"), class =
"data.frame", row.names = c(NA,
-9L))

To the above data I want to append the following data to create an
empty structure for the new data
> dput(dNewTests)
structure(list(type = structure(1:9, .Label = c("test14", "test15",
"test16", "test17", "test18", "test19", "test20", "test21", "test22"
), class = "factor")), .Names = "type", class = "data.frame", row.names = c(NA,
-9L))

rbind obviously throws the following error:

 rbind(d, dNewTests)
Error in rbind(deparse.level, ...) :
  numbers of columns of arguments do not match

Any way that I can default the missing columns to NA or some default
value like zero?

Also, I tried to go through the archives and tried merge_all. What am
I doing wrong wih the code below ?

require(reshape)
merge_all(d, dNewTests,by="type")

Error in fix.by(by.x, x) :
  'by' must specify column(s) as numbers, names or logical


Thanks,
Santosh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--- End Message ---
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Flexible rbind

2011-03-17 Thread Santosh Srinivas

Sender: r-help-boun...@r-project.org
On-Behalf-Of: santosh.srini...@gmail.com
Subject: [R] Flexible rbind
Message-Id: 
Recipient: killian.door...@barclayscapital.com

___


 e at 1 Churchill Place, London, E14 5HP.  This email may relate to or be sent 
from other members of the Barclays Group.
___
--- Begin Message ---
Dear All,

I am trying to create a empty structure that I want to fill gradually
through the code.
I want to use something like rbind to create the basic structure first.

I am looking for a possibility to do an rbind where the columns names
dont match fully (but the missing columns can be defaulted to either
zero or n/a)  (my actual data has a lot of columns).
Please see the data frames below

I have the following data frame that is filled
> dput(d)
structure(list(type = structure(1:9, .Label = c("test1", "test2",
"test3", "test4", "test5", "test6", "test7", "test8", "test9"
), class = "factor"), meter = c(37.25122438, 58.65845732, 63.47108421,
94.76085162, 13.75013867, 8.520664878, 79.74623167, 62.16109819,
20.47806657), degree = c(2.884440333, 74.94067898, 32.64251152,
83.24820274, 58.36084794, 12.64490368, 4.428796741, 32.12824213,
97.83710088)), .Names = c("type", "meter", "degree"), class =
"data.frame", row.names = c(NA,
-9L))

To the above data I want to append the following data to create an
empty structure for the new data
> dput(dNewTests)
structure(list(type = structure(1:9, .Label = c("test14", "test15",
"test16", "test17", "test18", "test19", "test20", "test21", "test22"
), class = "factor")), .Names = "type", class = "data.frame", row.names = c(NA,
-9L))

rbind obviously throws the following error:

 rbind(d, dNewTests)
Error in rbind(deparse.level, ...) :
  numbers of columns of arguments do not match

Any way that I can default the missing columns to NA or some default
value like zero?

Also, I tried to go through the archives and tried merge_all. What am
I doing wrong wih the code below ?

require(reshape)
merge_all(d, dNewTests,by="type")

Error in fix.by(by.x, x) :
  'by' must specify column(s) as numbers, names or logical


Thanks,
Santosh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--- End Message ---
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset using grepl

2011-03-17 Thread Kang Min

Ok thank you!

On Mar 17, 12:12 pm,  wrote:
> subset(data,grepl("[1-5]", section) & !grepl("0", section))
>
> BTW
>
> grepl("[1:5]", section)
>
> does work.  It checks for the characters 1, :, or 5.  
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Kang Min
> Sent: Thursday, 17 March 2011 1:29 PM
> To: r-h...@r-project.org
> Subject: Re: [R]Subsetusinggrepl
>
> I have a new question, also regardinggrepl.
> I would like tosubsetrows with numbers from 1 to 5 in the section
> column, so I used
>
> subset(data,grepl("[1:5]", section))
>
> but this gave me rows with 10, 1 and 5. (Why is this so?) So I tried
>
> subset(data,grepl("[1,2,3,4,5]", section))
>
> which worked. But I also got 10 in the dataframe as well. How can I
> exclude 10?
>
> >data
> section piece   LTc1    LTc2
> 10a     10-1    0.729095368     NA
> 10a     10-2    59.53292189     13.95612454
> 10h     10-3    0.213756661     NA
> 10i     10-4    NA      NA
> 10b     NA      NA      NA
> 10c     NA      NA      NA
> 10d     NA      NA      NA
> 10e     NA      NA      NA
> 10f     NA      NA      NA
> 10g     NA      NA      NA
> 10h     NA      NA      NA
> 10j     NA      NA      NA
> 1b      1-1     NA      NA
> 1d      1-2     29.37971303     12.79688209
> 1g      1-6     NA      7.607911603
> 1h      1-3     0.298059164     27.09896941
> 1i      1-4     25.11261782     19.87149991
> 1j      1-5     36.66969601     42.28507923
> 1a      NA      NA      NA
> 1c      NA      NA      NA
> 1e      NA      NA      NA
> 1f      NA      NA      NA
> 2a      2-1     15.98582117     10.58696146
> 2a      2-2     0.557308341     41.52650718
> 2c      2-3     14.99499024     10.0896793
> 2e      2-4     148.4530636     56.45493191
> 2f      2-5     25.27493551     12.98808577
> 2i      2-6     20.32857108     22.76075728
> 2b      NA      NA      NA
> 2d      NA      NA      NA
> 2g      NA      NA      NA
> 2h      NA      NA      NA
> 2j      NA      NA      NA
> 3a      3-1     13.36602867     11.47541439
> 3a      3-7     NA      111.9007822
> 3c      3-2     10.57406701     5.58567
> 3d      3-3     11.73240891     10.73833651
> 3e      3-8     NA      14.54214165
> 3h      3-4     21.56072089     21.59748884
> 3i      3-5     15.42846935     16.62715409
> 3i      3-6     129.7367193     121.8206045
> 3b      NA      NA      NA
> 3f      NA      NA      NA
> 3g      NA      NA      NA
> 3j      NA      NA      NA
> 5b      5-1     18.61733498     18.13545293
> 5d      5-3     NA      7.81018526
> 5f      5-2     12.5158971      14.37884817
> 5a      NA      NA      NA
> 5c      NA      NA      NA
> 5e      NA      NA      NA
> 5g      NA      NA      NA
> 5h      NA      NA      NA
> 5i      NA      NA      NA
> 5j      NA      NA      NA
> 9h      9-1     NA      NA
> 9a      NA      NA      NA
> 9b      NA      NA      NA
> 9c      NA      NA      NA
> 9d      NA      NA      NA
> 9e      NA      NA      NA
> 9f      NA      NA      NA
> 9g      NA      NA      NA
> 9i      NA      NA      NA
> 9j      NA      NA      NA
> 8a      8-1     14.29712852     12.83178905
> 8e      8-2     23.46594953     9.097377872
> 8f      8-3     NA      NA
> 8f      8-4     22.20001584     20.39646766
> 8h      8-5     50.54497551     56.93752065
> 8b      NA      NA      NA
> 8c      NA      NA      NA
> 8d      NA      NA      NA
> 8g      NA      NA      NA
> 8i      NA      NA      NA
> 8j      NA      NA      NA
> 4b      4-1     40.83468857     35.99017683
> 4f      4-3     NA      182.8060799
> 4f      4-4     NA      36.81401955
> 4h      4-2     17.13625062     NA
> 4a      NA      NA      NA
> 4c      NA      NA      NA
> 4d      NA      NA      NA
> 4e      NA      NA      NA
> 4g      NA      NA      NA
> 4i      NA      NA      NA
> 4j      NA      NA      NA
> 7b      7-1     8.217605633     8.565035083
> 7a      NA      NA      NA
> 7c      NA      NA      NA
> 7d      NA      NA      NA
> 7e      NA      NA      NA
> 7f      NA      NA      NA
> 7g      NA      NA      NA
> 7h      NA      NA      NA
> 7i      NA      NA      NA
> 7j      NA      NA      NA
> 6b      6-6     NA      11.57887288
> 6c      6-1     27.32608984     17.17778959
> 6c      6-2     78.21988783     61.80558768
> 6d      6-7     NA      3.599685625
> 6f      6-3     26.78838281     23.33258286
> 6h      6-4     NA      NA
> 6h      6-5     NA      NA
> 6a      NA      NA      NA
> 6e      NA      NA      NA
> 6g      NA      NA      NA
> 6i      NA      NA      NA
> 6j      NA      NA      NA
>
> On Jan 29, 10:43 pm, Prof Brian Ripley  wrote:
> > On Sat, 29 Jan 2011, Kang Min wrote:
> > > Thanks Prof Ripley, the condition worked!
> > > Btw I tried to search ?repl but I don't have documentation for it. Is
> > > it in a non-basic package?
>
> > I meantgrepl: the edit messed up (but not on my screen, as sometimes
> > happens when working remotely).  The point is that 'perl=TRUE'
> > guarantees that

Re: [R] NaNs in Nested Mixed Model

2011-03-17 Thread Johan Stenberg

Thank you Thierry for your kind answer!

If you don't mind I would like to ask a follow-up question. In your
suggestions I get P-values for "Species". However, I am really not
interested in that factor per se. Would it make sense to use this
model instead if I am only interested in "Genotype"?

> model<-glmer(Nymphs~Genotype+(1|Species/Genotype),family=poisson)

All the best,
Johan

2011/3/9 ONKELINX, Thierry :
> Dear Johan,
>
> A few remarks.
>
> - R-sig-mixed models is a better list for asking questions about mixed model.
> - I presume that Nymphs is the number of insects? In that case you need a 
> generalised linear (mixed) model with poisson family
> - What are you interessed in? The variability among genotypes or the effect 
> of each genotype.
>        You can achieve the first with a glmm like glmer(Nymphs ~ Species + 
> (1|Genotype), family = "poisson"). Genotype will be implicitly nested in 
> Species.  Note that since you have only 4 genotypes, you will not get very 
> reliable estimates of the genotype variance.
>        For the latter you cannot use a mixed model so you need a simple 
> glm(Nymphs ~ Species/Genotype, family = "poisson"). Note that several 
> coefficients will be NaN, because you cannot estimate them.
>
> Best regards,
>
> Thierry
>
> 
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek
> team Biometrie & Kwaliteitszorg
> Gaverstraat 4
> 9500 Geraardsbergen
> Belgium
>
> Research Institute for Nature and Forest
> team Biometrics & Quality Assurance
> Gaverstraat 4
> 9500 Geraardsbergen
> Belgium
>
> tel. + 32 54/436 185
> thierry.onkel...@inbo.be
> www.inbo.be
>
> To call in the statistician after the experiment is done may be no more than 
> asking him to perform a post-mortem examination: he may be able to say what 
> the experiment died of.
> ~ Sir Ronald Aylmer Fisher
>
> The plural of anecdote is not data.
> ~ Roger Brinner
>
> The combination of some data and an aching desire for an answer does not 
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
>
>> -Oorspronkelijk bericht-
>> Van: r-help-boun...@r-project.org
>> [mailto:r-help-boun...@r-project.org] Namens Johan Stenberg
>> Verzonden: dinsdag 8 maart 2011 16:52
>> Aan: r-help@r-project.org
>> Onderwerp: [R] NaNs in Nested Mixed Model
>>
>> Dear R users,
>>
>> I have a problem with something called "NaNs" in a nested mixed model.
>>
>> The background is that I have studied the number of insect
>> nymphs emerging from replicated Willow genotypes in the
>> field. I have 15 replicates each of 4 Willow genotypes
>> belonging two 2 Willow species.
>> Now I want to elucidate the effect of Willow genotype on the
>> number of emerging nymphs. Previously I performed a simple
>> one-way anova with "genotype" as explanatory factor and
>> "number of nymphs emerging" as dependent variable, but the
>> editor of the journal I've submitted this piece to wants me
>> to nest Willow genotype within Willow species before he
>> accepts the paper for publication [Species*Genotype(Species)].
>>
>> The fact that I didn't include "Willow species" as a factor
>> in my initial analysis reflects that I am not very interested
>> in the species factor per se - I am just interested in if
>> genetic variation in the host plant is important, but
>> "species" is of course a factor that structures genetic diversity.
>>
>> I thought the below model would be appropriate:
>>
>> > model<-lme(Nymphs~Species*Genotype,random=~1|Species/Genotype)
>>
>> ...but I then get the error message "Error in MEEM(object, conLin,
>> control$niterEM) : Singularity in backsolve at level 0, block 1"
>>
>> I then tried to remove "Genotype" from the fixed factors, but
>> then I get the error message "NaNs produced".
>>
>> > model<-lme(Nymphs~Species,random=~1|Species/Genotype)
>> > summary(model)
>> Linear mixed-effects model fit by REML
>>  Data: NULL
>>        AIC      BIC    logLik
>>   259.5054 269.8077 -124.7527
>>
>> Random effects:
>>  Formula: ~1 | Species
>>         (Intercept)
>> StdDev:   0.9481812
>>
>>  Formula: ~1 | Genotype %in% Species
>>         (Intercept) Residual
>> StdDev:   0.3486937 1.947526
>>
>> Fixed effects: Nymphs ~ Species
>>                  Value Std.Error DF   t-value p-value
>> (Intercept)   2.67  1.042243 56  2.558585  0.0132
>> Speciesviminalis -2.03  1.473954  0 -1.379510     NaN
>>  Correlation:
>>              (Intr)
>> Speciesviminalis -0.707
>>
>> Standardized Within-Group Residuals:
>>        Min         Q1        Med         Q3        Max
>> -1.4581821 -0.3892233 -0.2751795  0.3439871  3.1630658
>>
>> Number of Observations: 60
>> Number of Groups:
>>           Species Genotype %in% Species
>>             2             4
>> Warning message:
>> In pt(q, df, lower.tail, log.p) : NaNs produced
>> ***
>>
>> Do you have any idea what these error messages mean in my
>> case and how I can

Re: [R] Specify feature weights in model prediction (CARET)

2011-03-17 Thread Kendric Wang

It was a general question to find out if the caret framework supported
passing feature weights to any predictive model. But I guess, it may only
make sense in the context of the model. The ones I am interested in would be
svm and knn.

Thanks,
Kendric

On Wed, Mar 16, 2011 at 6:20 PM, Max Kuhn  wrote:

> > Using the 'CARET' package, is it possible to specify weights for features
> > used in model prediction?
>
> For what model?
>
> > And for the 'knn' implementation, is there a way
> > to choose a distance metric (i.e. Mahalanobis distance)?
> >
>
> No, sorry.
>
> Max
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] flow map lines between point pairs (latitude/longitude)

2011-03-17 Thread vioravis

I am working on a similar problem. I have to add two columns: one containing
the US state to which the origin belongs and another one to add the state in
to which destination belongs. All I have is the latitude and the longitude
of the origin and destination. Are there any packages in R that can do
this??? Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/flow-map-lines-between-point-pairs-latitude-longitude-tp860009p3383842.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fetch uneven

2011-03-17 Thread rens

Thanks !
You all made my day !


--
View this message in context: 
http://r.789695.n4.nabble.com/fetch-uneven-tp3381949p3383956.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R² for non-linear model

2011-03-17 Thread Rubén Roa

Hi Alexx,

I don't see any problem in comparing models based on different distributions 
for the same data using the AIC, as long as they have a different number of 
parameters and all the constants are included.
For example, you can compare distribution mixture models with different number 
of components using the AIC.
This is one example:
Roa-Ureta. 2010. A Likelihood-Based Model of Fish Growth With Multiple Length 
Frequency Data. Journal of Biological, Agricultural and Environmental 
Statistics 15:416-429.
Here is another example:
www.education.umd.edu/EDMS/fac/Dayton/PCIC_JMASM.pdf
Prof. Dayton writes above that one advantage of AIC over hypothesis testing is:
"(d) Considerations related to underlying distributions   for   random   
variables   can   be 
incorporated  into  the  decision-making  process rather than being treated as 
an assumption whose 
robustness  must  be  considered  (e.g.,  models based  on  normal  densities  
and  on  log-normal 
densities can be compared)."
Last, if you read Akaike's theorem you will see there is nothing precluding 
comparing models built on different distributional models. Here it is:
" the expected (over the sample space and the space of parameter estimates) 
maximum log-likelihood of some data on a working model overshoots the expected 
(over the sample space only) maximum log-likelihood of the data under the true 
model that 
generated the data by exactly the number of  parameters in the working model."
A remarkable result.

Rubén

-Original Message-
From: r-help-boun...@r-project.org on behalf of Alexx Hardt
Sent: Wed 3/16/2011 7:42 PM
To: r-help@r-project.org
Subject: Re: [R] R² for non-linear model
 
Am 16.03.2011 19:34, schrieb Anna Gretschel:
> Am 16.03.2011 19:21, schrieb Alexx Hardt:
>> And to be on-topic: Anna, as far as I know anova's are only useful to
>> compare a submodel (e.g. with one less regressor) to another model.
>>
> thanks! i don't get it either what they mean by fortune...

It's an R-package (and a pdf [1]) with collected quotes from the mailing 
list.
Be careful with the suggestion to use AIC. If you wanted to compare two 
models using AICs, you need the same distribution (that is, 
Verteilungsannahme) in both models.
To my knowledge, there is no way to "compare" a gaussian model to an 
exponential one (except common sense), but my knowledge is very limited.

[1] http://cran.r-project.org/web/packages/fortunes/vignettes/fortunes.pdf

-- 
alexx@alexx-fett:~$ vi .emacs

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rJava software

2011-03-17 Thread Bio7

Hello,

it seems that rJava tries to detect the path to the Java Virtual Maschine
from a registry key which is not installed. I gues that the HLM means
HKEY_LOCAL_MASCHINE where you find normally the path to your installed Java.
On my Windows System i have for example the path
HKEY_LOCAL_MASCHINE\Software\JavaSoft\Java Runtime Environment\1.6.0_24
where you can find the key for the Java installation path.
Try to open regedit fromyour Windows console and convince yourself that the
path is not present.

Maybee a reinstallation of the JRE or JDK helps (for the registry key). Or
try to put your Java installation (the location) on the OS system path (open
the Windows Shell and type Java -version if a installation is already on the
system path), maybee that helps, too. 

Also, does it matter where java is installed?
Yes, it does matter because the OS has to find the executables and native
libs of the JRE and rJava has to find them, too because it relies on them.

I hope this helps.
 

--
View this message in context: 
http://r.789695.n4.nabble.com/rJava-software-tp3383366p3383977.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] adding linear regression data to plot

2011-03-17 Thread Jim Lemon


On 03/17/2011 02:47 AM, derek wrote:

I know I can add line to graph with abline(), but I would like to print
R-squared, F-test value, Residuals  and other statistics from lm() to a
graph. I don't know how to access the values from summary(), so that I can
use them in a following code or print them in a graph.


Hi derek,
As Peter has already replied, you can use str() to find out the 
components of the summary. Try something like this:


plot(1:10)
library(plotrix)
addtable2plot(2,8,coef(result))

where "result" is what has been returned by the call to "lm".

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why doesn't this work ?

2011-03-17 Thread Iain Gallagher

The first line of this reply is a definite candidate for the fortunes package!

best

i

--- On Thu, 17/3/11, bill.venab...@csiro.au  wrote:

> From: bill.venab...@csiro.au 
> Subject: Re: [R] Why doesn't this work ?
> To: ericst...@aol.com, r-help@r-project.org
> Date: Thursday, 17 March, 2011, 3:54
> It doesn't work (in R) because it is
> not written in R.  It's written in some other language
> that looks a bit like R.
> 
> > t <- 3
> > z <- t %in% 1:3
> > z
> [1] TRUE
> > t <- 4
> > z <- t %in% 1:3
> > z
> [1] FALSE
> > 
> 
>  
> 
> -Original Message-
> From: r-help-boun...@r-project.org
> [mailto:r-help-boun...@r-project.org]
> On Behalf Of eric
> Sent: Thursday, 17 March 2011 1:26 PM
> To: r-help@r-project.org
> Subject: [R] Why doesn't this work ?
> 
> Why doesn't this work and is there a better way ?
> 
> z <-ifelse(t==1 || 2 || 3, 1,0)
> t <-3
> z
> [1] 1
> t <-4
> z
> [1] 1
> 
> trying to say ...if t == 1 or if t== 2 or if t ==3 then
> true, otherwise
> false
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Why-doesn-t-this-work-tp3383656p3383656.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
> 
> __
> R-help@r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] calculating the occurrences of distinct observations in the subsets of a dataframe

2011-03-17 Thread Bodnar Laszlo EB_HU

Hello everybody,

I have a data frame in R which is similar to the follows. Actually my real 'df' 
dataframe is much bigger than this one here but I really do not want to confuse 
anybody so that is why I try to simplify things as much as possible.

So here's the data frame.

id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,3)
df <-data.frame(id,a,b,c,d,e)
df

Basically what I would like to do is to get the occurrences of numbers for each 
column (a,b,c,d,e) and for each id group (1,2,3) (for this latter grouping see 
my column 'id').

So, for column 'a' and for id number '1' (for the latter see column 'id') the 
code would be something like this:
as.numeric(table(df[1:10,2]))

The results are:
[1] 3 7

Just to briefly explain my results: in column 'a' (and regarding only those 
records which have number '1' in column 'id') we can say that:
number 1 occured 3 times, and
number 3 occured 7 times.

Again, just to show you another example. For column 'a' and for id number '2' 
(for the latter grouping see again column 'id'):
as.numeric(table(df[11:20,2]))

After running the codes the results are: [1] 4 3 3

Let me explain a little again: in column 'a' and regarding only those 
observations which have number '2' in column 'id') we can say that
number 1 occured 4 times
number 2 occured 3 times and
number 3 occured 3 times.

Last example: for column 'e' and for id number '3' the code would be:
as.numeric(table(df[21:30,6]))

With the results:
[1] 1 4 5

...meaning that number '1' occured once, number '2' occured four times and 
number '3' occured 5 times.

So this is what I would like to do. Calculating the occurrences of numbers for 
each custom-defined subsets (and then collecting these values into a data 
frame). I know it is NOT a difficult task but the PROBLEM is that I'm gonna 
have to change the input 'df' dataframe on a regular basis and hence both the 
overall number of rows and columns might CHANGE over time...

What I have done so far is that I have separated the 'df' dataframe by columns, 
like this:
for (z in (2:ncol(df))) assign(paste("df",z,sep="."),df[,z])

So df.2 will refer to df$a, df.3 will equal df$b, df.4 will equal df$c etc. But 
I'm really stuck now and I don't know how to move forward, you know, getting 
the occurrences for each column and each group of ids.

Do you have any ideas?
Best regards,
Laszlo


Ez az e-mail Ã©s az Ã¶sszes hozzÃ¡ tartozÃ³ csatolt mellÃ©klet titkos Ã©s/vagy 
jogilag, szakmailag vagy mÃ¡s mÃ³don vÃ©dett informÃ¡ciÃ³t tartalmazhat. 
Amennyiben nem Ãn a levÃ©l cÃmzettje akkor a levÃ©l tartalmÃ¡nak kÃ¶zlÃ©se, 
reprodukÃ¡lÃ¡sa, mÃ¡solÃ¡sa, vagy egyÃ©b mÃ¡s Ãºton tÃ¶rtÃ©nÅ terjesztÃ©se, 
felhasznÃ¡lÃ¡sa szigorÃºan tilos. Amennyiben tÃ©vedÃ©sbÅl kapta meg ezt az 
Ã¼zenetet kÃ©rjÃ¼k azonnal Ã©rtesÃtse az Ã¼zenet kÃ¼ldÅjÃ©t. Az Erste Bank 
Hungary Zrt. (EBH) nem vÃ¡llal felelÅssÃ©get az informÃ¡ciÃ³ teljes Ã©s pontos 
- cÃmzett(ek)hez tÃ¶rtÃ©nÅ - eljuttatÃ¡sÃ¡Ã©rt, valamint semmilyen 
kÃ©sÃ©sÃ©rt, kapcsolat megszakadÃ¡sbÃ³l eredÅ hibÃ¡Ã©rt, vagy az informÃ¡ciÃ³ 
felhasznÃ¡lÃ¡sÃ¡bÃ³l vagy annak megbÃzhatatlansÃ¡gÃ¡bÃ³l eredÅ kÃ¡rÃ©rt.

Az Ã¼zenetek EBH-n kÃvÃ¼li kÃ¼ldÅje vagy cÃmzettje tudomÃ¡sul veszi Ã©s 
hozzÃ¡jÃ¡rul, hogy az Ã¼zenetekhez mÃ¡s banki alkalmazott is hozzÃ¡fÃ©rhet az 
EBH folytonos munkamenetÃ©nek biztosÃtÃ¡sa Ã©rdekÃ©ben.


This e-mail and any attached files are confidential and/...{{dropped:19}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why doesn't this work ?

2011-03-17 Thread Ted Harding

And, just to make it really clear (I hope!):

Your original expression
  z <-ifelse(t==1 || 2 || 3, 1,0)
looks like a transcription into "R" of the words

  "If t equals 1 or 2 or 3 then z is 1 else z is 0"

However, your "t==1 || 2 || 3" has to be parsed in the correct
order according to operator precedence. If you look at '?Syntax'
you will see that operator '==' has precedence over the "or"
operator '||'. Hence the expression "t==1 || 2 || 3" will be
parsed as

  (t==1) || 2 || 3

So, whatever the value of 't' ("t==1" may be either TRUE or
FALSE), the result will be either

  TRUE || 2 || 3
or
  FALSE || 2 || 3

When numeric values (like "2" or "3") occur in a logical
expression, they are coerced to a logical TRUE (with the
one exception that a numerical value of "0" is coerced
to FALSE). Hence, whatever the outcome of "t==1" the result
will be either

  TRUE || TRUE || TRUE
or
  FALSE || TRUE || TRUE

which is always TRUE. This explains your results. Bill and Phil
have given you an alternative which works: "t %in% (1:3)".
A way of writing it (though longer) which is closer to your
original "wording" could be

  if( (t==1) || (t==2) || (t==3) , 1, 0)

which really spells out how to force the parsing.

Hoping this helps,
Ted.

On 17-Mar-11 03:54:32, bill.venab...@csiro.au wrote:
> It doesn't work (in R) because it is not written in R.  It's written in
> some other language that looks a bit like R.
> 
>> t <- 3
>> z <- t %in% 1:3
>> z
> [1] TRUE
>> t <- 4
>> z <- t %in% 1:3
>> z
> [1] FALSE
> 
> -Original Message-
> From: r-help-boun...@r-project.org
> [mailto:r-help-boun...@r-project.org] On Behalf Of eric
> Sent: Thursday, 17 March 2011 1:26 PM
> To: r-help@r-project.org
> Subject: [R] Why doesn't this work ?
> 
> Why doesn't this work and is there a better way ?
> 
> z <-ifelse(t==1 || 2 || 3, 1,0)
> t <-3
> z
> [1] 1
> t <-4
> z
> [1] 1
> 
> trying to say ...if t == 1 or if t== 2 or if t ==3 then true, otherwise
> false
> 
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Why-doesn-t-this-work-tp3383656p3383656.ht
> ml
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 17-Mar-11   Time: 10:13:53
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] create data set from selection of rows

2011-03-17 Thread e-letter

On 15/03/2011, Francisco Gochez  wrote:
> Hi,
>
> What you are after is:
>
> datasubset <- dataset[ dataset[,3] == "text3", ]

Thank you. For the set

text1,23,text2,45
text1,23,text3,78
text1,23,text3,56
text1,23,text2,45

Is it possible to write a function that selects rows containing
'text3' and applies the function 'sum' to values '78' and '56'? The
control statements described in the document 'an introduction to r'
(Venables and Smith, 2010) suggest that the if statement would return
'true', to prevent a sum function being applied to 'true' results.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Retrieve an index of nested lists | Changing name delimiter in 'unlist()'

2011-03-17 Thread Janko Thyson

Dear list,

I have to problems that are connected:

PROBLEM 1
I wonder if it is somehow possible to patch the function
'unlist(use.names=TRUE)' such that you can specify an arbitrary name
delimiter, e.g. "/" or "_". As I often name my variables "var.x.y", the
default delimiter makes it hard to distinguish the distinct layers of a
nested list after unlisting. 
As 'unlist()' it is called by '.Internal()', I didn't have a clue of how to
find the section where the names are pasted.

I'd like to use such a patched version in order to create a customized index
(class: data.frame) of nested lists which should look like this

name index  is.top  is.bottom   degree
a1  TRUEFALSE   1
a/a.11-1FALSE   FALSE   2
a/a.1/a.1.1  1-1-1  FALSE   TRUE3
a/a.21-2FALSE  FALSE   2
a/a.2/a.2.1  1-2-1  FALSE  TRUE3
b2  TRUE   FALSE   1
...

Of course I could get such an index by recursively traversing the list
layers, but this takes much too long. Maybe you also have another idea on
how to create such an index.

PROBLEM 2
I'd also like to retrieve such an index for nested environment structures.
Therefore, I would either need something similar to 'unlist()' that works on
environments or find something that coerces nested environments to nested
lists. Again, I solved this by recursively looping over the respective
environment layers, but it's too inefficient. Is there something out there
that does the job via a C routine somehow?
Thanks for any suggestions,
Janko

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculating the occurrences of distinct observations in the subsets of a dataframe

2011-03-17 Thread Tóth Dénes


Hi!

Sorry, I made an error in the previous e-mail.
So try this:
by(df[,-1],df$id,function(x) apply(x,2,tabulate))

This gives you a list. You can rearrange it into a data frame or a 3d
array if you wish.

Regards,
  Denes




> Hello everybody,
>
> I have a data frame in R which is similar to the follows. Actually my real
> 'df' dataframe is much bigger than this one here but I really do not want
> to confuse anybody so that is why I try to simplify things as much as
> possible.
>
> So here's the data frame.
>
> id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
> a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
> b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
> c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
> d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
> e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,3)
> df <-data.frame(id,a,b,c,d,e)
> df
>
> Basically what I would like to do is to get the occurrences of numbers for
> each column (a,b,c,d,e) and for each id group (1,2,3) (for this latter
> grouping see my column 'id').
>
> So, for column 'a' and for id number '1' (for the latter see column 'id')
> the code would be something like this:
> as.numeric(table(df[1:10,2]))
>
> The results are:
> [1] 3 7
>
> Just to briefly explain my results: in column 'a' (and regarding only
> those records which have number '1' in column 'id') we can say that:
> number 1 occured 3 times, and
> number 3 occured 7 times.
>
> Again, just to show you another example. For column 'a' and for id number
> '2' (for the latter grouping see again column 'id'):
> as.numeric(table(df[11:20,2]))
>
> After running the codes the results are: [1] 4 3 3
>
> Let me explain a little again: in column 'a' and regarding only those
> observations which have number '2' in column 'id') we can say that
> number 1 occured 4 times
> number 2 occured 3 times and
> number 3 occured 3 times.
>
> Last example: for column 'e' and for id number '3' the code would be:
> as.numeric(table(df[21:30,6]))
>
> With the results:
> [1] 1 4 5
>
> ...meaning that number '1' occured once, number '2' occured four times and
> number '3' occured 5 times.
>
> So this is what I would like to do. Calculating the occurrences of numbers
> for each custom-defined subsets (and then collecting these values into a
> data frame). I know it is NOT a difficult task but the PROBLEM is that I'm
> gonna have to change the input 'df' dataframe on a regular basis and hence
> both the overall number of rows and columns might CHANGE over time...
>
> What I have done so far is that I have separated the 'df' dataframe by
> columns, like this:
> for (z in (2:ncol(df))) assign(paste("df",z,sep="."),df[,z])
>
> So df.2 will refer to df$a, df.3 will equal df$b, df.4 will equal df$c
> etc. But I'm really stuck now and I don't know how to move forward, you
> know, getting the occurrences for each column and each group of ids.
>
> Do you have any ideas?
> Best regards,
> Laszlo
>
> 
> Ez az e-mail Ã©s az Ã¶sszes hozzÃ¡ tartozÃ³ csatolt mellÃ©klet titkos
> Ã©s/vagy jogilag, szakmailag vagy mÃ¡s mÃ³don vÃ©dett informÃ¡ciÃ³t
> tartalmazhat. Amennyiben nem Ãn a levÃ©l cÃmzettje akkor a levÃ©l
> tartalmÃ¡nak kÃ¶zlÃ©se, reprodukÃ¡lÃ¡sa, mÃ¡solÃ¡sa, vagy egyÃ©b mÃ¡s
> Ãºton tÃ¶rtÃ©nÅ terjesztÃ©se, felhasznÃ¡lÃ¡sa szigorÃºan tilos.
> Amennyiben tÃ©vedÃ©sbÅl kapta meg ezt az Ã¼zenetet kÃ©rjÃ¼k azonnal
> Ã©rtesÃtse az Ã¼zenet kÃ¼ldÅjÃ©t. Az Erste Bank Hungary Zrt. (EBH) nem
> vÃ¡llal felelÅssÃ©get az informÃ¡ciÃ³ teljes Ã©s pontos - cÃmzett(ek)hez
> tÃ¶rtÃ©nÅ - eljuttatÃ¡sÃ¡Ã©rt, valamint semmilyen kÃ©sÃ©sÃ©rt, kapcsolat
> megszakadÃ¡sbÃ³l eredÅ hibÃ¡Ã©rt, vagy az informÃ¡ciÃ³
> felhasznÃ¡lÃ¡sÃ¡bÃ³l vagy annak megbÃzhatatlansÃ¡gÃ¡bÃ³l eredÅ
> kÃ¡rÃ©rt.
>
> Az Ã¼zenetek EBH-n kÃvÃ¼li kÃ¼ldÅje vagy cÃmzettje tudomÃ¡sul veszi Ã©s
> hozzÃ¡jÃ¡rul, hogy az Ã¼zenetekhez mÃ¡s banki alkalmazott is hozzÃ¡fÃ©rhet
> az EBH folytonos munkamenetÃ©nek biztosÃtÃ¡sa Ã©rdekÃ©ben.
>
>
> This e-mail and any attached files are confidential and/...{{dropped:19}}
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] changing the dimensions of a matrix in a real specific way

2011-03-17 Thread Bodnar Laszlo EB_HU

Hi again,

I'd like to ask you a question again.

I have a matrix like this:
a <-matrix(c(1,2,3,4,5,6,7,8,9,10,11,12))
a

  [,1]
 [1,]1
 [2,]2
 [3,]3
 [4,]4
 [5,]5
 [6,]6
 [7,]7
 [8,]8
 [9,]9
[10,]   10
[11,]   11
[12,]   12

Is there a proper way to change the dimensions of this matrix so that I'll get 
this as a result:
 [,1] [,2] [,3]
[1,]   123
[2,]   456
[3,]   789
[4,]  10   11   12

Thank you very much and have a pleasant day,
Laszlo


Ez az e-mail Ã©s az Ã¶sszes hozzÃ¡ tartozÃ³ csatolt mellÃ©klet titkos Ã©s/vagy 
jogilag, szakmailag vagy mÃ¡s mÃ³don vÃ©dett informÃ¡ciÃ³t tartalmazhat. 
Amennyiben nem Ãn a levÃ©l cÃmzettje akkor a levÃ©l tartalmÃ¡nak kÃ¶zlÃ©se, 
reprodukÃ¡lÃ¡sa, mÃ¡solÃ¡sa, vagy egyÃ©b mÃ¡s Ãºton tÃ¶rtÃ©nÅ terjesztÃ©se, 
felhasznÃ¡lÃ¡sa szigorÃºan tilos. Amennyiben tÃ©vedÃ©sbÅl kapta meg ezt az 
Ã¼zenetet kÃ©rjÃ¼k azonnal Ã©rtesÃtse az Ã¼zenet kÃ¼ldÅjÃ©t. Az Erste Bank 
Hungary Zrt. (EBH) nem vÃ¡llal felelÅssÃ©get az informÃ¡ciÃ³ teljes Ã©s pontos 
- cÃmzett(ek)hez tÃ¶rtÃ©nÅ - eljuttatÃ¡sÃ¡Ã©rt, valamint semmilyen 
kÃ©sÃ©sÃ©rt, kapcsolat megszakadÃ¡sbÃ³l eredÅ hibÃ¡Ã©rt, vagy az informÃ¡ciÃ³ 
felhasznÃ¡lÃ¡sÃ¡bÃ³l vagy annak megbÃzhatatlansÃ¡gÃ¡bÃ³l eredÅ kÃ¡rÃ©rt.

Az Ã¼zenetek EBH-n kÃvÃ¼li kÃ¼ldÅje vagy cÃmzettje tudomÃ¡sul veszi Ã©s 
hozzÃ¡jÃ¡rul, hogy az Ã¼zenetekhez mÃ¡s banki alkalmazott is hozzÃ¡fÃ©rhet az 
EBH folytonos munkamenetÃ©nek biztosÃtÃ¡sa Ã©rdekÃ©ben.


This e-mail and any attached files are confidential and/...{{dropped:19}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] create data set from selection of rows

2011-03-17 Thread jim holtman

Is this what you want:

> x
 V1 V2V3 V4
1 text1 23 text2 45
2 text1 23 text3 78
3 text1 23 text3 56
4 text1 23 text2 45
> str(x)
'data.frame':   4 obs. of  4 variables:
 $ V1: Factor w/ 1 level "text1": 1 1 1 1
 $ V2: int  23 23 23 23
 $ V3: Factor w/ 2 levels "text2","text3": 1 2 2 1
 $ V4: int  45 78 56 45
> tapply(x$V4, x$V3, sum)
text2 text3
   90   134
>


On Thu, Mar 17, 2011 at 6:50 AM, e-letter  wrote:
> On 15/03/2011, Francisco Gochez  wrote:
>> Hi,
>>
>> What you are after is:
>>
>> datasubset <- dataset[ dataset[,3] == "text3", ]
>
> Thank you. For the set
>
> text1,23,text2,45
> text1,23,text3,78
> text1,23,text3,56
> text1,23,text2,45
>
> Is it possible to write a function that selects rows containing
> 'text3' and applies the function 'sum' to values '78' and '56'? The
> control statements described in the document 'an introduction to r'
> (Venables and Smith, 2010) suggest that the if statement would return
> 'true', to prevent a sum function being applied to 'true' results.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] create data set from selection of rows

2011-03-17 Thread Dennis Murphy

Using the summarise function in package plyr is one way; taking df to be
your data frame with variable names V1-V4,

library(plyr)
summarise(subset(df, V3 == 'text3'), sum = sum(V4))
  sum
1 134

Another is to use the data.table package:

library(data.table)
dt <- data.table(df)
dt[V3 == 'text3', sum(V4)]
[1] 134

A third route would be with package sqldf, something like

sqldf(df, "select sum(V4) from df where V3 == 'text3'")

but sqldf is messed up on my system (probably because I didn't install it
properly) so the code is untested.

HTH,
Dennis

On Thu, Mar 17, 2011 at 3:50 AM, e-letter  wrote:

> On 15/03/2011, Francisco Gochez  wrote:
> > Hi,
> >
> > What you are after is:
> >
> > datasubset <- dataset[ dataset[,3] == "text3", ]
>
> Thank you. For the set
>
> text1,23,text2,45
> text1,23,text3,78
> text1,23,text3,56
> text1,23,text2,45
>
> Is it possible to write a function that selects rows containing
> 'text3' and applies the function 'sum' to values '78' and '56'? The
> control statements described in the document 'an introduction to r'
> (Venables and Smith, 2010) suggest that the if statement would return
> 'true', to prevent a sum function being applied to 'true' results.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Extracting columns from a class

2011-03-17 Thread nuncio m

Hi list,
   I am not a frequent user of R.  Recently I used R in principal
component analysis and got the result as a class, which has information like
standard deviation and principal components from 1 to 10.  How is it
possible to extract the column corresponding to first principal component
and write it to a file
the out from prcomp command is something like this

Standard
deviations:

 [1] 3.325801e+00 7.669837e-01 6.625773e-01 4.990732e-01 3.470071e-01
 [6] 2.946679e-01 2.206289e-01 1.645828e-01 1.570887e-01
4.741294e-16


Rotation:
   PC1   PC2   PC3   PC4  PC5
  [1,] -0.07900624 -0.0824864352  0.1208419434  0.1763425845  0.089545020
  [2,] -0.09114708 -0.0901675110  0.1377608881  0.2224127252  0.076620976
  [3,] -0.10510742 -0.0935434206  0.1113586044  0.2513993555  0.029783117

I want to extract PC1 and 1 value in the standard deviation

Thanks

-- 
Nuncio.M
Research Scientist
National Center for Antarctic and Ocean research
Head land Sada
Vasco da Gamma
Goa-403804

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] assigning to list element within target environment

2011-03-17 Thread Richard D. Morey

I would like to assign an value to an element of a list contained in an 
environment. The list will contain vectors and matrices. Here's a simple 
example:


# create toy environment
testEnv = new.env(parent = emptyenv())

# create list that will be in the environment, then assign() it
x = list(a=1,b=2)
assign("xList",x,testEnv)

# create new element, to be inserted into xList
c = 5:7

Now, what I'd like to do is something like this:

assign("xList[[3]]",c,testEnv)

But the assign() help says:
"assign does not dispatch assignment methods, so it cannot be used to 
set elements of vectors, names, attributes, etc."


So that's no good. Also, this fails:
attach(testEnv)
xList[[3]] <<- c
detach(testEnv)

because, as the attach() help says:
"If you use <<- or assign to assign to an attached database, you only 
alter the attached copy, not the original object."


What I've been doing is making a copy of xList in the local environment, 
changing it, then using assign() to make the change in the testEnv 
environment. This is unsatisfactory due to the double use of memory 
making a copy of testEnv (the actual list I will use will be quite large).


Another option would be this:

# Assign the new element to a variable in the target environment
assign("temp",c,testEnv)
# "Assign" it within the target environment, from the temp variable
eval(parse(text="xList[[3]]=temp"),env=testEnv)
# remove the unneeded variable
rm("temp",envir=testEnv)

But I figure there must be a more elegant way. Anyone have any ideas?

Thanks,
Richard

--
Richard D. Morey
Assistant Professor
Psychometrics and Statistics
Rijksuniversiteit Groningen / University of Groningen
http://drsmorey.org/research/rdmorey

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] changing the dimensions of a matrix in a real specific way

2011-03-17 Thread Tóth Dénes


 t(matrix(a,3,4))

for more complex arrays, see ?aperm


> Hi again,
>
> I'd like to ask you a question again.
>
> I have a matrix like this:
> a <-matrix(c(1,2,3,4,5,6,7,8,9,10,11,12))
> a
>
>   [,1]
>  [1,]1
>  [2,]2
>  [3,]3
>  [4,]4
>  [5,]5
>  [6,]6
>  [7,]7
>  [8,]8
>  [9,]9
> [10,]   10
> [11,]   11
> [12,]   12
>
> Is there a proper way to change the dimensions of this matrix so that I'll
> get this as a result:
>  [,1] [,2] [,3]
> [1,]   123
> [2,]   456
> [3,]   789
> [4,]  10   11   12
>
> Thank you very much and have a pleasant day,
> Laszlo
>
> 
> Ez az e-mail Ã©s az Ã¶sszes hozzÃ¡ tartozÃ³ csatolt mellÃ©klet titkos
> Ã©s/vagy jogilag, szakmailag vagy mÃ¡s mÃ³don vÃ©dett informÃ¡ciÃ³t
> tartalmazhat. Amennyiben nem Ãn a levÃ©l cÃmzettje akkor a levÃ©l
> tartalmÃ¡nak kÃ¶zlÃ©se, reprodukÃ¡lÃ¡sa, mÃ¡solÃ¡sa, vagy egyÃ©b mÃ¡s
> Ãºton tÃ¶rtÃ©nÅ terjesztÃ©se, felhasznÃ¡lÃ¡sa szigorÃºan tilos.
> Amennyiben tÃ©vedÃ©sbÅl kapta meg ezt az Ã¼zenetet kÃ©rjÃ¼k azonnal
> Ã©rtesÃtse az Ã¼zenet kÃ¼ldÅjÃ©t. Az Erste Bank Hungary Zrt. (EBH) nem
> vÃ¡llal felelÅssÃ©get az informÃ¡ciÃ³ teljes Ã©s pontos - cÃmzett(ek)hez
> tÃ¶rtÃ©nÅ - eljuttatÃ¡sÃ¡Ã©rt, valamint semmilyen kÃ©sÃ©sÃ©rt, kapcsolat
> megszakadÃ¡sbÃ³l eredÅ hibÃ¡Ã©rt, vagy az informÃ¡ciÃ³
> felhasznÃ¡lÃ¡sÃ¡bÃ³l vagy annak megbÃzhatatlansÃ¡gÃ¡bÃ³l eredÅ
> kÃ¡rÃ©rt.
>
> Az Ã¼zenetek EBH-n kÃvÃ¼li kÃ¼ldÅje vagy cÃmzettje tudomÃ¡sul veszi Ã©s
> hozzÃ¡jÃ¡rul, hogy az Ã¼zenetekhez mÃ¡s banki alkalmazott is hozzÃ¡fÃ©rhet
> az EBH folytonos munkamenetÃ©nek biztosÃtÃ¡sa Ã©rdekÃ©ben.
>
>
> This e-mail and any attached files are confidential and/...{{dropped:19}}
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Flexible rbind

2011-03-17 Thread sayan dasgupta

Hi Santosh,

May be you looking at something like this

merge(d,dNewTests,by="type",all=TRUE)


On Thu, Mar 17, 2011 at 1:03 PM, Santosh Srinivas <
santosh.srini...@gmail.com> wrote:

> Dear All,
>
> I am trying to create a empty structure that I want to fill gradually
> through the code.
> I want to use something like rbind to create the basic structure first.
>
> I am looking for a possibility to do an rbind where the columns names
> dont match fully (but the missing columns can be defaulted to either
> zero or n/a)  (my actual data has a lot of columns).
> Please see the data frames below
>
> I have the following data frame that is filled
> > dput(d)
> structure(list(type = structure(1:9, .Label = c("test1", "test2",
> "test3", "test4", "test5", "test6", "test7", "test8", "test9"
> ), class = "factor"), meter = c(37.25122438, 58.65845732, 63.47108421,
> 94.76085162, 13.75013867, 8.520664878, 79.74623167, 62.16109819,
> 20.47806657), degree = c(2.884440333, 74.94067898, 32.64251152,
> 83.24820274, 58.36084794, 12.64490368, 4.428796741, 32.12824213,
> 97.83710088)), .Names = c("type", "meter", "degree"), class =
> "data.frame", row.names = c(NA,
> -9L))
>
> To the above data I want to append the following data to create an
> empty structure for the new data
> > dput(dNewTests)
> structure(list(type = structure(1:9, .Label = c("test14", "test15",
> "test16", "test17", "test18", "test19", "test20", "test21", "test22"
> ), class = "factor")), .Names = "type", class = "data.frame", row.names =
> c(NA,
> -9L))
>
> rbind obviously throws the following error:
>
>  rbind(d, dNewTests)
> Error in rbind(deparse.level, ...) :
>  numbers of columns of arguments do not match
>
> Any way that I can default the missing columns to NA or some default
> value like zero?
>
> Also, I tried to go through the archives and tried merge_all. What am
> I doing wrong wih the code below ?
>
> require(reshape)
> merge_all(d, dNewTests,by="type")
>
> Error in fix.by(by.x, x) :
>  'by' must specify column(s) as numbers, names or logical
>
>
> Thanks,
> Santosh
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] assigning to list element within target environment

2011-03-17 Thread Hadley Wickham

On Thu, Mar 17, 2011 at 7:25 AM, Richard D. Morey  wrote:
> I would like to assign an value to an element of a list contained in an
> environment. The list will contain vectors and matrices. Here's a simple
> example:
>
> # create toy environment
> testEnv = new.env(parent = emptyenv())
>
> # create list that will be in the environment, then assign() it
> x = list(a=1,b=2)
> assign("xList",x,testEnv)
>
> # create new element, to be inserted into xList
> c = 5:7
>
> Now, what I'd like to do is something like this:
>
> assign("xList[[3]]",c,testEnv)

testEnv$x[[3]] <- c

?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why doesn't this work ?

2011-03-17 Thread Barry Rowlingson

On Thu, Mar 17, 2011 at 3:54 AM,   wrote:
> It doesn't work (in R) because it is not written in R.  It's written in some 
> other language that looks a bit like R.

 It parses in R, so I would say it was written in R.

 To paraphrase Obi-wan, it's just not the R you are looking for.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] changing the dimensions of a matrix in a real specific way

2011-03-17 Thread Sarah Goslee

matrix(a, ncol=3, nrow=4, byrow=TRUE)

or

> dim(a) <- c(3,4)
> a <- t(a)
> a
 [,1] [,2] [,3]
[1,]123
[2,]456
[3,]789
[4,]   10   11   12

Depending on the context of the problem.

Sarah

On Thu, Mar 17, 2011 at 7:59 AM, Bodnar Laszlo EB_HU
 wrote:
> Hi again,
>
> I'd like to ask you a question again.
>
> I have a matrix like this:
> a <-matrix(c(1,2,3,4,5,6,7,8,9,10,11,12))
> a
>
>      [,1]
>  [1,]    1
>  [2,]    2
>  [3,]    3
>  [4,]    4
>  [5,]    5
>  [6,]    6
>  [7,]    7
>  [8,]    8
>  [9,]    9
> [10,]   10
> [11,]   11
> [12,]   12
>
> Is there a proper way to change the dimensions of this matrix so that I'll 
> get this as a result:
>     [,1] [,2] [,3]
> [1,]   1    2    3
> [2,]   4    5    6
> [3,]   7    8    9
> [4,]  10   11   12
>
> Thank you very much and have a pleasant day,
> Laszlo
>

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Incorrect degrees of freedom in SEM model using lavaan

2011-03-17 Thread Andrew Miles

I have been trying to use lavaan (version 0.4-7) for a simple path model,
but the program seems to be computing far less degrees of freedom for my
model then it should have.  I have 7 variables, which should give (7)(8)/2 =
28 covariances, and hence 28 DF.  The model seems to only think I have 13
DF.  The code to reproduce the problem is below.  Have I done something
wrong, or is this something I should take to the developer?


library(lavaan)

myCov = matrix(c(24.40, 0, 0, 0, 0, 0, 0, .03, .03, 0, 0, 0, 0, 0, 6.75, -
.08, 519.38, 0, 0, 0, 0, .36, .01, 2.74, .18, 0, 0, 0, .51, .0, -.31, .02,
.2, .0, 0, -.17, .0, -1.6, -.04, .01, .25, 0, -.1, .02, -.03, .0, -.01, .01
, .02), nrow=7, byrow=TRUE, dimnames=list(c("Internet", "Stress3",
"HHincome", "Race", "Age", "Gender", "Stress1"), c("Internet", "Stress3",
"HHincome", "Race", "Age", "Gender", "Stress1")))


model = '

Internet ~ HHincome + Race + Age + Gender + Stress1

Stress3 ~ Internet + HHincome + Race + Age + Gender + Stress1

'


fit=sem(model, sample.nobs=161, sample.cov=myCov, mimic.Mplus=FALSE)


#check the number of parameters being estimated

inspect(fit, what="free")


#Note the DF for the Chi-square is 0, when it should be 28-13 = 15

summary(fit)


Andrew Miles

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Autocorrelation in linear models

2011-03-17 Thread Ben Bolker

Arni Magnusson  hafro.is> writes:

> 
> I have been reading about autocorrelation in linear models over the last 
> couple of days, and I have to say the more I read, the more confused I 
> get. Beyond confusion lies enlightenment, so I'm tempted to ask R-Help for 
> guidance.
> 
> Most authors are mainly worried about autocorrelation in the residuals, 
> but some authors are also worried about autocorrelation within Y and 
> within X vectors before any model is fitted. Would you test for 
> autocorrelation both in the data and in the residuals?

   My immediate reaction is that autocorrelation in the raw data
(marginal autocorrelation) is not relevant. (There are exceptions,
of course -- in many ecological systems the marginal autocorrelation
tells us something about the processes driving the system, so we
may want to quantify/estimate it -- but I wouldn't generally think
that *testing* it (e.g. trying to reject a null hypothesis of
ACF=0) makes sense.)

> 
> If we limit our worries to the residuals, it looks like we have a variety 
> of tests for lag=1:
> 
>stats::cor.test(residuals(fm)[-n], residuals(fm)[-1])
>stats::Box.test(residuals(fm))
>lmtest::dwtest(fm, alternative="two.sided")
>lmtest::bgtest(fm, type="F")

  Note that (I think) all of these tests are based on
lag-1 autocorrelation only (I see you mention this
later).  Have you looked at nlme:::ACF ?  It is possible
to get non-significant autocorrelation at lag 1 with sig.
autocorrelation at higher lags.

> 
> In my model, a simple lm(y~x1+x2) with n=20 annual measurements, I have 
> significant _positive_ autocorrelation within Y and within both X vectors, 
> but _negative_ autocorrelation in the residuals. 

  That's plausible. Again, I think the residual autocorrelation
is what you should worry about.

> The residual 
> autocorrelation is not quite significant, with the p-values
> 
>0.070
>0.064
>0.125
>0.077
> 
> from the tests above. I seem to remember some authors saying that the 
> Durbin-Watson test has less power than some alternative tests, as 
> reflected here. The difference in p-values is substantial,

  ?? I wouldn't necessarily say so -- I would guess you could get this
range of p-values from a single test statistic if you had
multiple simulated data sets from the same underlying model
and parameters ...  Have you tried running such simulations?

> so choosing 
> which test to use could in many cases make a big difference for the 
> subsequent analysis and conclusions. Most of them (cor.test, Box.test, 
> bgtest) can also test lags>1. Which test would you recommend? I imagine 
> the basic cor.test is somehow inappropriate for this; the other tests 
> wouldn't have been invented otherwise, right?

  I don't know the details (it's been a while since I did time
series analysis, and it wasn't in this particular vein.)

> The car::dwt(fm) has p-values fluctuating by a factor of 2, unless I run a 
> very long simulation, which results in a p-value similar to 
> lmtest::dwtest, at least in my case.
> 
> Finally, one question regarding remedies. If there was significant 
> _positive_ autocorrelation in the residuals, some authors suggest 
> remedying this by deflating the df (fewer effective df in the data) and 
> redo the t-tests of the regression coefficients, rejecting fewer null 
> hypotheses. Does that mean if the residuals are _negatively_ correlated 
> then I should inflate the df (more effective df in the data) and reject 
> more null hypotheses?

   My personal taste is that these df adjustments are bit cheesy.
Most of the time I would prefer to fit a model that incorporated
autocorrelation (i.e. nlme::gls(y~x1+x2,correlation=corAR1()) [or pick
another choice of time-series model from ?corClasses].

  More generally, this whole approach falls into the category of
"test for presence of XX; if XX is not statistically significant
then ignore it", which is worrisome (if your test for XX is very
powerful then you will be concerned about dealing with XX even when
its effect on your results would be trivial; if your test for XX
is weak or you have very little data then you won't detect
XX even when it is present).  I would say that if you're really
concerned about autocorrelation you should just automatically use
a modeling approach (see above) that incorporates it.

> 
> That's four question marks. I'd greatly appreciate guidance on any of 
> them.
> 
> Thanks in advance,
> 

  cheers
Ben

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bug in lattice auto.key argument

2011-03-17 Thread Deepayan Sarkar

On Wed, Mar 16, 2011 at 12:52 AM,   wrote:
> The Lattice auto.key argument has a bug in R.12.2.
>
> R version 2.12.2 (2011-02-25)
> Platform: i386-pc-mingw32/i386 (32-bit)
> 
> other attached packages:
> [1] lattice_0.19-17
>
> loaded via a namespace (and not attached):
> [1] grid_2.12.2
>
> If I set up my plot parameters as
>
> require(lattice)
> superpose.line.settings <- trellis.par.get("superpose.line")
> str(superpose.line.settings)
> superpose.line.settings$col <- 1
> superpose.line.settings$lty <- c(1,2,5)     *
> superpose.line.settings$lwd <- 2
> trellis.par.set("superpose.line",superpose.line.settings)
>
> and then set up my key list as
>
> my.key <- list(space="top", points=FALSE, lines=TRUE, columns=3)
>
> and then run xyplot with the argument auto.key=my.key,
> I do NOT get the proper legend at the top of the plot.
> Instead of the lines in the legend having the characteristics of
> lty=c(1,2,5), they have the characteristics of lwd=c(1,2,5).

I think Felix has fixed this bug a couple of days ago. Please check
the r-forge version from

https://r-forge.r-project.org/R/?group_id=638

There should be a new release sometime next week.

-Deepayan

>
> The auto.key argument works fine in R.12.1
>
> R version 2.12.1 (2010-12-16)
> Platform: i386-pc-mingw32/i386 (32-bit)
> .
> other attached packages:
> [1] lattice_0.19-13
>
> loaded via a namespace (and not attached):
> [1] grid_2.12.1
>
>
> Joe
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] setting up a R working group

2011-03-17 Thread Joanne Demmler


Dear R users,

we are currently trying to set up a R working group at Swansea University.
I would be very grateful to get some information and feedback from R 
users that have done something similar, in particular regarding:


- help with giving a general overview of what R is capable of
- help with what to present or teach (did you organise your own training 
programme or did you get someone from outside?)

- are there people out there who would like to present their work to us?

Any help will be highly appreciated!
Thanks Joanne

--
Joanne Demmler Ph.D.
Research Assistant
College of Medicine
Swansea University
Singleton Park
Swansea SA2 8PP
UK
tel:+44 (0)1792 295674
fax:+44 (0)1792 513430
email:  j.demm...@swansea.ac.uk
DECIPHer:   www.decipher.uk.net

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Using barplot() with zoo -- names.arg not permitted?

2011-03-17 Thread David Wolfskill

I've used barplot(), including the anmes.arg parameter, on data frames
successfully, but I'm even newer to using zoo than I am to R.  :-}

I am working on a functon that accepts a data frame ("df") as its
primary argument, extracts information from it to create a zoo, then
generates a plot based on that.

The data frame has a column called "time" which is a standard UNIX time
-- the number of seconds since 01 Jan 1970 (UTC), so I treat that as a
POSIXct.  (The types of the other columns in the data frame varies
greatly, and the sequence in which they occur is not expected to be
constant.  Thus, I find it easier to let R make a reasonable guess as to
the data type, and override special cases -- such as "time" -- on an
individual basis.)

The data frame contains certain columns whose values are counters, so I
generate a zoo of the diff():

  oid = "kern.cp_time_"
  pat = paste("^", oid, sep = "")
  class(df$time) <- "POSIXct"
  df_d <- diff(zoo(df[, grep(pat, names(df))], order.by = df$time))

I then save the start & end timestamps for future reference, and
generate another zoo, this one containing only the percentages from
df_d:

  cpu_states <- c("sys", "intr", "user", "nice", "idle")
  d_range <- range(index(df_d))
  df_pct <- sweep(df_d, 1, apply(df_d, 1, sum), "/")[, paste(oid, 
cpu_states[1:4], sep = "")] * 100

Well, that is, I save the percentages in which I'm interested.

So far, so good.  df_pct is a zoo (as expected, and the content looks
reasonable.

I then plot it, e.g.:

barplot(df_pct, border = NA, col = colvec, space = 0, main = title,
  sub = sub, ylab = "CPU %", ylim = c(0, 100), xlab = "Time")

But the X-axis labels show up as "large integers" -- the POSIXct values
are apparently treated as numeric quantities for this purpose.

And if I try

barplot(df_pct, border = NA, col = colvec, space = 0, main = title,
  sub = sub, ylab = "CPU %", ylim = c(0, 100), xlab = "Time",
  names.arg = index(df_pct))

I am told:

Warning messages:
1: In plot.window(xlim, ylim, log = log, ...) :
  "names" is not a graphical parameter
2: In axis(if (horiz) 2 else 1, at = at.l, labels = names.arg, lty = axis.lty,  
:
  "names" is not a graphical parameter
3: In title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) :
  "names" is not a graphical parameter
4: In axis(if (horiz) 1 else 2, cex.axis = cex.axis, ...) :
  "names" is not a graphical parameter

which I find a bit perplexing.

If I plot via:

barplot(df_pct, border = NA, col = col_vec, space = 0, main = title,
  sub = sub, ylab = "CPU %", ylim = c(0, 100), x lab = "Time",
  axisnames = FALSE)

that works OK, but then I'm unable to generate a human-readable set
of X-axis labels: At best, I'm able to get a single label at the origin.

Here's an excerpt from the generated df_d:
Browse[2]> df_d[1:10]
   kern.cp_time_idle kern.cp_time_intr kern.cp_time_nice
1297278017   275 0 0
1297278018   231 0 0
1297278019   266 1 0
1297278020   230 2 0
1297278021   191 0 0
1297278022   114 0 0
129727802330 0 0
1297278024 0 0 0
1297278025 0 0 0
1297278026 0 0 0
   kern.cp_time_sys kern.cp_time_user
12972780178 1
1297278018   1735
1297278019   10 6
1297278020   2824
1297278021   2172
1297278022   27   147
1297278023   43   219
1297278024   47   249
1297278025   41   259
1297278026815


Here's a excerpt from the generated df_pct:

Browse[2]> df_pct[1:10]
   kern.cp_time_sys kern.cp_time_intr kern.cp_time_user
1297278017 2.816901 0.000 0.3521127
1297278018 6.007067 0.00012.3674912
1297278019 3.533569 0.3533569 2.1201413
1297278020 9.859155 0.7042254 8.4507042
1297278021 7.394366 0.00025.3521127
1297278022 9.375000 0.00051.0416667
129727802314.726027 0.00075.000
129727802415.878378 0.00084.1216216
129727802513.67 0.00086.333
129727802634.782609 0.00065.2173913
   kern.cp_time_nice
1297278017 0
1297278018 0
1297278019 0
1297278020 0
1297278021 0
1297278022

Re: [R] Help- Fitting a Thin Plate Spline

2011-03-17 Thread Simon Wood


library(mgcv)

b <- gam(arcsine.success ~ s(date.num,clutch.size,k=50))
vis.gam(b,theta=30)

will fit a thin plate spline and plot it. k is an upper limit on the 
number of degrees of freedom for the TPS, but the actual degrees of 
freedom are chosen automatically (use the 'method' argument of gam to 
choose exactly how). Increase k if 50 is overly restrictive here.


gam will also let you fit binary/binomial data directly, so you could 
probably do away with the arcsine transformation (if I'm guessing right 
about what you are doing here).


There are also several other libraries that will let you fit thin plate 
splines: gss, for example.


Simon




On 14/03/11 09:55, kehoe wrote:

Hi Everyone,

I'm a pretty useless r-er but have data that SPSS etc doesn't like. I've
managed to do GLMs for my data, but now need to fit a thin plate spline for
my data (arcsine.success~date.num:clutch.size)
If anyone has a bit of spare time and could come up with a bit of code I'd
be very grateful- I just don't get R language!

Thanks

Rach

--
View this message in context: 
http://r.789695.n4.nabble.com/Help-Fitting-a-Thin-Plate-Spline-tp3353485p3353485.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
---
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603   http://people.bath.ac.uk/sw283

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to make sure R's g++ compiler uses certain C++ flags when making a package

2011-03-17 Thread Mike Marchywka





> Date: Wed, 16 Mar 2011 13:50:37 -0700
> From: solomon.mess...@gmail.com
> To: r-help@r-project.org
> Subject: Re: [R] How to make sure R's g++ compiler uses certain C++ flags 
> when making a package
>
> Looks like the problem may be that R is automatically passing this flag to
> the g++ compiler: "-arch x86_64" which appears to be causing trouble for
> opencv. Does anyone know how to suppress this flag?

Are you builing a 32 or 64 bit app? I haven't looked but offhand that may be
what this flag is for ( you can man g++ and check I suppose ) 

I was going to reply after doing some checking as I have never actually made
a complete R package but I did make a test one and verified I could call
the c++ code from R. I seem to remember that it was easy to just invoke g++
explicitly and R would use whatever binaries you happen to have left there.
That is, if you just compile the code for your R modules yourself there
is no real reason to worry about how R does it. Mixing some flags will
prevent linking or, worse, create mysterious run time errors.

Also, IIRC, your parameters were the output of foreign scripts ( opencv 
apparently)
and not too helpful here ( and in any case you should echo these to see what 
they
are if they are an issue). 




>
> --

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lmm WITHOUT random factor (lme4)

2011-03-17 Thread Mark Difford

On Mar 17, 2011; 11:43am Baugh wrote:

>> Question: can I simply substitute a dummy var (e.g. populated by zeros)
>> for "ID" to run the model 
>> without the random factor? when I try this R returns values that seem
>> reasonable, but I want to be sure 
>> this is appropriate.

If you can fit the model using lme (and it looks like you easily can) then
another check would be:

## Compare models with and without random effects
fm <- lm(Reaction ~ Days, sleepstudy)
fm1 <- lme(Reaction ~ Days, random= ~1|Subject, sleepstudy)
anova(fm1, fm) ## lme-fitted model must come first

Regards, Mark.

--
View this message in context: 
http://r.789695.n4.nabble.com/lmm-WITHOUT-random-factor-lme4-tp3384054p3384072.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Strange R squared, possible error

2011-03-17 Thread derek

Exuse me, I don't claim R^2 can't be negative. What I say if I get R^2
negative then the data are useless.
I know, that what Thomas said is true in general case. But in my special
case of data, using nonzero intercept is nonsense, and to get R^2 less than
0.985 is considered poor job (standard R^2>0.995). (R^2 given by R^2 = 1 -
Sum(R[i]^2) / Sum((y[i])^2) )

Because lm() uses two differrent formulas for computing R^2,
it is confusing to get R^2 closer to 1 when linear model with zero intercept
y=a*x (a = slope) is used, rather than in case with model y=a*x+b (a=slope,
b= nonzero intercept).

I think R^2 is only measure of good fit for least squares optimization and
it doesn't matter which formula is used: (R^2 = 1 - Sum(R[i]^2) /
Sum((y[i])^2) or R^2 = 1 - Sum(R[i]^2) / Sum((y[i])^2-y*)), but using both
is confusing.

So I would like to know why two different formulas for R^2 are used? 



--
View this message in context: 
http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3383992.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lmm WITHOUT random factor (lme4)

2011-03-17 Thread Baugh

LMM without Random effect:

I want to run an LMM both with and without the random factor (ID). And then
extract the log-lik values from the two models in order to generate a
p-value.

with random factor as:lmer(y~x+(1|ID),data)

Question: can I simply substitute a dummy var (e.g. populated by zeros) for
"ID" to run the model without the random factor? when I try this R returns
values that seem reasonable, but I want to be sure this is appropriate.

>From other inquires in the forum it seems that substituting ("ID-1") or some
variant like that, does not work.

Thanks for your thoughts. 
Baugh


--
View this message in context: 
http://r.789695.n4.nabble.com/lmm-WITHOUT-random-factor-lme4-tp3384054p3384054.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Strange R squared, possible error

2011-03-17 Thread Hadley Wickham

> 2) I don't want to fit data with linear model of zero intercept.
> 3) I dont know if I understand correctly. Im 100% sure the model for my data
> should have zero intercept.
> The only coordinate which Im 100% sure is correct. If I had measured quality
> Y of a same sample X0 number of times I would get E(Y(X0))=0.

Are points 2) and 3) not contradictory?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cv.lm syntax error

2011-03-17 Thread agent dunham

I rewrite my previous comand, CVlm works now. It was related to having the
same names in the df than in the formula included at CVlm . 

Now I'd like to know if the seed is of any special importance, or I can type
just seed=29 like in the example. 

Thanks in advance, u...@host.com

--
View this message in context: 
http://r.789695.n4.nabble.com/cv-lm-syntax-error-tp3334889p3384198.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Segmentation fault when using "plot" function

2011-03-17 Thread cchace


I frequently get a segmentation fault error when using the "plot" command.
It happens about half the time. 

We are running an old version of R (R version 2.8.0 (2008-10-20) on Linux.

I did a quick search for this problem and didn't find anything. Apologies if
I missed it.


 *** Process received signal ***
 Signal: Segmentation fault (11)
 Signal code: Address not mapped (1)
 Failing at address: 0x231e18f22
 [ 0] /lib64/libc.so.6 [0x3c84a300a0]
 [ 1] /lib64/libc.so.6(strlen+0x10) [0x3c84a76090]
 [ 2] /usr/lib64/libX11.so.6 [0x3c8765d101]
 [ 3] /usr/lib64/libX11.so.6(_XlcCreateLC+0x46) [0x3c8765e5d6]
 [ 4] /usr/lib64/libX11.so.6(_XlcUtf8Loader+0x10) [0x3c8767ebe0]
 [ 5] /usr/lib64/libX11.so.6(_XOpenLC+0x104) [0x3c87665674]
 [ 6] /usr/lib64/libX11.so.6(_XlcCurrentLC+0x8) [0x3c87665768]
 [ 7] /usr/lib64/libX11.so.6(XSupportsLocale+0x9) [0x3c87665b59]
 [ 8] /usr/lib64/R/modules//R_X11.so(X11_Open+0x51) [0x2aaab8a32581]
 [ 9] /usr/lib64/R/modules//R_X11.so(X11DeviceDriver+0x19a) [0x2aaab8a336aa]
 [10] /usr/lib64/R/modules//R_X11.so [0x2aaab8a33c30]
 [11] /usr/lib64/R/lib/libR.so [0x3659cff75e]
 [12] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
 [13] /usr/lib64/R/lib/libR.so [0x3659ccd6a2]
 [14] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
 [15] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x291) [0x3659ccfa41]
 [16] /usr/lib64/R/lib/libR.so(Rf_eval+0x2fb) [0x3659ccc1ab]
 [17] /usr/lib64/R/lib/libR.so(GEcurrentDevice+0x158) [0x3659c95a88]
 [18] /usr/lib64/R/lib/libR.so [0x3659d25c99]
 [19] /usr/lib64/R/lib/libR.so [0x3659cff75e]
 [20] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
 [21] /usr/lib64/R/lib/libR.so [0x3659ccd6a2]
 [22] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
 [23] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x291) [0x3659ccfa41]
 [24] /usr/lib64/R/lib/libR.so(Rf_eval+0x2fb) [0x3659ccc1ab]
 [25] /usr/lib64/R/lib/libR.so [0x3659ccd6a2]
 [26] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
 [27] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x291) [0x3659ccfa41]
 [28] /usr/lib64/R/lib/libR.so(Rf_usemethod+0x3fa) [0x3659d019da]
 [29] /usr/lib64/R/lib/libR.so [0x3659d0203d]
 *** End of error message ***

Process R segmentation fault at Thu Mar 17 09:33:59 2011

--
View this message in context: 
http://r.789695.n4.nabble.com/Segmentation-fault-when-using-plot-function-tp3384603p3384603.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] setting up a R working group

2011-03-17 Thread Rainer M Krug

On Thu, Mar 17, 2011 at 2:42 PM, Joanne Demmler  wrote:
> Dear R users,

Hi

>
> we are currently trying to set up a R working group at Swansea University.
> I would be very grateful to get some information and feedback from R users
> that have done something similar, in particular regarding:

We started one,but it fell apart after I left the University - first
lesson: you need a critical mass and at least two people who are
prepared to drive it.

>
> - help with giving a general overview of what R is capable of
> - help with what to present or teach (did you organise your own training
> programme or did you get someone from outside?)

We managed with organising it internally, but there were topics, for
which external lecturers were very useful - it depends what expertise
your R userbase has. It is always better to do the training
internally, as then there is always the possibility to go back to the
person who taught the course.

> - are there people out there who would like to present their work to us?

I guess that depends if you are able / willing to at least are
prepared to provide travel expenses and what you expect?

>
> Any help will be highly appreciated!

Welcome - and please fel free to ask more detailed questions. By the
way - there is also a sig R-teach which might also be of help.

Cheers,

Rainer

> Thanks Joanne
>
> --
> Joanne Demmler Ph.D.
> Research Assistant
> College of Medicine
> Swansea University
> Singleton Park
> Swansea SA2 8PP
> UK
> tel:            +44 (0)1792 295674
> fax:            +44 (0)1792 513430
> email:          j.demm...@swansea.ac.uk
> DECIPHer:       www.decipher.uk.net
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
NEW GERMAN FAX NUMBER!!!

Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

Cell:           +27 - (0)83 9479 042
Fax:            +27 - (0)86 516 2782
Fax:            +49 - (0)321 2125 2244
email:          rai...@krugs.de

Skype:          RMkrug
Google:         r.m.k...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using barplot() with zoo -- names.arg not permitted?

2011-03-17 Thread Gabor Grothendieck

On Thu, Mar 17, 2011 at 9:38 AM, David Wolfskill  wrote:
> I've used barplot(), including the anmes.arg parameter, on data frames
> successfully, but I'm even newer to using zoo than I am to R.  :-}
>
> I am working on a functon that accepts a data frame ("df") as its
> primary argument, extracts information from it to create a zoo, then
> generates a plot based on that.
>
> The data frame has a column called "time" which is a standard UNIX time
> -- the number of seconds since 01 Jan 1970 (UTC), so I treat that as a
> POSIXct.  (The types of the other columns in the data frame varies
> greatly, and the sequence in which they occur is not expected to be
> constant.  Thus, I find it easier to let R make a reasonable guess as to
> the data type, and override special cases -- such as "time" -- on an
> individual basis.)
>
> The data frame contains certain columns whose values are counters, so I
> generate a zoo of the diff():
>
>  oid = "kern.cp_time_"
>  pat = paste("^", oid, sep = "")
>  class(df$time) <- "POSIXct"
>  df_d <- diff(zoo(df[, grep(pat, names(df))], order.by = df$time))
>
> I then save the start & end timestamps for future reference, and
> generate another zoo, this one containing only the percentages from
> df_d:
>
>  cpu_states <- c("sys", "intr", "user", "nice", "idle")
>  d_range <- range(index(df_d))
>  df_pct <- sweep(df_d, 1, apply(df_d, 1, sum), "/")[, paste(oid, 
> cpu_states[1:4], sep = "")] * 100
>
> Well, that is, I save the percentages in which I'm interested.
>
> So far, so good.  df_pct is a zoo (as expected, and the content looks
> reasonable.
>
> I then plot it, e.g.:
>
> barplot(df_pct, border = NA, col = colvec, space = 0, main = title,
>  sub = sub, ylab = "CPU %", ylim = c(0, 100), xlab = "Time")
>
> But the X-axis labels show up as "large integers" -- the POSIXct values
> are apparently treated as numeric quantities for this purpose.
>
> And if I try
>
> barplot(df_pct, border = NA, col = colvec, space = 0, main = title,
>  sub = sub, ylab = "CPU %", ylim = c(0, 100), xlab = "Time",
>  names.arg = index(df_pct))
>
> I am told:
>
> Warning messages:
> 1: In plot.window(xlim, ylim, log = log, ...) :
>  "names" is not a graphical parameter
> 2: In axis(if (horiz) 2 else 1, at = at.l, labels = names.arg, lty = 
> axis.lty,  :
>  "names" is not a graphical parameter
> 3: In title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) :
>  "names" is not a graphical parameter
> 4: In axis(if (horiz) 1 else 2, cex.axis = cex.axis, ...) :
>  "names" is not a graphical parameter
>
> which I find a bit perplexing.
>
> If I plot via:
>
> barplot(df_pct, border = NA, col = col_vec, space = 0, main = title,
>  sub = sub, ylab = "CPU %", ylim = c(0, 100), x lab = "Time",
>  axisnames = FALSE)
>
> that works OK, but then I'm unable to generate a human-readable set
> of X-axis labels: At best, I'm able to get a single label at the origin.
>
> Here's an excerpt from the generated df_d:
> Browse[2]> df_d[1:10]
>           kern.cp_time_idle kern.cp_time_intr kern.cp_time_nice
> 1297278017               275                 0                 0
> 1297278018               231                 0                 0
> 1297278019               266                 1                 0
> 1297278020               230                 2                 0
> 1297278021               191                 0                 0
> 1297278022               114                 0                 0
> 1297278023                30                 0                 0
> 1297278024                 0                 0                 0
> 1297278025                 0                 0                 0
> 1297278026                 0                 0                 0
>           kern.cp_time_sys kern.cp_time_user
> 1297278017                8                 1
> 1297278018               17                35
> 1297278019               10                 6
> 1297278020               28                24
> 1297278021               21                72
> 1297278022               27               147
> 1297278023               43               219
> 1297278024               47               249
> 1297278025               41               259
> 1297278026                8                15
>
>
> Here's a excerpt from the generated df_pct:
>
> Browse[2]> df_pct[1:10]
>           kern.cp_time_sys kern.cp_time_intr kern.cp_time_user
> 1297278017         2.816901         0.000         0.3521127
> 1297278018         6.007067         0.000        12.3674912
> 1297278019         3.533569         0.3533569         2.1201413
> 1297278020         9.859155         0.7042254         8.4507042
> 1297278021         7.394366         0.000        25.3521127
> 1297278022         9.375000         0.000        51.0416667
> 1297278023        14.726027         0.000        75.000
> 1297278024        15.878378         0.000        84.1216216
> 1297278025        13.67         0.000        86.333
> 1297278026

Re: [R] lmm WITHOUT random factor (lme4)

2011-03-17 Thread ONKELINX, Thierry

Dear Mark,

You cannot compare lm() with lme() because the likelihoods are not the same. 
Use gls() instead of lm()

library(nlme)
data("sleepstudy", package = "lme4")
fm <- lm(Reaction ~ Days, sleepstudy)
fm0 <- gls(Reaction ~ Days, sleepstudy)
logLik(fm)
logLik(fm0)

fm1 <- lme(Reaction ~ Days, random= ~1|Subject, sleepstudy)
anova(fm0, fm1)

Best regards,

Thierry


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey
  

> -Oorspronkelijk bericht-
> Van: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] Namens Mark Difford
> Verzonden: donderdag 17 maart 2011 10:55
> Aan: r-help@r-project.org
> Onderwerp: Re: [R] lmm WITHOUT random factor (lme4)
> 
> On Mar 17, 2011; 11:43am Baugh wrote:
> 
> >> Question: can I simply substitute a dummy var (e.g. populated by 
> >> zeros) for "ID" to run the model without the random factor? when I 
> >> try this R returns values that seem reasonable, but I want 
> to be sure 
> >> this is appropriate.
> 
> If you can fit the model using lme (and it looks like you 
> easily can) then another check would be:
> 
> ## Compare models with and without random effects fm <- 
> lm(Reaction ~ Days, sleepstudy)
> fm1 <- lme(Reaction ~ Days, random= ~1|Subject, sleepstudy)
> anova(fm1, fm) ## lme-fitted model must come first
> 
> Regards, Mark.
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/lmm-WITHOUT-random-factor-lme4-t
> p3384054p3384072.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Strange R squared, possible error

2011-03-17 Thread Peter Ehlers


On 2011-03-17 02:08, derek wrote:

Exuse me, I don't claim R^2 can't be negative. What I say if I get R^2
negative then the data are useless.
I know, that what Thomas said is true in general case. But in my special
case of data, using nonzero intercept is nonsense, and to get R^2 less than
0.985 is considered poor job (standard R^2>0.995). (R^2 given by R^2 = 1 -
Sum(R[i]^2) / Sum((y[i])^2) )

Because lm() uses two differrent formulas for computing R^2,
it is confusing to get R^2 closer to 1 when linear model with zero intercept
y=a*x (a = slope) is used, rather than in case with model y=a*x+b (a=slope,
b= nonzero intercept).

I think R^2 is only measure of good fit for least squares optimization and
it doesn't matter which formula is used: (R^2 = 1 - Sum(R[i]^2) /
Sum((y[i])^2) or R^2 = 1 - Sum(R[i]^2) / Sum((y[i])^2-y*)), but using both
is confusing.

So I would like to know why two different formulas for R^2 are used?



The beauty of R is that you can study the code and if you
don't like it, you can just write your own. I doubt that
anyone here would be offended if you used whatever definition
of R^2 you like best.

Thomas has explained clearly what R does and why. If you
find R's usage confusing then your understanding of
regression is lacking the depth required to apply it.
You would probably benefit considerably from consulting
a text or two on regression.

Peter Ehlers




--
View this message in context: 
http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3383992.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Segmentation fault when using "plot" function

2011-03-17 Thread Henrik Bengtsson

On Thu, Mar 17, 2011 at 6:58 AM, cchace  wrote:
>
> I frequently get a segmentation fault error when using the "plot" command.
> It happens about half the time.
>
> We are running an old version of R (R version 2.8.0 (2008-10-20) on Linux.
>
> I did a quick search for this problem and didn't find anything. Apologies if
> I missed it.

Eventually some kind person with lots of extra time on his/her hand
will install R v2.8.0 on their machine, try to reproduce the problem,
troubleshoot it, fix the bug, and provide you with a customized
retroactive patched R v2.8.0 that you could install.  While waiting
for that to happen, you may want to try out R v2.12.2.

/Henrik

>
>
>  *** Process received signal ***
>  Signal: Segmentation fault (11)
>  Signal code: Address not mapped (1)
>  Failing at address: 0x231e18f22
>  [ 0] /lib64/libc.so.6 [0x3c84a300a0]
>  [ 1] /lib64/libc.so.6(strlen+0x10) [0x3c84a76090]
>  [ 2] /usr/lib64/libX11.so.6 [0x3c8765d101]
>  [ 3] /usr/lib64/libX11.so.6(_XlcCreateLC+0x46) [0x3c8765e5d6]
>  [ 4] /usr/lib64/libX11.so.6(_XlcUtf8Loader+0x10) [0x3c8767ebe0]
>  [ 5] /usr/lib64/libX11.so.6(_XOpenLC+0x104) [0x3c87665674]
>  [ 6] /usr/lib64/libX11.so.6(_XlcCurrentLC+0x8) [0x3c87665768]
>  [ 7] /usr/lib64/libX11.so.6(XSupportsLocale+0x9) [0x3c87665b59]
>  [ 8] /usr/lib64/R/modules//R_X11.so(X11_Open+0x51) [0x2aaab8a32581]
>  [ 9] /usr/lib64/R/modules//R_X11.so(X11DeviceDriver+0x19a) [0x2aaab8a336aa]
>  [10] /usr/lib64/R/modules//R_X11.so [0x2aaab8a33c30]
>  [11] /usr/lib64/R/lib/libR.so [0x3659cff75e]
>  [12] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
>  [13] /usr/lib64/R/lib/libR.so [0x3659ccd6a2]
>  [14] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
>  [15] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x291) [0x3659ccfa41]
>  [16] /usr/lib64/R/lib/libR.so(Rf_eval+0x2fb) [0x3659ccc1ab]
>  [17] /usr/lib64/R/lib/libR.so(GEcurrentDevice+0x158) [0x3659c95a88]
>  [18] /usr/lib64/R/lib/libR.so [0x3659d25c99]
>  [19] /usr/lib64/R/lib/libR.so [0x3659cff75e]
>  [20] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
>  [21] /usr/lib64/R/lib/libR.so [0x3659ccd6a2]
>  [22] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
>  [23] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x291) [0x3659ccfa41]
>  [24] /usr/lib64/R/lib/libR.so(Rf_eval+0x2fb) [0x3659ccc1ab]
>  [25] /usr/lib64/R/lib/libR.so [0x3659ccd6a2]
>  [26] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
>  [27] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x291) [0x3659ccfa41]
>  [28] /usr/lib64/R/lib/libR.so(Rf_usemethod+0x3fa) [0x3659d019da]
>  [29] /usr/lib64/R/lib/libR.so [0x3659d0203d]
>  *** End of error message ***
>
> Process R segmentation fault at Thu Mar 17 09:33:59 2011
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Segmentation-fault-when-using-plot-function-tp3384603p3384603.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] possible problem with "endpoints"?

2011-03-17 Thread Joshua Ulrich

Hi Erin,

On Thu, Mar 17, 2011 at 1:27 AM, Erin Hodgess  wrote:
> Dear R People:
>
> Hello again!
>
> I found something unusual in the behavior of the "endpoints" function
> from the xts package:
>
>> x1 <- ts(1:24,start=2008,freq=12)
>> dat <- seq(as.Date("2008/01/01"),length=24,by="months")
>> library(zoo);library(xts)
>> x2 <- zoo(1:24,order=dat)
>> #Here is the surprise:
>> endpoints(as.xts(x1),'quarters')
>  [1]  0  1  4  7 10 13 16 19 22 24
>> endpoints(as.xts(x2),'quarters')
> [1]  0  3  6  9 12 15 18 21 24
>
>
> Shouldn't the endpoints be the same, please?  What am I doing wrong, please?
>
Looks like a timezone issue.  I'll investigate further.

> endpoints(as.xts(x1,tz="GMT"),'quarters')
[1]  0  3  6  9 12 15 18 21 24

> Thanks,
> Erin
>
>
> --
> Erin Hodgess
> Associate Professor
> Department of Computer and Mathematical Sciences
> University of Houston - Downtown
> mailto: erinm.hodg...@gmail.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Joshua Ulrich  |  FOSS Trading: www.fosstrading.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R² for non-linear model

2011-03-17 Thread Kjetil Halvorsen

see inline.

On Thu, Mar 17, 2011 at 4:58 AM, Rubén Roa  wrote:
> Hi Alexx,
>
> I don't see any problem in comparing models based on different distributions 
> for the same data using the AIC, as long as they have a different number of 
> parameters and all the constants are included.
> For example, you can compare distribution mixture models with different 
> number of components using the AIC.
> This is one example:
> Roa-Ureta. 2010. A Likelihood-Based Model of Fish Growth With Multiple Length 
> Frequency Data. Journal of Biological, Agricultural and Environmental 
> Statistics 15:416-429.
> Here is another example:
> www.education.umd.edu/EDMS/fac/Dayton/PCIC_JMASM.pdf
> Prof. Dayton writes above that one advantage of AIC over hypothesis testing 
> is:
> "(d) Considerations related to underlying distributions   for   random   
> variables   can   be
> incorporated  into  the  decision-making  process rather than being treated 
> as an assumption whose
> robustness  must  be  considered  (e.g.,  models based  on  normal  densities 
>  and  on  log-normal
> densities can be compared)."

My  reading of this is that AIC can be used to compare models with
densities relative to the same
dominating measure.

Kjetil

> Last, if you read Akaike's theorem you will see there is nothing precluding 
> comparing models built on different distributional models. Here it is:
> " the expected (over the sample space and the space of parameter estimates) 
> maximum log-likelihood of some data on a working model overshoots the 
> expected (over the sample space only) maximum log-likelihood of the data 
> under the true model that
> generated the data by exactly the number of  parameters in the working model."
> A remarkable result.
>
> Rubén
>
> -Original Message-
> From: r-help-boun...@r-project.org on behalf of Alexx Hardt
> Sent: Wed 3/16/2011 7:42 PM
> To: r-help@r-project.org
> Subject: Re: [R] R² for non-linear model
>
> Am 16.03.2011 19:34, schrieb Anna Gretschel:
>> Am 16.03.2011 19:21, schrieb Alexx Hardt:
>>> And to be on-topic: Anna, as far as I know anova's are only useful to
>>> compare a submodel (e.g. with one less regressor) to another model.
>>>
>> thanks! i don't get it either what they mean by fortune...
>
> It's an R-package (and a pdf [1]) with collected quotes from the mailing
> list.
> Be careful with the suggestion to use AIC. If you wanted to compare two
> models using AICs, you need the same distribution (that is,
> Verteilungsannahme) in both models.
> To my knowledge, there is no way to "compare" a gaussian model to an
> exponential one (except common sense), but my knowledge is very limited.
>
> [1] http://cran.r-project.org/web/packages/fortunes/vignettes/fortunes.pdf
>
> --
> alexx@alexx-fett:~$ vi .emacs
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Incorrect degrees of freedom in SEM model using lavaan

2011-03-17 Thread Gustavo Carvalho

Your model is saturated.

I think lavaan calculates the number of degrees of freedom this way:

DF = n*(n + 1)/2 - t - n.fix*(n.fix + 1)/2

n = number of variables
t = number of free parameters
n.fix = number of fixed exogenous variables

So, if you fix the exogenous variables, as in mimic = "Mplus":

DF = 28 - 13 - 15 = 0

If you don't, as in mimic = "EQS":

DF = 28 - 28 - 0 = 0

You are probably not considering the variances and covariances of the
exogenous variables to arrive at 15 degrees of freedom:

model = '

Internet + Stress3 ~ HHincome + Race + Age + Gender + Stress1

Stress3 ~ Internet

HHincome ~~ 0*Race + 0*Age + 0*Gender + 0*Stress1
Race ~~  0*Age + 0*Gender + 0*Stress1
Age ~~ 0*Gender + 0*Stress1
Gender ~~  0*Stress1

HHincome ~~ 1*HHincome
Race ~~ 1*Race
Age ~~ 1*Age
Gender ~~ 1*Gender
Stress1 ~~ 1*Stress1
'

fit = lavaan(model, sample.nobs=161, sample.cov=myCov, int.ov.free =
T, int.lv.free = F, auto.fix.first = T, auto.fix.single = T,
auto.resid.var = T, auto.cov.lv.x = T, auto.cov.y = T,
mimic = "EQS")
summary(fit)

Cheers,

Gustavo.



On Thu, Mar 17, 2011 at 9:50 AM, Andrew Miles  wrote:
> I have been trying to use lavaan (version 0.4-7) for a simple path model,
> but the program seems to be computing far less degrees of freedom for my
> model then it should have.  I have 7 variables, which should give (7)(8)/2 =
> 28 covariances, and hence 28 DF.  The model seems to only think I have 13
> DF.  The code to reproduce the problem is below.  Have I done something
> wrong, or is this something I should take to the developer?
>
>
> library(lavaan)
>
> myCov = matrix(c(24.40, 0, 0, 0, 0, 0, 0, .03, .03, 0, 0, 0, 0, 0, 6.75, -
> .08, 519.38, 0, 0, 0, 0, .36, .01, 2.74, .18, 0, 0, 0, .51, .0, -.31, .02,
> .2, .0, 0, -.17, .0, -1.6, -.04, .01, .25, 0, -.1, .02, -.03, .0, -.01, .01
> , .02), nrow=7, byrow=TRUE, dimnames=list(c("Internet", "Stress3",
> "HHincome", "Race", "Age", "Gender", "Stress1"), c("Internet", "Stress3",
> "HHincome", "Race", "Age", "Gender", "Stress1")))
>
>
> model = '
>
> Internet ~ HHincome + Race + Age + Gender + Stress1
>
> Stress3 ~ Internet + HHincome + Race + Age + Gender + Stress1
>
> '
>
>
> fit=sem(model, sample.nobs=161, sample.cov=myCov, mimic.Mplus=FALSE)
>
>
> #check the number of parameters being estimated
>
> inspect(fit, what="free")
>
>
> #Note the DF for the Chi-square is 0, when it should be 28-13 = 15
>
> summary(fit)
>
>
> Andrew Miles
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Incorrect degrees of freedom in SEM model using lavaan

2011-03-17 Thread Mike Cheung

Dear Andrew,

The reported df in lavaan is 0 which is correct. It is because this
path model is saturated. "28" is not the df, it is the no. of pieces
of information. The no. of parameter estimates is also 28. Thus, the
df is 0.

However, you are correct that there are only 13, not 28, free
parameters reported by lavaan. It is because "fixed.x=TRUE" was
implicitly used in your example (see ?sem). Since HHincome + Race +
Age + Gender + Stress1 are predictors, they are fixed at their sample
values when "fixed.x=TRUE" was used. And their covariance elements are
not involved in calculating the df.

If you want to mirror the behaviors in other SEM packages, you may try:

fit2 <- sem(model, sample.nobs=161, sample.cov=myCov, fixed.x=FALSE)
inspect(fit2, what="free")
summary(fit2)

Hope it helps.
Mike
-- 
-
 Mike W.L. Cheung               Phone: (65) 6516-3702
 Department of Psychology       Fax:   (65) 6773-1843
 National University of Singapore
 http://courses.nus.edu.sg/course/psycwlm/internet/
-

On Thu, Mar 17, 2011 at 8:50 PM, Andrew Miles  wrote:
> I have been trying to use lavaan (version 0.4-7) for a simple path model,
> but the program seems to be computing far less degrees of freedom for my
> model then it should have.  I have 7 variables, which should give (7)(8)/2 =
> 28 covariances, and hence 28 DF.  The model seems to only think I have 13
> DF.  The code to reproduce the problem is below.  Have I done something
> wrong, or is this something I should take to the developer?
>
>
> library(lavaan)
>
> myCov = matrix(c(24.40, 0, 0, 0, 0, 0, 0, .03, .03, 0, 0, 0, 0, 0, 6.75, -
> .08, 519.38, 0, 0, 0, 0, .36, .01, 2.74, .18, 0, 0, 0, .51, .0, -.31, .02,
> .2, .0, 0, -.17, .0, -1.6, -.04, .01, .25, 0, -.1, .02, -.03, .0, -.01, .01
> , .02), nrow=7, byrow=TRUE, dimnames=list(c("Internet", "Stress3",
> "HHincome", "Race", "Age", "Gender", "Stress1"), c("Internet", "Stress3",
> "HHincome", "Race", "Age", "Gender", "Stress1")))
>
>
> model = '
>
> Internet ~ HHincome + Race + Age + Gender + Stress1
>
> Stress3 ~ Internet + HHincome + Race + Age + Gender + Stress1
>
> '
>
>
> fit=sem(model, sample.nobs=161, sample.cov=myCov, mimic.Mplus=FALSE)
>
>
> #check the number of parameters being estimated
>
> inspect(fit, what="free")
>
>
> #Note the DF for the Chi-square is 0, when it should be 28-13 = 15
>
> summary(fit)
>
>
> Andrew Miles
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] setting up a R working group

2011-03-17 Thread Spencer Graves





  1.  There is a CRAN mirror at Bristol (roughly 100km from 
Swansea?).  They must have (or have had) some very active R users there.



  2.  Sundar Dorai-Raj and I developed a local R Archive Network 
and subversion repository internal to a company where we used to work.  
I believe that made it easier for us to collaborate and to share our 
results with others.  (See "Creating R Packages, Using CRAN, R-Forge, 
And Local R Archive Networks And Subversion (SVN) Repositories", 
available from CRAN -> Documentation:  Contributed.)  This is not the 
first thing you should do, but I believe it can help build local 
collaboration and awareness of R.



  Hope this helps.
  Spencer


On 3/17/2011 6:42 AM, Joanne Demmler wrote:

Dear R users,

we are currently trying to set up a R working group at Swansea 
University.
I would be very grateful to get some information and feedback from R 
users that have done something similar, in particular regarding:


- help with giving a general overview of what R is capable of
- help with what to present or teach (did you organise your own 
training programme or did you get someone from outside?)

- are there people out there who would like to present their work to us?

Any help will be highly appreciated!
Thanks Joanne



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need to abstract changing name of column within loop

2011-03-17 Thread Joshua Ulrich

On Wed, Mar 16, 2011 at 6:58 PM, jctoll  wrote:
> Hi,
>
> I'm struggling to figure out the way to change the name of a column
> from within a loop.  The problem is I can't refer to the object by its
> actual variable name, since that will change each time through the
> loop.  My xts object is A.
>
>> head(A)
>           A.Open A.High A.Low A.Close A.Volume A.Adjusted A.Adjusted.1
> 2007-01-03  34.99  35.48 34.05   34.30  2574600      34.30  1186780
> 2007-01-04  34.30  34.60 33.46   34.41  2073700      34.41  1190586
> 2007-01-05  34.30  34.40 34.00   34.09  2676600      34.09  1179514
> 2007-01-08  33.98  34.08 33.68   33.97  1557200      33.97  1175362
> 2007-01-09  34.08  34.32 33.63   34.01  1386200      34.01  1176746
> 2007-01-10  34.04  34.04 33.37   33.70  2157400      33.70  1166020
>
> It's column names are:
>> colnames(A)
> [1] "A.Open"       "A.High"       "A.Low"        "A.Close"
> "A.Volume"     "A.Adjusted"   "A.Adjusted.1"
>
> I want to change the 7th column name:
>> colnames(A)[7]
> [1] "A.Adjusted.1"
>
> I need to do that through a reference to i:
>> i
> [1] "A"
>
> This works:
>> colnames(get(i))[7]
> [1] "A.Adjusted.1"
>
> And this is what I want to change the column name to:
>> paste(i, ".MarketCap", sep = "")
> [1] "A.MarketCap"
>
> But how do I make the assignment?  This clearly doesn't work:
>
>> colnames(get(i))[7] <-  paste(i, ".MarketCap", sep = "")
> Error in colnames(get(i))[7] <- paste(i, ".MarketCap", sep = "") :
>  could not find function "get<-"
>
> Nor does this (it creates a new object "A.Adjusted.1" with a value of
> "A.MarketCap") :
>
> assign(colnames(get(i))[7], paste(i, ".MarketCap", sep = ""))
>
> How can I change the name of that column within my big loop?  Any
> ideas?  Thanks!
>
I usually make a copy of the object, change it, then overwrite the original:

tmp <- get(i)
colnames(tmp)[7] <- "foo"
assign(i,tmp)

Hope that helps.

> Best regards,
>
>
> James
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Joshua Ulrich  |  FOSS Trading: www.fosstrading.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Strange R squared, possible error

2011-03-17 Thread Gabor Grothendieck

On Wed, Mar 16, 2011 at 3:49 PM, derek  wrote:
> k=lm(y~x)
> summary(k)
> returns R^2=0.9994
>
> lm(y~x) is supposed to find coef. a anb b in y=a*x+b
>
> l=lm(y~x+0)
> summary(l)
> returns R^2=0.9998
> lm(y~x+0) is supposed to find coef. a in y=a*x+b while setting b=0
>
> The question is why do I get better R^2, when it should be otherwise?
>
> Im sorry to use the word "MS exel" here, but I verified it in exel and it
> gives:
> R^2=0.9994 when y=a*x+b is used
> R^2=0.99938 when y=a*x+0 is used
>

The idea is that if you have a positive quantity that can be broken
down into two nonnegative quantities: X = X1 + X2 then it makes sense
to ask what proportion X1 is of X.   For example: 10 = 6 + 4 and 6 is
.6 of the total.

Now, in the case of a model with an intercept its a mathematical fact
that the variance of the response equals the variance of the fitted
model plus the variance of the residuals.  Thus it makes sense to ask
what fraction of the variance of the response is represented by the
variance of the fitted model (this fraction is R^2).

But if there is no intercept then that mathematical fact breaks down.
That is, its no longer true that the variance of the response equals
the variance of the fitted model plus the variance of the residuals.
Thus how meaningful is it to ask what proportion the variance of the
fitted model is of the variance of the response in the first place?
In this case we need to rethink the entire approach which is why a
different formula is required.

Also, maybe the real problem is not this at all. That is perhaps you
are not really trying to find the goodness of fit but rather you are
trying to compare two particular models: one with intercept and one
without.  In that case R^2 is not really what you want.  Instead use
the R anova command. For example, using the built in BOD data frame:

> fm <- lm(demand ~ Time, BOD)
> fm0 <- lm(demand ~ Time - 1, BOD)
> anova(fm, fm0)
Analysis of Variance Table

Model 1: demand ~ Time
Model 2: demand ~ Time - 1
  Res.Df RSS Df Sum of Sq  F  Pr(>F)
1  4  38.069
2  5 135.820 -1   -97.751 10.271 0.03275 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Here we see that the residual sum of squares is much less for the full
model than for the reduced model and its significant at the 3.275%
level.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculating AUCs for each of the 1000 boot strap samples

2011-03-17 Thread Brian Diggs


Taby,

First, it is better to reply to the whole list (which I have included on 
this reply); there is a better chance of someone helping you.  Just 
because I could help with one aspect does not mean I necessarily can (or 
have the time to) help with more.


Further comments are inline below.

On 3/16/2011 10:45 PM, taby gathoni wrote:

Hi Brian,

Thanks for this comment I will action on this. Thanks also for the
comment, Andrija also advised the same thing and it worked like magic.

My next cause of action was to get the confidence intervals with the
AUC values.

For the confidence intervals i did them manually. for 99% i cut out
first 5 and last 5 after ranking the ACs while for 95% CI i cut out
first 25 and last 25


In general, this is right.  The middle 95% excludes the 2.5% on the 
ends, so for 1000 samples that is excluding the 25 most extreme values.



and this is my output


Upper bound Lower  Bound

at 99% CI   0.8175  0.50125

at 95% CI   0.7775  0.50375

from my understanding because there are small samples of 20 GOOD and
20 BAD the variations in the upper and lower bound should be minimal in
the 1000 samples.


I don't know why you would necessarily expect the variance to be 
minimal.  It is what it is.  Also, I don't know why you took 20 of each 
rather than just a random sub-sample.



If you get time, Would you be in a position to assist me find out
why  i have such huge variations? Thank you for taking time to respond.


Maybe pull out 10 of your bootstrap samples and look at the ROC curves 
themselves and their associated AUC.  That might give you a sense as to 
the variability that is possible (which is reflected in the confidence 
interval).


As a final note, you are reinventing the wheel.  There are several 
packages that deal with ROC curves.  Two I like in particular are ROCR 
and pROC.  The latter even has built in routines for computing 
confidence intervals for the AUC using bootstrap replication.



Kind regards,
Taby


--- On Wed, 3/16/11, Brian Diggs  wrote:

From: Brian Diggs
Subject: Re: calculating AUCs for each of the 1000 boot strap samples
To: tab...@yahoo.com
Cc: "R help"
Date: Wednesday, March 16, 2011, 10:42 PM

On 3/16/2011 8:04 AM, taby gathoni wrote:

data<-data.frame(id=1:(165+42),main_samp$SCORE, 
x=rep(c("BAD","GOOD"),c(42,165)))

  f<-function(x) {

+ str.sample<-list()
+ for (i in 1:length(levels(x$x)))
+ {
+ str.sample[[i]]<-x[x$x==levels(x$x)[i] 
,][sample(tapply(x$x,x$x,length)[i],20,rep=T),]
+ }
+ strat.sample<-do.call("rbind",str.sample)
+ return(strat.sample$main_samp.SCORE)
+ }

  f(data)

   [1]
   706 633 443 843 756 743 730 843 706 730 606 743 768 768 743 763 608 730
   743 743 530 813 813 831 793 900 793 693 900 738 706 831
[33] 818 758 718 831 768 638 770 738

  repl<-list()
  auc<-list()
  for(i in 1:1000)

+ {
+ repl[[i]]<-f(data)
+ auc[[i]]<-colAUC(repl[[i]],rep(c("BAD","GOOD")),plotROC=FALSE,alg="ROC")
+ }
Error in
   colAUC(repl[[i]], rep(c("BAD", "GOOD")), plotROC = FALSE, alg = "ROC") :
colAUC: length(y) and nrow(X) must be the same Thanks alotTaby


I think (though I can't check because the example is not reproducible without 
main_samp$SCORE), that the problem is that the second argument to colAUC should 
be
rep(c("BAD", "GOOD"), c(20,20))
The error is that repl[[i]] is length 40 while rep(c("BAD", "GOOD")) is length 
2.

P.S. When giving an example, it is better to not include the prompts and 
continuation prompts.  Copy it from the script rather than the output. Relevant 
output can then be included as script comments (prefixed with #).  That makes 
cutting-and-pasting to test easier.

-- Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health&  Science University



--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regex query (Apache logs)

2011-03-17 Thread Saptarshi Guha

Hello Allan,

Thanks the response. Provides me hope. I appreciate [3], might even go with
that.
And for posterity, here's the code (assuming pastebin never expires)

[1] Test string : http://pastebin.com/FyAFzmTv
[2] Pattern (modified as per your suggestion) : http://pastebin.com/s7VT0r5K

pattern <- readLines(url("http://pastebin.com/raw.php?i=s7VT0r5K";),
warn=FALSE)
test <- readLines(url("http://pastebin.com/raw.php?i=rbAvR2dK";),warn=FALSE)
regexpr(pattern, test, perl=TRUE)

Thanks
Saptarshi


On Thu, Mar 17, 2011 at 12:12 AM, Allan Engelhardt wrote:

> Some comments:
>
> 1. [^\s] matches everything up to a literal 's', unless perl=TRUE.
> 2. The (.*) is greedy, so you'll need (.*?)"\s"(.*?)"\s"(.*?)"$ or similar
> at the end of the expression
>
> With those changes (and removing a space inserted by the newsgroup posting)
> the expression works for me.
>
> > (pat <- readLines("/tmp/b.txt")[1])
> [1]
> "^(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\s([^\\s]*)\\s([^\\s]*)\\s\\[([^\\]]+)\\]\\s\"([A-Z]*)\\s([^\\s]*)\\s([^\\s]*)\"\\s([^\\s]+)\\s(\\d+)\\s\"(.*?)\"\\s\"(.*?)\"\\s\"(.*?)\"$"
> > regexpr(pat, test, perl=TRUE)
> [1] 1
> attr(,"match.length")
> [1] 436
>
> 3. Consider a different approach, e.g. scan(textConnection(test),
> what=character(0))
>
> Hope this helps
>
> Allan
>
>
>
> On 16/03/11 22:18, Saptarshi Guha wrote:
>
>> Hello R users,
>>
>> I have this regex see [1] for apache log lines. I tried using R to parse
>> some data (only because I wanted to stay in R).
>> A sample line is [2]
>>
>> (a) I saved the line in [1] into "~/tmp/a.txt" and [2] into "/tmp/a.txt"
>>
>> pat<- readLines("~/tmp/a.txt")
>> test<- readLines("/tmp/a.txt")
>> test
>> grep(pat,test)
>>
>> returns integer(0)
>>
>> The same query works in python via re.match() (i.e does return groups)
>>
>> Using readLines, the regex is escaped for me. Does Python and R use
>> different regex styles?
>>
>> Cheers
>> Saptarshi
>>
>> [1]
>>
>>  
>> ^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s([^\s]*)\s([^\s]*)\s\[([^\]]+)\]\s"([A-Z]*)\s([^\s]*)\s([^\s]*)"\s([^\s]+)\s(\d+)\s"(.*)"\s"(.*)"\s"(.*)"$
>>
>> [2]
>> 220.213.119.925 addons.mozilla.org - [10/Jan/2001:01:55:07 -0800] "GET
>>
>> /blocklist/3/%8ce33983c0-fd0e-11dc-12aa-0800200c9a66%7D/4.0b5/Fennec/20110217140304/Android_arm-eabi-gcc3/chrome:%2F%2Fglobal%2Flocale%2Fintl.properties/beta/Linux%
>> 202.6.32.9/default/default/6/6/1/ HTTP/1.1" 200 3243 "-" "Mozilla/5.0
>> (Android; Linux armv7l; rv:2.0b12pre) Gecko/20110217 Firefox/4.0b12pre
>> Fennec/4.0b5" "BLOCKLIST_v3=110.163.217.169.1299218425.9706"
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Spatial cluster analysis of continous outcome variable

2011-03-17 Thread Jon Toledo


Dear R Users, R Core
Team,
I have a two dimensional space where I measure a numerical value in two 
situations at different points. I have measured the change and I would like to 
test if there are areas in this 2D-space where there is a different amount of 
change (no change, increase, decrease). I don´t know if it´s better to analyse 
the data just with the coordinates or if its better to group them in "pixels" 
(and obtain the mean value for each pixel) and then run the cluster analysis. I 
would like to know if there is a package/function that allows me to do these 
calculations.I would also like to know if it could be done in a 3D space (I 
have collapsed the data to 2D because I don´t have many points.
Thanks in advance
J Toledo  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Segmentation fault when using "plot" function

2011-03-17 Thread Prof Brian Ripley


On Thu, 17 Mar 2011, Henrik Bengtsson wrote:


On Thu, Mar 17, 2011 at 6:58 AM, cchace  wrote:


I frequently get a segmentation fault error when using the "plot" command.
It happens about half the time.

We are running an old version of R (R version 2.8.0 (2008-10-20) on Linux.

I did a quick search for this problem and didn't find anything. Apologies if
I missed it.


Eventually some kind person with lots of extra time on his/her hand
will install R v2.8.0 on their machine, try to reproduce the problem,
troubleshoot it, fix the bug, and provide you with a customized
retroactive patched R v2.8.0 that you could install.  While waiting
for that to happen, you may want to try out R v2.12.2.


If the bug is in R: it looks much more likely to be in the locale 
handling of the X11 functions of the unstated (but probably old and 
unpatched) version of 'Linux'.



 *** Process received signal ***
 Signal: Segmentation fault (11)
 Signal code: Address not mapped (1)
 Failing at address: 0x231e18f22
 [ 0] /lib64/libc.so.6 [0x3c84a300a0]
 [ 1] /lib64/libc.so.6(strlen+0x10) [0x3c84a76090]
 [ 2] /usr/lib64/libX11.so.6 [0x3c8765d101]
 [ 3] /usr/lib64/libX11.so.6(_XlcCreateLC+0x46) [0x3c8765e5d6]
 [ 4] /usr/lib64/libX11.so.6(_XlcUtf8Loader+0x10) [0x3c8767ebe0]
 [ 5] /usr/lib64/libX11.so.6(_XOpenLC+0x104) [0x3c87665674]
 [ 6] /usr/lib64/libX11.so.6(_XlcCurrentLC+0x8) [0x3c87665768]
 [ 7] /usr/lib64/libX11.so.6(XSupportsLocale+0x9) [0x3c87665b59]
 [ 8] /usr/lib64/R/modules//R_X11.so(X11_Open+0x51) [0x2aaab8a32581]


That is all that was useful

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cv.lm syntax error

2011-03-17 Thread agent dunham

I rewrite my previous comand, CVlm works now. It was related to having the
same names in the df than in the formula included at CVlm . 

Now I'd like to know:

Q1: if the seed is of any special importance, or I can type just seed=29
like in the example? 

Q2: I typed: 

CVlm(df = mydf, form.lm = formula(response~v1+v2+factor), m=3, dots = TRUE,
seed=29, plotit=TRUE, printit=TRUE)

In case is important factor has three levels, this way: table(factor): 
levels: 2  3  4


freq:   2 21 34

And when the 3rd Fold it says: 

fold 3 
Observations in test set: 2 5 6 8 11 13 15 16 17 22 29 33 35 37 41 46 51 52
55 
Error en model.frame.default(Terms, newdata, na.action = na.action, xlev =
object$xlevels) : 
  factor 'factor' has new level(s) 2

What does it mean?

Thanks in advance,u...@host.com

--
View this message in context: 
http://r.789695.n4.nabble.com/cv-lm-syntax-error-tp3334889p3385065.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Strange R squared, possible error

2011-03-17 Thread derek

Thank you for your very comprehensible answer. 
I a priori know that model y=a*x+0 is right and that I can't get x=constant
nor y=constant. 
I'm comparing performance of data gathering in my data set to another data
sets in which performance gathering is characterized by R-squared . The data
in data sets are not random but have random error with normal distribution.
The error in my data is composed of human and equipment error (normal
distribution), appliance error (probably skewed normal distribution) and
some other interferences. So I compare model y = a*x + b and y = a*x + 0 to
know the magnitude of interferences which I can't influence - the difference
of R^2's.

--
View this message in context: 
http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3385009.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Strange R squared, possible error

2011-03-17 Thread derek

Yes they are. I had edited the reply, but It didn't help.
Correction:
2)I meant zero slope, no zero intercept.


--
View this message in context: 
http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3384648.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lmm WITHOUT random factor (lme4)

2011-03-17 Thread Mark Difford

On Mar 17, 2011; 04:29pm Thierry Onkelinx wrote:

>> You cannot compare lm() with lme() because the likelihoods are not the
>> same. Use gls() instead of lm()

Hi Thierry,

Of course, I stand subject to correction, but unless something dramatic has
changed, you can. gls() can be used if you need to accommodate a correlation
structure.

The method I have outlined, i.e. anova(lme$obj, lm$obj), is detailed in
Pinheiro & Bates (2000) beginning page 154. Please refer to this if you
doubt me.

Regards, Mark.

--
View this message in context: 
http://r.789695.n4.nabble.com/lmm-WITHOUT-random-factor-lme4-tp3384054p3384802.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Strange R squared, possible error

2011-03-17 Thread derek

Thats exactly what I would like to do. Any idea on good text? I've consulted
severel texts, but no one defined R^2 as R^2 = 1 - Sum(R[i]^2) /
Sum((y[i])^2-y*)) still less why to use different formulas for similar model
or why should be R^2 closer to 1 when y=a*x+0 than in general model y=a*x+b.

from manual:
r.squared R^2, the ‘fraction of variance explained by the model’, 
R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2), 
where y* is the mean of y[i] "if there is an intercept" and zero otherwise. 

I don't need explaining what R^2 does nor how to interpret it, because I
know what it means and how it is derived. I don't need to be told which
model I should apply. So the answers from Thomas weren't helpful. 

I don't claim it is wrong, otherwise wouldn't be employed, but I want to see
the reason behind using two formulas.

Control questions:
1) Statement "if there is an intercept" means intercept including zero
intercept?

2) If I use model y = a*x+0 which formula for R^2 is used: the one with Y*
or the one without?

--
View this message in context: 
http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3384844.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Beginner question: How to replace part of a filename in read.csv?

2011-03-17 Thread pierz

I would like to use samp as a part of a filename that I can change. My source
files are .csv files with date as the file name, and I would like to be able
to type in the date (later perhaps automate this using list.files) and then
read the csv and write the pdf automatically. I have tried different
combinations with "" and () around samp, but I keep getting the error
"object 'samp.csv' not found".

samp <- "20110317"
read.csv(file=samp.csv,...)
#next R processes some code that works fine, and then should save the
figure:
pdf(file=samp.pdf,...)
dev.off()

How can I get R to replace samp as 20110317 in front of .csv and .pdf?

Best,
Sjouke

--
View this message in context: 
http://r.789695.n4.nabble.com/Beginner-question-How-to-replace-part-of-a-filename-in-read-csv-tp3384786p3384786.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lmm WITHOUT random factor (lme4)

2011-03-17 Thread Mark Difford

On Mar 17, 2011; 04:29pm Thierry Onkelinx wrote:

>> You cannot compare lm() with lme() because the likelihoods are not the
>> same. Use gls() instead of lm()

And perhaps I should have added the following:

First para on page 155 of Pinheiro & Bates (2000) states, "The anova method
can be used to compare lme and lm objects."

Regards, Mark.

--
View this message in context: 
http://r.789695.n4.nabble.com/lmm-WITHOUT-random-factor-lme4-tp3384054p3384823.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Gelman-Rubin convergence diagnostics via coda package

2011-03-17 Thread fbielejec

Dear,

I'm trying to run diagnostics on MCMC analysis (fitting a log-linear
model to rates data). I'm getting an error message when trying
Gelman-Rubin shrink factor plot:

>gelman.plot(out)
Error in chol.default(W) : 
  the leading minor of order 2 is not positive definite

I take it that somewhere, somehow a matrix is singular, but how can
that be remedied? 

My code:

library(rjags)

#JAGS MODEL3 CODE-#
data {
   for (i in 1:N) {
 y[i] <- z2[i] - z1[i]
   }
}

model {

  for (i in 1 : N) {
  
  mean[i] <- (beta[i] * (n[i] - z1[i]) * (n[i] - z2[i]) ) / n[i]

  y[i] ~ dpois( mean[i] )
  log(beta[i]) <-  a1 * type[i] + a0
}

a1 ~ dnorm(0, 1.0E-4)
a0 ~ dnorm(0, 1.0E-4)

}
#--# 

N = 12
n = c(63, 33, 18, 16, 36, 3, 33, 101, 64, 30, 10, 36)
z1 = c(33, 8, 14, 4, 2, 8, 24, 76, 35, 14, 6, 22)
z2 = c(37, 8, 16, 6, 3, 10, 28, 91, 56, 17, 8, 28)
type = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1)

init0 = list(a0 = 0.0, a1 = 0.0, .RNG.seed = 1234, .RNG.name =
"base::Super-Duper") init1 = list(a0 = 1.0, a1 = 1.0, .RNG.seed =
1234, .RNG.name = "base::Super-Duper")

jags <- jags.model('model3.bug',
   data = list('N' = N,
   'n' = n,
   'z1' = z1,
   'z2' = z2,
   'type' = type),
   inits = list(init0, init1),
   n.chains = 2,
   n.adapt = 100)
  
update(jags, 1000)
  
out <- coda.samples(model = jags,
variable.names =  c('beta'),
n.iter = 25000,
thin = 5)

gelman.plot(out)

-- 
while(!succeed) { try(); }

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] generalized mixed linear models, glmmPQL and GLMER give very different results that both do not fit the data well...

2011-03-17 Thread Franssens, Samuel

Hi,

I have the following type of data: 86 subjects in three independent groups 
(high power vs low power vs control). Each subject solves 8 reasoning problems 
of two kinds: conflict problems and noconflict problems. I measure accuracy in 
solving the reasoning problems. To summarize: binary response, 1 within subject 
var (TYPE), 1 between subject var (POWER).

I wanted to fit the following model: for problem i, person j:
logodds ( Y_ij ) = b_0j + b_1j TYPE_ij
with b_0j = b_00 + b_01 POWER_j + u_0j
and b_1j = b_10 + b_11 POWER_j

I think it makes sense, but I'm not sure.
Here are the observed cell means:
conflict noconflict
control 0.68965520.9568966
high  0.69354840.9677419
low 0.88461540.9903846

GLMER gives me:
summary(glmer(accuracy~f_power*f_type + (1|subject), 
family=binomial,data=syllogisms))
Generalized linear mixed model fit by the Laplace approximation
Formula: accuracy ~ f_power * f_type + (1 | subject)
   Data: syllogisms
 AIC   BIC logLik deviance
406 437.7   -196  392
Random effects:
Groups  NameVariance Std.Dev.
subject (Intercept) 4.9968   2.2353
Number of obs: 688, groups: subject, 86

Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept)  1.507450.50507   2.985  0.00284 **
f_powerhp0.130830.70719   0.185  0.85323
f_powerlow   2.041210.85308   2.393  0.01672 *
f_typenoconflict 3.287150.64673   5.083 3.72e-07 ***
f_powerhp:f_typenoconflict   0.216800.93165   0.233  0.81599
f_powerlow:f_typenoconflict -0.011991.45807  -0.008  0.99344
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
(Intr) f_pwrh f_pwrl f_typn f_pwrh:_
f_powerhp   -0.714
f_powerlow  -0.592  0.423
f_typncnflc -0.185  0.132  0.109
f_pwrhp:f_t  0.128 -0.170 -0.076 -0.694
f_pwrlw:f_t  0.082 -0.059 -0.144 -0.444  0.308

glmmPQL gives me:
summary(glmmPQL(fixed=accuracy~f_power*f_type, random=~1|subject, 
family=binomial, data=syllogisms))
iteration 1
iteration 2
iteration 3
iteration 4
iteration 5
iteration 6
Linear mixed-effects model fit by maximum likelihood
Data: syllogisms
  AIC BIC logLik
   NA  NA NA

Random effects:
Formula: ~1 | subject
(Intercept)  Residual
StdDev:1.817202 0.8045027

Variance function:
Structure: fixed weights
Formula: ~invwt
Fixed effects: accuracy ~ f_power * f_type
Value Std.Error  DF  t-value p-value
(Intercept) 1.1403334 0.4064642 599 2.805496  0.0052
f_powerhp   0.0996481 0.5683296  83 0.175335  0.8612
f_powerlow  1.5358270 0.6486150  83 2.367856  0.0202
f_typenoconflict3.0096016 0.4769761 599 6.309754  0.
f_powerhp:f_typenoconflict  0.1856061 0.6790046 599 0.273350  0.7847
f_powerlow:f_typenoconflict 0.0968204 1.0318659 599 0.093830  0.9253
Correlation:
(Intr) f_pwrh f_pwrl f_typn f_pwrh:_
f_powerhp   -0.715
f_powerlow  -0.627  0.448
f_typenoconflict-0.194  0.138  0.121
f_powerhp:f_typenoconflict   0.136 -0.182 -0.085 -0.702
f_powerlow:f_typenoconflict  0.089 -0.064 -0.153 -0.462  0.325

Standardized Within-Group Residuals:
 Min   Q1  Med   Q3  Max
-12.43735991   0.06243699   0.22966010   0.33106978   2.23942234

Number of Observations: 688
Number of Groups: 86


Strange thing is that when you convert the estimates to probabilities, they are 
quite far off. For control, no conflict (intercept), the estimation from glmer 
is 1.5 -> 81% and for glmmPQL is 1.14 -> 75%, whereas the observed is: 68%.

Am I doing something wrong?

Any help is very much appreciated.
Sam.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Beginner question: How to replace part of a filename in read.csv?

2011-03-17 Thread Sarah Goslee

You need to use paste(), as in
paste(samp, ".csv", sep="")

Use that as your filename.

Sarah

On Thu, Mar 17, 2011 at 11:05 AM, pierz  wrote:
> I would like to use samp as a part of a filename that I can change. My source
> files are .csv files with date as the file name, and I would like to be able
> to type in the date (later perhaps automate this using list.files) and then
> read the csv and write the pdf automatically. I have tried different
> combinations with "" and () around samp, but I keep getting the error
> "object 'samp.csv' not found".
>
> samp <- "20110317"
> read.csv(file=samp.csv,...)
> #next R processes some code that works fine, and then should save the
> figure:
> pdf(file=samp.pdf,...)
> dev.off()
>
> How can I get R to replace samp as 20110317 in front of .csv and .pdf?
>
> Best,
> Sjouke
>

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generalized mixed linear models, glmmPQL and GLMER give very different results that both do not fit the data well...

2011-03-17 Thread Bert Gunter

I suggest that you post this on the R-sig-mixed-models list where you
are more likely to find those with bothe interest and expertise in
these matters.

-- Bert

On Thu, Mar 17, 2011 at 7:44 AM, Franssens, Samuel
 wrote:
> Hi,
>
> I have the following type of data: 86 subjects in three independent groups 
> (high power vs low power vs control). Each subject solves 8 reasoning 
> problems of two kinds: conflict problems and noconflict problems. I measure 
> accuracy in solving the reasoning problems. To summarize: binary response, 1 
> within subject var (TYPE), 1 between subject var (POWER).
>
> I wanted to fit the following model: for problem i, person j:
> logodds ( Y_ij ) = b_0j + b_1j TYPE_ij
> with b_0j = b_00 + b_01 POWER_j + u_0j
> and b_1j = b_10 + b_11 POWER_j
>
> I think it makes sense, but I'm not sure.
> Here are the observed cell means:
>                conflict                 noconflict
> control 0.6896552            0.9568966
> high      0.6935484            0.9677419
> low         0.8846154            0.9903846
>
> GLMER gives me:
> summary(glmer(accuracy~f_power*f_type + (1|subject), 
> family=binomial,data=syllogisms))
> Generalized linear mixed model fit by the Laplace approximation
> Formula: accuracy ~ f_power * f_type + (1 | subject)
>   Data: syllogisms
>  AIC   BIC logLik deviance
> 406 437.7   -196      392
> Random effects:
> Groups  Name        Variance Std.Dev.
> subject (Intercept) 4.9968   2.2353
> Number of obs: 688, groups: subject, 86
>
> Fixed effects:
>                            Estimate Std. Error z value Pr(>|z|)
> (Intercept)                  1.50745    0.50507   2.985  0.00284 **
> f_powerhp                    0.13083    0.70719   0.185  0.85323
> f_powerlow                   2.04121    0.85308   2.393  0.01672 *
> f_typenoconflict             3.28715    0.64673   5.083 3.72e-07 ***
> f_powerhp:f_typenoconflict   0.21680    0.93165   0.233  0.81599
> f_powerlow:f_typenoconflict -0.01199    1.45807  -0.008  0.99344
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Correlation of Fixed Effects:
>            (Intr) f_pwrh f_pwrl f_typn f_pwrh:_
> f_powerhp   -0.714
> f_powerlow  -0.592  0.423
> f_typncnflc -0.185  0.132  0.109
> f_pwrhp:f_t  0.128 -0.170 -0.076 -0.694
> f_pwrlw:f_t  0.082 -0.059 -0.144 -0.444  0.308
>
> glmmPQL gives me:
> summary(glmmPQL(fixed=accuracy~f_power*f_type, random=~1|subject, 
> family=binomial, data=syllogisms))
> iteration 1
> iteration 2
> iteration 3
> iteration 4
> iteration 5
> iteration 6
> Linear mixed-effects model fit by maximum likelihood
> Data: syllogisms
>  AIC BIC logLik
>   NA  NA     NA
>
> Random effects:
> Formula: ~1 | subject
>        (Intercept)  Residual
> StdDev:    1.817202 0.8045027
>
> Variance function:
> Structure: fixed weights
> Formula: ~invwt
> Fixed effects: accuracy ~ f_power * f_type
>                                Value Std.Error  DF  t-value p-value
> (Intercept)                 1.1403334 0.4064642 599 2.805496  0.0052
> f_powerhp                   0.0996481 0.5683296  83 0.175335  0.8612
> f_powerlow                  1.5358270 0.6486150  83 2.367856  0.0202
> f_typenoconflict            3.0096016 0.4769761 599 6.309754  0.
> f_powerhp:f_typenoconflict  0.1856061 0.6790046 599 0.273350  0.7847
> f_powerlow:f_typenoconflict 0.0968204 1.0318659 599 0.093830  0.9253
> Correlation:
>                            (Intr) f_pwrh f_pwrl f_typn f_pwrh:_
> f_powerhp                   -0.715
> f_powerlow                  -0.627  0.448
> f_typenoconflict            -0.194  0.138  0.121
> f_powerhp:f_typenoconflict   0.136 -0.182 -0.085 -0.702
> f_powerlow:f_typenoconflict  0.089 -0.064 -0.153 -0.462  0.325
>
> Standardized Within-Group Residuals:
>         Min           Q1          Med           Q3          Max
> -12.43735991   0.06243699   0.22966010   0.33106978   2.23942234
>
> Number of Observations: 688
> Number of Groups: 86
>
>
> Strange thing is that when you convert the estimates to probabilities, they 
> are quite far off. For control, no conflict (intercept), the estimation from 
> glmer is 1.5 -> 81% and for glmmPQL is 1.14 -> 75%, whereas the observed is: 
> 68%.
>
> Am I doing something wrong?
>
> Any help is very much appreciated.
> Sam.
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Beginner question: How to replace part of a filename in read.csv?

2011-03-17 Thread Scott Chamberlain

> paste(samp, ".pdf", sep="")
[1] "20110317.pdf"
> paste(samp, ".csv", sep="")
[1] "20110317.csv"
On Thursday, March 17, 2011 at 10:05 AM, pierz wrote:
I would like to use samp as a part of a filename that I can change. My source
> files are .csv files with date as the file name, and I would like to be able
> to type in the date (later perhaps automate this using list.files) and then
> read the csv and write the pdf automatically. I have tried different
> combinations with "" and () around samp, but I keep getting the error
> "object 'samp.csv' not found".
> 
> samp <- "20110317"
> read.csv(file=samp.csv,...)
> #next R processes some code that works fine, and then should save the
> figure:
> pdf(file=samp.pdf,...)
> dev.off()
> 
> How can I get R to replace samp as 20110317 in front of .csv and .pdf?
> 
> Best,
> Sjouke
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Beginner-question-How-to-replace-part-of-a-filename-in-read-csv-tp3384786p3384786.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotting multiple figures on one page

2011-03-17 Thread scarlet

Jim, 

Thanks for looking into this. The c without paste works. If the rq model
overrides the mfrow, I think I will have to piece together individual plots
using other software. 

scarlet

--
View this message in context: 
http://r.789695.n4.nabble.com/plotting-multiple-figures-on-one-page-tp3382981p3384787.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Strange R squared, possible error

2011-03-17 Thread Greg Snow

It is all a matter of what you are comparing too, or what the null model is.  
For most cases (standard regression) we compare a model with slope and 
intercept to an intercept only model (looking at the effect of the slope), the 
intercept only model fits a horizontal line through the mean of the y's hence 
the subtraction of the mean.  If we don't do that then R-squared can easily 
become meaningless.  Here is an example where we compute the r-squared using 
the no-intercept formula:

x <- rnorm(100, 1000, 20)
y <- rnorm(100, 1000, 20)
cor(x,y)

summary( lm( y ~ rep(1,100) + x + 0 ) )


Notice how big the r-squared value is (and that it is not anywhere near the 
square of the correlation) for data that is pretty independent.

When you force the intercept to 0, then you are using a different null model 
(mean 0).  Part of Thomas's point was that if we still subtract the mean in 
this case then the calculation of r-squared can give a negative number, which 
you pointed out is meaningless, the gist is that that is the incorrect formula 
to use and so R instead uses the formula without subtracting the mean when you 
don't fit an intercept.

The reason the r-squared values are different is because they are using 
different denominators and are therefore not comparable.

The reason that R uses 2 different formulas/denominators is because there is 
not one single formula/denominator that makes general sense in both cases.

Hope this helps,


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of derek
> Sent: Thursday, March 17, 2011 9:29 AM
> To: r-help@r-project.org
> Subject: Re: [R] Strange R squared, possible error
> 
> Thats exactly what I would like to do. Any idea on good text? I've
> consulted
> severel texts, but no one defined R^2 as R^2 = 1 - Sum(R[i]^2) /
> Sum((y[i])^2-y*)) still less why to use different formulas for similar
> model
> or why should be R^2 closer to 1 when y=a*x+0 than in general model
> y=a*x+b.
> 
> from manual:
> r.squared R^2, the ‘fraction of variance explained by the model’,
> R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2),
> where y* is the mean of y[i] "if there is an intercept" and zero
> otherwise.
> 
> I don't need explaining what R^2 does nor how to interpret it, because
> I
> know what it means and how it is derived. I don't need to be told which
> model I should apply. So the answers from Thomas weren't helpful.
> 
> I don't claim it is wrong, otherwise wouldn't be employed, but I want
> to see
> the reason behind using two formulas.
> 
> Control questions:
> 1) Statement "if there is an intercept" means intercept including zero
> intercept?
> 
> 2) If I use model y = a*x+0 which formula for R^2 is used: the one with
> Y*
> or the one without?
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Strange-R-
> squared-possible-error-tp3382818p3384844.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fitting gamm with interaction term

2011-03-17 Thread Irene Mantzouni

Hi all,

 

I would like to fit a gamm model of the form:

 

Y~X+X*f(z)

Where f is the smooth function and 

With random effects on X and on the intercept.

 

So, I try to write it like this:

 

gam.lme<- gamm(Y~ s(z, by=X) +X,  random=list(groups=pdDiag(~1+X))  )

 

but I get the error message :

 

Error in MEestimate(lmeSt, grps) : 

  Singularity in backsolve at level 0, block 1

 

When I simply fit a gam model using the formula above, then it works ok.

 

Is it possible to fit such a model with gamm?

 

 

Thanks a lot!


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotting multiple figures on one page

2011-03-17 Thread Muhammad Rahiz


Scarlet,

If the mfrow is being overridden, perhaps the rimage package might be 
able to piece the individual plots...


--
Muhammad Rahiz
Researcher & DPhil Candidate (Climate Systems & Policy)
School of Geography & the Environment
University of Oxford

On Thu, 17 Mar 2011, scarlet wrote:


Jim,

Thanks for looking into this. The c without paste works. If the rq model
overrides the mfrow, I think I will have to piece together individual plots
using other software.

scarlet

--
View this message in context: 
http://r.789695.n4.nabble.com/plotting-multiple-figures-on-one-page-tp3382981p3384787.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fitting gamm with interaction term

2011-03-17 Thread Simon Wood


Is X a numeric variable or a factor? If it's numeric try

gam.lme<- gamm(Y~ s(z, by=X),  random=list(groups=pdDiag(~1+X))  )

... since otherwise the separate X term is confounded with s(z, by=X). 
(gam detects such confounding and copes with it, but gamm can't).


Simon


On 17/03/11 17:47, Irene Mantzouni wrote:

Hi all,



I would like to fit a gamm model of the form:



Y~X+X*f(z)

Where f is the smooth function and

With random effects on X and on the intercept.



So, I try to write it like this:



gam.lme<- gamm(Y~ s(z, by=X) +X,  random=list(groups=pdDiag(~1+X))  )



but I get the error message :



Error in MEestimate(lmeSt, grps) :

   Singularity in backsolve at level 0, block 1



When I simply fit a gam model using the formula above, then it works ok.



Is it possible to fit such a model with gamm?





Thanks a lot!


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603   http://people.bath.ac.uk/sw283

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting

2011-03-17 Thread Jim Silverton

I have a matrix say:

23   1
12  12
00
0   1
0   1
0   2
23  2

I want to count of number of distinct rows and the number of disinct element
in the second column and put these counts in a column. SO at the end of the
day I should have:

c(1, 1, 1, 2, 2, 1, 1) for the distinct rows and c(1, 1, 1, 2, 2, 2, 2) for
the counts of how many times the elements in the second column exists. Any
help is greatly appreciated.

-- 
Thanks,
Jim.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Spatial cluster analysis of continous outcome variable

2011-03-17 Thread Mike Marchywka








Did you post your data or hypothetical data?
Usually that helps make your problem more clear and more interesting
( likely to get a useful response to your post).




From: tintin...@hotmail.com
To: r-help@r-project.org
Date: Thu, 17 Mar 2011 17:38:14 +0100
Subject: [R] Spatial cluster analysis of continous outcome variable



Dear R Users, R Core
Team,
I have a two dimensional space where I measure a numerical value in two 
situations at different points. I have measured the change and I would like to 
test if there are areas in this 2D-space where there is a different amount of 
change (no change, increase, decrease). I don´t know if it´s better to analyse 
the data just with the coordinates or if its better to group them in "pixels" 
(and obtain the mean value for each pixel) and then run the cluster analysis. I 
would like to know if there is a package/function that allows me to do these 
calculations.I would also like to know if it could be done in a 3D space (I 
have collapsed the data to 2D because I don´t have many points.
Thanks in advance





J Toledo
[[alternative HTML version deleted]]

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting

2011-03-17 Thread K. Elo

Dear Jim,

17.03.2011 20:54, Jim Silverton wrote:

I have a matrix say:

23   1
12  12
00
0   1
0   1
0   2
23  2

I want to count of number of distinct rows and the number of disinct element
in the second column and put these counts in a column. SO at the end of the
day I should have:

c(1, 1, 1, 2, 2, 1, 1) for the distinct rows...

Let's suppose my.data is your data frame, "var" is the 1st column and 
"var1" is the second.

1) Create a 3rd columns for the first task:
   my.data$var2<-0
2) Count distinct rows:

   for (i in 1:nrow(my.data)) { my.data$var2[i]<-nrow(subset(my.data, 
var==var[i] & var1==var1[i])) }

After this, the output of "my.data$var2" is:

[1] 1 1 1 2 2 1 1

> ... and c(1, 1, 1, 2, 2, 2, 2) for the counts of how many times the
> elements in the second column exists.

Here I'm a bit irritated. Shouldn't the count for the first element "1" 
rather be 3, since the number 3 occurs three times... If this is what 
You are looking for, then the following should work:

1) Create a 4th column for:
   my.data$var3<-0
2) Count distinct elements in the second column:

   for (i in 1:nrow(my.data)) { 
my.data$var3[i]<-sum(my.data$var1==my.data$var1[i]) }

After this, the output of "my.data$var3" is:

[1] 3 1 1 3 3 2 2

HTH,
Kimmo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ggplot, transformation, and ylim

2011-03-17 Thread godwin yung

Hey everyone,

I'm having a little trouble with ggplot. I have two sets of y-values, one
whose range is contained in the other. Due to the nature of the y-values, I
wish to scale the y axis with a log base two transformation. Furthermore, I
wish to plot the two sets of y-values as boxplots in separate graphs with
the same ylim (and thus with the same ticks and tick labels). As of now, I
have:
ggplot(data, aes(x1, y1))+geom_boxplot()+scale_y_log2()
ggplot(data, aes(x2, y2))+geom_boxplot()+scale_y_log2()
which successfully creates the boxplot and transforms the y-axis. However, I
can't figure out how to change the range of the y-axis so that they're the
same in both graphs. Inserting "+scale_y_continuous(limits=c(...))" before
or after "+scale_y_log2()" doesn't seem to work. I also tried:
qplot(x1,y1,geom="boxplot", ylim=c(...))+scale_y_log2()
but my efforts were in vain. Any ideas?

Thanks.

Cheers,
Godwin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot, transformation, and ylim

2011-03-17 Thread godwin yung

Hey everyone,

Nevermind, I figured it out:
ggplot(data, aes(x1, y1))+geom_boxplot()+scale_y_log2(lim=c(...))
:)

Cheers,
Godwin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R² for non-linear model

2011-03-17 Thread Alexander Engelhardt


Hi,
thank you for your elaborate answer. I downloaded Prof. Dayton's pdf and 
will read it tomorrow.


A friend also told me that our professor said you can actually compare 
AICs for different distributions. Apparently it's not correct strictly 
speaking, because of the two different likelihoods, but you can get 
meaningful information out of it. Did I understand that right?


By the way, what is the etiquette way of answering your post? Should I 
mail to you /and/ the list?


Regards,
 Alex (with a new e-mail address, in case this mixes something up)

Am 17.03.2011 09:58, schrieb Rubén Roa:

Hi Alexx,

I don't see any problem in comparing models based on different
distributions for the same data using the AIC, as long as they have a
different number of parameters and all the constants are included.
For example, you can compare distribution mixture models with different
number of components using the AIC.
This is one example:
Roa-Ureta. 2010. A Likelihood-Based Model of Fish Growth With Multiple
Length Frequency Data. Journal of Biological, Agricultural and
Environmental Statistics 15:416-429.
Here is another example:
www.education.umd.edu/EDMS/fac/Dayton/PCIC_JMASM.pdf
Prof. Dayton writes above that one advantage of AIC over hypothesis
testing is:
"(d) Considerations related to underlying distributions for random
variables can be
incorporated into the decision-making process rather than being treated
as an assumption whose
robustness must be considered (e.g., models based on normal densities
and on log-normal
densities can be compared)."
Last, if you read Akaike's theorem you will see there is nothing
precluding comparing models built on different distributional models.
Here it is:
" the expected (over the sample space and the space of parameter
estimates) maximum log-likelihood of some data on a working model
overshoots the expected (over the sample space only) maximum
log-likelihood of the data under the true model that
generated the data by exactly the number of parameters in the working
model."
A remarkable result.

Rubén

-Original Message-
From: r-help-boun...@r-project.org on behalf of Alexx Hardt
Sent: Wed 3/16/2011 7:42 PM
To: r-help@r-project.org
Subject: Re: [R] R² for non-linear model

Am 16.03.2011 19:34, schrieb Anna Gretschel:
 > Am 16.03.2011 19:21, schrieb Alexx Hardt:
 >> And to be on-topic: Anna, as far as I know anova's are only useful to
 >> compare a submodel (e.g. with one less regressor) to another model.
 >>
 > thanks! i don't get it either what they mean by fortune...

It's an R-package (and a pdf [1]) with collected quotes from the mailing
list.
Be careful with the suggestion to use AIC. If you wanted to compare two
models using AICs, you need the same distribution (that is,
Verteilungsannahme) in both models.
To my knowledge, there is no way to "compare" a gaussian model to an
exponential one (except common sense), but my knowledge is very limited.

[1] http://cran.r-project.org/web/packages/fortunes/vignettes/fortunes.pdf

--
alexx@alexx-fett:~$ vi .emacs

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fitting gamm with interaction term

2011-03-17 Thread Irene Mantzouni

Hi all,

 

I would like to fit a gamm model of the form:

 

Y~X+X*f(z)

Where f is the smooth function and 

With random effects on X and on the intercept.

 

So, I try to write it like this:

 

gam.lme<- gamm(Y~ s(z, by=X) +X,  random=list(groups=pdDiag(~1+X))  )

 

but I get the error message :

 

Error in MEestimate(lmeSt, grps) : 

  Singularity in backsolve at level 0, block 1

 

When I simply fit a gam model using the formula above, then it works ok.

 

Is it possible to fit such a model with gamm?

 

 

Thanks a lot!


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting

2011-03-17 Thread Petr Savicky

On Thu, Mar 17, 2011 at 02:54:49PM -0400, Jim Silverton wrote:
> I have a matrix say:
> 
> 23   1
> 12  12
> 00
> 0   1
> 0   1
> 0   2
> 23  2
> 
> I want to count of number of distinct rows and the number of disinct element
> in the second column and put these counts in a column. SO at the end of the
> day I should have:
> 
> c(1, 1, 1, 2, 2, 1, 1) for the distinct rows and c(1, 1, 1, 2, 2, 2, 2) for
> the counts of how many times the elements in the second column exists. Any
> help is greatly appreciated.

Hi.

I understand the first as follows. For each row compute the number of
rows, which are equal to the given one. If this is correct, then
the following can be used.

  a <- cbind(
  c(23, 12,  0,  0,  0,  0, 23),
  c(1, 12,  0,  1,  1,  2,  2))

  u <- rep(1, times=nrow(a))
  ave(u, a[, 1], a[, 2], FUN=sum)
  [1] 1 1 1 2 2 1 1

I am not sure, whether i understand the second correctly. Can you
explain it in more detail? I would expect

  ave(u, a[, 2], FUN=sum)
  [1] 3 1 1 3 3 2 2

However, this is different from your expected output. Do you count
only consecutive equal elements?

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Scope and apply-type functions

2011-03-17 Thread jamie.f.olson

So, I've been confused by this for a while.  If I want to create functions in
an apply, it only uses the desired value for the variable if I create a new
local variable:

> lapply(1:5,function(h){k=h;function(){k}})[[1]]()
[1] 1
> lapply(1:5,function(k){function(){k}})[[1]]()
[1] 5
> 

Normally, a function will use values for a variable if the variable is local
to the definition of the function:

> a = function(x){function(){x}}
> a(5)
function(){x}

> a(5)()
[1] 5
> a(6)()
[1] 6


So why doesn't this work for apply?  Does apply work more like loops, with a
shared variable that changes values so that the first function defined
actually uses the last value of the variable?

> a = list();for (k in 1:5){a = c(a,function(){k})}
> a[[1]]()
[1] 5
> 


Or is there something else entirely?

--
View this message in context: 
http://r.789695.n4.nabble.com/Scope-and-apply-type-functions-tp3385230p3385230.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Scope and apply-type functions

2011-03-17 Thread baptiste auguie

Hi,

I think it's a side effect of lazy evaluation, where you should
probably use the ?force like a jedi,

lapply(1:5,function(k){force(k) ; function(){k}})[[2]]()

HTH,

baptiste

On 18 March 2011 07:01, jamie.f.olson  wrote:
> So, I've been confused by this for a while.  If I want to create functions in
> an apply, it only uses the desired value for the variable if I create a new
> local variable:
>
>> lapply(1:5,function(h){k=h;function(){k}})[[1]]()
> [1] 1
>> lapply(1:5,function(k){function(){k}})[[1]]()
> [1] 5
>>
>
> Normally, a function will use values for a variable if the variable is local
> to the definition of the function:
>
>> a = function(x){function(){x}}
>> a(5)
> function(){x}
>
>> a(5)()
> [1] 5
>> a(6)()
> [1] 6
>
>
> So why doesn't this work for apply?  Does apply work more like loops, with a
> shared variable that changes values so that the first function defined
> actually uses the last value of the variable?
>
>> a = list();for (k in 1:5){a = c(a,function(){k})}
>> a[[1]]()
> [1] 5
>>
>
>
> Or is there something else entirely?
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Scope-and-apply-type-functions-tp3385230p3385230.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Numeric vector converted mysteriously to characters in data frame?

2011-03-17 Thread Martin Ralphs

Thanks Jim,

It turns out that the problem was that all columns had been converted
to factors.  Once I converted them back to numeric variables the code
worked fine.  If anybody is wondering how, you can do this with the
following:

mynumber <- as.numeric(levels(myfactor))[myfactor]

There are plenty of other threads that describe this process fully.

Martin


On Thu, Mar 17, 2011 at 3:11 AM, Jim Holtman  wrote:
>
> take a look at what the output of your cbind is; my guess is that it is a 
> character matrix.  you probably want to do
>
> data.frame(...,...,...)
>
> without the cbind.
>
> Sent from my iPad
>
> On Mar 16, 2011, at 18:12, Martin Ralphs  wrote:
>
> > Dear R Help,
> >
> > I would be very grateful if somebody could explain why this is happening.  I
> > am trying to plot a lattice barchart of a vector of numbers with age
> > bandings that I have brought together into a data frame.  When I plot the
> > variables in their raw vector form, it works.  When I try to plot them in
> > the df, R labels the numeric axis with default factor values and treats the
> > numeric vector as character.  Why is this?  Apologies if this is blindingly
> > obvious, I am new to R.
> >
> > Here is some sample code that demonstrates the problem:
> >
> > library(lattice)
> >
> > # Load in some sample data ...
> > ageband <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18)
> > agenames <-
> > c("00-04","05-09","10-14","15-19","20-24","25-29","30-34","35-39",
> > "40-44","45-49","50-54","55-59","60-64","65-69","70-74","75-79","80-84","85
> > plus")
> > popcount <-
> > c(35274,40958,41574,47973,50384,51248,65748,54854,60948,66473,70854,
> > 60475,61854,55848,45857,30584,25475,20574)
> >
> >
> > region <- "North East"
> > test1 <- as.data.frame(cbind(region,ageband,agenames,popcount))
> > region <- "North West"
> > test2 <- as.data.frame(cbind(region,ageband,agenames,popcount))
> > test <- rbind(test1,test2)
> > names(test) <- c("GOR","Band","AgeBand","Persons")
> >
> > # When I plot my numeric data from the df it is treated as character
> > vector...
> >
> > hg1 <- barchart(AgeBand ~ Persons | GOR,
> >        data=test,
> >        origin=0,
> >        layout=c(2,1),
> >        reference=TRUE,
> >        xlab="Count (Persons)",
> >        ylab="Quinary Age Band",
> >        scales="free")
> >
> > hg1
> >
> > # But when I plot the source vectors in the same way, it works how I want it
> > to... (without the region bit, of course)
> > hg2 <- barchart(agenames ~ popcount | region,
> >        origin=0,
> >        layout=c(2,1),
> >        reference=TRUE,
> >        xlab="Count (Persons)",
> >        ylab="Quinary Age Band",
> >        scales="free")
> > hg2
> >
> >    [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Scope and apply-type functions

2011-03-17 Thread Bert Gunter

Try this:

lapply(1:5,function(i){i;function()i})[[2]]()

or

 lapply(1:5,function(i){i;function(j=i) j } )[[2]]()

## both give 2

Now try:

lapply(1:5,function(i){function(j=i) j } )[[2]]()

## gives 5 !

The problem is that if you do:

 lapply(1:5,function(h){function(h)h)

what you get is a list of functions of the form function(h)h , where the
The "h" in this function has nothing to do with the index "h" in the
lapply statement, which no longer exists. So the result is an error
message saying it can't find h.

The top 2 versions specifically put the value of i at the time the ith
function is defined into the function environment, so they exist after
the lapply call is finished.

The 3rd version obviously uses the value of i at the conclusion of the
lapply call when the list of functions is returned from the lapply
call and they need to find the value of i for their environments.

In general, the scoping with lapply gets complicated and, iirc, the
experts have said is nonstandard (Experts: please correct if wrong).
So the best advice is to try to avoid constructs like this if
possible.

-- Bert

On Thu, Mar 17, 2011 at 11:01 AM, jamie.f.olson
 wrote:
> So, I've been confused by this for a while.  If I want to create functions in
> an apply, it only uses the desired value for the variable if I create a new
> local variable:
>
>> lapply(1:5,function(h){k=h;function(){k}})[[1]]()
> [1] 1
>> lapply(1:5,function(k){function(){k}})[[1]]()
> [1] 5
>>
>
> Normally, a function will use values for a variable if the variable is local
> to the definition of the function:
>
>> a = function(x){function(){x}}
>> a(5)
> function(){x}
>
>> a(5)()
> [1] 5
>> a(6)()
> [1] 6
>
>
> So why doesn't this work for apply?  Does apply work more like loops, with a
> shared variable that changes values so that the first function defined
> actually uses the last value of the variable?
>
>> a = list();for (k in 1:5){a = c(a,function(){k})}
>> a[[1]]()
> [1] 5
>>
>
>
> Or is there something else entirely?
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Scope-and-apply-type-functions-tp3385230p3385230.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need to abstract changing name of column within loop

2011-03-17 Thread jctoll

On Thu, Mar 17, 2011 at 10:32 AM, Joshua Ulrich  wrote:
> On Wed, Mar 16, 2011 at 6:58 PM, jctoll  wrote:
>> Hi,
>>
>> I'm struggling to figure out the way to change the name of a column
>> from within a loop.  The problem is I can't refer to the object by its
>> actual variable name, since that will change each time through the
>> loop.  My xts object is A.
>>
>>> head(A)
>>           A.Open A.High A.Low A.Close A.Volume A.Adjusted A.Adjusted.1
>> 2007-01-03  34.99  35.48 34.05   34.30  2574600      34.30  1186780
>> 2007-01-04  34.30  34.60 33.46   34.41  2073700      34.41  1190586
>> 2007-01-05  34.30  34.40 34.00   34.09  2676600      34.09  1179514
>> 2007-01-08  33.98  34.08 33.68   33.97  1557200      33.97  1175362
>> 2007-01-09  34.08  34.32 33.63   34.01  1386200      34.01  1176746
>> 2007-01-10  34.04  34.04 33.37   33.70  2157400      33.70  1166020
>>
>> It's column names are:
>>> colnames(A)
>> [1] "A.Open"       "A.High"       "A.Low"        "A.Close"
>> "A.Volume"     "A.Adjusted"   "A.Adjusted.1"
>>
>> I want to change the 7th column name:
>>> colnames(A)[7]
>> [1] "A.Adjusted.1"
>>
>> I need to do that through a reference to i:
>>> i
>> [1] "A"
>>
>> This works:
>>> colnames(get(i))[7]
>> [1] "A.Adjusted.1"
>>
>> And this is what I want to change the column name to:
>>> paste(i, ".MarketCap", sep = "")
>> [1] "A.MarketCap"
>>
>> But how do I make the assignment?  This clearly doesn't work:
>>
>>> colnames(get(i))[7] <-  paste(i, ".MarketCap", sep = "")
>> Error in colnames(get(i))[7] <- paste(i, ".MarketCap", sep = "") :
>>  could not find function "get<-"
>>
>> Nor does this (it creates a new object "A.Adjusted.1" with a value of
>> "A.MarketCap") :
>>
>> assign(colnames(get(i))[7], paste(i, ".MarketCap", sep = ""))
>>
>> How can I change the name of that column within my big loop?  Any
>> ideas?  Thanks!
>>
> I usually make a copy of the object, change it, then overwrite the original:
>
> tmp <- get(i)
> colnames(tmp)[7] <- "foo"
> assign(i,tmp)
>
> Hope that helps.
>
>> Best regards,
>>
> Joshua Ulrich  |  FOSS Trading: www.fosstrading.com


Thank you, that does help.  It sounds like your way is similar to the
way David suggested.  At least now I can just resign myself to doing
it that way rather than torturing my mind trying to figure out a
shorter way.

Thanks,

James

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Spatial cluster analysis of continous outcome variable

2011-03-17 Thread Jon Toledo


I attach the data (csv format). There are the 3 coordinates, (but as there are 
not so many points I wanted two do 3 analysis in each of them collapsing one 
variable).There are two variables to study I have posted the data as a ratio 
between both states and as a percentage state between both states. The data are 
from different samples (and each sample has 3 or 6 measures).Thanks again.

> From: marchy...@hotmail.com
> To: tintin...@hotmail.com; r-help@r-project.org
> Subject: RE: [R] Spatial cluster analysis of continous outcome variable
> Date: Thu, 17 Mar 2011 15:20:09 -0400
> 
> 
> 
> 
> 
> 
> 
> 
> Did you post your data or hypothetical data?
> Usually that helps make your problem more clear and more interesting
> ( likely to get a useful response to your post).
> 
> 
> 
> 
> From: tintin...@hotmail.com
> To: r-help@r-project.org
> Date: Thu, 17 Mar 2011 17:38:14 +0100
> Subject: [R] Spatial cluster analysis of continous outcome variable
> 
> 
> 
> Dear R Users, R Core
> Team,
> I have a two dimensional space where I measure a numerical value in two 
> situations at different points. I have measured the change and I would like 
> to test if there are areas in this 2D-space where there is a different amount 
> of change (no change, increase, decrease). I don´t know if it´s better to 
> analyse the data just with the coordinates or if its better to group them in 
> "pixels" (and obtain the mean value for each pixel) and then run the cluster 
> analysis. I would like to know if there is a package/function that allows me 
> to do these calculations.I would also like to know if it could be done in a 
> 3D space (I have collapsed the data to 2D because I don´t have many points.
> Thanks in advance
> 
> 
> 
> 
> 
> J Toledo
> [[alternative HTML version deleted]]
> 
> 
  __
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using barplot() with zoo -- names.arg not permitted?

2011-03-17 Thread David Wolfskill

On Thu, Mar 17, 2011 at 10:23:33AM -0400, Gabor Grothendieck wrote:
> On Thu, Mar 17, 2011 at 9:38 AM, David Wolfskill  wrote:
> ...
> > But the X-axis labels show up as "large integers" -- the POSIXct values
> > are apparently treated as numeric quantities for this purpose.
> ...
> Please cut this all down to as small a reproducible example as you can
> manage and place it all in ***one*** code section that also generates
> the data so that we can copy a single section of code from your post
> to R.  Use dput to convert data to code for this purpose.

Thank you for that.  :-)

The data below is a bit larger than absolutely necessary, but its
much smaller than what I would normally be using -- and I was able
to paste this into R directly (on a different machine, getting the
same results), so I believe it will suffice.

As a benefit, after going through this, I did manage to find an
invocation that is able to generate an acceptable X-axis -- at
least, sometimes.  (Changing the size of the X11 window appears to
mess things up -- at least.)  But this seems rather more complicated
than I expected.

Anyway, here's a cut/paste of the requested code section:

df <- structure(list(time = c(1297278016L, 1297278017L, 1297278018L, 
1297278019L, 1297278020L, 1297278021L, 1297278022L, 1297278023L, 
1297278024L, 1297278025L, 1297278026L, 1297278027L, 1297278028L, 
1297278029L, 1297278030L, 1297278031L, 1297278032L, 1297278033L, 
1297278034L, 1297278035L, 1297278036L, 1297278037L, 1297278038L, 
1297278039L, 1297278040L, 1297278041L, 1297278042L, 1297278043L, 
1297278044L, 1297278045L, 1297278046L, 1297278047L, 1297278048L, 
1297278049L, 1297278050L, 1297278051L, 1297278054L, 1297278055L, 
1297278056L, 1297278057L, 1297278058L, 1297278059L, 1297278060L, 
1297278061L, 1297278062L, 1297278063L, 1297278064L, 1297278065L, 
1297278066L, 1297278067L, 1297278068L, 1297278069L, 1297278070L, 
1297278071L, 1297278072L, 1297278073L, 1297278074L, 1297278075L, 
1297278076L, 1297278077L, 1297278078L, 1297278079L, 1297278080L, 
1297278081L, 1297278082L, 1297278083L, 1297278084L, 1297278085L, 
1297278086L, 1297278087L, 1297278088L, 1297278089L, 1297278090L, 
1297278091L, 1297278092L, 1297278093L, 1297278094L, 1297278095L, 
1297278096L, 1297278097L, 1297278098L, 1297278099L, 1297278100L, 
1297278101L, 1297278102L), kern.cp_time_idle = c(3193265L, 3193540L, 
3193771L, 3194037L, 3194267L, 3194458L, 3194572L, 3194602L, 3194602L, 
3194602L, 3194602L, 3194602L, 3194603L, 3194607L, 3194617L, 3194672L, 
3194802L, 3194933L, 3195067L, 3195225L, 3195378L, 3195378L, 3195418L, 
3195418L, 3195418L, 3195419L, 3195419L, 3195421L, 3195422L, 3195422L, 
3195422L, 3195422L, 3195422L, 3195450L, 3195450L, 3195664L, 3196432L, 
3196440L, 3196871L, 3196872L, 3196874L, 3196944L, 3196944L, 3197037L, 
3197130L, 3197130L, 3197132L, 3197159L, 3197159L, 3197162L, 3197166L, 
3197236L, 3197367L, 3197367L, 3197497L, 3197639L, 3197782L, 3197922L, 
3197946L, 3197947L, 3197954L, 3197954L, 3197954L, 3198194L, 3198194L, 
3198204L, 3198205L, 3198256L, 3198265L, 3198269L, 3198269L, 3198269L, 
3198269L, 3198269L, 3198270L, 3198270L, 3198270L, 3198270L, 3198270L, 
3198270L, 3198270L, 3198323L, 3198323L, 3198367L, 3198631L), 
kern.cp_time_intr = c(5142L, 5142L, 5142L, 5143L, 5145L, 
5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 
5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 
5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 
5146L, 5147L, 5147L, 5147L, 5148L, 5148L, 5149L, 5150L, 5152L, 
5152L, 5152L, 5153L, 5153L, 5155L, 5156L, 5156L, 5156L, 5156L, 
5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 
5156L, 5156L, 5156L, 5156L, 5156L, 5157L, 5157L, 5157L, 5157L, 
5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 
5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L), 
kern.cp_time_nice = c(36L, 36L, 36L, 36L, 36L, 36L, 36L, 
36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 
36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 
36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 
36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 
36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 
36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 
36L, 36L, 36L, 36L, 36L, 36L), kern.cp_time_sys = c(132583L, 
132591L, 132608L, 132618L, 132646L, 132667L, 132694L, 132737L, 
132784L, 132825L, 132833L, 132881L, 132933L, 132977L, 133017L, 
133059L, 133093L, 133121L, 133147L, 133172L, 133210L, 133220L, 
133259L, 133302L, 133342L, 133388L, 133431L, 133494L, 133547L, 
133557L, 133592L, 133631L, 133673L, 133713L, 133765L, 133785L, 
133798L, 133808L, 133816L, 133857L, 133892L, 133951L, 133960L, 
134004L, 134043L, 134109L, 134170L, 134206L, 134268L, 134329L, 
134396L, 134443L, 134486L, 134501L, 134558L, 134611L, 134654L, 
134704L, 134743L, 134789L, 134824L, 134871L, 134881L, 1

[R] A question about list

2011-03-17 Thread ufuk beyaztas

Hi dear all,

It may be a simple question, i have a list output with different number of
elements as following;

[[1]]
 [1]  0.86801402 -0.82974691  0.3974 -0.98566707 -4.96576856 -1.32056754
 [7] -5.54093319 -0.07600462 -1.34457280 -1.04080125  1.62843297 -0.20473912
[13]  0.30659907  2.66908117  2.53791837  0.53788013 -0.57463077  0.27708874
[19] -2.94233200  1.54565643 -6.83694100  7.21556266 -3.14823536 -1.34590796
[25]  0.78660855  5.53692735  1.22511890  7.65249980 -0.43008997 -0.10536125

[[2]]
 [1] -2.80332826  0.54414548  4.38232256 -1.38407653 -1.59241491 -1.35509664
 [7]  1.04806755 -0.27685465 -1.36671548 -3.16649719  2.17194692 -3.49404253
[13]  4.69102017  2.78297615  0.34565006  1.05954751  1.78836097 -0.80393182
[19]  3.74315304  1.17427902  1.62354686  0.53186688 -6.56519965 -3.39045485
[25]  0.01043676 -0.18857654 -0.57070351 -0.06135564  6.92331269 -1.46544614
[31] -1.65309767

[[3]]
 [1]  4.1923546  0.6319591 -0.8568113 -3.3115788 -2.4166481 -1.1543074
 [7] -0.9333245  0.2632038 -0.6909956 -3.1008763 -2.9557687  1.5382464
[13]  1.2713290  6.6527302  1.0433603 -0.9916190 -2.7724673 -1.6554250
[19]  1.8023591 -1.5101793  1.2604704 -0.2853326 -2.4312827 -0.4731487
[25]  3.5061061  1.7392190  1.5493419 -0.7203778 -0.6995221  2.7686406
[31]  6.1813364 -1.8665294

I want to compute the percentile of all of 94 elements, i tried
quantile(data) but i got an error like
Error in sort.list(x, partial = unique(c(lo, hi))) : 
  'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
Can some one help me abouth this ?
Thanks for any idea...

--
View this message in context: 
http://r.789695.n4.nabble.com/A-question-about-list-tp3385711p3385711.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Autocorrelation in non-linear regression model

2011-03-17 Thread pruro

Hey all!

I am working on my master thesis and I am desperate with my model. 
It looks as following:

Y(t) = β1*X1(t) + β2*X2(t) + δ*(β1*((1+c)/(δ+c))+β2)*IE(t) -
β2*α*((1+c)/(δ+c))*(δ+g)* IE(t-1)

note: c and g is a constant value

The problem I encounter is that between IE(t) and IE(t-1) there is strong
linear correlation (autocorrelation). How can I solve this problem? Of
utterly importance is to have finally a significant coefficient δ and α
which is than used for a consecutive model. However, I get either no
significant values for δ and α, or for one of the two some unrealistic
values.

Is there an option to combine both in using some non-linear time lagged
model, time series or plugged in autoregression? 

A following up question would be how to place penalties for this model. I
would like to restrict values for δ and α between 0 and 0.5 and add
penalties when they come closer to the boundaries.

I really need some help. Because I am stuck with it for the last two weeks
and don't know how to go about it.

Thanks for the support

Cheers,

Bob


--
View this message in context: 
http://r.789695.n4.nabble.com/Autocorrelation-in-non-linear-regression-model-tp3385647p3385647.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Histograms with strings,

2011-03-17 Thread Khanvilkar, Shashank

Hello,
Thanks in advance for any help,

I have read a CSV file in which there is a column for an IP addr as in:

tmpInFile$V2
  [1] "74.125.224.38" "74.125.224.38" "129.46.71.19"  "129.46.71.19"
  [5] "129.46.71.19"  "129.46.71.19"  "129.46.71.19"  "129.46.71.19"
  [9] "129.46.71.19"  "129.46.71.19"  "129.46.71.19"  "129.46.71.19"

If I want to find the IP addr that has the highest occurrence (129.46.71.19, in 
this case), is there a simple way to do this?

Thanks
Shank

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with plotting a line that is multicoloured based on levels of a factor

2011-03-17 Thread Pamela Allen

 

 

Hi All, 

 

I'm trying to plot data that is a time series of flows that are associated
with a specific level, and I would like each level to represent a colour
in a line plot.  Here is some data that approximates what I'm using:

 

date=c(1:300)

flow=sin(2*pi/53*c(1:300))

levels=c(rep(c("high","med","low"),100))

data=cbind.data.frame(date, flow, levels)

 

the "levels" column represents the levels of flow.  What I've done so far
is to plot this data using coloured points corresponding with each flow
level:

 

colour=ifelse(data$levels=="high","red",

ifelse(data$levels=="med","green",

ifelse(data$levels=="low","blue","")))

plot(date, flow, col=colour)

 

What I would like to do instead is to plot the line of this data, not the
points.  i.e., 

plot(date, flow, type="l")

 

But I would like the colour of the line to change with each level, i.e.,

plot(date, flow, type="l", col=colour)

 

But this doesn't work because the line is continuous and the colours are
discrete.  I looked into using clipplot, but I'm not sure how I would
specify limits that would give different sections of the line correct
colours.  Does anyone know of a way to draw a line with different colours?
One way I thought of was to plot each level of flow separately and then
build the plot up, i.e., 

plot(data$date[data$levels=="high"], data$flow[data$levels=="high"],
col="red", type="l")

lines(data$date[data$levels=="med"], data$flow[data$levels=="med"],
col="green", type="l")

lines(data$date[data$levels=="low"], data$flow[data$levels=="low"],
col="blue", type="l")

 

But the line fills in data gaps, so this doesn't work.

 

Any help would be much appreciated!  Thank you.

 

-Pam Allen

pal...@hatfieldgroup.com

 

 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with Time Series Plot‏

2011-03-17 Thread Axel Urbiz

 Dear List,

This is an embarrassing question, but I can seem to make this workHow do I
change the font size on the xlab and on the numbers shown in the x-axis on
the time series plot below. The arguments cex.lab and cex.axis do not seem
to be 'passing' to the plot function.

plot(ts(rnorm(100), start=2004, freq=12),
  ylab="RQI", xlab="My X lab", col="black", cex.lab=0.1, cex.axis=0.7)
Thanks,
Axel.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Strange R squared, possible error

2011-03-17 Thread derek

Thank you for our reply. It's a pity, that 2 variables defined by different
formula have same name. If the variables had been named differently, I
wouldn't have problem at all and it looks like it's done on purpose.
Because I test a quality of data (performance of collecting data) not a
model which I know a priori, R^2 gives me simple information of that quality
even for nonlinear model.

Still when testing different models R^2 shows nothing important. I didn't
stated that negative R^2 is meaningless. It has meaning that the model or
the data are wrong.

--
View this message in context: 
http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3385837.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A question about list

2011-03-17 Thread Tóth Dénes


?unlist

quantile(unlist(data))


> Hi dear all,
>
> It may be a simple question, i have a list output with different number of
> elements as following;
>
> [[1]]
>  [1]  0.86801402 -0.82974691  0.3974 -0.98566707 -4.96576856
> -1.32056754
>  [7] -5.54093319 -0.07600462 -1.34457280 -1.04080125  1.62843297
> -0.20473912
> [13]  0.30659907  2.66908117  2.53791837  0.53788013 -0.57463077
> 0.27708874
> [19] -2.94233200  1.54565643 -6.83694100  7.21556266 -3.14823536
> -1.34590796
> [25]  0.78660855  5.53692735  1.22511890  7.65249980 -0.43008997
> -0.10536125
>
> [[2]]
>  [1] -2.80332826  0.54414548  4.38232256 -1.38407653 -1.59241491
> -1.35509664
>  [7]  1.04806755 -0.27685465 -1.36671548 -3.16649719  2.17194692
> -3.49404253
> [13]  4.69102017  2.78297615  0.34565006  1.05954751  1.78836097
> -0.80393182
> [19]  3.74315304  1.17427902  1.62354686  0.53186688 -6.56519965
> -3.39045485
> [25]  0.01043676 -0.18857654 -0.57070351 -0.06135564  6.92331269
> -1.46544614
> [31] -1.65309767
>
> [[3]]
>  [1]  4.1923546  0.6319591 -0.8568113 -3.3115788 -2.4166481 -1.1543074
>  [7] -0.9333245  0.2632038 -0.6909956 -3.1008763 -2.9557687  1.5382464
> [13]  1.2713290  6.6527302  1.0433603 -0.9916190 -2.7724673 -1.6554250
> [19]  1.8023591 -1.5101793  1.2604704 -0.2853326 -2.4312827 -0.4731487
> [25]  3.5061061  1.7392190  1.5493419 -0.7203778 -0.6995221  2.7686406
> [31]  6.1813364 -1.8665294
>
> I want to compute the percentile of all of 94 elements, i tried
> quantile(data) but i got an error like
> Error in sort.list(x, partial = unique(c(lo, hi))) :
>   'x' must be atomic for 'sort.list'
> Have you called 'sort' on a list?
> Can some one help me abouth this ?
> Thanks for any idea...
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/A-question-about-list-tp3385711p3385711.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Histograms with strings,

2011-03-17 Thread jim holtman

try this:

> x
 [1] "74.125.224.38" "74.125.224.38" "129.46.71.19"  "129.46.71.19"
"129.46.71.19"  "129.46.71.19"
 [7] "129.46.71.19"  "129.46.71.19"  "129.46.71.19"  "129.46.71.19"
"129.46.71.19"  "129.46.71.19"
> table(x)
x
 129.46.71.19 74.125.224.38
   10 2
> which.max(table(x))
129.46.71.19
   1


On Thu, Mar 17, 2011 at 6:06 PM, Khanvilkar, Shashank
 wrote:
> Hello,
> Thanks in advance for any help,
>
> I have read a CSV file in which there is a column for an IP addr as in:
>
> tmpInFile$V2
>  [1] "74.125.224.38" "74.125.224.38" "129.46.71.19"  "129.46.71.19"
>  [5] "129.46.71.19"  "129.46.71.19"  "129.46.71.19"  "129.46.71.19"
>  [9] "129.46.71.19"  "129.46.71.19"  "129.46.71.19"  "129.46.71.19"
>
> If I want to find the IP addr that has the highest occurrence (129.46.71.19, 
> in this case), is there a simple way to do this?
>
> Thanks
> Shank
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 129 matches

Mail list logo