[R] apply block of if statements with menu function

2014-09-15 Thread rl
Subscribers,

apply block of if statements with menu function
Subscribers,

For a menu:

menu(c('a','b','c','d'))

How to create a function that will apply to specific menu choice 
objects? For example:


object1<-function (menuifchoices) {
menu1<-menu(c('a','b','c','d'))
if (menu1==1)
...
menu1a<-menu...
if (menu1a==1)
...
menu2a<-menu...
if (menu2a==1)
...
menu2
<-menu(c('a','b','c','d'))
if (menu1==2)
...
}

The request action is that a user can select a menu option that will 
activate a series of "multiple choice" questions, results in "menu1" 
being activated without menu2 being activated. If someone could direct 
to the relevant terminology, thank you.


Separate question; for a menu:

menu(c('a','b','c','d'))

1: a
2: b
3: c
4: d

Selection: 1
[1] 1

is it possible to change behaviour so that result of the selection is 
not the integer, but the original menu choice:


Selection: 1
[1] a

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] Announce: Lahman baseball database archive package, v 3.0

2014-09-15 Thread Michael Friendly
Dear list

Version 3.0-1 of the Lahman package was recently submitted to CRAN. It 
contains the

the tables from Sean Lahman's  Baseball Database,
http://www.seanlahman.com/baseball-archive/statistics/
as a set of R data.frames with examples of use.

V 3.0 provides the updated data on pitching, hitting and fielding 
performance and other tables from 1871
through 2013, as recorded in the 2014 version of the database.  The 
Lahman project has a home page
at http://lahman.r-forge.r-project.org/ with some additional examples.  
Additional links to other

applications or analyses of this database are invited.

If you have used the earlier v. 2.0-x package in scripts or examples, 
please note that the database schema
has been somewhat revised to regularize the use of player ID variables 
across the various tables, and
remove some redundant variables, so some scripts may need to be 
revised.  In particular:


  o HallOfFame$hofID is now HallOfFame$playerID
  o managerID is now playerID in all tables
  o Removed from Master: managerID, hofID, holtzID, lahmanID, 
lahman40ID, lahman45ID, nameNote, nameNick, and college


-Michael

--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 8 fast or 4 very fast cores?

2014-09-15 Thread Ben Bolker
Leif Ruckman  Ruckman.se> writes:

> 
> I am going to buy a new computer ( Dell workstation T5810 - Windows 8) 
> to work with simulatons in R.
> 
> Now I am asked what kind of processor I like and I was given two choices.
> 
> 1. Intel Xeon E5-1620 v3 - 4 cores 3.7 GHz Turbo
> 2. Intel Xeon E5-2640 v3 - 8 cores 2.6 GHz Turbo
> 
> I don't know what is better in simulations studies in R, a few very fast 
> cores or many cores at normal speed.


  It's **very** hard to answer such general questions reliably, but I'll
take a guess and say that if you're doing simulation studies you're likely
to be doing tasks that are easily distributable (e.g. many random
realizations of the same simulation and/or realizations for many
different sets of parameter values) and so the more-cores option
will be a good idea.

  But it's possible that what you mean by "simulation studies" is
different.

  If you can do some benchmarking of your problems on an existing
machine that would probably be a good idea.

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 8 fast or 4 very fast cores?

2014-09-15 Thread Prof Brian Ripley
On 15/09/2014 11:21, Ben Bolker wrote:

Leif Ruckman  Ruckman.se> writes:



I am going to buy a new computer ( Dell workstation T5810 - Windows 8)
to work with simulatons in R.

Now I am asked what kind of processor I like and I was given two choices.

1. Intel Xeon E5-1620 v3 - 4 cores 3.7 GHz Turbo
2. Intel Xeon E5-2640 v3 - 8 cores 2.6 GHz Turbo

I don't know what is better in simulations studies in R, a few very fast
cores or many cores at normal speed.



   It's **very** hard to answer such general questions reliably, but I'll
take a guess and say that if you're doing simulation studies you're likely
to be doing tasks that are easily distributable (e.g. many random
realizations of the same simulation and/or realizations for many
different sets of parameter values) and so the more-cores option
will be a good idea.

   But it's possible that what you mean by "simulation studies" is
different.

   If you can do some benchmarking of your problems on an existing
machine that would probably be a good idea.


Unfortunately unless it is of very similar architecture that may not 
help much.


Three issues hard to scale from are the 'Turbo', the hyperthreading of 
modern Xeons and the cache sizes.  Now, I happen to have machines with 
multiple E5-24x0 and E5-26x0 Xeons: both do hyperthreading well, so you 
would have 8 or 16 virtual CPUs and they will give you say 50% increase 
in throughput if all the virtual cores are used.  But you cannot scale 
up from using just one process on one core.


I find it hard to think of tasks where option 1) would have more 
throughput, but if most of the time you are not running things in 
parallel then the higher speed on a single task is a consideration.




   Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] spatstat rmh problem

2014-09-15 Thread Sebastian Schutte
Dear R and spatstat developers,

Thanks so much for the time and effort that you invest into this awesome 
software. I have a problem simulating from a Point Process Model in 
spatstat. In summary, the option "new.coef" should allow me to use a 
fitted model and change its beta coefficients before simulating a point 
pattern from the model via Monte Carlo simulation. Intuitively, one 
would assume that the predicted point pattern changes as one fiddles 
with the beta coefficients. However, this does not seem to work.


Please let me know what I am missing here and which screw to drive to 
actually change the simulation output.


#owin is a polygon of country boundaries, "im.pop" is a raster with 
georeferenced population counts.

#I am using a random point pattern for demonstration purposes

#Fix random seed
set.seed(12345)
#Generate artificial points
dat <- rpoint(500,win=cshape)
#Fit a (inhomogenous spatial poisson) model to the data
mod <- ppm (ppp, ~  pop ,  covariates = list (pop = im.pop))
#Simulate some points:
plot(density(rmh(mod)))
#plot(density(simulate(mod)))
#Show that this is reproducible
set.seed(12345)
#Generate artificial points
dat <- rpoint(500,win=cshape)
#Fit a (inhomogenous spatial poisson) model to the data
mod <- ppm (ppp, ~  pop ,  covariates = list (pop = im.pop))
#Simulate some points:
plot(density(rmh(mod)))
#As expected, the density is the same

#Now change the coefs and do it again:
set.seed(12345)
#Generate artificial points
dat <- rpoint(500,win=cshape)
#Fit a (inhomogenous spatial poisson) model to the data
mod <- ppm (ppp, ~  pop ,  covariates = list (pop = im.pop))
#Simulate some points:
plot(density(rmh(mod),new.coef=c(1,200)))
#Looks the same, so what am I missing?

Thanks for your help,
Sebastian

P.S:
R 3.1.1
Spatstat 1.38-1
Ubuntu 14.04
Linux 3.13.0-34-generic

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [I] Re: Installing nloptr in UNIX environ

2014-09-15 Thread pofigster
Thanks - I ended up getting our linux admin to let the server connect to the 
internet and got it working.

Mark Ewing

From: ray48 [via R] [mailto:ml-node+s789695n4696884...@n4.nabble.com]
Sent: Friday, September 12, 2014 8:51 AM
To: Ewing, Mark
Subject: [I] Re: Installing nloptr in UNIX environ

I had this issue too as a result of having to hand install nlopt-2.4.2 (as a 
result of not having a connection to CRAN at the time)

I first removed my nlopt installation (using 'make uninstall') and then 
installed nloptr using online CRAN.
This will cause CRAN to download nlopt-2.4.2 prior to it installing nloptr.


If you reply to this email, your message will be added to the discussion below:
http://r.789695.n4.nabble.com/Installing-nloptr-in-UNIX-environ-tp4696711p4696884.html
To unsubscribe from Installing nloptr in UNIX environ, click 
here.
NAML




--
View this message in context: 
http://r.789695.n4.nabble.com/Installing-nloptr-in-UNIX-environ-tp4696711p4696951.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: CoxME: Family relatedness

2014-09-15 Thread Marie Dogherty
Hello all,



I have a table like this, with ~300 individuals:



Famid Id   Faid Moid CohortSex  Survival Event SNP1 SNP2 SNP3

11000010   1010

22001120   1000

23000025   1010

45120035   1011

46120035   0101







famid=family id, id=id of person,faid=father id,moid=mother id.



My question of interest: What impact does SNP1/SNP2/SNP3 have on survival
of individuals (Id), after accounting for possible effects due to family
relatedness (Famid).



So I want to account for family-specific frailty effects, and individual
frailty effects according to degree of relationship between individuals.



The commands I've used are:



Library(survival)

Library(coxme)

Library(kinship2)

Library(bdsmatrix)



Death.dat <-read.table(“Table”,header=T)



deathdat.kmat
<-makekinship(famid=death.dat$famid,id=death.dat$id,father=death.dat$faid,mother=death.dat$moid)



death.dat1<-subset(death.dat,!is.na(Survival))



all <-dimnames(deathdat.kmat)[[1]]



temp <-which(!is.na(death.dat$Survival[match(all,death.dat$id)]))



deathdat1.kmat <-deathdat.kmat[temp,temp]



model4
<-coxme(Surv(Survival,Event)~Sex+strata(Cohort)+SNP1+SNP2+SNP3,data=death.dat1,id|famid,varlist=list(deathdat1.kmat,famblockf.mat),pdcheck=FALSE)







I almost completely edited these commands from :
http://www.ncbi.nlm.nih.gov/pubmed/21786277 as I am new to R.



The error I obtain is:



Error in coxme(Surv(Survival, Event) ~ Sex + strata(Cohort) + SNP1 + SNP2
+  :

No observations remain in the data set

In addition: Warning message:

In Ops.factor(id, famid) : | not meaningful for factors





I have two questions:

1. What is the difference between (id|famid) and (1+id|famid)/How to I tell
which is appropriate for my data set/Have I formatted that section of the
command properly?

2. Does anyone understand the error/how to fix it?



Many thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] the properties of a DATA to run DBSCAN in R

2014-09-15 Thread Maryam Heidari
hello everybody,

I have been trying to run "dbscan" algorithm on my data, my data round
4 records which each of them has 3 attributes + plus the ID for each
record.
the interesting thing is that when I run the "dbscan just on 3 attributes"
R gives me an ERROR regarding "stackoverflow" but when I run it including
the 4 attributes it runs with out any problem.
so is the problem with my data?!

this is statistics about my data
   att1   att2att3
 Min.   :0.0  Min.   :0.00   Min.   :0.00
 1st Qu.:0.014291st Qu.:0.00 1st Qu.:0.00
 Median :0.02857Median :0.00   Median :0.00
 Mean   :0.02764Mean   :0.001135Mean   :0.000477
 3rd Qu.:0.028573rd Qu.:0.003rd Qu.:0.00
 Max.   :1.0  Max.   :1.00 Max.   :1.00

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] apply block of if statements with menu function

2014-09-15 Thread David L Carlson
I think switch() should work for you here, but it is not clear how much 
flexibility you are trying to have (different tests based on the first 
response; different tests based on first, then second response; different tests 
based on each successive response). 

?switch

For the second question just index the return value:

> let <- letters[1:4]
> let[menu(let)]

1: a
2: b
3: c
4: d

Selection: 3
[1] "c"

Or a bit more polished:

> cat("Choice: ", let[menu(let)], "\n")

1: a
2: b
3: c
4: d

Selection: 4
Choice:  d

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of r...@openmailbox.org
Sent: Monday, September 15, 2014 3:53 AM
To: r-help@r-project.org
Subject: [R] apply block of if statements with menu function

Subscribers,

apply block of if statements with menu function
Subscribers,

For a menu:

menu(c('a','b','c','d'))

How to create a function that will apply to specific menu choice 
objects? For example:

object1<-function (menuifchoices) {
menu1<-menu(c('a','b','c','d'))
if (menu1==1)
...
menu1a<-menu...
if (menu1a==1)
...
menu2a<-menu...
if (menu2a==1)
...
menu2
<-menu(c('a','b','c','d'))
if (menu1==2)
...
}

The request action is that a user can select a menu option that will 
activate a series of "multiple choice" questions, results in "menu1" 
being activated without menu2 being activated. If someone could direct 
to the relevant terminology, thank you.

Separate question; for a menu:

menu(c('a','b','c','d'))

1: a
2: b
3: c
4: d

Selection: 1
[1] 1

is it possible to change behaviour so that result of the selection is 
not the integer, but the original menu choice:

Selection: 1
[1] a

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] apply block of if statements with menu function

2014-09-15 Thread rl
On 2014-09-15 14:22, David L Carlson wrote:

I think switch() should work for you here, but it is not clear how
much flexibility you are trying to have (different tests based on the
first response; different tests based on first, then second response;
different tests based on each successive response).



Yes, different tests are to be written dependent upon the first 
response.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] chi-square test

2014-09-15 Thread eliza botto
Dear useRs of R,
I have two datasets (TT and SS) and i wanted to to see if my data is uniformly 
distributed or not?I tested it through chi-square test and results are given at 
the end of it.Now apparently P-value has a significant importance but I cant 
interpret the results and why it says that "In chisq.test(TT) : Chi-squared 
approximation may be incorrect"
###
> dput(TT)
structure(list(clc5 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.26, 0.14, 0, 0.44, 
0.26, 0, 0, 0, 0, 0, 0, 0.11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.17, 0.16, 
0.56, 0, 1.49, 0, 0.64, 0.79, 0.66, 0, 0, 0.17, 0, 0, 0, 0, 0.56, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0.43, 0.41, 0, 0.5, 0.44, 0, 0, 0, 0, 0.09, 0.46, 0, 0.27, 
0.45, 0.15, 0.31, 0.16, 0.44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.07, 0, 0, 0, 0, 
0, 0.06, 0, 0.09, 0.07, 0, 0, 7.89, 0, 0.22, 0.29, 0.33, 0.27, 0, 0.36, 0.41, 
0, 0, 0, 0, 0.55, 0.81, 0, 0.09, 0.13, 0.28, 0, 0, 0), quota_massima = c(1167L, 
1167L, 4572L, 3179L, 3141L, 585L, 585L, 876L, 876L, 1678L, 2667L, 1369L, 1369L, 
1369L, 1381L, 1381L, 1381L, 1381L, 2284L, 410L, 2109L, 2507L, 2579L, 2507L, 
1436L, 3234L, 3234L, 3234L, 3234L, 2792L, 2569L, 2569L, 2569L, 1669L, 4743L, 
4743L, 4743L, 3403L, 3197L, 3267L, 3583L, 3583L, 3583L, 2584L, 2584L, 2579L, 
1241L, 1241L, 4174L, 3006L, 3197L, 2366L, 2618L, 2670L, 4487L, 3196L, 3196L, 
2107L, 2107L, 2427L, 1814L, 2622L, 1268L, 1268L, 1268L,!
  3885L, 3885L, 3092L, 3234L, 2625L, 2625L, 3760L, 4743L, 3707L, 3760L, 4743L, 
3760L, 3885L, 3760L, 4743L, 2951L, 782L, 2957L, 3343L, 2697L, 2697L, 3915L, 
2277L, 1678L, 1678L, 3197L, 2957L, 2957L, 2957L, 4530L, 4530L, 4530L, 2131L, 
3618L, 3618L, 3335L, 2512L, 2390L, 1616L, 3526L, 3197L, 3197L, 2625L, 2622L, 
3197L, 3197L, 2622L, 2622L, 2622L, 368L, 4572L, 3953L, 863L, 3716L, 3716L, 
3716L, 2697L, 2697L, 1358L)), .Names = c("clc5", "quota_massima"), class = 
"data.frame", row.names = c(NA, -124L))

>  chisq.test(TT)
Pearson's Chi-squared test
data:  TT
X-squared = 411.5517, df = 123, p-value < 2.2e-16
Warning message:
In chisq.test(TT) : Chi-squared approximation may be incorrect 
###
> dput(SS)
structure(list(NDVIanno = c(0.57, 0.536, 0.082, 0.262, 0.209, 0.539, 0.536, 
0.543, 0.588, 0.599, 0.397, 0.63, 0.616, 0.644, 0.579, 0.597, 0.617, 0.622, 
0.548, 0.528, 0.541, 0.436, 0.509, 0.467, 0.534, 0.412, 0.324, 0.299, 0.41, 
0.462, 0.427, 0.456, 0.508, 0.581, 0.242, 0.291, 0.324, 0.28, 0.291, 0.305, 
0.365, 0.338, 0.399, 0.516, 0.357, 0.558, 0.605, 0.638, 0.191, 0.377, 0.325, 
0.574, 0.458, 0.426, 0.188, 0.412, 0.464, 0.568, 0.582, 0.494, 0.598, 0.451, 
0.577, 0.572, 0.602, 0.321, 0.38, 0.413, 0.427, 0.55, 0.437, 0.481, 0.425, 
0.234, 0.466, 0.464, 0.491, 0.463, 0.489, 0.435, 0.267, 0.564, 0.256, 0.156, 
0.476, 0.498, 0.122, 0.508, 0.582, 0.615, 0.409, 0.356, 0.284, 0.285, 0.444, 
0.303, 0.478, 0.557, 0.345, 0.408, 0.347, 0.498, 0.534, 0.576, 0.361, 0.495, 
0.502, 0.553, 0.519, 0.504, 0.53, 0.547, 0.559, 0.505, 0.557, 0.377, 0.36, 
0.613, 0.452, 0.397, 0.277, 0.42, 0.443, 0.62), delta_z = c(211L, 171L, 925L, 
534L, 498L, 50L, 53L, 331L, 135L, 456L, 850L, 288L, 286L, 233L, 342L, 27!
 4L, 184L, 198L, 312L, 67L, 476L, 676L, 349L, 873L, 65L, 963L, 553L, 474L, 
948L, 1082L, 616L, 704L, 814L, 450L, 865L, 987L, 1265L, 720L, 565L, 652L, 941L, 
822L, 1239L, 929L, 477L, 361L, 199L, 203L, 642L, 788L, 818L, 450L, 703L, 760L, 
711L, 1015L, 1351L, 195L, 511L, 617L, 296L, 604L, 381L, 389L, 287L, 1043L, 
1465L, 963L, 1125L, 582L, 662L, 1424L, 1762L, 575L, 1477L, 1364L, 1236L, 1483L, 
1201L, 1644L, 498L, 142L, 510L, 482L, 811L, 788L, 466L, 626L, 461L, 350L, 
1177L, 826L, 575L, 568L, 916L, 767L, 1017L, 532L, 1047L, 1370L, 902L, 686L, 
703L, 440L, 1016L, 1148L, 1089L, 753L, 650L, 1065L, 568L, 712L, 762L, 636L, 
79L, 1092L, 955L, 158L, 1524L, 1145L, 673L, 513L, 596L, 239L)), .Names = 
c("NDVIanno", "delta_z"), class = "data.frame", row.names = c(NA, -124L))
>  chisq.test(SS)
Pearson's Chi-squared test
data:  SS
X-squared = 72.8115, df = 123, p-value = 0.
Warning message:
In chisq.test(SS) : Chi-squared approximation may be incorrect
#
Kindly guide me through like you always did :)
thanks in advance,


Eliza 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] chi-square test

2014-09-15 Thread Rick Bilonick
On 09/15/2014 10:57 AM, eliza botto wrote:

Dear useRs of R,
I have two datasets (TT and SS) and i wanted to to see if my data is uniformly 
distributed or not?I tested it through chi-square test and results are given at the end 
of it.Now apparently P-value has a significant importance but I cant interpret the 
results and why it says that "In chisq.test(TT) : Chi-squared approximation may be 
incorrect"
###

dput(TT)

structure(list(clc5 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.26, 0.14, 0, 0.44, 
0.26, 0, 0, 0, 0, 0, 0, 0.11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.17, 0.16, 
0.56, 0, 1.49, 0, 0.64, 0.79, 0.66, 0, 0, 0.17, 0, 0, 0, 0, 0.56, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0.43, 0.41, 0, 0.5, 0.44, 0, 0, 0, 0, 0.09, 0.46, 0, 0.27, 
0.45, 0.15, 0.31, 0.16, 0.44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.07, 0, 0, 0, 0, 
0, 0.06, 0, 0.09, 0.07, 0, 0, 7.89, 0, 0.22, 0.29, 0.33, 0.27, 0, 0.36, 0.41, 
0, 0, 0, 0, 0.55, 0.81, 0, 0.09, 0.13, 0.28, 0, 0, 0), quota_massima = c(1167L, 
1167L, 4572L, 3179L, 3141L, 585L, 585L, 876L, 876L, 1678L, 2667L, 1369L, 1369L, 
1369L, 1381L, 1381L, 1381L, 1381L, 2284L, 410L, 2109L, 2507L, 2579L, 2507L, 
1436L, 3234L, 3234L, 3234L, 3234L, 2792L, 2569L, 2569L, 2569L, 1669L, 4743L, 
4743L, 4743L, 3403L, 3197L, 3267L, 3583L, 3583L, 3583L, 2584L, 2584L, 2579L, 
1241L, 1241L, 4174L, 3006L, 3197L, 2366L, 2618L, 2670L, 4487L, 3196L, 3196L, 
2107L, 2107L, 2427L, 1814L, 2622L, 1268L, 1268L, 1268!

L,!

   3885L, 3885L, 3092L, 3234L, 2625L, 2625L, 3760L, 4743L, 3707L, 3760L, 4743L, 3760L, 3885L, 3760L, 4743L, 
2951L, 782L, 2957L, 3343L, 2697L, 2697L, 3915L, 2277L, 1678L, 1678L, 3197L, 2957L, 2957L, 2957L, 4530L, 
4530L, 4530L, 2131L, 3618L, 3618L, 3335L, 2512L, 2390L, 1616L, 3526L, 3197L, 3197L, 2625L, 2622L, 3197L, 
3197L, 2622L, 2622L, 2622L, 368L, 4572L, 3953L, 863L, 3716L, 3716L, 3716L, 2697L, 2697L, 1358L)), .Names = 
c("clc5", "quota_massima"), class = "data.frame", row.names = c(NA, -124L))


  chisq.test(TT)

 Pearson's Chi-squared test
data:  TT
X-squared = 411.5517, df = 123, p-value < 2.2e-16
Warning message:
In chisq.test(TT) : Chi-squared approximation may be incorrect
###

dput(SS)

structure(list(NDVIanno = c(0.57, 0.536, 0.082, 0.262, 0.209, 0.539, 0.536, 
0.543, 0.588, 0.599, 0.397, 0.63, 0.616, 0.644, 0.579, 0.597, 0.617, 0.622, 
0.548, 0.528, 0.541, 0.436, 0.509, 0.467, 0.534, 0.412, 0.324, 0.299, 0.41, 
0.462, 0.427, 0.456, 0.508, 0.581, 0.242, 0.291, 0.324, 0.28, 0.291, 0.305, 
0.365, 0.338, 0.399, 0.516, 0.357, 0.558, 0.605, 0.638, 0.191, 0.377, 0.325, 
0.574, 0.458, 0.426, 0.188, 0.412, 0.464, 0.568, 0.582, 0.494, 0.598, 0.451, 
0.577, 0.572, 0.602, 0.321, 0.38, 0.413, 0.427, 0.55, 0.437, 0.481, 0.425, 
0.234, 0.466, 0.464, 0.491, 0.463, 0.489, 0.435, 0.267, 0.564, 0.256, 0.156, 
0.476, 0.498, 0.122, 0.508, 0.582, 0.615, 0.409, 0.356, 0.284, 0.285, 0.444, 
0.303, 0.478, 0.557, 0.345, 0.408, 0.347, 0.498, 0.534, 0.576, 0.361, 0.495, 
0.502, 0.553, 0.519, 0.504, 0.53, 0.547, 0.559, 0.505, 0.557, 0.377, 0.36, 
0.613, 0.452, 0.397, 0.277, 0.42, 0.443, 0.62), delta_z = c(211L, 171L, 925L, 
534L, 498L, 50L, 53L, 331L, 135L, 456L, 850L, 288L, 286L, 233L, 342L, !

27!

  4L, 184L, 198L, 312L, 67L, 476L, 676L, 349L, 873L, 65L, 963L, 553L, 474L, 948L, 1082L, 616L, 704L, 814L, 
450L, 865L, 987L, 1265L, 720L, 565L, 652L, 941L, 822L, 1239L, 929L, 477L, 361L, 199L, 203L, 642L, 788L, 818L, 
450L, 703L, 760L, 711L, 1015L, 1351L, 195L, 511L, 617L, 296L, 604L, 381L, 389L, 287L, 1043L, 1465L, 963L, 
1125L, 582L, 662L, 1424L, 1762L, 575L, 1477L, 1364L, 1236L, 1483L, 1201L, 1644L, 498L, 142L, 510L, 482L, 
811L, 788L, 466L, 626L, 461L, 350L, 1177L, 826L, 575L, 568L, 916L, 767L, 1017L, 532L, 1047L, 1370L, 902L, 
686L, 703L, 440L, 1016L, 1148L, 1089L, 753L, 650L, 1065L, 568L, 712L, 762L, 636L, 79L, 1092L, 955L, 158L, 
1524L, 1145L, 673L, 513L, 596L, 239L)), .Names = c("NDVIanno", "delta_z"), class = 
"data.frame", row.names = c(NA, -124L))

  chisq.test(SS)

 Pearson's Chi-squared test
data:  SS
X-squared = 72.8115, df = 123, p-value = 0.
Warning message:
In chisq.test(SS) : Chi-squared approximation may be incorrect
#
Kindly guide me through like you always did :)
thanks in advance,


Eliza   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
You are using a Chi-squared test on a 124x2 matrix of values (not all 
integers) and many are zeros. The expected frequencies for many cells 
are very small (near zero, less than 1) hence the warning message. More 
importantly, does this appli

[R] Using sqldf() to read in .fwf files

2014-09-15 Thread Doran, Harold
I am learning to use sqldf() to read in very large fixed width files that 
otherwise do not work efficiently with read.fwf. I found the following example 
online and have worked with this in various ways to read in the data

cat("1 8.3
210.3
319.0
416.0
515.6
719.8
", file = "fixed")

fixed <- file("fixed")
sqldf("select substr(V1, 1, 1) f1, substr(V1, 2, 4) f2 from fixed")

I then applied this to my real world data problem though it yields the 
following error message and I am not sure how to interpret this.

dor <- file("dor")
> sqldf("select substr(V1, 1, 1) f1, substr(V1, 2, 4) f2 from dor")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  line 1 did not have 6 elements

Looking at my .fwf. data in a text editor shows the data are structured as I 
would expect. In fact, I can read in the first few lines of the file using 
read.fwf and the data are as I would expect after being read into R.

Thanks,
Harold


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 8 fast or 4 very fast cores?

2014-09-15 Thread Clint Bowman
I'm in a similar situation and am looking seriously at a pair of E5-2643v3 
(6 cores each-hyperthreaded).


Clint BowmanINTERNET:   cl...@ecy.wa.gov
Air Quality Modeler INTERNET:   cl...@math.utah.edu
Department of Ecology   VOICE:  (360) 407-6815
PO Box 47600FAX:(360) 407-7534
Olympia, WA 98504-7600

USPS:   PO Box 47600, Olympia, WA 98504-7600
Parcels:300 Desmond Drive, Lacey, WA 98503-1274

On Mon, 15 Sep 2014, Prof Brian Ripley wrote:


On 15/09/2014 11:21, Ben Bolker wrote:

 Leif Ruckman  Ruckman.se> writes:

> 
>  I am going to buy a new computer ( Dell workstation T5810 - Windows 8)

>  to work with simulatons in R.
> 
>  Now I am asked what kind of processor I like and I was given two 
>  choices.
> 
>  1. Intel Xeon E5-1620 v3 - 4 cores 3.7 GHz Turbo

>  2. Intel Xeon E5-2640 v3 - 8 cores 2.6 GHz Turbo
> 
>  I don't know what is better in simulations studies in R, a few very fast

>  cores or many cores at normal speed.


It's **very** hard to answer such general questions reliably, but I'll
 take a guess and say that if you're doing simulation studies you're likely
 to be doing tasks that are easily distributable (e.g. many random
 realizations of the same simulation and/or realizations for many
 different sets of parameter values) and so the more-cores option
 will be a good idea.

But it's possible that what you mean by "simulation studies" is
 different.

If you can do some benchmarking of your problems on an existing
 machine that would probably be a good idea.


Unfortunately unless it is of very similar architecture that may not help 
much.


Three issues hard to scale from are the 'Turbo', the hyperthreading of modern 
Xeons and the cache sizes.  Now, I happen to have machines with multiple 
E5-24x0 and E5-26x0 Xeons: both do hyperthreading well, so you would have 8 
or 16 virtual CPUs and they will give you say 50% increase in throughput if 
all the virtual cores are used.  But you cannot scale up from using just one 
process on one core.


I find it hard to think of tasks where option 1) would have more throughput, 
but if most of the time you are not running things in parallel then the 
higher speed on a single task is a consideration.




Ben Bolker

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] spatstat rmh problem

2014-09-15 Thread Sebastian Schutte
Dear R and spatstat developers,

Thanks so much for the time and effort that you invest into this awesome 
software. I have a problem simulating from a Point Process Model in 
spatstat. In summary, the option "new.coef" should allow me to use a 
fitted model and change its beta coefficients before simulating a point 
pattern from the model via Monte Carlo simulation. Intuitively, one 
would assume that the predicted point pattern changes as one fiddles 
with the beta coefficients. However, this does not seem to work.


Please let me know what I am missing here and which screw to drive to 
actually change the simulation output.


#owin is a polygon of country boundaries, "im.pop" is a raster with 
georeferenced population counts.

#I am using a random point pattern for demonstration purposes

#Fix random seed
set.seed(12345)
#Generate artificial points
dat <- rpoint(500,win=cshape)
#Fit a (inhomogenous spatial poisson) model to the data
mod <- ppm (ppp, ~  pop ,  covariates = list (pop = im.pop))
#Simulate some points:
plot(density(rmh(mod)))
#plot(density(simulate(mod)))
#Show that this is reproducible
set.seed(12345)
#Generate artificial points
dat <- rpoint(500,win=cshape)
#Fit a (inhomogenous spatial poisson) model to the data
mod <- ppm (ppp, ~  pop ,  covariates = list (pop = im.pop))
#Simulate some points:
plot(density(rmh(mod)))
#As expected, the density is the same

#Now change the coefs and do it again:
set.seed(12345)
#Generate artificial points
dat <- rpoint(500,win=cshape)
#Fit a (inhomogenous spatial poisson) model to the data
mod <- ppm (ppp, ~  pop ,  covariates = list (pop = im.pop))
#Simulate some points:
plot(density(rmh(mod),new.coef=c(1,200)))
#Looks the same, so what am I missing?

Thanks for your help,
Sebastian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bug in rep() function

2014-09-15 Thread Samuel Knapp
Dear all,

I have discovered a bug in the standard rep() function: At certain 
values, rep() does not replicate the element by the proper number of times:


> a <- (1-0.9)*100
> a
[1] 10
> length(rep(1,times=a))
[1] 9
> length(rep(1,each=a))
[1] 9

As shown, this happens as well for the times= as for the each= 
parameter. It does not depend on the kind of element that is to be repeated:


> length(rep("abc",each=a))
[1] 9

I tried to narrow down the bug, but haven't really managed to find a 
pattern behind the bug. Here is a list with values for a (see above) 
that returns a false object ( after the value for a, i've collected the 
expected length and the length that is produced by r):


# mistake at
(1-0.9)*100   10   9
(1-0.8)*100   20  19
(1-0.8)*1000  200   199
(1-0.9)*1000  100   99
(1-0.9)*101 0
(1-0.8)*102 1
(1-0.9)*10  1   
(2-1-0.9)*100 10  9
(10/10-0.9)*100 10  9

# the following sets for a work fine
(1+0.1)*100
(1-0.1)*100
(1-0.7)*100
(1-0.99)*1000
(1-0.7)*10
(1-0.90)*10
(1-0.95)*100
(1-0.95)*1000
(2-0.9)*1000
(2-1.9)*100
(1.1-1)*100
(10-9)*100

Did I make any mistake? Or where else should I address this problem?

Thanks and best regards,
Samuel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Building R for better performance

2014-09-15 Thread Anspach, Jonathan P
All,

I’ve attached the actual benchmark TACC and I used.  I’ve also attached a paper 
I wrote covering this in a little more detail.  The paper specifies the 
hardware configuration I used.  Let me know if you have any other questions.

Regards,
Jonathan Anspach
Sr. Software Engineer
Intel Corp.
jonathan.p.ansp...@intel.com
713-751-9460

From: henrik.bengts...@gmail.com [mailto:henrik.bengts...@gmail.com] On Behalf 
Of Henrik Bengtsson
Sent: Thursday, September 11, 2014 9:18 AM
To: Anspach, Jonathan P
Cc: arnaud gaboury; r-help@r-project.org
Subject: Re: [R] Building R for better performance


You'll find R-benchmark-25.R, which I assume is the same and the proper pointer 
to use, at 
http://r.research.att.com/benchmarks/

Henrik
I'm out of the office today, but will resend it tomorrow.

Jonathan Anspach
Intel Corp.

Sent from my mobile phone.

On Sep 11, 2014, at 3:49 AM, "arnaud gaboury" 
mailto:arnaud.gabo...@gmail.com>> wrote:

>>> I got the benchmark script, which I've attached, from Texas Advanced
>>> Computing Center.  Here are my results (elapsed times, in secs):
>
>
> Where can we get the benchmark script?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bug in rep() function

2014-09-15 Thread Sarah Goslee
No, actually you've discovered FAQ 7.31.

> a <- (1-0.9)*100
> a
[1] 10
> print(a, digits=20)
[1] 9.9982236

In combination with the description in ?rep:
 Non-integer values of ‘times’ will be truncated towards zero.  If
 ‘times’ is a computed quantity it is prudent to add a small fuzz.

you get 9 times.

The best thing to do is ensure that your values are integer *before*
passing them to rep(), unless you know that truncating toward zero is
the right thing to do.

Sarah

On Mon, Sep 15, 2014 at 11:30 AM, Samuel Knapp  wrote:
> Dear all,
>
> I have discovered a bug in the standard rep() function: At certain values,
> rep() does not replicate the element by the proper number of times:
>
>> a <- (1-0.9)*100
>> a
> [1] 10
>> length(rep(1,times=a))
> [1] 9
>> length(rep(1,each=a))
> [1] 9
>
> As shown, this happens as well for the times= as for the each= parameter. It
> does not depend on the kind of element that is to be repeated:
>
>> length(rep("abc",each=a))
> [1] 9
>
> I tried to narrow down the bug, but haven't really managed to find a pattern
> behind the bug. Here is a list with values for a (see above) that returns a
> false object ( after the value for a, i've collected the expected length and
> the length that is produced by r):
>
> # mistake at
> (1-0.9)*100   10   9
> (1-0.8)*100   20  19
> (1-0.8)*1000  200   199
> (1-0.9)*1000  100   99
> (1-0.9)*101 0
> (1-0.8)*102 1
> (1-0.9)*10  1   
> (2-1-0.9)*100 10  9
> (10/10-0.9)*100 10  9
>
> # the following sets for a work fine
> (1+0.1)*100
> (1-0.1)*100
> (1-0.7)*100
> (1-0.99)*1000
> (1-0.7)*10
> (1-0.90)*10
> (1-0.95)*100
> (1-0.95)*1000
> (2-0.9)*1000
> (2-1.9)*100
> (1.1-1)*100
> (10-9)*100
>
> Did I make any mistake? Or where else should I address this problem?
>
> Thanks and best regards,
> Samuel
>

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bug in rep() function

2014-09-15 Thread Prof Brian Ripley
On 15/09/2014 16:30, Samuel Knapp wrote:

Dear all,

I have discovered a bug in the standard rep() function: At certain


Not so:

> a <- (1-0.9)*100
> trunc(a)
[1] 9

As the help says

 Non-integer values of ‘times’ will be truncated towards zero.  If
 ‘times’ is a computed quantity it is prudent to add a small fuzz.

And as the posting guide said

Do your homework before posting:
...
Read the online help for relevant functions (type ?functionname, 
e.g., ?prod, at the R prompt)




values, rep() does not replicate the element by the proper number of times:

 > a <- (1-0.9)*100
 > a
[1] 10
 > length(rep(1,times=a))
[1] 9
 > length(rep(1,each=a))
[1] 9

As shown, this happens as well for the times= as for the each=
parameter. It does not depend on the kind of element that is to be
repeated:

 > length(rep("abc",each=a))
[1] 9

I tried to narrow down the bug, but haven't really managed to find a
pattern behind the bug. Here is a list with values for a (see above)
that returns a false object ( after the value for a, i've collected the
expected length and the length that is produced by r):

# mistake at
(1-0.9)*100   10   9
(1-0.8)*100   20  19
(1-0.8)*1000  200   199
(1-0.9)*1000  100   99
(1-0.9)*101 0
(1-0.8)*102 1
(1-0.9)*10  1   
(2-1-0.9)*100 10  9
(10/10-0.9)*100 10  9

# the following sets for a work fine
(1+0.1)*100
(1-0.1)*100
(1-0.7)*100
(1-0.99)*1000
(1-0.7)*10
(1-0.90)*10
(1-0.95)*100
(1-0.95)*1000
(2-0.9)*1000
(2-1.9)*100
(1.1-1)*100
(10-9)*100

Did I make any mistake? Or where else should I address this problem?

Thanks and best regards,
Samuel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using sqldf() to read in .fwf files

2014-09-15 Thread Gabor Grothendieck
On Mon, Sep 15, 2014 at 12:09 PM, Doran, Harold  wrote:
> I am learning to use sqldf() to read in very large fixed width files that 
> otherwise do not work efficiently with read.fwf. I found the following 
> example online and have worked with this in various ways to read in the data
>
> cat("1 8.3
> 210.3
> 319.0
> 416.0
> 515.6
> 719.8
> ", file = "fixed")
>
> fixed <- file("fixed")
> sqldf("select substr(V1, 1, 1) f1, substr(V1, 2, 4) f2 from fixed")
>
> I then applied this to my real world data problem though it yields the 
> following error message and I am not sure how to interpret this.
>
> dor <- file("dor")
>> sqldf("select substr(V1, 1, 1) f1, substr(V1, 2, 4) f2 from dor")
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
>   line 1 did not have 6 elements
>
> Looking at my .fwf. data in a text editor shows the data are structured as I 
> would expect. In fact, I can read in the first few lines of the file using 
> read.fwf and the data are as I would expect after being read into R.
>

We want it to regard the entire line as one field so specify sep= as
some character not in the file.

attr(fixed, "file.format") <- list(sep = ";")


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using sqldf() to read in .fwf files

2014-09-15 Thread Doran, Harold
Thank you, Gabor. This has seemingly resolved the issue. Perhaps a quick follow 
up. Suppose I know that the 1st variable I am reading in is to be numeric and 
the second is character. Can that be specified in the substr() argument?

sqldf("select substr(V1, 1, 1) f1, substr(V1, 2, 4) f2 from fixed")

-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Monday, September 15, 2014 12:42 PM
To: Doran, Harold
Cc: r-help@r-project.org
Subject: Re: [R] Using sqldf() to read in .fwf files

On Mon, Sep 15, 2014 at 12:09 PM, Doran, Harold  wrote:
> I am learning to use sqldf() to read in very large fixed width files 
> that otherwise do not work efficiently with read.fwf. I found the 
> following example online and have worked with this in various ways to 
> read in the data
>
> cat("1 8.3
> 210.3
> 319.0
> 416.0
> 515.6
> 719.8
> ", file = "fixed")
>
> fixed <- file("fixed")
> sqldf("select substr(V1, 1, 1) f1, substr(V1, 2, 4) f2 from fixed")
>
> I then applied this to my real world data problem though it yields the 
> following error message and I am not sure how to interpret this.
>
> dor <- file("dor")
>> sqldf("select substr(V1, 1, 1) f1, substr(V1, 2, 4) f2 from dor")
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
>   line 1 did not have 6 elements
>
> Looking at my .fwf. data in a text editor shows the data are structured as I 
> would expect. In fact, I can read in the first few lines of the file using 
> read.fwf and the data are as I would expect after being read into R.
>

We want it to regard the entire line as one field so specify sep= as some 
character not in the file.

attr(fixed, "file.format") <- list(sep = ";")


--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using sqldf() to read in .fwf files

2014-09-15 Thread Gabor Grothendieck
On Mon, Sep 15, 2014 at 3:23 PM, Doran, Harold  wrote:
> Thank you, Gabor. This has seemingly resolved the issue. Perhaps a quick 
> follow up. Suppose I know that the 1st variable I am reading in is to be 
> numeric and the second is character. Can that be specified in the substr() 
> argument?
>
> sqldf("select substr(V1, 1, 1) f1, substr(V1, 2, 4) f2 from fixed")
>

Cast the numeric field to real:
   select  cast(substr(V1, 1, 1) as real)  ...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] chi-square test

2014-09-15 Thread David L Carlson
Rick's question is a good one. It is unlikely that the results will be 
informative, but from a technical standpoint, you can estimate the p value 
using the simulate.p.value=TRUE argument to chisq.test().

> chisq.test(TT, simulate.p.value=TRUE)

Pearson's Chi-squared test with simulated p-value (based on 2000
replicates)

data:  TT
X-squared = 7919.632, df = NA, p-value = 0.0004998

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Rick Bilonick
Sent: Monday, September 15, 2014 10:18 AM
To: r-help@r-project.org
Subject: Re: [R] chi-square test

On 09/15/2014 10:57 AM, eliza botto wrote:
> Dear useRs of R,
> I have two datasets (TT and SS) and i wanted to to see if my data is 
> uniformly distributed or not?I tested it through chi-square test and results 
> are given at the end of it.Now apparently P-value has a significant 
> importance but I cant interpret the results and why it says that "In 
> chisq.test(TT) : Chi-squared approximation may be incorrect"
> ###
>> dput(TT)
> structure(list(clc5 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.26, 0.14, 0, 0.44, 
> 0.26, 0, 0, 0, 0, 0, 0, 0.11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.17, 0.16, 
> 0.56, 0, 1.49, 0, 0.64, 0.79, 0.66, 0, 0, 0.17, 0, 0, 0, 0, 0.56, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0.43, 0.41, 0, 0.5, 0.44, 0, 0, 0, 0, 0.09, 0.46, 0, 
> 0.27, 0.45, 0.15, 0.31, 0.16, 0.44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.07, 0, 0, 
> 0, 0, 0, 0.06, 0, 0.09, 0.07, 0, 0, 7.89, 0, 0.22, 0.29, 0.33, 0.27, 0, 0.36, 
> 0.41, 0, 0, 0, 0, 0.55, 0.81, 0, 0.09, 0.13, 0.28, 0, 0, 0), quota_massima = 
> c(1167L, 1167L, 4572L, 3179L, 3141L, 585L, 585L, 876L, 876L, 1678L, 2667L, 
> 1369L, 1369L, 1369L, 1381L, 1381L, 1381L, 1381L, 2284L, 410L, 2109L, 2507L, 
> 2579L, 2507L, 1436L, 3234L, 3234L, 3234L, 3234L, 2792L, 2569L, 2569L, 2569L, 
> 1669L, 4743L, 4743L, 4743L, 3403L, 3197L, 3267L, 3583L, 3583L, 3583L, 2584L, 
> 2584L, 2579L, 1241L, 1241L, 4174L, 3006L, 3197L, 2366L, 2618L, 2670L, 4487L, 
> 3196L, 3196L, 2107L, 2107L, 2427L, 1814L, 2622L, 1268L, 1268L, 1268!
 L,!
>3885L, 3885L, 3092L, 3234L, 2625L, 2625L, 3760L, 4743L, 3707L, 3760L, 
> 4743L, 3760L, 3885L, 3760L, 4743L, 2951L, 782L, 2957L, 3343L, 2697L, 2697L, 
> 3915L, 2277L, 1678L, 1678L, 3197L, 2957L, 2957L, 2957L, 4530L, 4530L, 4530L, 
> 2131L, 3618L, 3618L, 3335L, 2512L, 2390L, 1616L, 3526L, 3197L, 3197L, 2625L, 
> 2622L, 3197L, 3197L, 2622L, 2622L, 2622L, 368L, 4572L, 3953L, 863L, 3716L, 
> 3716L, 3716L, 2697L, 2697L, 1358L)), .Names = c("clc5", "quota_massima"), 
> class = "data.frame", row.names = c(NA, -124L))
>
>>   chisq.test(TT)
>  Pearson's Chi-squared test
> data:  TT
> X-squared = 411.5517, df = 123, p-value < 2.2e-16
> Warning message:
> In chisq.test(TT) : Chi-squared approximation may be incorrect
> ###
>> dput(SS)
> structure(list(NDVIanno = c(0.57, 0.536, 0.082, 0.262, 0.209, 0.539, 0.536, 
> 0.543, 0.588, 0.599, 0.397, 0.63, 0.616, 0.644, 0.579, 0.597, 0.617, 0.622, 
> 0.548, 0.528, 0.541, 0.436, 0.509, 0.467, 0.534, 0.412, 0.324, 0.299, 0.41, 
> 0.462, 0.427, 0.456, 0.508, 0.581, 0.242, 0.291, 0.324, 0.28, 0.291, 0.305, 
> 0.365, 0.338, 0.399, 0.516, 0.357, 0.558, 0.605, 0.638, 0.191, 0.377, 0.325, 
> 0.574, 0.458, 0.426, 0.188, 0.412, 0.464, 0.568, 0.582, 0.494, 0.598, 0.451, 
> 0.577, 0.572, 0.602, 0.321, 0.38, 0.413, 0.427, 0.55, 0.437, 0.481, 0.425, 
> 0.234, 0.466, 0.464, 0.491, 0.463, 0.489, 0.435, 0.267, 0.564, 0.256, 0.156, 
> 0.476, 0.498, 0.122, 0.508, 0.582, 0.615, 0.409, 0.356, 0.284, 0.285, 0.444, 
> 0.303, 0.478, 0.557, 0.345, 0.408, 0.347, 0.498, 0.534, 0.576, 0.361, 0.495, 
> 0.502, 0.553, 0.519, 0.504, 0.53, 0.547, 0.559, 0.505, 0.557, 0.377, 0.36, 
> 0.613, 0.452, 0.397, 0.277, 0.42, 0.443, 0.62), delta_z = c(211L, 171L, 925L, 
> 534L, 498L, 50L, 53L, 331L, 135L, 456L, 850L, 288L, 286L, 233L, 342L, !
 27!
>   4L, 184L, 198L, 312L, 67L, 476L, 676L, 349L, 873L, 65L, 963L, 553L, 474L, 
> 948L, 1082L, 616L, 704L, 814L, 450L, 865L, 987L, 1265L, 720L, 565L, 652L, 
> 941L, 822L, 1239L, 929L, 477L, 361L, 199L, 203L, 642L, 788L, 818L, 450L, 
> 703L, 760L, 711L, 1015L, 1351L, 195L, 511L, 617L, 296L, 604L, 381L, 389L, 
> 287L, 1043L, 1465L, 963L, 1125L, 582L, 662L, 1424L, 1762L, 575L, 1477L, 
> 1364L, 1236L, 1483L, 1201L, 1644L, 498L, 142L, 510L, 482L, 811L, 788L, 466L, 
> 626L, 461L, 350L, 1177L, 826L, 575L, 568L, 916L, 767L, 1017L, 532L, 1047L, 
> 1370L, 902L, 686L, 703L, 440L, 1016L, 1148L, 1089L, 753L, 650L, 1065L, 568L, 
> 712L, 762L, 636L, 79L, 1092L, 955L, 158L, 1524L, 1145L, 673L, 513L, 596L, 
> 239L)), .Names = c("NDVIanno", "delta_z"), class = "data.frame", row.names = 
> c(NA, -124L))
>>   chisq.test(SS)
>  Pearson's Chi-squared test
> d

Re: [R] CoxME: Family relatedness

2014-09-15 Thread Therneau, Terry M., Ph.D.
I would have caught this tomorrow (I read the digest).
Some thoughts:

1. Skip the entire step of subsetting the death.kmat object.  The coxme function knows how 
to do this on its own, and is more likely to get it correct.  My version of your code would be

  deathdat.kmat <- 2* with(deathdat, makekinship(famid, id, faid, moid))
  model3 <- coxme(Surv(Survival, Event) ~ Sex + strata(cohort) + SNP1 + SNP2 + 
SNP3 + (1|id),
data=death.dat, varlist=deathdat.kmat)


This all assumes that the "id" variable is unique.  If family 3 and family 4 both have an 
id of "1", then the coxme call can't match up rows in the data to rows/cols in the kinship 
matrix uniquely.  But that is simple to fix.
The kinship matrix K has .5 on the diagonal, by definition, but when used as a correlation 
most folks prefer to use 2K.  (This causes mixups since some software adds the "2" for 
you, but coxme does not.)


2. The model above is the correct covariance structure for a set of families.  There is a 
single intercept per subject, with a complex correlation matrix.  The simpler "per family" 
frailty model would be


model4 <- coxme(Surv(Survival, Event) ~ Sex + strata(cohort) + SNP1 + SNP2 + SNP3 + 
(1|famid), death.dat)


This model lets each family have a separate risk, with everyone in the same family sharing 
the exact same risk.  It is less general than model3 above which lets a family have higher 
risk plus has variation between family members.


A model with both per-subject and per family terms is identical to one with a covariance 
matrix of s1 K + s2 B, where K is the kinship matrix, B is a block diagonal matrix which 
has a solid block of "1" for each family, and s1 s2 are the fitted variance coefficients.


  I don't find this intuitive, but have seen the argument that "B" is a shared 
environmental effect.  (Perhaps because I have large family trees where they do not all 
live together).  If you want such a model:

   model5 <- coxme(.. + (1|id) + (1|famid), death.dat, 
varlist=deathdat.kmat)

(When the varlist is too short the program uses the default for remaining 
terms).

3. To go further you will need to tell us what you are trying to model, as math formulas 
not as R code.


4. The error messages you got would more properly read "I'm confused" on the part of the 
program.  They cases of something I would never do, so never got that message.  Therefore 
useful for me to see.  Continuous variables go to the left of the | and categoricals to 
the right of the |.  Having a family id to the left makes no sense at all.


Terry Therneau


On 09/15/2014 03:20 PM, Marie Dogherty wrote:

Dr. Therneau,

I was wondering if you had a spare minute, if you could view a post in the R 
forum:

http://r.789695.n4.nabble.com/CoxME-family-relatedness-td4696976.html

I would appreciate it, I'm stuck and out of ideas!

Many thanks

Marie.


---Original post --

Hello all,

I have a table like this, with ~300 individuals:

Famid Id  Faid Moid Cohort  Sex  Survival Event SNP1 SNP2 SNP3

11000010   1010

22001120   1000

23000025   1010

45120035   1011

46120035   0101



famid=family id, id=id of person,faid=father id,moid=mother id.

My question of interest: What impact does SNP1/SNP2/SNP3 have on survival of individuals 
(Id), after accounting for possible effects due to family relatedness (Famid).


So I want to account for family-specific frailty effects, and individual frailty effects 
according to degree of relationship between individuals.


The commands I've used are from this paper: 
http://www.ncbi.nlm.nih.gov/pubmed/21786277

Library(survival)
Library(coxme)
Library(kinship2)
Library(bdsmatrix)

Death.dat <-read.table(“Table”,header=T)

#Make a kinship matrix for the whole study
deathdat.kmat 
<-makekinship(famid=death.dat$famid,id=death.dat$id,father=death.dat$faid,mother=death.dat$moid)


##omit linker individuals with no phenotypic data, used only to ##properly specify the 
pedigree when calculating kinship ##coefficints:

death.dat1<-subset(death.dat,!is.na(Survival))

##temp is an array with the indices of the individuals with Survival years:
all <-dimnames(deathdat.kmat)[[1]]
temp <-which(!is.na(death.dat$Survival[match(all,death.dat$id)]))

##kinship matrix for the subset with phenotype Survival:
deathdat1.kmat <-deathdat.kmat[temp,temp]


If I type:

model3 
<-coxme(Surv(Survival,Event)~Sex+strata(Cohort)+SNP1+SNP2+SNP3+(1+id|famid),data=death.dat,varlist=deathdat1.kmat)


I get:

“Error in coxme(Surv(Survival, Event) ~ Sex + strata(Cohort) + SNP1 +  :
  In random term 1: Mlist cannot have both covariates and grouping”


Whereas if I type:

model3 
<-coxme(Surv(Survival,Event)~Sex+strata(Cohort)+SNP1+SNP2+SNP3,(id|famid),data=death.dat1,varlist=deathdat1.kmat)



I get:

Error in coxme(Surv(Survi

Re: [R] spatstat rmh problem

2014-09-15 Thread Rolf Turner

Your example is not reproducible.  We don't have "cshape" or "im.pop" 
(and are possibly lacking other bits and pieces; I didn't check the 
details since the example fails to run from the get-go).  Please provide 
a *reproducible* example.


Also I am puzzled by the line


mod <- ppm (ppp, ~  pop ,  covariates = list (pop = im.pop))


Did you mean


mod <- ppm (dat, ~  pop ,  covariates = list (pop = im.pop))


???

Also please note that with versions of spatstat later than or equal to 
1.37-0 you can write


ppm(dat ~ im.pop)

when the object "im.pop" is present in the global environment.

cheers,

Rolf Turner

On 16/09/14 02:30, Sebastian Schutte wrote:

Dear R and spatstat developers,

Thanks so much for the time and effort that you invest into this awesome
software. I have a problem simulating from a Point Process Model in
spatstat. In summary, the option "new.coef" should allow me to use a
fitted model and change its beta coefficients before simulating a point
pattern from the model via Monte Carlo simulation. Intuitively, one
would assume that the predicted point pattern changes as one fiddles
with the beta coefficients. However, this does not seem to work.

Please let me know what I am missing here and which screw to drive to
actually change the simulation output.

#owin is a polygon of country boundaries, "im.pop" is a raster with
georeferenced population counts.
#I am using a random point pattern for demonstration purposes

#Fix random seed
set.seed(12345)
#Generate artificial points
dat <- rpoint(500,win=cshape)
#Fit a (inhomogenous spatial poisson) model to the data
mod <- ppm (ppp, ~  pop ,  covariates = list (pop = im.pop))
#Simulate some points:
plot(density(rmh(mod)))
#plot(density(simulate(mod)))
#Show that this is reproducible
set.seed(12345)
#Generate artificial points
dat <- rpoint(500,win=cshape)
#Fit a (inhomogenous spatial poisson) model to the data
mod <- ppm (ppp, ~  pop ,  covariates = list (pop = im.pop))
#Simulate some points:
plot(density(rmh(mod)))
#As expected, the density is the same

#Now change the coefs and do it again:
set.seed(12345)
#Generate artificial points
dat <- rpoint(500,win=cshape)
#Fit a (inhomogenous spatial poisson) model to the data
mod <- ppm (ppp, ~  pop ,  covariates = list (pop = im.pop))
#Simulate some points:
plot(density(rmh(mod),new.coef=c(1,200)))
#Looks the same, so what am I missing?


--
Rolf Turner
Technical Editor ANZJS

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bug in rep() function

2014-09-15 Thread Samuel Knapp
Thank you.
I got the point with non-integer values in rep(). I also red FAQ 7.3:
"The only numbers that can be represented exactly in R’s numeric type 
are integers and fractions whose denominator is a power of 2."


But then I still don't understand:

> for (b in seq(0.2,0.8,0.2))
+ {
+   a <- (1-b)*10
+
+   print(1-b,digits=20)
+   print(a,digits=22)
+   print(trunc(a))
+   print("///")
+ }
[1] 0.80004441
[1] 8
[1] 8
[1] "///"
[1] 0.5999778
[1] 6
[1] 6
[1] "///"
[1] 0.39991118
[1] 3.999111822
[1] 3
[1] "///"
[1] 0.19995559
[1] 1.999555911
[1] 1
[1] "///"

Why are the first two yielding an integer after multiplying, and the 
last two don't? Apparently, c(0.8,0.6,0.4,0.2) can't be represented exactly.


What would be your approach? Always round numbers first, before giving 
them to rep() ?


Thanks,
Samuel

On 15.09.2014 18:36, Prof Brian Ripley wrote:

On 15/09/2014 16:30, Samuel Knapp wrote:

Dear all,

I have discovered a bug in the standard rep() function: At certain


Not so:

> a <- (1-0.9)*100
> trunc(a)
[1] 9

As the help says

 Non-integer values of ‘times’ will be truncated towards zero.  If
 ‘times’ is a computed quantity it is prudent to add a small fuzz.

And as the posting guide said

Do your homework before posting:
...
Read the online help for relevant functions (type ?functionname, 
e.g., ?prod, at the R prompt)



values, rep() does not replicate the element by the proper number of 
times:


 > a <- (1-0.9)*100
 > a
[1] 10
 > length(rep(1,times=a))
[1] 9
 > length(rep(1,each=a))
[1] 9

As shown, this happens as well for the times= as for the each=
parameter. It does not depend on the kind of element that is to be
repeated:

 > length(rep("abc",each=a))
[1] 9

I tried to narrow down the bug, but haven't really managed to find a
pattern behind the bug. Here is a list with values for a (see above)
that returns a false object ( after the value for a, i've collected the
expected length and the length that is produced by r):

# mistake at
(1-0.9)*100   10   9
(1-0.8)*100   20  19
(1-0.8)*1000  200   199
(1-0.9)*1000  100   99
(1-0.9)*101 0
(1-0.8)*102 1
(1-0.9)*10  1   
(2-1-0.9)*100 10  9
(10/10-0.9)*100 10  9

# the following sets for a work fine
(1+0.1)*100
(1-0.1)*100
(1-0.7)*100
(1-0.99)*1000
(1-0.7)*10
(1-0.90)*10
(1-0.95)*100
(1-0.95)*1000
(2-0.9)*1000
(2-1.9)*100
(1.1-1)*100
(10-9)*100

Did I make any mistake? Or where else should I address this problem?

Thanks and best regards,
Samuel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Quantile

2014-09-15 Thread Felix Dietrich
Hi, I want to use the quantile function, the example shown under "help"

x <- rnorm(1001)
quantile(x <- rnorm(1001)) # Extremes & Quartiles by default
quantile(x,  probs = c(0.1, 0.5, 1, 2, 5, 10, 50, NA)/100)

I get the following error:
Error in quantile(x, probs = c(0.1, 0.5, 1, 2, 5, 10, 50, NA)/100) : 
  unused argument (probs = c(0.1, 0.5, 1, 2, 5, 10, 50, NA)/100)

The argument probs does not seems to work. I tried many other variations. Does 
anybody have an idea?

Thanks, Felix
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Updates to R Core and R Foundation Membership

2014-09-15 Thread Robert Gentleman
Hi all,
  It is my pleasure to announce new members to R Core and to the R
Foundation
 whose efforts will be most appreciated as R continues to evolve and
advance.

There are 2 new R core members:  Martin Morgan and Michael Lawrence.
In addition Stefano Iacus has decided to step down from R Core.

   There are 7 new R foundation members:
  Dirk Eddelbuettel, Torsten Hothorn, Marc Schwartz,
  Hadley Wickham, and Achim Zeileis, Martin Morgan and Michael Lawrence.
  The R Foundation now has 29 ordinary members.

  Please join me in welcoming them to their new roles and especially in
thanking
 Stefano for his many years of contributions.


  best wishes
Robert

 for the R Foundation

-- 
Robert Gentleman
rgent...@gmail.com

[[alternative HTML version deleted]]

___
r-annou...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-announce

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Efficient frontier

2014-09-15 Thread Aparna
Hi I need help for plotting efficient frontier, I have expected return and 
covariance matrix. I am using tseries and downloaded portfolio package too. The 
suggestion says to use efficient.frontier, but it looks you replaces it by 
something in R 3.1.1 as it says this is not available. At current R version, 
what is the way to draw efficient frontier? 

Sent from my iPad
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ncdf size error

2014-09-15 Thread Hernan A. Moreno Ramirez
Hi I am using both ncdf and ncdf4 libraries and with both I keep getting the
same error: Error in R_nc_enddef: NetCDF: One or more variable sizes violate
format constraints. Error in R_nc_sync: NetCDF: Operation not allowed in
define mode. This happens when I try to create.ncdf() a file with more than
13 variables. I think is a problem of memory size. What would you recommend?
Any help will be appreciated,.
Sent from my iPhone

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: the properties of a DATA to run DBSCAN in R

2014-09-15 Thread Maryam Heidari
hello everybody,

I have been trying to run "dbscan" algorithm on my data, my data round
4 records which each of them has 3 attributes + plus the ID for each
record.
the interesting thing is that when I run the "dbscan just on 3 attributes"
R gives me an ERROR regarding "stackoverflow" but when I run it including
the 4 attributes it runs with out any problem.
so is the problem with my data?!

this is statistics about my data
   att1   att2att3
 Min.   :0.0  Min.   :0.00   Min.   :0.00
 1st Qu.:0.014291st Qu.:0.00 1st Qu.:0.00
 Median :0.02857Median :0.00   Median :0.00
 Mean   :0.02764Mean   :0.001135Mean   :0.000477
 3rd Qu.:0.028573rd Qu.:0.003rd Qu.:0.00
 Max.   :1.0  Max.   :1.00 Max.   :1.00

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ncdf size error

2014-09-15 Thread David W. Pierce
On Mon, Sep 15, 2014 at 4:20 PM, Hernan A. Moreno Ramirez
 wrote:
>
> Hi I am using both ncdf and ncdf4 libraries and with both I keep getting the
> same error: Error in R_nc_enddef: NetCDF: One or more variable sizes violate
> format constraints. Error in R_nc_sync: NetCDF: Operation not allowed in
> define mode. This happens when I try to create.ncdf() a file with more than
> 13 variables. I think is a problem of memory size. What would you recommend?
> Any help will be appreciated

Hi Hernan,

can you supply an example that shows the problem?

Regards,

--Dave

---
David W. Pierce
Division of Climate, Atmospheric Science, and Physical Oceanography
Scripps Institution of Oceanography, La Jolla, California, USA
(858) 534-8276 (voice)  /  (858) 534-8561 (fax)dpie...@ucsd.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quantile

2014-09-15 Thread David Winsemius
On Sep 15, 2014, at 11:17 AM, Felix Dietrich wrote:

> Hi, I want to use the quantile function, the example shown under "help"
> 
> x <- rnorm(1001)
> quantile(x <- rnorm(1001)) # Extremes & Quartiles by default
> quantile(x,  probs = c(0.1, 0.5, 1, 2, 5, 10, 50, NA)/100)
> 
> I get the following error:
> Error in quantile(x, probs = c(0.1, 0.5, 1, 2, 5, 10, 50, NA)/100) : 
>  unused argument (probs = c(0.1, 0.5, 1, 2, 5, 10, 50, NA)/100)
> 
> The argument probs does not seems to work. I tried many other variations. 
> Does anybody have an idea?

You have probably already read the responses to the duplicate question you 
posed on StackOverfow earlier today that said this problem could not be 
duplicated by anyone who responded, and one leading theory is that you or one 
of the packages has overwritten the `quantile` function. 

This is what the code shows:

>  quantile
function (x, ...) 
UseMethod("quantile")



That may be informative in your case.

Please read the Posting Guide. It asks that you not crosspost. If you post a 
followup to rhelp, then the reading of the Posting guide will tell you that 
much more in the way of detail about your setup was requested.

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bug in rep() function

2014-09-15 Thread William Dunlap
> Why are the first two yielding an integer after multiplying, and the last two 
> don't?
> Apparently, c(0.8,0.6,0.4,0.2) can't be represented exactly.

Most fractions cannot be represented exactly.  Also, you cannot depend
on the third element of seq(.2,.8,by=.2) being equal to .6 (it is
slightly greater).  Use subtraction
instead of equality tests to get a better feel for what is happening.
  > seq(.2, .8, .2)[3] - .6
  [1] 1.110223e-16

> What would be your approach? Always round numbers first, before giving them
> to rep() ?

You can do that or generate integer sequences.  It is not just rep() -
any function that interprets an argument as an integer has the same
problem.


Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Mon, Sep 15, 2014 at 10:12 AM, Samuel Knapp  wrote:
> Thank you.
> I got the point with non-integer values in rep(). I also red FAQ 7.3:
> "The only numbers that can be represented exactly in R’s numeric type are
> integers and fractions whose denominator is a power of 2."
>
> But then I still don't understand:
>
>> for (b in seq(0.2,0.8,0.2))
> + {
> +   a <- (1-b)*10
> +
> +   print(1-b,digits=20)
> +   print(a,digits=22)
> +   print(trunc(a))
> +   print("///")
> + }
> [1] 0.80004441
> [1] 8
> [1] 8
> [1] "///"
> [1] 0.5999778
> [1] 6
> [1] 6
> [1] "///"
> [1] 0.39991118
> [1] 3.999111822
> [1] 3
> [1] "///"
> [1] 0.19995559
> [1] 1.999555911
> [1] 1
> [1] "///"
>
> Why are the first two yielding an integer after multiplying, and the last
> two don't? Apparently, c(0.8,0.6,0.4,0.2) can't be represented exactly.
>
> What would be your approach? Always round numbers first, before giving them
> to rep() ?
>
> Thanks,
> Samuel
>
> On 15.09.2014 18:36, Prof Brian Ripley wrote:
>>
>> On 15/09/2014 16:30, Samuel Knapp wrote:
>>>
>>> Dear all,
>>>
>>> I have discovered a bug in the standard rep() function: At certain
>>
>>
>> Not so:
>>
>> > a <- (1-0.9)*100
>> > trunc(a)
>> [1] 9
>>
>> As the help says
>>
>>  Non-integer values of ‘times’ will be truncated towards zero.  If
>>  ‘times’ is a computed quantity it is prudent to add a small fuzz.
>>
>> And as the posting guide said
>>
>> Do your homework before posting:
>> ...
>> Read the online help for relevant functions (type ?functionname, e.g.,
>> ?prod, at the R prompt)
>>
>>
>>> values, rep() does not replicate the element by the proper number of
>>> times:
>>>
>>>  > a <- (1-0.9)*100
>>>  > a
>>> [1] 10
>>>  > length(rep(1,times=a))
>>> [1] 9
>>>  > length(rep(1,each=a))
>>> [1] 9
>>>
>>> As shown, this happens as well for the times= as for the each=
>>> parameter. It does not depend on the kind of element that is to be
>>> repeated:
>>>
>>>  > length(rep("abc",each=a))
>>> [1] 9
>>>
>>> I tried to narrow down the bug, but haven't really managed to find a
>>> pattern behind the bug. Here is a list with values for a (see above)
>>> that returns a false object ( after the value for a, i've collected the
>>> expected length and the length that is produced by r):
>>>
>>> # mistake at
>>> (1-0.9)*100   10   9
>>> (1-0.8)*100   20  19
>>> (1-0.8)*1000  200   199
>>> (1-0.9)*1000  100   99
>>> (1-0.9)*101 0
>>> (1-0.8)*102 1
>>> (1-0.9)*10  1   
>>> (2-1-0.9)*100 10  9
>>> (10/10-0.9)*100 10  9
>>>
>>> # the following sets for a work fine
>>> (1+0.1)*100
>>> (1-0.1)*100
>>> (1-0.7)*100
>>> (1-0.99)*1000
>>> (1-0.7)*10
>>> (1-0.90)*10
>>> (1-0.95)*100
>>> (1-0.95)*1000
>>> (2-0.9)*1000
>>> (2-1.9)*100
>>> (1.1-1)*100
>>> (10-9)*100
>>>
>>> Did I make any mistake? Or where else should I address this problem?
>>>
>>> Thanks and best regards,
>>> Samuel
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: the properties of a DATA to run DBSCAN in R

2014-09-15 Thread Jeff Newmiller
Instead of repeating yourself, please do some research. There is a Posting 
Guide mentioned at the bottom of this message. One of the things it mentions is 
making a reproducible example. (You might find [1] helpful in that regard.) 
Another thing it mentions is posting in plain text, which does not get mangled 
in transit. Another useful but of advice is to cc the package maintainer in 
case they are not monitoring R-help (see ?maintainer).

Keep in mind that there are over 6000 contributed packages out there, so making 
it easy for someone familiar with R but not familiar with your special package 
is an important strategy in getting help. In addition, if this turns out to be 
a bug that the maintainer needs to fix then they will need a reproducible 
example in order to do that.

[1]  
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On September 15, 2014 5:11:29 PM PDT, Maryam Heidari  
wrote:
>hello everybody,
>
>I have been trying to run "dbscan" algorithm on my data, my data round
>4 records which each of them has 3 attributes + plus the ID for
>each
>record.
>the interesting thing is that when I run the "dbscan just on 3
>attributes"
>R gives me an ERROR regarding "stackoverflow" but when I run it
>including
>the 4 attributes it runs with out any problem.
>so is the problem with my data?!
>
>this is statistics about my data
>att1   att2att3
> Min.   :0.0  Min.   :0.00   Min.   :0.00
> 1st Qu.:0.014291st Qu.:0.00 1st Qu.:0.00
> Median :0.02857Median :0.00   Median :0.00
> Mean   :0.02764Mean   :0.001135Mean   :0.000477
> 3rd Qu.:0.028573rd Qu.:0.003rd Qu.:0.00
> Max.   :1.0  Max.   :1.00 Max.   :1.00
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient frontier

2014-09-15 Thread Bert Gunter
Please read the posting guide (link at bottom of message) to learn how
to post coherently to get a useful response. I, at least, found your
post to be unintelligible gibberish.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Mon, Sep 15, 2014 at 12:40 PM, Aparna  wrote:
> Hi I need help for plotting efficient frontier, I have expected return and 
> covariance matrix. I am using tseries and downloaded portfolio package too. 
> The suggestion says to use efficient.frontier, but it looks you replaces it 
> by something in R 3.1.1 as it says this is not available. At current R 
> version, what is the way to draw efficient frontier?
>
> Sent from my iPad
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] spatstat rmh problem

2014-09-15 Thread Sebastian Schutte
Thanks so much for your comments. Sorry for not having sent a running 
example from the start. Here it is:


library (spatstat)

#Load example data
data(demopat)
#Generate a random point pattern within the polygon
set.seed(12345)
pdat <- rpoint(200,win=demopat$window)
#Generate a distmap, which will serve as covariate information
im.cdat <- as.im(distmap(pdat))
#Now the random seed is fixed and a new set of random points is 
generated for the example

set.seed(1)
pdat <- rpoint(200,win=demopat$window)
#Fitting a model to the data
mod <- ppm (pdat ~  im.cdat)
#Now a point pattern is simulated via rmh from the fitted model an 
visualized as a density surface

set.seed(2)
plot(density(rmh(mod)))
#And here is the problem: When I repeat the exercise with different 
coefs, the very same patter come out. "new.coef" has no effect.

set.seed(2)
plot(density(rmh(mod),new.coef=c(1,200)))

What am I missing?

Thanks again,
Sebastian


On 16.09.2014 00:18, Rolf Turner wrote:


Your example is not reproducible.  We don't have "cshape" or "im.pop" 
(and are possibly lacking other bits and pieces; I didn't check the 
details since the example fails to run from the get-go).  Please 
provide a *reproducible* example.


Also I am puzzled by the line


mod <- ppm (ppp, ~  pop ,  covariates = list (pop = im.pop))


Did you mean


mod <- ppm (dat, ~  pop ,  covariates = list (pop = im.pop))


???

Also please note that with versions of spatstat later than or equal to 
1.37-0 you can write


ppm(dat ~ im.pop)

when the object "im.pop" is present in the global environment.

cheers,

Rolf Turner

On 16/09/14 02:30, Sebastian Schutte wrote:

Dear R and spatstat developers,

Thanks so much for the time and effort that you invest into this awesome
software. I have a problem simulating from a Point Process Model in
spatstat. In summary, the option "new.coef" should allow me to use a
fitted model and change its beta coefficients before simulating a point
pattern from the model via Monte Carlo simulation. Intuitively, one
would assume that the predicted point pattern changes as one fiddles
with the beta coefficients. However, this does not seem to work.

Please let me know what I am missing here and which screw to drive to
actually change the simulation output.

#owin is a polygon of country boundaries, "im.pop" is a raster with
georeferenced population counts.
#I am using a random point pattern for demonstration purposes

#Fix random seed
set.seed(12345)
#Generate artificial points
dat <- rpoint(500,win=cshape)
#Fit a (inhomogenous spatial poisson) model to the data
mod <- ppm (ppp, ~  pop ,  covariates = list (pop = im.pop))
#Simulate some points:
plot(density(rmh(mod)))
#plot(density(simulate(mod)))
#Show that this is reproducible
set.seed(12345)
#Generate artificial points
dat <- rpoint(500,win=cshape)
#Fit a (inhomogenous spatial poisson) model to the data
mod <- ppm (ppp, ~  pop ,  covariates = list (pop = im.pop))
#Simulate some points:
plot(density(rmh(mod)))
#As expected, the density is the same

#Now change the coefs and do it again:
set.seed(12345)
#Generate artificial points
dat <- rpoint(500,win=cshape)
#Fit a (inhomogenous spatial poisson) model to the data
mod <- ppm (ppp, ~  pop ,  covariates = list (pop = im.pop))
#Simulate some points:
plot(density(rmh(mod),new.coef=c(1,200)))
#Looks the same, so what am I missing?




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] spatstat rmh problem

2014-09-15 Thread Rolf Turner

OK.  Two things are going wrong.

(1) There is an error in your code.  You are passing the new.coef 
argument to density() and not to rmh(). The function density() has no 
such argument, but has a "..." argument, so "new.coef" simply gets ignored.


You should use:

plot(density(rmh(mod,new.coef=c(1,200

(2) However, even when the correct call is given you still wind up with 
identical densities!!!


Hm.  I think this may be a bug; I'll will check with the other 
authors of spatstat and report back.


cheers,

Rolf Turner

On 16/09/14 16:30, Sebastian Schutte wrote:

Thanks so much for your comments. Sorry for not having sent a running
example from the start. Here it is:

library (spatstat)

#Load example data
data(demopat)
#Generate a random point pattern within the polygon
set.seed(12345)
pdat <- rpoint(200,win=demopat$window)
#Generate a distmap, which will serve as covariate information
im.cdat <- as.im(distmap(pdat))
#Now the random seed is fixed and a new set of random points is
generated for the example
set.seed(1)
pdat <- rpoint(200,win=demopat$window)
#Fitting a model to the data
mod <- ppm (pdat ~  im.cdat)
#Now a point pattern is simulated via rmh from the fitted model an
visualized as a density surface
set.seed(2)
plot(density(rmh(mod)))
#And here is the problem: When I repeat the exercise with different
coefs, the very same patter come out. "new.coef" has no effect.
set.seed(2)
plot(density(rmh(mod),new.coef=c(1,200)))

What am I missing?


--
Rolf Turner
Technical Editor ANZJS

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.