date:20080914

Re: [R] Please help me in Converting this from C# to R

2008-09-14 Thread Barry Rowlingson

2008/9/14 rajivv <[EMAIL PROTECTED]>:
>
> Random r = new Random();
>DirectedGraph graph = GetGraph();
>decimal B = 0.1m;
>decimal D = 0.05m;
 [ deletia ]
>if (P[i] < 0)
>P[i] = 0;
>}
>}
>
>}

 If you convert it into English first, then more people will be able
to help. It's much easier to convert English to any programming
language than from programming language A to programming language B.
Given that this code must have derived from a specification written in
a human language (such as English) just supply us with that. Sometimes
you don't need the original spec if the code is well-commented. But
this code is null-commented.

 At a guess, it seems to get a graph from out of nowhere
(graph=GetGraph()) and then do 100 iterations of some calculation
based on the graph adjacency. This should not be too difficult to
convert to R, but with any conversion problem there are always hidden
traps to beware of. Here's one in your code:

 You have:

  for (int i = 7; i <= 10; ++i)

 in one loop, and:

  for (int t = 0; t < 100; ++t)

 Now, much as C style loop specifications are concise and elegant,
they can cause confusion. The subtle differences here (using < instead
of <=, and the 'preincrement' ++i) confuse me as to what values the
loop variable takes in the loop.

 The way to get by all these issues in any conversion problem is to
have a good set of test cases. You run the test cases in language A
and get a set of answers. You then run the test cases using the
converted code in language B and if you don't get the same answers
then the conversion has failed.

 If you can describe what the code does, add some meaningful comments,
and produce a set of sample data test cases and results then perhaps
you'll get more help than just pasting the code in and asking nicely
(you did say 'please', which is more than some people do on this
list!).

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with misclass function on tree classification

2008-09-14 Thread Meir Preiszler


I am working through Tom Minka's lectures on Data Mining and am now on Day 32. 
The following
is the link: 
http://alumni.media.mit.edu/~tpminka/courses/36-350.2001/lectures/day32/
In order to use the functions cited I followed the instructions as follows:

Installed tree package from CRAN mirror (Ca-1)
Downloaded and sourced the file "tree.r"
Downloaded the function "clus1.r"

Having defined a tree "tr, when I write "misclass(tr,x$test)" as shown in the 
link
I get an error message that "R does not find the function pred1.tree".

 Is this function included in the tree package? If so it was not in my 
download. Is this a bug?
Do you know of a fix?

Thanks for your help
Meir

 <>  <> 


Meir Preiszler - Research Engineer
I t a m a r M e d i c a l Ltd. 
Caesarea, Israel:
Tel: +(972) 4 617 7000 ext 232
Fax: +(972) 4 627 5598
Cell: +(972) 54 699 9630
Email: [EMAIL PROTECTED] 
Web: www.Itamar-medical.com 
*




8<8<---8<---
 This E-mail is confidential information of Itamar medical Ltd. It may also 
  be legally privileged. If you are not the addressee you may not copy, forward,
  disclose or use any part of it. If you have received this message in error,
  please delete it and all copies from your system and notify the sender
  immediately by return E-mail. Internet communications cannot be guaranteed 
  to be timely, secure, error or virus-free. The sender does not accept 
  liability for any errors or omissions. Before printing this email , 
  kindly think about the environment.   Itamar Medical Ltd. MIS Yan Malgin.
8<8<---8<---

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Where is the error

2008-09-14 Thread Berend Hasselman

rajivv wrote:
> 
> P <- vector(mode="numeric",length =10)
> 
> SS<-function(){for(id in 0:9){
>   if(0   print("ss")
> else
> print("ss")
>  }}
> 
> SS()
> ---
> Error in if (0 < P[id]) print("ss") else print("ss") : 
>   argument is of length zero
> 

Use for(id in 1:10)

Berend
-- 
View this message in context: 
http://www.nabble.com/Where-is-the-error-tp19477706p19478528.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Where is the error

2008-09-14 Thread rajivv


P <- vector(mode="numeric",length =10)

SS<-function(){for(id in 0:9){
if(0http://www.nabble.com/Where-is-the-error-tp19477706p19477706.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Where is the error

2008-09-14 Thread Nicky Chorley

2008/9/14 rajivv <[EMAIL PROTECTED]>:
>
> P <- vector(mode="numeric",length =10)
>
> SS<-function(){for(id in 0:9){
>if(0print("ss")
> else
> print("ss")
>  }}
>
> SS()
> ---
> Error in if (0 < P[id]) print("ss") else print("ss") :
>  argument is of length zero

Arrays/vectors/matrices in R are indexed from 1, not 0.

Regards,

Nicky Chorley

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Please help me in Converting this from C# to R

2008-09-14 Thread Simon Knapp

# bit hard to provide a simple conversion without definitions of the
class 'Node', the template 'DirectedGraph' and the function 'Writed'!
# I've used the package 'igraph' as a drop in - hope it is still clear.
#
# by the way:
# - your curly braces don't match,
# - not all elements of P are initialised before they are used.

#
# original code (cleaned to make comparison easier).
#
#Random r = new Random();
#DirectedGraph graph = GetGraph();
#decimal B = 0.1m;
#decimal D = 0.05m;
#int nodes = graph.NodesCount;
#decimal[] E = new decimal[nodes];
#decimal[] P = new decimal[nodes];
#
#for (int i = 7; i <= 10; ++i) P[i] = (decimal)r.NextDouble();
#
#for (int t = 0; t < 100; ++t){
#Writed(P, "P");
#
#foreach (SimpleNode n in graph.Nodes) {
#int id = graph.index[n];
#
#decimal product = 1;
#foreach (var item in graph.GetAdjacentNodes(n)){
#int j = graph.index[item];
#product *= (1 - B * P[j]);
#}
#
#E[id] = product;
#}
#
#foreach (SimpleNode n in graph.Nodes){
#int i = graph.index[n];
#P[i] = 1 - ((1 - P[i]) * E[i] + D * (1 - P[i]) * E[i] + 0.5m
* D * P[i] * (1 - E[i]));
#if (P[i] < 0) P[i] = 0;
#}
#}
#
#}


#
# drop-in for your method getGraph (produces a 10 'random' node
directed graph). I only assign to temporary so I can use the same
'grph' and 'P' in both implementations.
#
library(igraph)
GetGraph <- function() graph.adjacency(matrix(sample(0:1, size=100,
replace=T), nrow=10))
grph.t <- GetGraph()
P.t <- runif(nodes) # assume you meant to initialise all elements of P

#
# IMPLEMENTATON 1.
# A 'mirror' implementation. Some of the code relies
# on the specifics of package igraph, but I've tried to
# be as similar as possible. Hope it still makes sense!
#
B <- 0.1
D <- 0.05
grph <- grph.t
nodes <- vcount(grph)
E <- numeric(nodes)
P <- P.t

for(t in 0:99){
cat('P:', P, '\n')# is this equivalent to 'Writed(P, "P")' ???
graph.Nodes <- get.adjlist(grph) # returns a list of vectors,
where each vector is the nodes a node is connected to.
id <- 0 # we loop over the vectors and so must index separately
for(n in graph.Nodes){ # n is a vector containing the verticies
the vertex at index id+1 is connected to.
id <- id+1
product <- 1;
for(item in n){
product <- product * (1 - B * P[item+1]); # verticies are
indexed from 0. no operator*= in R.
}
E[id] <- product;
}

at <- 0
for(i in 1:nodes){
P[i] <- 1 - ((1 - P[i]) * E[i] + D * (1 - P[i]) * E[i] + 0.5 *
D * P[i] * (1 - E[i])); # we are accessing nodes in order so the
indexes are also ordered.
if (P[i] < 0) P[i] <- 0;
}
}

P # print the result

#
# IMPLEMENTATION 2.
# a more 'R-ish' implementation.
#
B <- 0.1
D <- 0.05
P <- P.t
grph <- grph.t

for(t in 0:99){
E <- sapply(get.adjlist(grph), function(node) prod(1-B*P[node+1]))
P <- 1 - ((1 - P) * E + D * (1 - P) * E + 0.5 * D * P * (1 - E))
}

P # print the result

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with misclass function on tree classification

2008-09-14 Thread Simon Knapp

Did you say:

library("tree")

at the top of your script?

On Sun, Sep 14, 2008 at 5:47 PM, Meir Preiszler
<[EMAIL PROTECTED]> wrote:
>
> I am working through Tom Minka's lectures on Data Mining and am now on Day 
> 32. The following
> is the link: 
> http://alumni.media.mit.edu/~tpminka/courses/36-350.2001/lectures/day32/
> In order to use the functions cited I followed the instructions as follows:
>
> Installed tree package from CRAN mirror (Ca-1)
> Downloaded and sourced the file "tree.r"
> Downloaded the function "clus1.r"
>
> Having defined a tree "tr, when I write "misclass(tr,x$test)" as shown in the 
> link
> I get an error message that "R does not find the function pred1.tree".
>
>  Is this function included in the tree package? If so it was not in my 
> download. Is this a bug?
> Do you know of a fix?
>
> Thanks for your help
> Meir
>
>  <>  <>
>
> 
> Meir Preiszler - Research Engineer
> I t a m a r M e d i c a l Ltd.
> Caesarea, Israel:
> Tel: +(972) 4 617 7000 ext 232
> Fax: +(972) 4 627 5598
> Cell: +(972) 54 699 9630
> Email: [EMAIL PROTECTED]
> Web: www.Itamar-medical.com
> *
>
>
>
>
> 8<8<---8<---
>  This E-mail is confidential information of Itamar medical Ltd. It may also
>  be legally privileged. If you are not the addressee you may not copy, 
> forward,
>  disclose or use any part of it. If you have received this message in error,
>  please delete it and all copies from your system and notify the sender
>  immediately by return E-mail. Internet communications cannot be guaranteed
>  to be timely, secure, error or virus-free. The sender does not accept
>  liability for any errors or omissions. Before printing this email ,
>  kindly think about the environment.   Itamar Medical Ltd. MIS Yan Malgin.
> 8<8<---8<---
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Data format for BiodiversityR

2008-09-14 Thread Ndoh Innocent (Holy)

Greetings dear friends.
Please, I really find problems having the program read my datasets (here 
attached).
Have converted datasets to csv, imported but always not reaching the target.
Would be very happy if some one out can help me on time.
Thanks

Ndoh Mbue Innocent 
International corporation office 
China University of Geosciences 
388 Lumo road 
430074, Wuhan-China 
Tel: 0086 27 67885947/0086 15927262962 
A gentlemen should be truly a moral person, a straightforward and reliable   
personality,in solidarity with the community and rooted in self rescpect


  __
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem plotting axes on graphs

2008-09-14 Thread Les Stather


2008/9/14 [EMAIL PROTECTED]


I ran your example

Speed <- cars$speed
Distance <- cars$dist

Speed
[1]  4  4  7  7  8  9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 
15
[26] 15 16 16 17 17 17 18 18 18 18 19 19 19 20 20 20 20 20 22 23 24 24 24 24 
25

Distance
[1]   2  10   4  22  16  10  18  26  34  17  28  14  20  24  28  26  34  34 
46
[20]  26  36  60  80  20  26  54  32  40  32  40  50  42  56  76  84  36  46 
68

[39]  32  48  52  56  64  66  54  70  92  93 120  85

plot(Speed, Distance, panel.first = grid(8,8),
pch = 0, cex = 1.2, col = "blue")
plot(Speed, Distance,
panel.first = lines(stats::lowess(Speed, Distance), lty = "dashed"),
pch = 0, cex = 1.2, col = "blue")

And got the following

Error in axis(side = side, at = at, labels = labels, ...) :
 too few arguments


I got a graph of the points with a dashed lined line through them but did 
not get any axes


I am runing R 2.7.2 under windowa XP Service Pack 2 on an Acer Extensa 5200


Les Stather


R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Greyed text in the background of a plot

2008-09-14 Thread Agustin Lobo


Yes, it's easy but you get a significant "jump" between a time
step and the next one, which makes the animation unpleasant and 
difficult to follow. I think that this problem is because

of the use of plot(), which redraws everything, and that
there is no way around within R (is it?).

I also understand that
these "jumps" are avoided in your package by creating
the animated gif and html pages. Actually your
animation with the word "Animation" in
your page
http://animation.yihui.name/animation:start#generate_an_animation_sequence
is much "softer" that what you get by running the code within R.

Regarding what you have in
http://animation.yihui.name/da:ts:hans_rosling_s_talk

I'm missing the function Rosling.bubbles(), so cannot
actually try it.

And congratulations, great site and package.

Agus


Yihui Xie wrote:

Well, his talk seems to have attracted a lot of people... You may
simply use gray text in your plot. Here is an example:

##
x = runif(10)
y = runif(10)
z = runif(10, 0.1, 0.3)
cl = rgb(runif(10), runif(10), runif(10), 0.5) # transparent colors!
par(mar = c(4, 4, 0.2, 0.2))
for (i in 1917:2007) {
x = x + rnorm(10, 0, 0.02)
y = y + rnorm(10, 0, 0.02)
z = abs(z + rnorm(10, 0, 0.05))
plot(x, y, xlim = c(0, 1), ylim = c(0, 1), type = "n", panel.first = {
grid()
text(0.5, 0.5, i, cex = 5, col = "gray") # here is the text!
})
symbols(x, y, circles = z, add = T, bg = cl, inches = 0.8)
box()
Sys.sleep(0.2)
}
##

Not difficult at all, right? :)

BTW, if you are interested in such animations, you may as well take a
look at my "animation" package:
http://cran.r-project.org/web/packages/animation/index.html
http://animation.yihui.name/

Regards,
Yihui

On Fri, Sep 12, 2008 at 8:35 PM, Agustin Lobo <[EMAIL PROTECTED]> wrote:

Hi!

Is there any way of having a greyed ("ghosted") text
(i.e, 2006) in the background of a plot?
I'm making a dynamic plot and would like to show the
year of each time step as a big greyed text in the background.

(the idea comes from Hans Rosling video:
http://video.google.com/videoplay?docid=4237353244338529080&sourceid=searchfeed
)

Thanks

Agus
--
Dr. Agustin Lobo
Institut de Ciencies de la Terra "Jaume Almera" (CSIC)
LLuis Sole Sabaris s/n
08028 Barcelona
Spain
Tel. 34 934095410
Fax. 34 934110012
email: [EMAIL PROTECTED]
http://www.ija.csic.es/gt/obster





--
Dr. Agustin Lobo
Institut de Ciencies de la Terra "Jaume Almera" (CSIC)
LLuis Sole Sabaris s/n
08028 Barcelona
Spain
Tel. 34 934095410
Fax. 34 934110012
email: [EMAIL PROTECTED]
http://www.ija.csic.es/gt/obster

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help please! How to code a mixed-model with 2 within-subject factors using lme or lmer?

2008-09-14 Thread roberto toro

Hello,

I'm using aov() to analyse changes in brain volume between males and
females. For every subject (there are 331 in total) I have 8 volume
measurements (4 different brain lobes and 2 different tissues
(grey/white matter)). The data looks like this:

Subject Sex LobeTissue  Volume
subect1 1   F   g   262374
subect1 1   F   w   173758
subect1 1   O   g   67155
subect1 1   O   w   30067
subect1 1   P   g   117981
subect1 1   P   w   85441
subect1 1   T   g   185241
subect1 1   T   w   83183
subect2 1   F   g   255309
subect2 1   F   w   164335
subect2 1   O   g   71769
subect2 1   O   w   31879
subect2 1   P   g   120518
subect2 1   P   w   90334
subect2 1   T   g   168413
subect2 1   T   w   75790
subect3 0   F   g   243621
subect3 0   F   w   167025
subect3 0   O   g   65998
subect3 0   O   w   29758
subect3 0   P   g   118026
subect3 0   P   w   91903
subect3 0   T   g   156279
subect3 0   T   w   82349


I'm trying to see if there is an interaction Sex*Lobe*Tissue. This is
the command I use with aov():
   mod1<-aov(Volume~Sex*Lobe*Tissue+Error(Subject/(Lobe*Tissue)),data.vslt)

Subject is a random effect, Sex, Lobe and Tissue are fixed effects;
Sex is an outer factor (between subjects), and Lobe and Tissue are
inner factors (within-subjects); and there is indeed a significant
3-way interaction.

I was told, however, that the results reported by aov() may depend on
the order of the factors
(type I anova), and that is better to use lme() or lmer() with type
II, but I'm struggling to find the right syntaxis...

To begin, how should I write the model using lme() or lmer()??

I tried this with lme():

gvslt<-groupedData(Volume~1|Subject,outer=~Val,inner=list(~Lobe,~Tissue),data=vslt)
mod2<-lme(Volume~Val*Lobe*Tissue,random=~1|Subject,data=gvslt)

but I have interaction terms for every level of Lobe and Tissue, and 8
times the number of DF I should have... (around 331*8 instead of
~331).

Using lmer(), the specification of Subject as a random effect is
straightforward:

mod2<-lmer(Volume~Sex*Lobe*Tissue+(1|Subject),data.vslt)

but I can't figure out the /(Lobe*Tissue) part...

Thank you very much in advance!
roberto

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Join data by minimum distance

2008-09-14 Thread Simon Knapp

> I am wondering if there is a function which will do a join between 2 
> data.frames by minimum distance, as it is done in ArcGIS for example. For 
> people who are not familiar with ArcGIS here it is an explanation:
>
> Suppose you have a data.frame with x, y, coordinates called track, and a 
> second data frame with different x, y coordinates and some other attributes 
> called classif. The track data.frame has a different number of rows than 
> classif. I want to join the rows from classif to track in such a way that for 
> each row in track I add only the row from classif that has coordinates 
> closest to the coordinates in the track row (and hence minimum distance in 
> between the 2 rows), and also add a new column which will record this minimum 
> distance. Even if the coordinates in the 2 data.frames have same name, the 
> values are not identical between the data.frames, so a merge by column is not 
> what I am after.



#---
# get the distance between two points on the globe.
#
# args:
# lat1 - latitude of first point.
# long1 - longitude of first point.
# lat2 - latitude of first point.
# long2 - longitude of first point.
# radius - average radius of the earth in km
#
# see: http://en.wikipedia.org/wiki/Great_circle_distance
#---
greatCircleDistance <- function(lat1, long1, lat2, long2, radius=6372.795){
sf <- pi/180
lat1 <- lat1*sf
lat2 <- lat2*sf
long1 <- long1*sf
long2 <- long2*sf
lod <- abs(long1-long2)
radius * atan2(
sqrt((cos(lat1)*sin(lod))**2 +
(cos(lat2)*sin(lat1)-sin(lat2)*cos(lat1)*cos(lod))**2),
sin(lat2)*sin(lat1)+cos(lat2)*cos(lat1)*cos(lod)
)
}

#---
# Calculate the nearest point using latitude and longitude.
# and attach the other args and nearest distance from the
# other data.frame.
#
# args:
# x as you describe 'track'
# y as you describe 'classif'
# xlongnme name of longitude variable in x
# xlatnme name of latitude location variable in x
# ylongnme name of longitude location variable on y
# ylatnme name of latitude location variable on y
#---
dist.merge <- function(x, y, xlongnme, xlatnme, ylongnme, ylatnme){
tmp <- t(apply(x[,c(xlongnme, xlatnme)], 1, function(x, y){
dists <- apply(y, 1, function(x, y) greatCircleDistance(x[2],
x[1], y[2], y[1]), x)
cbind(1:nrow(y), dists)[dists == min(dists),,drop=F][1,]
}
, y[,c(ylongnme, ylatnme)]))
tmp <- cbind(x, min.dist=tmp[,2], y[tmp[,1],-match(c(ylongnme,
ylatnme), names(y))])
row.names(tmp) <- NULL
tmp
}

# demo
track <- data.frame(xt=runif(10,0,360), yt=rnorm(10,-90, 90))
classif <- data.frame(xc=runif(10,0,360), yc=rnorm(10,-90, 90),
v1=letters[1:20], v2=1:20)
dist.merge(track, classif, 'xt', 'yt', 'xc', 'yc')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help please! How to code a mixed-model with 2 within-subject factors using lme or lmer?

2008-09-14 Thread Mark Difford


Hi Roberto,

>> but I can't figure out the /(Lobe*Tissue) part...

This type of nesting is easier to do using lmer(). To do it using lme() you
have to generate the crossed factor yourself. Do something like this:

##
tfac <- with(vslt, interaction(Lobe, Tissue, drop=T))
str(tfac); head(tfac)
mod2<-lme(Volume ~ Val*Lobe*Tissue, random = ~1|Subject/tfac, data = vslt)

Pre-Scriptum: You can also use ?":" but ?interaction is more flexible and
powerful.

Regards, Mark.


roberto toro wrote:
> 
> Hello,
> 
> I'm using aov() to analyse changes in brain volume between males and
> females. For every subject (there are 331 in total) I have 8 volume
> measurements (4 different brain lobes and 2 different tissues
> (grey/white matter)). The data looks like this:
> 
> Subject   Sex LobeTissue  Volume
> subect1   1   F   g   262374
> subect1   1   F   w   173758
> subect1   1   O   g   67155
> subect1   1   O   w   30067
> subect1   1   P   g   117981
> subect1   1   P   w   85441
> subect1   1   T   g   185241
> subect1   1   T   w   83183
> subect2   1   F   g   255309
> subect2   1   F   w   164335
> subect2   1   O   g   71769
> subect2   1   O   w   31879
> subect2   1   P   g   120518
> subect2   1   P   w   90334
> subect2   1   T   g   168413
> subect2   1   T   w   75790
> subect3   0   F   g   243621
> subect3   0   F   w   167025
> subect3   0   O   g   65998
> subect3   0   O   w   29758
> subect3   0   P   g   118026
> subect3   0   P   w   91903
> subect3   0   T   g   156279
> subect3   0   T   w   82349
> 
> 
> I'm trying to see if there is an interaction Sex*Lobe*Tissue. This is
> the command I use with aov():
>   
> mod1<-aov(Volume~Sex*Lobe*Tissue+Error(Subject/(Lobe*Tissue)),data.vslt)
> 
> Subject is a random effect, Sex, Lobe and Tissue are fixed effects;
> Sex is an outer factor (between subjects), and Lobe and Tissue are
> inner factors (within-subjects); and there is indeed a significant
> 3-way interaction.
> 
> I was told, however, that the results reported by aov() may depend on
> the order of the factors
> (type I anova), and that is better to use lme() or lmer() with type
> II, but I'm struggling to find the right syntaxis...
> 
> To begin, how should I write the model using lme() or lmer()??
> 
> I tried this with lme():
>
> gvslt<-groupedData(Volume~1|Subject,outer=~Val,inner=list(~Lobe,~Tissue),data=vslt)
> mod2<-lme(Volume~Val*Lobe*Tissue,random=~1|Subject,data=gvslt)
> 
> but I have interaction terms for every level of Lobe and Tissue, and 8
> times the number of DF I should have... (around 331*8 instead of
> ~331).
> 
> Using lmer(), the specification of Subject as a random effect is
> straightforward:
> 
> mod2<-lmer(Volume~Sex*Lobe*Tissue+(1|Subject),data.vslt)
> 
> but I can't figure out the /(Lobe*Tissue) part...
> 
> Thank you very much in advance!
> roberto
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Help-please%21-How-to-code-a-mixed-model-with-2-within-subject-factors-using-lme-or-lmer--tp19479860p19480387.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Greyed text in the background of a plot

2008-09-14 Thread Yihui Xie

Hi Agus,

Yes you are absolutely right about the awkward jumps in the animations
and this has also been my big problem for a long time. To solve this
problem, I think I need a third-party software, as I don't know any
solutions merely using R. Maybe the "swfc" utility in the SWF Tools or
the Processing language can be possible solutions. I'll try them when
I have enough time.

As for the function Rosling.bubbles(), you have to wait until the
version 1.0-2 is published on CRAN. (I've submitted the new version
this morning)

Sorry it seems I have been discussing a different topic under this thread...

Yihui

On Sun, Sep 14, 2008 at 8:05 PM, Agustin Lobo <[EMAIL PROTECTED]> wrote:
> Yes, it's easy but you get a significant "jump" between a time
> step and the next one, which makes the animation unpleasant and difficult to
> follow. I think that this problem is because
> of the use of plot(), which redraws everything, and that
> there is no way around within R (is it?).
>
> I also understand that
> these "jumps" are avoided in your package by creating
> the animated gif and html pages. Actually your
> animation with the word "Animation" in
> your page
> http://animation.yihui.name/animation:start#generate_an_animation_sequence
> is much "softer" that what you get by running the code within R.
>
> Regarding what you have in
> http://animation.yihui.name/da:ts:hans_rosling_s_talk
>
> I'm missing the function Rosling.bubbles(), so cannot
> actually try it.
>
> And congratulations, great site and package.
>
> Agus
>
>

-- 
Yihui Xie <[EMAIL PROTECTED]>
Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
Mobile: +86-15810805877
Homepage: http://www.yihui.name
School of Statistics, Room 1037, Mingde Main Building,
Renmin University of China, Beijing, 100872, China

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Scaling X axis from -1 to 1

2008-09-14 Thread Gundala Viswanath

Hi,

I have a density plot in which the x axis
ranged from 0 to 2000.

How can I scale the data so that the x-axis
is scaled in -1 to 1 form?

- Gundala Viswanath
Jakarta - Indonesia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Scaling X axis from -1 to 1

2008-09-14 Thread Yihui Xie

2 * (x - min(x))/(max(x) - min(x)) - 1

On Sun, Sep 14, 2008 at 10:13 PM, Gundala Viswanath <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I have a density plot in which the x axis
> ranged from 0 to 2000.
>
> How can I scale the data so that the x-axis
> is scaled in -1 to 1 form?
>
> - Gundala Viswanath
> Jakarta - Indonesia
>


-- 
Yihui Xie <[EMAIL PROTECTED]>
Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
Mobile: +86-15810805877
Homepage: http://www.yihui.name
School of Statistics, Room 1037, Mingde Main Building,
Renmin University of China, Beijing, 100872, China

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help please! How to code a mixed-model with 2 within-subject factors using lme or lmer?

2008-09-14 Thread roberto toro

Thanks for answering Mark!

I tried with the coding of the interaction you suggested:

> tfac<-with(vlt,interaction(Lobe,Tissue,drop=T))
> mod<-lme(Volume~Sex*Lobe*Tissue,random=~1|Subject/tfac,data=vlt)

But is it normal that the DF are 2303? DF is 2303 even for the estimate of
LobeO that has only 662 values (331 for Tissue=white and 331 for Tissue=grey).
I'm not sure either that Sex, Lobe and Tissue are correctly handled why are
there different estimates called Sex:LobeO, Sex:LobeP, etc, and not just
Sex:Lobe as with aov()?. Why there's Tissuew, but not Sex1, for example?

Thanks again!
roberto

ps1. How would you code this with lmer()?
ps2. this is part of the output of mod<-lme:
> summary(mod)
Linear mixed-effects model fit by REML
 Data: vlt
   AIC  BIClogLik
  57528.35 57639.98 -28745.17

Random effects:
 Formula: ~1 | Subject
(Intercept)
StdDev:11294.65

 Formula: ~1 | tfac %in% Subject
(Intercept) Residual
StdDev:10569.03 4587.472

Fixed effects: Volume ~ Sex * Lobe * Tissue
   Value Std.Error   DFt-value p-value
(Intercept)245224.61  1511.124 2303  162.27963  0.
Sex  2800.01  1866.312  3291.50029  0.1345
LobeO -180794.83  1526.084 2303 -118.46975  0.
LobeP -131609.27  1526.084 2303  -86.23984  0.
LobeT  -73189.97  1526.084 2303  -47.95932  0.
Tissuew-72461.05  1526.084 2303  -47.48168  0.
Sex:LobeO-663.27  1884.789 2303   -0.35191  0.7249
Sex:LobeP   -2146.08  1884.789 2303   -1.13863  0.2550
Sex:LobeT1379.49  1884.789 23030.73191  0.4643
Sex:Tissuew  5387.65  1884.789 23032.85849  0.0043
LobeO:Tissuew   43296.99  2158.209 2303   20.06154  0.
LobeP:Tissuew   50952.21  2158.209 2303   23.60856  0.
LobeT:Tissuew  -15959.31  2158.209 2303   -7.39470  0.
Sex:LobeO:Tissuew   -5228.66  2665.494 2303   -1.96161  0.0499
Sex:LobeP:Tissuew   -1482.83  2665.494 2303   -0.55631  0.5781
Sex:LobeT:Tissuew   -6037.49  2665.494 2303   -2.26506  0.0236

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help please! How to code a mixed-model with 2 within-subject factors using lme or lmer?

2008-09-14 Thread Mark Difford

Hi Roberto,

It's difficult to comment further on specifics without access to your data
set. A general point is that the output from summary(aov.object) is not
directly comparable with summary(lme.object). The latter gives you a summary
of a fitted linear regression model, not an analysis of variance model, and
what you "see" will depend on what contrasts were in place when the model
was fitted.

If you haven't changed these then they will be so-called treatment
contrasts. What you are seeing for Lobe (which plainly is coded as a factor)
in the output from summary(lme.object) are the regression coefficients for
each level of Lobe relative to its reference/treatment/baseline level, which
is your (Intercept). If you fitted your model with, say, Helmert or
sum-to-zero contrasts then these values would change.

To see what your current reference level is do levels(dataset$Lobe). See
?levels.

What you want to look at to begin with is: anova(lme.object).

HTH, Mark.

roberto toro wrote:
> 
> Thanks for answering Mark!
> 
> I tried with the coding of the interaction you suggested:
> 
>> tfac<-with(vlt,interaction(Lobe,Tissue,drop=T))
>> mod<-lme(Volume~Sex*Lobe*Tissue,random=~1|Subject/tfac,data=vlt)
> 
> But is it normal that the DF are 2303? DF is 2303 even for the estimate of
> LobeO that has only 662 values (331 for Tissue=white and 331 for
> Tissue=grey).
> I'm not sure either that Sex, Lobe and Tissue are correctly handled
> why are
> there different estimates called Sex:LobeO, Sex:LobeP, etc, and not just
> Sex:Lobe as with aov()?. Why there's Tissuew, but not Sex1, for example?
> 
> Thanks again!
> roberto
> 
> ps1. How would you code this with lmer()?
> ps2. this is part of the output of mod<-lme:
>> summary(mod)
> Linear mixed-effects model fit by REML
>  Data: vlt
>AIC  BIClogLik
>   57528.35 57639.98 -28745.17
> 
> Random effects:
>  Formula: ~1 | Subject
> (Intercept)
> StdDev:11294.65
> 
>  Formula: ~1 | tfac %in% Subject
> (Intercept) Residual
> StdDev:10569.03 4587.472
> 
> Fixed effects: Volume ~ Sex * Lobe * Tissue
>Value Std.Error   DFt-value p-value
> (Intercept)245224.61  1511.124 2303  162.27963  0.
> Sex  2800.01  1866.312  3291.50029  0.1345
> LobeO -180794.83  1526.084 2303 -118.46975  0.
> LobeP -131609.27  1526.084 2303  -86.23984  0.
> LobeT  -73189.97  1526.084 2303  -47.95932  0.
> Tissuew-72461.05  1526.084 2303  -47.48168  0.
> Sex:LobeO-663.27  1884.789 2303   -0.35191  0.7249
> Sex:LobeP   -2146.08  1884.789 2303   -1.13863  0.2550
> Sex:LobeT1379.49  1884.789 23030.73191  0.4643
> Sex:Tissuew  5387.65  1884.789 23032.85849  0.0043
> LobeO:Tissuew   43296.99  2158.209 2303   20.06154  0.
> LobeP:Tissuew   50952.21  2158.209 2303   23.60856  0.
> LobeT:Tissuew  -15959.31  2158.209 2303   -7.39470  0.
> Sex:LobeO:Tissuew   -5228.66  2665.494 2303   -1.96161  0.0499
> Sex:LobeP:Tissuew   -1482.83  2665.494 2303   -0.55631  0.5781
> Sex:LobeT:Tissuew   -6037.49  2665.494 2303   -2.26506  0.0236
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Help-please%21-How-to-code-a-mixed-model-with-2-within-subject-factors-using-lme-or-lmer--tp19480815p19481027.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ksvm accessing the slots of S4 object

2008-09-14 Thread Nair, Murlidharan T


I am using kernlab to build svm models. I am not sure how to access the 
different slots of the object. For instance if I want to get the nuber of 
support vectors for each of model I am building and store it in a vector.

>ksvm.model <- ksvm(Class ~ ., data = somedata,kernel = "vanilladot", cross = 
>10, type ="C-svc")
>names(attributes(ksvm.model))
[1] "param"  "scaling""coef"   "alphaindex" "b"
 [6] "obj""SVindex""nSV""prior"  "prob.model"
[11] "alpha"  "type"   "kernelf""kpar"   "xmatrix"
[16] "ymatrix""fitted" "lev""nclass" "error"
[21] "cross"  "n.action"   "terms"  "kcall"  "class"


>ksvm.model
Support Vector Machine object of class "ksvm"

SV type: C-svc  (classification)
 parameter : cost C = 1

Linear (vanilla) kernel function.

Number of Support Vectors : 144

Objective Function Value : -4.3162
Training error : 0
Cross validation error : 0.4


In the above dummy example how do I access the number of support vectors?

I tried the following

ksvm.model$nSV
nSV(ksvm.model)

Thanks ../Murli

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Question on glm.nb vs zeroinfl vs hurdle models

2008-09-14 Thread eugen pircalabelu

Good afternoon, 

  

I’m in need of an advice regarding a proper use of glm.nb, zeroinfl or hurdle 
with my dataframe. 

I can not provide a self-contained example, since I need an advice on this 
current dataset and its “contradictory” results. 

So i have a dataset which contains 1309 cases and 11 variables, highly 
right-skewed and heavily zeroinflated (with over 1100 cases that have 0 value 
for my variables both dependent and independent, eg: variable A has 1220 cases 
with 0 value, variable B has 1283 with 0 value and so on..) 

  

I tried to fit 3 models: glm.nb, zeroinfl and hurdle and I was expecting some 
“similar” results and similar conclusions. 

What was similar was log-likelihood (very close for all 3 models) and the 
number of predicted 0 (which was identical for each model), but what surprised 
me were the following results: 

-glm.nb identified as having an influence the same variables that were 
identified by the hurdle model in the zero-model; 

-zerinfl model identified also d variable as influential; 

  

Now my question is the following: having seen the vignette (Regression Models 
for Count Data in R) I noticed that glm.nb, hurdle and zeroinfl give similar 
results for the count model, while for the zero-component hurdle and zeroifl 
may give slightly more different results, while for my example the count model 
from glm.nb is similar to the zero-component part of hurdle and zeroinfl. Why 
is that? Is there a problem with the fact that my dataset is  extremely 
zero-inflated, and there are few cases with values different from 0? 

  

Any kind of help would be most welcomed 

Thank you and have a great day ahead. 

  

  

  

> summary(aaa) 

  

Call: 

hurdle(formula = as.integer(x) ~ as.integer(a) + as.integer(b) + as.integer(c) 
+ as.integer(d) + as.integer(e) + 

as.integer(f) + as.integer(g) + as.integer(h), data = dep, dist = "negbin") 

  

  

Count model coefficients (truncated negbin with log link): 

   Estimate Std. Error z value Pr(>|z|) 

(Intercept)-0.021780.30753  -0.0710.944 

as.integer(a) -0.488860.54023  -0.9050.366 

as.integer(b)-0.095550.11688  -0.8170.414 

as.integer(c) -0.086540.20809  -0.4160.678 

as.integer(d)  0.174460.16956   1.0290.304 

as.integer(e) 0.271800.55702   0.4880.626 

as.integer(f)0.155120.42721   0.3630.717 

as.integerg)  -0.076870.21750  -0.3530.724 

as.integer(h)   -0.169060.44986  -0.3760.707 

Log(theta) -0.762740.51800  -1.4720.141 

  

Zero hurdle model coefficients (binomial with logit link): 

   Estimate Std. Error z value Pr(>|z|)

(Intercept)-1.134980.07906 -14.356  < 2e-16 *** 

as.integer(a) -0.331340.30239  -1.096  0.27320

as.integer(b)-0.263940.08397  -3.143  0.00167 ** 

as.integer(c)  0.066890.12796   0.523  0.60115

as.integer(d) -0.120450.11984  -1.005  0.31486

as.integer(e)-0.793140.29106  -2.725  0.00643 ** 

as.integer(f)   -0.285470.40790  -0.700  0.48402

as.integer(g)  -0.331860.18887  -1.757  0.07890 .   

as.integer(h)   -0.370080.31035  -1.192  0.23308

--- 

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

  

Theta: count = 0.4664 

Number of iterations in BFGS optimization: 28 

Log-likelihood: -1073 on 19 Df 

  

  

  

> summary(a) 

  

Call: 

glm.nb(formula = as.integer(x) ~ as.integer(a) + 

as.integer(b) + as.integer(c) + as.integer(d) + 

as.integer(e) + as.integer(f) + as.integer(g) + 

as.integer(h), data = dep, init.theta = 0.187836108765364, 

link = log) 

  

Deviance Residuals: 

Min   1Q   Median   3Q  Max  

-0.8607  -0.7236  -0.6809  -0.4610   2.7575  

  

Coefficients: 

   Estimate Std. Error z value Pr(>|z|)

(Intercept)-0.563810.08820  -6.392 1.64e-10 *** 

as.integer(a) -0.515170.33477  -1.539  0.12384

as.integer(b)-0.218350.07250  -3.011  0.00260 ** 

as.integer(c)  0.089200.14546   0.613  0.53974

as.integer(d) -0.017420.10877  -0.160  0.87274

as.integer(e)-0.690850.23446  -2.946  0.00321 ** 

as.integer(f)   -0.141820.42142  -0.337  0.73647

as.integer(g)  -0.249760.15819  -1.579  0.11437

as.integer(h)   -0.376520.30043  -1.253  0.21009

--- 

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

  

(Dispersion parameter for Negative Binomial(0.1878) family taken to be 1) 

  

Null deviance: 707.18  on 1308  degrees of freedom 

Residual deviance: 677.09  on 1300  degrees of freedom 

AIC: 2181.5 

  

Number of Fisher Scoring iterations: 1 

  

  

  Theta:  0.1878 

  Std. Err.:  0.0186 

Warning while fitting theta: alternation limit reached 

> summary(aa) 
Call: 

zeroinfl(formula = a

[R] Fetching a range of columns

2008-09-14 Thread Jason Thibodeau

Hello,

I realize that using: x[x > 3 & x < 5] I can fetch all elements between 3
and 5. However I read in from a CSV file, and I would like to fetch all
columns from within a range ( 842-2411). In teh past, I have done this to
fetch just select few columns:

data <- read.csv(filein, header=TRUE, nrows=320, skip=nskip)
data_filter <- data[c(2,12,17)]
write.table(data_filter, fileout, append = TRUE,
sep= ",", row.names= FALSE, col.names = FALSE)
nskip <- nskip+320

This time, however, instead of grabbing columns 2, 12, 17, I woudl like all
columns in the range of 842-2411. I can't seem to do this correctly. Could
somebody please provide some insight? Thanks in advance.

-- 
Jason Thibodeau

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] library instal

2008-09-14 Thread John Kane

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--- On Fri, 9/12/08, gaurav1983 <[EMAIL PROTECTED]> wrote:

> From: gaurav1983 <[EMAIL PROTECTED]>
> Subject: [R]  library instal
> To: r-help@r-project.org
> Received: Friday, September 12, 2008, 6:43 AM
> I am finding real trouble in installing evd library in R for
> linux
> -- 
> View this message in context:
> http://www.nabble.com/library-instal-tp19453453p19453453.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.

  __
[[elided Yahoo spam]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] difference of two data frames

2008-09-14 Thread joseph

Hello
I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1:
DF1= data.frame(V1=1:6, V2= letters[1:6])
DF2= data.frame(V1=1:3, V2= letters[1:3])
How do I create a new data frame of the difference between DF1 and DF2
newDF=data.frame(V1=4:6, V2= letters[4:6])
In my real data, the rows are not in order as in the example I provided.
Thanks much
Joseph


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] difference of two data frames

2008-09-14 Thread Jorge Ivan Velez

Hi Joseph,
Try this:

DF1[!DF1$V1%in%DF2$V1,]
subset(DF1,!V1%in%DF2$V1)

HTH,

Jorge


On Sun, Sep 14, 2008 at 12:49 PM, joseph <[EMAIL PROTECTED]> wrote:

> Hello
> I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1:
> DF1= data.frame(V1=1:6, V2= letters[1:6])
> DF2= data.frame(V1=1:3, V2= letters[1:3])
> How do I create a new data frame of the difference between DF1 and DF2
> newDF=data.frame(V1=4:6, V2= letters[4:6])
> In my real data, the rows are not in order as in the example I provided.
> Thanks much
> Joseph
>
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] difference of two data frames

2008-09-14 Thread joseph

Hi Mark
as you guessed it, I meant a dataframe of the rows in DF1 that are not in DF2 .

Here is what I got:
> complement<-setdiff(DF1$V2,DF2$V2)
> DF1[,complement]
Error in `[.data.frame`(DF1, , complement) : undefined columns selected
> 


- Original Message 
From: Mark Leeds <[EMAIL PROTECTED]>
To: joseph <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Sent: Sunday, September 14, 2008 10:07:48 AM
Subject: RE: [R] difference of two data frames

Hi: If you mean a dataframe of the rows in DF1 that are not in DF2 , then I
think below will work for the letters, which , according to what I'm
understanding, will also make it work for the rows so no need to
consider the numbers ?

complement<-setdiff(DF1$V2,DF2$V2)
DFnew<=DF1[,complement)

But, 3 things to consider:

1) I'm not sure if I understand the problem. 

2) I'm also at home and I don't use R here so I can't test it.  

3) I'm also not sure about the order of the setdiff operation so you may
have to switch the order of the two columns I used.

Atleast,  it will get you started though and I'm confident someone else will
answer. Good luck.





-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of joseph
Sent: Sunday, September 14, 2008 12:50 PM
To: r-help@r-project.org
Cc: r-help@r-project.org
Subject: [R] difference of two data frames

Hello
I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1:
DF1= data.frame(V1=1:6, V2= letters[1:6])
DF2= data.frame(V1=1:3, V2= letters[1:3])
How do I create a new data frame of the difference between DF1 and DF2
newDF=data.frame(V1=4:6, V2= letters[4:6])
In my real data, the rows are not in order as in the example I provided.
Thanks much
Joseph


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] string functions

2008-09-14 Thread zubin

Hello, trying to locate all the string commands in the base version of 
R, can't seem to find an area that describes them. I am in need to do 
some serious parsing of text data to create my dataset.  Is there a 
summary link to all the character operators?  string manipulations that 
would help in parsing text.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] difference of two data frames

2008-09-14 Thread joseph

Hi Jorge
both commands work; 
can you extend it to several coulmns?  the reason I am asking is that in my 
real data the uniqueness of the rows is made of all the columns; in other words 
V1 might have duplicates.
Thanks




- Original Message 
From: Jorge Ivan Velez <[EMAIL PROTECTED]>
To: joseph <[EMAIL PROTECTED]>
Cc: r-help@r-project.org
Sent: Sunday, September 14, 2008 10:23:33 AM
Subject: Re: [R] difference of two data frames



Hi Joseph,

Try this:


DF1[!DF1$V1%in%DF2$V1,]

subset(DF1,!V1%in%DF2$V1)


HTH,

Jorge


On Sun, Sep 14, 2008 at 12:49 PM, joseph <[EMAIL PROTECTED]> wrote:

Hello
I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1:
DF1= data.frame(V1=1:6, V2= letters[1:6])
DF2= data.frame(V1=1:3, V2= letters[1:3])
How do I create a new data frame of the difference between DF1 and DF2
newDF=data.frame(V1=4:6, V2= letters[4:6])
In my real data, the rows are not in order as in the example I provided.
Thanks much
Joseph



   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fetching a range of columns

2008-09-14 Thread David Winsemius



On Sep 14, 2008, at 12:22 PM, Jason Thibodeau wrote:


Hello,

I realize that using: x[x > 3 & x < 5] I can fetch all elements  
between 3
and 5. However I read in from a CSV file, and I would like to fetch  
all
columns from within a range ( 842-2411). In teh past, I have done  
this to

fetch just select few columns:

data <- read.csv(filein, header=TRUE, nrows=320, skip=nskip)
   data_filter <- data[c(2,12,17)]
   write.table(data_filter, fileout, append =  
TRUE,

sep= ",", row.names= FALSE, col.names = FALSE)
   nskip <- nskip+320

This time, however, instead of grabbing columns 2, 12, 17, I woudl  
like all
columns in the range of 842-2411. I can't seem to do this correctly.  
Could

somebody please provide some insight? Thanks in advance.


Have your tried:

data_filter <- data[seq(842,2411)]
write.table(data_filter, fileout, append = TRUE, sep= ",", row.names=  
FALSE, col.names = FALSE)


When I use that format on a dataframe I have lying around, I get the  
expected results and I do not find in testing that dataframes are  
challenged by assigning 5000 columns.


--
David Winsemius

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fetching a range of columns

2008-09-14 Thread jim holtman

Have you tried:

data_filter <- data[842:2411]

Also if you have a lot of data to read, I would suggest that you use a
connection, and it all the data is numeric, possibly 'scan'.  If you
do use a connection, this would eliminate having to 'skip' each time
which could be time consuming on a large file.  Since it appears that
you are not writing out the column names in the output file, you could
bypass the header line on the file by readLine after the open.  So
something like this might work:

input <- file('yourfile','r')
invisible(readLines(input, n=1))  # skip the header
while (TRUE){  # read file
x <- try(read.csv(input, n=320, header=FALSE), silent=TRUE)  # catch EOF
if (inherits(x, 'try-error')) break
write.csv(...)
}

On Sun, Sep 14, 2008 at 12:22 PM, Jason Thibodeau <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I realize that using: x[x > 3 & x < 5] I can fetch all elements between 3
> and 5. However I read in from a CSV file, and I would like to fetch all
> columns from within a range ( 842-2411). In teh past, I have done this to
> fetch just select few columns:
>
> data <- read.csv(filein, header=TRUE, nrows=320, skip=nskip)
>data_filter <- data[c(2,12,17)]
>write.table(data_filter, fileout, append = TRUE,
> sep= ",", row.names= FALSE, col.names = FALSE)
>nskip <- nskip+320
>
> This time, however, instead of grabbing columns 2, 12, 17, I woudl like all
> columns in the range of 842-2411. I can't seem to do this correctly. Could
> somebody please provide some insight? Thanks in advance.
>
> --
> Jason Thibodeau
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] string functions

2008-09-14 Thread David Winsemius



On Sep 14, 2008, at 1:53 PM, zubin wrote:

Hello, trying to locate all the string commands in the base version  
of R, can't seem to find an area that describes them. I am in need  
to do some serious parsing of text data to create my dataset.  Is  
there a summary link to all the character operators?  string  
manipulations that would help in parsing text.


A bit of use of the ? operator on paste and strsplt produces (among  
other things:


See Also
String manipulation with as.character, substr, nchar, strsplit;  
further, cat which concatenates and writes to a file, and sprintf for  
C like string construction.


See Also
paste for the reverse, grep and sub for string search and  
manipulation; further nchar, substr.


You might look at the results of:

help.search("string")

help.search("character")

--

David Winsemius

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] string functions

2008-09-14 Thread jim holtman

Start with

?grep

and then follow the "See Also".  Exactly what type of serious parsing
are you trying to do?  R can do some, but if it is very complex, you
might want to consider awk/perl.

On Sun, Sep 14, 2008 at 1:53 PM, zubin <[EMAIL PROTECTED]> wrote:
> Hello, trying to locate all the string commands in the base version of R,
> can't seem to find an area that describes them. I am in need to do some
> serious parsing of text data to create my dataset.  Is there a summary link
> to all the character operators?  string manipulations that would help in
> parsing text.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fetching a range of columns

2008-09-14 Thread Jason Thibodeau

Jim, this is a GREAT help. I was trying something similar before, but I was
unable to detect EOF. Thanks for the help!

Also, David, your suggestion worked perfectly.

Thanks for all the help, everyone!

On Sun, Sep 14, 2008 at 2:08 PM, jim holtman <[EMAIL PROTECTED]> wrote:

> Have you tried:
>
> data_filter <- data[842:2411]
>
> Also if you have a lot of data to read, I would suggest that you use a
> connection, and it all the data is numeric, possibly 'scan'.  If you
> do use a connection, this would eliminate having to 'skip' each time
> which could be time consuming on a large file.  Since it appears that
> you are not writing out the column names in the output file, you could
> bypass the header line on the file by readLine after the open.  So
> something like this might work:
>
> input <- file('yourfile','r')
> invisible(readLines(input, n=1))  # skip the header
> while (TRUE){  # read file
>x <- try(read.csv(input, n=320, header=FALSE), silent=TRUE)  # catch EOF
>if (inherits(x, 'try-error')) break
>write.csv(...)
> }
>
>
>
> On Sun, Sep 14, 2008 at 12:22 PM, Jason Thibodeau <[EMAIL PROTECTED]>
> wrote:
> > Hello,
> >
> > I realize that using: x[x > 3 & x < 5] I can fetch all elements between 3
> > and 5. However I read in from a CSV file, and I would like to fetch all
> > columns from within a range ( 842-2411). In teh past, I have done this to
> > fetch just select few columns:
> >
> > data <- read.csv(filein, header=TRUE, nrows=320, skip=nskip)
> >data_filter <- data[c(2,12,17)]
> >write.table(data_filter, fileout, append = TRUE,
> > sep= ",", row.names= FALSE, col.names = FALSE)
> >nskip <- nskip+320
> >
> > This time, however, instead of grabbing columns 2, 12, 17, I woudl like
> all
> > columns in the range of 842-2411. I can't seem to do this correctly.
> Could
> > somebody please provide some insight? Thanks in advance.
> >
> > --
> > Jason Thibodeau
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>



-- 
Jason Thibodeau

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] string functions

2008-09-14 Thread Gabor Grothendieck

Try this:

 help.search(keyword = "character", package = "base")

Then read each of the pages listed to get info on the indicated command
plus related commands also described on those pages (but not necessarily
listed in the help.search list).

You might also want to look at the gsubfn package and its vignette (i.e. its pdf
document).  The gsubfn and strapply commands in that package can be used for
certain parsing tasks.  Its home page is at:
http://gsubfn.googlecode.com

On Sun, Sep 14, 2008 at 1:53 PM, zubin <[EMAIL PROTECTED]> wrote:
> Hello, trying to locate all the string commands in the base version of R,
> can't seem to find an area that describes them. I am in need to do some
> serious parsing of text data to create my dataset.  Is there a summary link
> to all the character operators?  string manipulations that would help in
> parsing text.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] difference of two data frames

2008-09-14 Thread Adaikalavan Ramasamy

It would be useful to have indexed both dataframes with a unique 
identifier, such as in rownames etc.


Without that information, you could possibly try to use the same 
approach as duplicated() does by "pasting together a character 
representation of rows" using "|" (or any other separator).


   keys1 <- apply(DF1, 1, paste, collapse="|")
   keys1
   [1] "1|a" "2|b" "3|c" "4|d" "5|e" "6|f"
   duplicated(keys1)
   [1] FALSE FALSE FALSE FALSE FALSE FALSE

   keys2 <- apply(DF2, 1, paste, collapse="|")
   keys2
   [1] "1|a" "2|b" "3|c"
   duplicated(keys2)
   [1] FALSE FALSE FALSE

The duplicated part is neccessary to ensure the key generated is truly 
unique. You might want to experiment and see if you can create a unique 
key using just a few columns.



   keys1 %in% keys2
   [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

   w <- setdiff( keys1, keys2 )
   DF1[ w, ]
  V1 V2
   4  4  d
   5  5  e
   6  6  f

Regards, Adai



joseph wrote:

Hi Jorge
both commands work; 
can you extend it to several coulmns?  the reason I am asking is that in my real data the uniqueness of the rows is made of all the columns; in other words V1 might have duplicates.

Thanks




- Original Message 
From: Jorge Ivan Velez <[EMAIL PROTECTED]>
To: joseph <[EMAIL PROTECTED]>
Cc: r-help@r-project.org
Sent: Sunday, September 14, 2008 10:23:33 AM
Subject: Re: [R] difference of two data frames



Hi Joseph,

Try this:


DF1[!DF1$V1%in%DF2$V1,]

subset(DF1,!V1%in%DF2$V1)


HTH,

Jorge


On Sun, Sep 14, 2008 at 12:49 PM, joseph <[EMAIL PROTECTED]> wrote:

Hello
I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1:
DF1= data.frame(V1=1:6, V2= letters[1:6])
DF2= data.frame(V1=1:3, V2= letters[1:3])
How do I create a new data frame of the difference between DF1 and DF2
newDF=data.frame(V1=4:6, V2= letters[4:6])
In my real data, the rows are not in order as in the example I provided.
Thanks much
Joseph



   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  
	[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] difference of two data frames

2008-09-14 Thread joseph

Actually you got it, the data sets you created are a perfect example (row#1 and 
row#2 in DF1 have the same V1 and differ only in V2) , but here is the problem:
row#2 in DF1 exists in DF1 and not in DF2, however it does not show in the 
Difference. It seems to me that both V1 and V2 should be considered when 
calculating the difference.



- Original Message 
From: Jorge Ivan Velez <[EMAIL PROTECTED]>
To: joseph <[EMAIL PROTECTED]>
Sent: Sunday, September 14, 2008 11:14:11 AM
Subject: Re: [R] difference of two data frames



Hi Joseph,

I'm not sure if I understood your point, but try this:

# Data sets
DF1= data.frame(V1=c(1,1,2,3,3,4,5,5,6), V2= letters[1:9])
DF2= data.frame(V1=1:3, V2= letters[1:3])

# Difference
DF1[! DF1$V1 %in% DF2$V1,]



HTH,

Jorge




On Sun, Sep 14, 2008 at 1:57 PM, joseph <[EMAIL PROTECTED]> wrote:

Hi Jorge
both commands work; 
can you extend it to several coulmns?  the reason I am asking is that in my 
real data the uniqueness of the rows is made of all the columns; in other words 
V1 might have duplicates.
Thanks




- Original Message 
From: Jorge Ivan Velez <[EMAIL PROTECTED]>
To: joseph <[EMAIL PROTECTED]>

Cc: r-help@r-project.org
Sent: Sunday, September 14, 2008 10:23:33 AM
Subject: Re: [R] difference of two data frames



Hi Joseph,

Try this:


DF1[!DF1$V1%in%DF2$V1,]

subset(DF1,!V1%in%DF2$V1)


HTH,

Jorge


On Sun, Sep 14, 2008 at 12:49 PM, joseph <[EMAIL PROTECTED]> wrote:

Hello
I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1:
DF1= data.frame(V1=1:6, V2= letters[1:6])
DF2= data.frame(V1=1:3, V2= letters[1:3])
How do I create a new data frame of the difference between DF1 and DF2
newDF=data.frame(V1=4:6, V2= letters[4:6])
In my real data, the rows are not in order as in the example I provided.
Thanks much
Joseph



   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] string functions

2008-09-14 Thread David Winsemius



On Sep 14, 2008, at 1:53 PM, zubin wrote:

Hello, trying to locate all the string commands in the base version  
of R, can't seem to find an area that describes them. I am in need  
to do some serious parsing of text data to create my dataset.  Is  
there a summary link to all the character operators?  string  
manipulations that would help in parsing text.


A further thought would be to look at the Natural Language Processing  
TaskView:


http://cran.r-project.org/web/views/NaturalLanguageProcessing.html

--
David Winsemius, MD
Heritage Laboratories

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fetching a range of columns

2008-09-14 Thread Jason Thibodeau

 TEST_filter("line50grab.csv","line50grab_filterout.csv")
Error in `[.data.frame`(data_tmp, seq(842, 2411)) :
  undefined columns selected

I know my file has about 3000 columns.

This happened when I used:
data_tmp <- read.csv(filein, header=TRUE, nrows=10, skip=nskip)
data_filter <- data_tmp[seq(842,2411)]
write.table(data_filter, fileout, append = TRUE,
sep= ",", row.names= FALSE, col.names = FALSE)

Also using data_tmp[842:2411] did not yield any output being written to my
file.

I have another slightly unrelated problem, but I'll propose that after this
one can be solved.

Thanks a lot.

On Sun, Sep 14, 2008 at 2:14 PM, Jason Thibodeau <[EMAIL PROTECTED]>wrote:

> Jim, this is a GREAT help. I was trying something similar before, but I was
> unable to detect EOF. Thanks for the help!
>
> Also, David, your suggestion worked perfectly.
>
> Thanks for all the help, everyone!
>
>
> On Sun, Sep 14, 2008 at 2:08 PM, jim holtman <[EMAIL PROTECTED]> wrote:
>
>> Have you tried:
>>
>> data_filter <- data[842:2411]
>>
>> Also if you have a lot of data to read, I would suggest that you use a
>> connection, and it all the data is numeric, possibly 'scan'.  If you
>> do use a connection, this would eliminate having to 'skip' each time
>> which could be time consuming on a large file.  Since it appears that
>> you are not writing out the column names in the output file, you could
>> bypass the header line on the file by readLine after the open.  So
>> something like this might work:
>>
>> input <- file('yourfile','r')
>> invisible(readLines(input, n=1))  # skip the header
>> while (TRUE){  # read file
>>x <- try(read.csv(input, n=320, header=FALSE), silent=TRUE)  # catch
>> EOF
>>if (inherits(x, 'try-error')) break
>>write.csv(...)
>> }
>>
>>
>>
>> On Sun, Sep 14, 2008 at 12:22 PM, Jason Thibodeau <[EMAIL PROTECTED]>
>> wrote:
>> > Hello,
>> >
>> > I realize that using: x[x > 3 & x < 5] I can fetch all elements between
>> 3
>> > and 5. However I read in from a CSV file, and I would like to fetch all
>> > columns from within a range ( 842-2411). In teh past, I have done this
>> to
>> > fetch just select few columns:
>> >
>> > data <- read.csv(filein, header=TRUE, nrows=320, skip=nskip)
>> >data_filter <- data[c(2,12,17)]
>> >write.table(data_filter, fileout, append = TRUE,
>> > sep= ",", row.names= FALSE, col.names = FALSE)
>> >nskip <- nskip+320
>> >
>> > This time, however, instead of grabbing columns 2, 12, 17, I woudl like
>> all
>> > columns in the range of 842-2411. I can't seem to do this correctly.
>> Could
>> > somebody please provide some insight? Thanks in advance.
>> >
>> > --
>> > Jason Thibodeau
>> >
>> >[[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>
>
>
> --
> Jason Thibodeau
>



-- 
Jason Thibodeau

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fetching a range of columns

2008-09-14 Thread David Winsemius



On Sep 14, 2008, at 4:01 PM, Jason Thibodeau wrote:


TEST_filter("line50grab.csv","line50grab_filterout.csv")
Error in `[.data.frame`(data_tmp, seq(842, 2411)) :
 undefined columns selected



I am guessing that you wrapped some code into a function but you did  
not provide the function. You are not really following the posting  
guidelines here.




I know my file has about 3000 columns.

This happened when I used:
data_tmp <- read.csv(filein, header=TRUE, nrows=10, skip=nskip)
   data_filter <- data_tmp[seq(842,2411)]
   write.table(data_filter, fileout, append =  
TRUE,

sep= ",", row.names= FALSE, col.names = FALSE)

Also using data_tmp[842:2411] did not yield any output being written  
to my

file.


Not a big surprise. Appears the error preceded the write.table call.

I have another slightly unrelated problem, but I'll propose that  
after this

one can be solved.


If the problem is not with the syntax or semantics of TEST_filter as I  
suspect, then perhaps you should examine the input file from R's  
perspective with:


?count.fields

Hard to tell without the actual code and sample data.

--
David Winsemius




Thanks a lot.

On Sun, Sep 14, 2008 at 2:14 PM, Jason Thibodeau  
<[EMAIL PROTECTED]>wrote:


Jim, this is a GREAT help. I was trying something similar before,  
but I was

unable to detect EOF. Thanks for the help!

Also, David, your suggestion worked perfectly.

Thanks for all the help, everyone!


On Sun, Sep 14, 2008 at 2:08 PM, jim holtman <[EMAIL PROTECTED]>  
wrote:



Have you tried:

data_filter <- data[842:2411]

Also if you have a lot of data to read, I would suggest that you  
use a

connection, and it all the data is numeric, possibly 'scan'.  If you
do use a connection, this would eliminate having to 'skip' each time
which could be time consuming on a large file.  Since it appears  
that
you are not writing out the column names in the output file, you  
could

bypass the header line on the file by readLine after the open.  So
something like this might work:

input <- file('yourfile','r')
invisible(readLines(input, n=1))  # skip the header
while (TRUE){  # read file
  x <- try(read.csv(input, n=320, header=FALSE), silent=TRUE)  #  
catch

EOF
  if (inherits(x, 'try-error')) break
  write.csv(...)
}



On Sun, Sep 14, 2008 at 12:22 PM, Jason Thibodeau <[EMAIL PROTECTED] 
>

wrote:

Hello,

I realize that using: x[x > 3 & x < 5] I can fetch all elements  
between

3
and 5. However I read in from a CSV file, and I would like to  
fetch all
columns from within a range ( 842-2411). In teh past, I have done  
this

to

fetch just select few columns:

data <- read.csv(filein, header=TRUE, nrows=320, skip=nskip)
  data_filter <- data[c(2,12,17)]
  write.table(data_filter, fileout, append =  
TRUE,

sep= ",", row.names= FALSE, col.names = FALSE)
  nskip <- nskip+320

This time, however, instead of grabbing columns 2, 12, 17, I  
woudl like

all
columns in the range of 842-2411. I can't seem to do this  
correctly.

Could

somebody please provide some insight? Thanks in advance.

--
Jason Thibodeau

  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?





--
Jason Thibodeau





--
Jason Thibodeau

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] moving from aov() to lmer()

2008-09-14 Thread Adam D. I. Kramer



On Sat, 13 Sep 2008, roberto toro wrote:


Hello,
I've used this command to analyse changes in brain volume:

mod1<-aov(Volume~Sex*Lobe*Tissue+Error(Subject/(Lobe*Tissue)),data.vslt)

I'm comparing males/females. For every subject I have 8 volume measurements
(4 different brain lobes and 2 different tissues (grey/white matter)).

As aov() provides only type I anovas, I would like to use lmer() with type
II, however, I have struggled to find the right syntaxis.

How should I write the model I use with aov() using lmer()??

Specifying Subject as a random effect is straightforward

mod2<-lmer(Volume~Sex*Lobe*Tissue+(1|Subject),data.vslt)

but I can't figure out the /(Lobe*Tissue) part...


You're trying to model a separate effect of lobe, of tissue, and of the
interaction between lobe and tissue for each subject, so you want

mod2<-lmer(Volume~Sex*Lobe*Tissue+(Lobe*Tissue|Subject),data.vslt)

...the resulting fixed effect for Lobe, Tissue, and L:T in the summary()
then corresponds to the within-subjects effect aggregated (but not exactly
AVERAGED) across subjects. So, it's not exactly providing you a Type II
ANOVA...it's doing a mixed-effects model (or HLM, if you prefer), which as
you've written it is a Type III analysis (though once again, not an ANOVA in
the classical sense).

To get something more akin to type II using the lmer function (and I trust
someone will pipe up if there is a better way), you could first fit

mod2.additive<-lmer(Volume~Sex*Lobe+Tissue+(Lobe+Tissue|Subject),data.vslt)

...and interpret the coefficients and effects provided by it, then fit the
crossed model to get the coefficients and effects for the higher-order
terms.

I hope this made sense and that I have understood you correctly.

--Adam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] histogram

2008-09-14 Thread Felipe

i calculated the density and wanna do something like this

separate in 0-19-29-39-49-59-69-79-99
and put in these spaces 8 densities .. 0.something
i have the frequency in % and divided already in 20 or 10 to get the density

i tried and tried..made breaks vector to separate but couldn't put the other
vector with the frequency density onit directly

anyone know how to do it??

tks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help please! How to code a mixed-model with 2 within-subject factors using lme or lmer?

2008-09-14 Thread Adam D. I. Kramer



On Sun, 14 Sep 2008, roberto toro wrote:


Thanks for answering Mark!

I tried with the coding of the interaction you suggested:


tfac<-with(vlt,interaction(Lobe,Tissue,drop=T))
mod<-lme(Volume~Sex*Lobe*Tissue,random=~1|Subject/tfac,data=vlt)


But is it normal that the DF are 2303? DF is 2303 even for the estimate of
LobeO that has only 662 values (331 for Tissue=white and 331 for
Tissue=grey). I'm not sure either that Sex, Lobe and Tissue are correctly
handled why are there different estimates called Sex:LobeO, Sex:LobeP,
etc, and not just Sex:Lobe as with aov()?. Why there's Tissuew, but not
Sex1, for example?


lme is basically doing a regression, not an ANOVA as you're used to it. You
may want anova(mod) instead of summary(mod) to see aggregated effects. Or,
you could define contrasts among your levels by assigning to
contrasts(vlt$Lobe), for example.

Also, in the above model, you're only looking at modeling a separate average
volume for each subject-within-tfac; if I read you correctly, you actually
want to model a lobe and tissue effect for each subject for each tfac, in
which case you would want something like what was in my last post.

--Adam



Thanks again!
roberto

ps1. How would you code this with lmer()?
ps2. this is part of the output of mod<-lme:

summary(mod)

Linear mixed-effects model fit by REML
Data: vlt
  AIC  BIClogLik
 57528.35 57639.98 -28745.17

Random effects:
Formula: ~1 | Subject
   (Intercept)
StdDev:11294.65

Formula: ~1 | tfac %in% Subject
   (Intercept) Residual
StdDev:10569.03 4587.472

Fixed effects: Volume ~ Sex * Lobe * Tissue
  Value Std.Error   DFt-value p-value
(Intercept)245224.61  1511.124 2303  162.27963  0.
Sex  2800.01  1866.312  3291.50029  0.1345
LobeO -180794.83  1526.084 2303 -118.46975  0.
LobeP -131609.27  1526.084 2303  -86.23984  0.
LobeT  -73189.97  1526.084 2303  -47.95932  0.
Tissuew-72461.05  1526.084 2303  -47.48168  0.
Sex:LobeO-663.27  1884.789 2303   -0.35191  0.7249
Sex:LobeP   -2146.08  1884.789 2303   -1.13863  0.2550
Sex:LobeT1379.49  1884.789 23030.73191  0.4643
Sex:Tissuew  5387.65  1884.789 23032.85849  0.0043
LobeO:Tissuew   43296.99  2158.209 2303   20.06154  0.
LobeP:Tissuew   50952.21  2158.209 2303   23.60856  0.
LobeT:Tissuew  -15959.31  2158.209 2303   -7.39470  0.
Sex:LobeO:Tissuew   -5228.66  2665.494 2303   -1.96161  0.0499
Sex:LobeP:Tissuew   -1482.83  2665.494 2303   -0.55631  0.5781
Sex:LobeT:Tissuew   -6037.49  2665.494 2303   -2.26506  0.0236

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fetching a range of columns

2008-09-14 Thread Adam D. I. Kramer


Hi Jason,

data[] is a data frame, remember--you need to specify rows AND columns. So,
data[,c(2,12,17)] is what you should be doing in the first place, and
data[,842:2411] in the second place.

Not sure if the help you needed was using the comma, or the : syntax, or if
you're trying to read only certain columns during the read.csv process
(which I don't think that's possible).

--Adam

On Sun, 14 Sep 2008, Jason Thibodeau wrote:


Hello,

I realize that using: x[x > 3 & x < 5] I can fetch all elements between 3
and 5. However I read in from a CSV file, and I would like to fetch all
columns from within a range ( 842-2411). In teh past, I have done this to
fetch just select few columns:

data <- read.csv(filein, header=TRUE, nrows=320, skip=nskip)
   data_filter <- data[c(2,12,17)]
   write.table(data_filter, fileout, append = TRUE,
sep= ",", row.names= FALSE, col.names = FALSE)
   nskip <- nskip+320

This time, however, instead of grabbing columns 2, 12, 17, I woudl like all
columns in the range of 842-2411. I can't seem to do this correctly. Could
somebody please provide some insight? Thanks in advance.

--
Jason Thibodeau

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] k-sample Kolmogorov-Smirnov test?

2008-09-14 Thread Mark Na

Hello, I would like to conduct a k-sample K-S test, but cannot find
reference to its implementation in R. Does anyone have experience with this?
Thanks, Mark

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fetching a range of columns

2008-09-14 Thread David Winsemius


On Sep 14, 2008, at 4:40 PM, Jason Thibodeau wrote:

> I cannot provide (all) the sample data (NDA) but here is the entire  
> function:
> TEST_filter <- function(filein,fileout)
>
> {
> file.remove(fileout)
> nskip<-0
> while(1)
> {
> data_tmp <- read.csv(filein, header=TRUE,  
> nrows=10, skip=nskip)
>
> data_filter <- data_tmp[842,2411]

Looks like you forgot a few syntactically essential items here:

data_tmp[842,2411] would only be the 842nd row in the 2411st column.  
And, since you only have 10 rows, you got an informative error. I

>
> write.table(data_filter, fileout, append =  
> TRUE, sep= ",", row.names= FALSE, col.names = FALSE)
> nskip <- nskip+10
> }
>
> }

You also say file.remove( fileout) , then you try to append to  
fileout. Does that make sense?

-- 
David Winsemius, MD
Heritage Laboratories

>
>
> Thanks for the help.
>
> On Sun, Sep 14, 2008 at 4:24 PM, David Winsemius <[EMAIL PROTECTED] 
> > wrote:
>
> On Sep 14, 2008, at 4:01 PM, Jason Thibodeau wrote:
>
> TEST_filter("line50grab.csv","line50grab_filterout.csv")
> Error in `[.data.frame`(data_tmp, seq(842, 2411)) :
>  undefined columns selected
>
>
> I am guessing that you wrapped some code into a function but you did  
> not provide the function. You are not really following the posting  
> guidelines here.
>
>
>
> I know my file has about 3000 columns.
>
> This happened when I used:
> data_tmp <- read.csv(filein, header=TRUE, nrows=10, skip=nskip)
>   data_filter <- data_tmp[seq(842,2411)]
>   write.table(data_filter, fileout, append = TRUE,
> sep= ",", row.names= FALSE, col.names = FALSE)
>
> Also using data_tmp[842:2411] did not yield any output being written  
> to my
> file.
>
> Not a big surprise. Appears the error preceded the write.table call.
>
>
> I have another slightly unrelated problem, but I'll propose that  
> after this
> one can be solved.
>
> If the problem is not with the syntax or semantics of TEST_filter as  
> I suspect, then perhaps you should examine the input file from R's  
> perspective with:
>
> ?count.fields
>
> Hard to tell without the actual code and sample data.
>
> -- 
> David Winsemius
>
>
>
>
> Thanks a lot.
>
> On Sun, Sep 14, 2008 at 2:14 PM, Jason Thibodeau  
> <[EMAIL PROTECTED]>wrote:
>
> Jim, this is a GREAT help. I was trying something similar before,  
> but I was
> unable to detect EOF. Thanks for the help!
>
> Also, David, your suggestion worked perfectly.
>
> Thanks for all the help, everyone!
>
>
> On Sun, Sep 14, 2008 at 2:08 PM, jim holtman <[EMAIL PROTECTED]>  
> wrote:
>
> Have you tried:
>
> data_filter <- data[842:2411]
>
> Also if you have a lot of data to read, I would suggest that you use a
> connection, and it all the data is numeric, possibly 'scan'.  If you
> do use a connection, this would eliminate having to 'skip' each time
> which could be time consuming on a large file.  Since it appears that
> you are not writing out the column names in the output file, you could
> bypass the header line on the file by readLine after the open.  So
> something like this might work:
>
> input <- file('yourfile','r')
> invisible(readLines(input, n=1))  # skip the header
> while (TRUE){  # read file
>  x <- try(read.csv(input, n=320, header=FALSE), silent=TRUE)  # catch
> EOF
>  if (inherits(x, 'try-error')) break
>  write.csv(...)
> }
>
>
>
> On Sun, Sep 14, 2008 at 12:22 PM, Jason Thibodeau  
> <[EMAIL PROTECTED]>
> wrote:
> Hello,
>
> I realize that using: x[x > 3 & x < 5] I can fetch all elements  
> between
> 3
> and 5. However I read in from a CSV file, and I would like to fetch  
> all
> columns from within a range ( 842-2411). In teh past, I have done this
> to
> fetch just select few columns:
>
> data <- read.csv(filein, header=TRUE, nrows=320, skip=nskip)
>  data_filter <- data[c(2,12,17)]
>  write.table(data_filter, fileout, append = TRUE,
> sep= ",", row.names= FALSE, col.names = FALSE)
>  nskip <- nskip+320
>
> This time, however, instead of grabbing columns 2, 12, 17, I woudl  
> like
> all
> columns in the range of 842-2411. I can't seem to do this correctly.
> Could
> somebody please provide some insight? Thanks in advance.
>
> --
> Jason Thibodeau
>
>  [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
>
>
>
> --
> Jason Thibodeau
>
>
>
>
> -- 
> Jason Thibodeau
>
>[[alternative HTML version deleted]]
>
> _

[R] using R for accessing web site data -

2008-09-14 Thread zubin

Hello, what's the most efficient way of using R to automate a data 
collection task i have:


-Login into a web site using my ID and PWD
-submit a query within the site using the search form after login
-extract the result of the search data into R so i can cleanse and use 
for analysis


kind of like a web scraping task, but like to do this in R.   I checked 
out RCurl, this seems very low level?


This leads to using R to perform mashups of various sites for data 
analysis. 


-zubin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fw: Complex sampling survey _ Use of survey package

2008-09-14 Thread Thomas Lumley

On Fri, 12 Sep 2008, Ahoussou Sylvie wrote:

--
From: "Ahoussou Sylvie" <[EMAIL PROTECTED]>
Sent: Friday, September 12, 2008 9:48 AM
To: "Thomas Lumley" <[EMAIL PROTECTED]>
Subject: Re: [R] Complex sampling survey _ Use of survey package

Thanks for your answer

I think I made a mistake when I recopied the 5 first rows of my database

here is the table with the comlums of interest

num esp fpc1 Totanim Id_An
2045 G 551 12 10
2046 C 551 68 11
2070 G 551 9 50
2070 S 551 9 51
2070 S 551 9 52

yes Totanim is the total number of animals in the farm and num is the total 
number of herds

Do you mean 'fpc1 is the total number of herds'? That is what your 
svydesign() call says.

I keep on obtaining this error message

clustot<-svydesign(id=~num+ ~ Id_An, fpc=~fpc1+~Totanim, data=tab1)

Erreur dans as.fpc(fpc, strata, ids) :
 FPC implies >100% sampling in some strata.

Well, we seem to have either a bug or a problem with the data.

If you do
  options(error=recover)
before the svydesign() call you can go into as.fpc() and look at the data.

As an example;

Error in as.fpc(fpc, strata, ids) :
  FPC implies >100% sampling in some strata.

Enter a frame number, or 0 to exit

1: svydesign(id = ~dnum + snum, fpc = ~fpc1 + I(pmin(fpc2, 4)), data = 
apiclus2)
2: svydesign.default(id = ~dnum + snum, fpc = ~fpc1 + I(pmin(fpc2, 4)), 
data = apiclus2)

3: as.fpc(fpc, strata, ids)
Selection: 3
Called from: eval(expr, envir, enclos)
Browse[1]> which(sampsize>popsize, arr.ind=TRUE)
row col
22   22   2
23   23   2
24   24   2
...

Browse[1]> sampsize[22,2]
[1] 5
Browse[1]> popsize[22,2]
[1] 4
Browse[1]> ids[22,]
   dnumsnum
22  200 200.841

So in this case one of the problems is in dnum 200, snum 841, where the 
population size was specified as 4 but the sample size is 5.

-thomas

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] k-sample Kolmogorov-Smirnov test?

2008-09-14 Thread Adam D. I. Kramer


Maybe you should look a little harder.

help.search("Kolmogorov")


PLEASE do read the posting guide http://www.R-project.org/posting-guide.html


--Adam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Ward's Clustering Doubts

2008-09-14 Thread Rodrigo Aluizio

Hi Everybody,

Now I have a doubt that is more statistical than R's technical. Im working 
with ecology of recent Foraminifera.

At the lab we used to perform cluster analysis using 1-Pearsons R and Wards 
method (we already saw it in bibliography of the area) which renders good 
results with our biological data. Recently, using R Software (vegan and 
Cluster packages) which allows the combination of any kind of distances matrix 
with any clustering method, we tried to used Bray Curtis + Wards (which seem to 
be more appropriate to a matrix with a lot of zeros) and it renders a better 
result. Furthermore, the results agree with our hypothesis and with the results 
we have got with the Distance-based Redundancy Analysis - dbRDA or CAP. It 
means, the analysis (Q-mode) clusters the stations according to the main 
physical, sedimentary and biological characteristics of the study area.

We received some critical comments noticing that Wards Method accepts Euclidean 
Distance only. So, we made the analysis again using Euclidean Distance but we 
dont get the better results we had using 1-Pearsons R + Wards or Bray Curtis 
+ Wards (actually any other distance + method combination rendered better 
results). Trying to find answers in the specialized literature we just got 
little more confused because in any moment we saw something like "You must use 
it with Euclidean Distance" and like I said above we already saw in some 
articles from respected journals, other kind of distance associated with the 
Ward's Clustering method. 

Is it wrong or is it non sense to do the analysis in the way we were doing?

The results with Wards combined with 1-Pearsons R or Bray Curtis fit better 
with our hypothesis and have excellent agglomerative coefficients , but we 
dont want to make inappropriate statistical procedures. I'm starting to 
realize how powerful R is, but it doesn't justify doing nonsense statistics...  
I hope one of you may help us!

Thank you in advance.

Rodrigo.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Combining tables

2008-09-14 Thread Andre Nathan

Thanks Jim, it worked great!

On Sun, 2008-09-14 at 21:27 -0400, jim holtman wrote:
> try this:
> 
> > c(tx,ty)
> 1 2 3 3 4
> 3 2 1 4 1
> > z <- c(tx,ty)
> > tapply(z, names(z), sum)
> 1 2 3 4
> 3 2 5 1
> >

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] any package to do generalized linear mixed model?

2008-09-14 Thread Wensui Liu

I checked GlmmML package. However, it can only do binomial and poisson
distribution. How about others such as gamma or neg binomial?
Thank you so much!
wensui

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Nonlinear regression question&[EMAIL PROTECTED]

2008-09-14 Thread David Winsemius



On Sep 14, 2008, at 6:53 PM, Esther Meenken wrote:


I was unable to open this file Bill Venables' excellent "Exegeses on
Linear Models" posted at
http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.ps.gz I'd be very
interested in reading it?


It's a gzipped file that expands into a ps file. On my Mac running  
Leopard, Stuffit does the expansion and Preview does the viewing. You  
need a utility on whatever (unspecified) OS you are running that will  
un-gzip it, and then you need a Postscript viewer. If that OS happens  
to  be a flavor of Windows, then I know from experience that  
Ghostscript and its associated viewer, Ghostview, will work for the  
second half of the process. Google is your friend. I suspect you need  
UnRAR, or something along those lines for the decompression step.


--
David Winsemius

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] NAs and stl

2008-09-14 Thread rkevinburton

I would like to decompose the log of a time series. There will be time slots 
that are zero. In order to handle this I insert 'NA' for all of the zeros in 
the time series. Having a zero value is a very legitamate value. With those NAs 
in the time series stil requires an na.action argument because the default is 
na.fail which I don't want. I am afraid if I use na.omit it will skew the 
seasonality and trend because what if there are seasonal components which are 
zero? Anyway I was looking for recommendations for what the value of na.action 
should be for stl. Ideally I would like to just keep the NAs there, not flag an 
error yet still have those values "count" for the seasonal or trend 
calculations.

Thank you.

Kevin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] need help please (HoltWinters function)

2008-09-14 Thread jeff_lc


every time i try to run HoltWinters i get this error message:

> HoltWinters(z, seasonal="additive")
Error in decompose(ts(x[1:wind], start = start(x), frequency = f), seasonal)
: 
  time series has no or less than 3 periods

what's going on? somebody please help me.
-- 
View this message in context: 
http://www.nabble.com/need-help-please-%28HoltWinters-function%29-tp19484728p19484728.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to draw a plot like this?

2008-09-14 Thread Jinsong Zhao

Hi there,

I hope to draw a plot like this:
http://www.sg-chem.net/swizard/Ru-bqdi-spectra.gif

is it possible to draw it using R?

thanks for any suggestions.

regards,
Jinsong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fetching a range of columns

2008-09-14 Thread David Winsemius

On Sep 14, 2008, at 5:39 PM, Adam D. I. Kramer wrote:

Hi Jason,

data[] is a data frame, remember--you need to specify rows AND  
columns. So,

data[,c(2,12,17)] is what you should be doing in the first place, and
data[,842:2411] in the second place.

Actually, the construction df[c(2,12,17)] will return just the 2nd,  
12th and 17th named column vectors.

Try it and see:

 df <- data.frame(a=1:10, b= 10:1, c=LETTERS[1:10])
 df
 df[c(2,3)]

Not sure if the help you needed was using the comma, or the :  
syntax, or if

you're trying to read only certain columns during the read.csv process
(which I don't think that's possible).

? colClasses  # with the vector element of NULL for each unwanted  
column.

I am not the person to be advising how to do this properly, as all of  
my efforts to use this facility to date have failed and I have  
resorted to reading in lines  with as.is=TRUE and then post- 
processing. But the facility does exist.

Maybe someone could give me a clue how one might construct a vector to  
send to colClasses inside read.table?

> mt <- matrix(1:200,nrow=4)
> write.table(file=file.choose(), as.data.frame(mt))
> read.table(file.choose())
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19  
V20 V21 V22 V23 V24 V25 V26 V27 V28 V29
1  1  5  9 13 17 21 25 29 33  37  41  45  49  53  57  61  65  69  73   
77  81  85  89  93  97 101 105 109 113
2  2  6 10 14 18 22 26 30 34  38  42  46  50  54  58  62  66  70  74   
78  82  86  90  94  98 102 106 110 114
3  3  7 11 15 19 23 27 31 35  39  43  47  51  55  59  63  67  71  75   
79  83  87  91  95  99 103 107 111 115
4  4  8 12 16 20 24 28 32 36  40  44  48  52  56  60  64  68  72  76   
80  84  88  92  96 100 104 108 112 116
  V30 V31 V32 V33 V34 V35 V36 V37 V38 V39 V40 V41 V42 V43 V44 V45 V46  
V47 V48 V49 V50
1 117 121 125 129 133 137 141 145 149 153 157 161 165 169 173 177 181  
185 189 193 197
2 118 122 126 130 134 138 142 146 150 154 158 162 166 170 174 178 182  
186 190 194 198
3 119 123 127 131 135 139 143 147 151 155 159 163 167 171 175 179 183  
187 191 195 199
4 120 124 128 132 136 140 144 148 152 156 160 164 168 172 176 180 184  
188 192 196 200

Not working efforts:
tstdta <- read.table(file.choose(), colClasses = c(c(paste(rep("NULL", 
49),sep=","),"numeric"),header=TRUE)
tstdta <- read.table(file.choose(), colClasses = paste(rep("NULL", 
49),"numeric",sep=","),header=TRUE)

--
David Winsemius

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] making spearman correlation cor() call fail with log(0) as input

2008-09-14 Thread Timur Shtatland

Hi Martin,

I got my initial question fully answered.
I do not have enough experience to to judge whether the behavior of R with
regard to Inf is "excellent" or "better" than Perl.

In my opinion, both Perl and R are great languages, designed for very
different applications.
So instead of me trying to impose The Perl Way upon R, I would like to say
how very grateful I am to the contributors to the R core and other packages,
and to the contributors to the R mailing lists. Because this is what I
really feel. R and its packages have been very useful to me on countless
occasions. Thank you, Martin and Greg!

Best regards,

Timur

On Sat, Sep 13, 2008 at 8:48 AM, Martin Maechler <[EMAIL PROTECTED]
> wrote:

> > "TS" == Timur Shtatland <[EMAIL PROTECTED]>
> > on Fri, 12 Sep 2008 11:52:25 -0400 writes:
>
>TS> I am more used to getting an error if you try to take
>TS> the log of 0, like this (in Perl):
>
>TS> perl -le 'for my $num (1, 0, -1, -2) { print log $num;
>TS> }' 0 Can't take log of 0 at -e line 1.
>
>TS> R is different. With R, you do not even get a *warning*
>TS> about log(0). Only log() of negative number produces a
>TS> warning:
>
>  []
>
> and why do you think the perl behavior to be better??
> R has been very carefully designed in such matters:
>
> The principle is that *limits* should work (using +/-Inf) were
> possible.
> For log(.) the limit only exists from the right and clearly is
> -Inf, so that's a feature.
>
> BTW,  S/R behavior of  1/0 |--> Inf   could be considered as
> more dangerous, since really the +Inf is the limit from the
> right only with the limit from the left being ``quite
> different''.
> But no, I'm not proposing to change R here (and actually would
> "fight" to keep it if that was necessary).
>
>
>TS> I agree with you that Spearman's correlation's invariance to
> monotone
>TS> transformations is an advantage. It is R's happy
>TS> attitude to -Inf and Inf that puzzled me at
>TS> first. Anyhow, verifying and/or preprocessing the input
>TS> to cor() is the answer to my questions.  Thank you again
>TS> for the help!
>
> So you now have understood that R's behavior of handling +/- Inf
> in this respect is rather  excellent  than bogous ?
>
> Martin Maechler, ETH Zurich (and R-core team)
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] k-sample Kolmogorov-Smirnov test?

2008-09-14 Thread Rolf Turner



On 15/09/2008, at 10:02 AM, Adam D. I. Kramer wrote:


Maybe you should look a little harder.

help.search("Kolmogorov")

PLEASE do read the posting guide http://www.R-project.org/posting- 
guide.html


Please do read the guy's question!!!

Hello, I would like to conduct a k-sample K-S test, but cannot find
	reference to its implementation in R. Does anyone have experience  
with this?


It's about a ***k-sample*** K-S test.  As far as I can discern,  
help.search("Kolmogorov")
points one only to ks.test() which (again, as far as I can discern)  
effects only one

or two sample KS tests.

The multi-sample KS test does exist --- I must admit this was a new  
one on me.  (But then,
so many things are!) See e.g. JASA vol. 68, No. 344, pp. 994--997  
``Tables of Critical Values
for a k-sample Kolmogorov-Smirnov Test Statistic'' by Edward H. Wolf  
and Joseph I. Naus.


cheers,

Rolf Turner

P.S. RSiteSearch("Kolmogorov k-sample") turned up nothing useful.

R. T.

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Nonlinear regression question&[EMAIL PROTECTED]

2008-09-14 Thread Gabor Grothendieck

Try this version:

http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf

On Sun, Sep 14, 2008 at 6:53 PM, Esther Meenken <[EMAIL PROTECTED]> wrote:
> I was unable to open this file Bill Venables' excellent "Exegeses on
> Linear Models" posted at
> http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.ps.gz I'd be very
> interested in reading it?
>
> Thanks
>
> Esther Meenken
> Biometrician
> Crop & Food Research
> Private Bag 4704
> Christchurch
>
> TEL: (03) 325 9639
> FAX: (03) 325 2074
> EMAIL:[EMAIL PROTECTED]
>
>
>
>
> Visit our website at http://www.crop.cri.nz
> __
> CAUTION: The information contained in this email is privileged
> and confidential.  If you read this message and you are not the
> intended recipient, you are hereby notified that any use,
> dissemination, distribution or reproduction of all or part of the
> contents is prohibited. If you receive this message in error,
> please notify the sender immediately.
>
> Any opinions or views expressed in this message are those of the
> individual sender and may not represent those of their employer.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Ward's Clustering Doubts

2008-09-14 Thread Rodrigo Aluizio

Hi Everybody,
Now I have a doubt that is more statistical than R's technical. Im working 
with ecology of recent Foraminifera.

At the lab we used to perform cluster analysis using 1-Pearsons R and Wards 
method (we already saw it in bibliography of the area) which renders good 
results with our biological data. Recently, using R Software (vegan and 
Cluster packages) which allows the combination of any kind of distances matrix 
with any clustering method, we tried to used Bray Curtis + Wards (which seem to 
be more appropriate to a matrix with a lot of zeros) and it renders a better 
result. Furthermore, the results agree with our hypothesis and with the results 
we have got with the Distance-based Redundancy Analysis - dbRDA or CAP. It 
means, the analysis (Q-mode) clusters the stations according to the main 
physical, sedimentary and biological characteristics of the study area.

We received some critical comments noticing that Wards Method accepts Euclidean 
Distance only. So, we made the analysis again using Euclidean Distance but we 
dont get the better results we had using 1-Pearsons R + Wards or Bray Curtis 
+ Wards (actually any other distance + method combination rendered better 
results). Trying to find answers in the specialized literature we just got 
little more confused because in any moment we saw something like "You must use 
it with Euclidean Distance" and like I said above we already saw in some 
articles from respected journals, other kind of distance associated with the 
Ward's Clustering method. 

Is it wrong or is it non sense to do the analysis in the way we were doing?

The results with Wards combined with 1-Pearsons R or Bray Curtis fit better 
with our hypothesis and have excellent agglomerative coefficients , but we 
dont want to make inappropriate statistical procedures. I'm starting to 
realize how powerful R is, but it doesn't justify doing nonsense statistics...  
I hope one of you may help us!

Thank you in advance.

Rodrigo.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Combining tables

2008-09-14 Thread Andre Nathan

Hello

Say I have the following data, and it's distribution given by table():

  > x <- c(1, 1, 1, 2, 2, 3)
  > tx <- table(x)
  > tx
  x
  1 2 3
  3 2 1

Now say I have new data,

  > y <- c(3, 3, 3, 3, 4)
  > ty <- table(y)
  > ty
  y
  3 4
  4 1

Is there a way to "combine" tx and ty in such a way to give me the
distribution below?

  1 2 3 4
  3 2 5 1

Essentially what I'm looking for is something equivalent to
table(c(x,y)), but x and y are too large and I'd like to avoid the
concatenation.

Thanks in advance,
Andre

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] k-sample Kolmogorov-Smirnov test?

2008-09-14 Thread David Winsemius



On Sep 14, 2008, at 5:46 PM, Mark Na wrote:


Hello, I would like to conduct a k-sample K-S test, but cannot find
reference to its implementation in R. Does anyone have experience  
with this?

Thanks, Mark


I didn't have any luck with the method Kramer suggested, perhaps  
because I do not have a large number of packages installed.


There is S-Plus code at the end of the article "k-Sample tests based  
on the likelihood ratio" Jin Zhang, Yuehua Wu;  Computational  
Statistics & Data Analysis, Volume 51, Issue 9, 15 May 2007, Pages  
4682-4691


--
David Winsemius

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] histogram

2008-09-14 Thread Adaikalavan Ramasamy

If I understand you correctly, you already pre-computed the frequencies 
and bin widths and want to display them as a histogram. If correct, then 
what you are asking for is analogous to what bxp() is to boxplot. I am 
not sure if such a function exists.


Instead you can think of the task as drawing a bunch of rectangles 
(perhaps using symbols?). Or you can hack the hist() code and try


   br<- c(0,20,30,40,50,60,70,80,100)
   dens  <- runif( length(br) - 1 )

   r <- structure(list(breaks = br, density = dens),
  class = "histogram")

   plot(r, main="Felipe's Histogram")

However, I do emphasize that this is a hack. If you have the original 
data that you used to calculate the densities, consider using the breaks 
argument with hist(). It is better to use tried and tested codes.


Regards, Adai



Felipe wrote:

i calculated the density and wanna do something like this

separate in 0-19-29-39-49-59-69-79-99
and put in these spaces 8 densities .. 0.something
i have the frequency in % and divided already in 20 or 10 to get the density

i tried and tried..made breaks vector to separate but couldn't put the other
vector with the frequency density onit directly

anyone know how to do it??

tks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

62 matches

Mail list logo