Re: [R] matrix manipulation question

2015-03-31 Thread Stéphane Adamowicz
Many thanks,

Stéphane

Le 30 mars 2015 à 10:42, peter dalgaard  a écrit :

> 
>> On 30 Mar 2015, at 09:59 , Stéphane Adamowicz 
>>  wrote:
>> 
>> 
>> However, in order to help me understand, would you be so kind as to give me 
>> a matrix or data.frame example where « complete.cases(X)== T » or « 
>> complete.cases(X)== TRUE » would give some unwanted result ?
> 
> The standard problem with T for TRUE is if T has been used for some other 
> purpose, like a time variable. E.g., T <- 0 ; complete.cases(X)==T.
> 
> complete.cases()==TRUE is just silly, like (x==0)==TRUE or 
> ((x==0)==TRUE)==TRUE). 
> 
> (However, notice that x==TRUE is different from as.logical(x) if x is 
> numeric, so ifelse(x,y,z) may differ from ifelse(x==TRUE,y,z).) 
> 
> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd@cbs.dk  Priv: pda...@gmail.com
> 
> 
> 
> 
> 
> 
> 
> 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple Plots using ggplot

2015-03-31 Thread Frederic Ntirenganya
 Hi All,

Sorry for the shape of data which was not good enough.This is how my
data look like.

I want to plot multiple using ggplot function from a data frame of
many columns. I want to plot only Start.of.Rain..i.,
Start.of.Rain..ii. and  Start.of.Rain..iii. and I failed to make it.
What I want is to compare Start.of.Rain..i., Start.of.Rain..ii. and
Start.of.Rain..iii. by plotting vertical line. I also need to add
points to the plot to be able to separate them. The x-axis must be
date column. Thanks!

Here is how the data look like and how I tried to make it.



Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii.
Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96 977.646
98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 100
12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11


Here is how I tried to solve the problem.

df1 <-data.frame(data)
df1
df2 <- melt(df1 ,  id = 'Date', variable_name = 'start of Rains')
df2

ggplot(df2, aes(Date,value)) + geom_line(aes(colour ="red"),type = "h")

Kindly any help is welcome. Thanks

Regards,
Frederic.

Frederic Ntirenganya
Maseno University,
African Maths Initiative,
Kenya.
Mobile:(+254)718492836
Email: fr...@aims.ac.za
https://sites.google.com/a/aims.ac.za/fredo/

On Tue, Mar 31, 2015 at 9:24 AM, Jeff Newmiller 
wrote:

> This is no better because (a) you are still posting using HTML format, and
> (b) using printed output loses the internal representation of the data. The
> dput function is very helpful for solving this. [1]
>
> [1]
> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
> ---
> Jeff NewmillerThe .   .  Go Live...
> DCN:Basics: ##.#.   ##.#.  Live
> Go...
>   Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
> ---
> Sent from my phone. Please excuse my brevity.
>
> On March 30, 2015 10:56:48 PM PDT, Frederic Ntirenganya 
> wrote:
> >Hi Stephen,
> >
> >Sorry, the data came in bad way.
> >Here is the head of the data.
> >
> >> head(data)Date Number.of.Rain.Days Total.rain
> >Start.of.Rain..i. Start.of.Rain..ii. Start.of.Rain..iii.
> >Start.Rain..iv.
> >1 1952-01-01  86   1139.95292
> >  239 112 112
> >2 1953-01-01  96977.64698
> >   98 112 112
> >3 1954-01-01 114   1382.01492
> >   92 120 120
> >4 1955-01-01 119   1323.086   100
> >  100 125 174
> >5 1956-01-01 123   1266.44492
> >   92 119 119
> >6 1957-01-01 124   1235.96492
> >   92 112 112
> >
> >
> >
> >Frederic Ntirenganya
> >Maseno University,
> >African Maths Initiative,
> >Kenya.
> >Mobile:(+254)718492836
> >Email: fr...@aims.ac.za
> >https://sites.google.com/a/aims.ac.za/fredo/
> >
> >On Mon, Mar 30, 2015 at 5:34 PM, stephen sefick 
> >wrote:
> >
> >> Hi Frederic,
> >>
> >> Can you provide a minimal reproducible example including either real
> >data
> >> (dput), or simulated data that mimics your situation? This will allow
> >more
> >> people to help.
> >>
> >> Stephen
> >>
> >> On Mon, Mar 30, 2015 at 8:39 AM, Frederic Ntirenganya
> >
> >> wrote:
> >>
> >>> Dear All,
> >>>
> >>> I want to plot multiple using ggplot function from a data frame of
> >>> many columns. I want to plot only str1, str2 and str3 and I failed
> >to
> >>> make it. What I want is to compare str1, str2 and str3 by plotting
> >>> vertical line. I also need to add points to the plot to be able to
> >>> separate them.
> >>>
> >>>
> >>> Here is how the data look like and how I tried to make it.
> >>>
> >>> Date NumberofRaindays TotalRains str1 str2 str3 1/1/1952 86 1360.5
> >92 120
> >>> 112 1/1/1953 96 1100 98 100 110
> >>> ...   
> >>>  ...  
> >>>
> >>> df1 <-data.frame(data)
> >>> df1
> >>> df2 <- melt(df1 ,  id = 'Date', variable_name = 'start of Rains')
> >>> df2
> >>>
> >>> ggplot(df2, aes(Date,value)) + geom_line(aes(colour ="red"),type =
> >"h")
> >>>
> >>> Kindly any help is welcome. Thanks
> >>>
> >>> Regards,
> >>> Frederic.
> >>>
> >>> Frederic Ntirenganya
> >>> Maseno University,
> >>> African Maths Initiative,
> >>> Kenya.
> >>> Mobile:(+254)718492836
> >>> Email: fr...@aims.ac.za
> >>> https://sites.google.com/a/aims.ac.za/fredo/
> >>>
> >>> [[alternative HTML 

[R] MethComp exported object namespace error

2015-03-31 Thread Kylie Lange
Hi everyone,

I am using the MCmcmc function of the MethComp package and receive the 
following error:

Error: 'coda.samples' is not an exported object from 'namespace:coda'

I emailed the package author last week but haven't had a reply. I have 
installed JAGS 3.4.0 as required by MethComp. I am using  R 3.1.2 and the 
MethComp currently on CRAN (1.22.1). I am not a regular R user so haven't had 
any luck making sense of the error, though there are references to namespace in 
the package check results here: 
http://cran.itam.mx/web/checks/check_results_bxc_at_steno.dk.html#MethComp. Not 
sure if that's relevant.

Any suggestions would be appreciated. Apologies if I haven't provided any 
required information.

The following shows my code and error (code taken from the package author's 
text 'Comparing Clinical Measurement Methods', Bendix Carstensen, section 
7.5.3) :

>library(MethComp)
>data(ox)
>ox<- Meth(ox)
>m3<- MCmcmc(ox, IxR=TRUE, n.iter=5)

Comparison of 2 methods, using 354 measurements on 61 items, with up to 3 
replicate measurements, (replicate values are in the set: 1 2 3 ) 
( 2 * 61 * 3 = 366 ): 

No. items with measurements on each method:
#Replicates
Method1   2   3 #Items #Obs: 354 Values:  min  med  max
  CO  1   4  56 61   177 22.2 78.6 93.5
  pulse   1   4  56 61   177 24.0 75.0 94.0

Simulation run of a model with 
- method by item and item by replicate interaction: 
- using 4 chains run for 5 iterations (of which 25000 are burn-in), 
- monitoring every 25 values of the chain: 
- giving a posterior sample of 4000 observations.

Loading required package: coda
Linked to JAGS 3.4.0
Loaded modules: basemod,bugs
Initialization and burn-in:
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 2868

Initializing model

  |++| 100%
Sampling:
Error: 'coda.samples' is not an exported object from 'namespace:coda'


Thanks,
Kylie.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in lm() with very small (close to zero) regressor

2015-03-31 Thread RiGui
I found a fix to my problem using the fastLm() from package RcppEigen, using
the Jacobi singular value decomposition (SVD) (method 4) or a method based
on the eigenvalue-eigenvector decomposition of X'X - method 5 of the fastLm
function



install.packages("RcppEigen")
library(RcppEigen)

n_obs <- 1500
y  <- rnorm(n_obs, 10,2.89)
x1 <- rnorm(n_obs, 0.01235657,0.45)
x2 <- rnorm(n_obs, 10,3.21)
X  <- cbind(x1,x2)



bFE <- fastLm(y ~ x1 + x2, method =4)
bFE

Call:
fastLm.formula(formula = y ~ x1 + x2, method = 4)

Coefficients:
(Intercept)  x1  x2 
9.94832839474159414 0.12293 0.00440078989949841 


Best,

Raluca





--
View this message in context: 
http://r.789695.n4.nabble.com/Error-in-lm-with-very-small-close-to-zero-regressor-tp4705185p4705328.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple Plots using ggplot

2015-03-31 Thread Jeff Newmiller
By failing to take the advice given to you, you make it harder to help you. 
Learn to control your email program to send plain text, and learn to use the 
dput function.

With regard to this function call:

> ggplot(df2, aes(Date,value)) +

I highly recommend using named parameters in the aes call. Also, if you want 
different values of "variable" to be plotted with different colors, you should 
map that column to the colour dimension:

ggplot(df2, aes(x=Date,y=value,colour=variable)) +

The "type" argument applies to base graphics rather than ggplot graphics, and 
you should never put fixed values inside the aes call. Since colour has already 
been taken care of, you can give no parameters in the geom_line call:

geom_line()

So all together then:

ggplot(df2, aes(x=Date,y=value,colour=variable)) +
geom_line()

but I cannot test it because you have not followed my other advice.
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On March 31, 2015 12:55:11 AM PDT, Frederic Ntirenganya  
wrote:
> Hi All,
>
>Sorry for the shape of data which was not good enough.This is how my
>data look like.
>
>I want to plot multiple using ggplot function from a data frame of
>many columns. I want to plot only Start.of.Rain..i.,
>Start.of.Rain..ii. and  Start.of.Rain..iii. and I failed to make it.
>What I want is to compare Start.of.Rain..i., Start.of.Rain..ii. and
>Start.of.Rain..iii. by plotting vertical line. I also need to add
>points to the plot to be able to separate them. The x-axis must be
>date column. Thanks!
>
>Here is how the data look like and how I tried to make it.
>
>
>
>Date Number.of.Rain.Days Total.rain Start.of.Rain..i.
>Start.of.Rain..ii.
>Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96
>977.646
>98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100
>100
>12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11
>
>
>Here is how I tried to solve the problem.
>
>df1 <-data.frame(data)
>df1
>df2 <- melt(df1 ,  id = 'Date', variable_name = 'start of Rains')
>df2
>
>ggplot(df2, aes(Date,value)) + geom_line(aes(colour ="red"),type = "h")
>
>Kindly any help is welcome. Thanks
>
>Regards,
>Frederic.
>
>Frederic Ntirenganya
>Maseno University,
>African Maths Initiative,
>Kenya.
>Mobile:(+254)718492836
>Email: fr...@aims.ac.za
>https://sites.google.com/a/aims.ac.za/fredo/
>
>On Tue, Mar 31, 2015 at 9:24 AM, Jeff Newmiller
>
>wrote:
>
>> This is no better because (a) you are still posting using HTML
>format, and
>> (b) using printed output loses the internal representation of the
>data. The
>> dput function is very helpful for solving this. [1]
>>
>> [1]
>>
>http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>>
>---
>> Jeff NewmillerThe .   .  Go
>Live...
>> DCN:Basics: ##.#.   ##.#.  Live
>> Go...
>>   Live:   OO#.. Dead: OO#.. 
>Playing
>> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
>> /Software/Embedded Controllers)   .OO#.   .OO#. 
>rocks...1k
>>
>---
>> Sent from my phone. Please excuse my brevity.
>>
>> On March 30, 2015 10:56:48 PM PDT, Frederic Ntirenganya
>
>> wrote:
>> >Hi Stephen,
>> >
>> >Sorry, the data came in bad way.
>> >Here is the head of the data.
>> >
>> >> head(data)Date Number.of.Rain.Days Total.rain
>> >Start.of.Rain..i. Start.of.Rain..ii. Start.of.Rain..iii.
>> >Start.Rain..iv.
>> >1 1952-01-01  86   1139.95292
>> >  239 112 112
>> >2 1953-01-01  96977.64698
>> >   98 112 112
>> >3 1954-01-01 114   1382.01492
>> >   92 120 120
>> >4 1955-01-01 119   1323.086   100
>> >  100 125 174
>> >5 1956-01-01 123   1266.44492
>> >   92 119 119
>> >6 1957-01-01 124   1235.96492
>> >   92 112 112
>> >
>> >
>> >
>> >Frederic Ntirenganya
>> >Maseno University,
>> >African Maths Initiative,
>> >Kenya.
>> >Mobile:(+254)718492836
>> >Email: fr...@aims.ac.za
>> >https://sites.google.com/a/ai

Re: [R] Multiple Plots using ggplot

2015-03-31 Thread stephen sefick
Your data and post is still not provided in one of the formats provided
here:
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example.
I am unsure of what you want to do, but I have made a reproducible example
that might help.

zz <- "Date Number.of.Rain.Days Total.rain Start.of.Rain..i.
Start.of.Rain..ii.   Start.of.Rain..iii.
 1952-01-01  86   1139.95292
 239 11
 1953-01-01  96977.64698
  98 11
 1954-01-01 114   1382.01492
  92 12
 1955-01-01 119   1323.086   100
 100 12
 1956-01-01 123   1266.44492
  92 11
 1957-01-01 124   1235.96492
  92 11"

library(reshape)
library(ggplot2)

Data <- read.table(text=zz, header = TRUE)

df1 <-data.frame(Data)

df2 <- melt(df1 ,  id = c('Date', 'Number.of.Rain.Days'))

df3 <- df2[-grep("Total.rain", df2$variable),]

qplot(Date,value, data=df3) +facet_wrap(~variable)

On Tue, Mar 31, 2015 at 2:55 AM, Frederic Ntirenganya 
wrote:

>  Hi All,
>
> Sorry for the shape of data which was not good enough.This is how my data 
> look like.
>
> I want to plot multiple using ggplot function from a data frame of many 
> columns. I want to plot only Start.of.Rain..i., Start.of.Rain..ii. and  
> Start.of.Rain..iii. and I failed to make it. What I want is to compare 
> Start.of.Rain..i., Start.of.Rain..ii. and Start.of.Rain..iii. by plotting 
> vertical line. I also need to add points to the plot to be able to separate 
> them. The x-axis must be date column. Thanks!
>
> Here is how the data look like and how I tried to make it.
>
>
>
> Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii.
> Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96 977.646
> 98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 100
> 12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11
>
>
> Here is how I tried to solve the problem.
>
> df1 <-data.frame(data)
> df1
> df2 <- melt(df1 ,  id = 'Date', variable_name = 'start of Rains')
> df2
>
> ggplot(df2, aes(Date,value)) + geom_line(aes(colour ="red"),type = "h")
>
> Kindly any help is welcome. Thanks
>
> Regards,
> Frederic.
>
> Frederic Ntirenganya
> Maseno University,
> African Maths Initiative,
> Kenya.
> Mobile:(+254)718492836
> Email: fr...@aims.ac.za
> https://sites.google.com/a/aims.ac.za/fredo/
>
> On Tue, Mar 31, 2015 at 9:24 AM, Jeff Newmiller 
> wrote:
>
>> This is no better because (a) you are still posting using HTML format,
>> and (b) using printed output loses the internal representation of the data.
>> The dput function is very helpful for solving this. [1]
>>
>> [1]
>> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>>
>> ---
>> Jeff NewmillerThe .   .  Go
>> Live...
>> DCN:Basics: ##.#.   ##.#.  Live
>> Go...
>>   Live:   OO#.. Dead: OO#..  Playing
>> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
>> /Software/Embedded Controllers)   .OO#.   .OO#.
>> rocks...1k
>>
>> ---
>> Sent from my phone. Please excuse my brevity.
>>
>> On March 30, 2015 10:56:48 PM PDT, Frederic Ntirenganya <
>> ntfr...@gmail.com> wrote:
>> >Hi Stephen,
>> >
>> >Sorry, the data came in bad way.
>> >Here is the head of the data.
>> >
>> >> head(data)Date Number.of.Rain.Days Total.rain
>> >Start.of.Rain..i. Start.of.Rain..ii. Start.of.Rain..iii.
>> >Start.Rain..iv.
>> >1 1952-01-01  86   1139.95292
>> >  239 112 112
>> >2 1953-01-01  96977.64698
>> >   98 112 112
>> >3 1954-01-01 114   1382.01492
>> >   92 120 120
>> >4 1955-01-01 119   1323.086   100
>> >  100 125 174
>> >5 1956-01-01 123   1266.44492
>> >   92 119 119
>> >6 1957-01-01 124   1235.96492
>> >   92 112 112
>> >
>> >
>> >
>> >Frederic Ntirenganya
>> >Maseno University,
>> >African Maths Initiative,
>> >Kenya.
>> >Mobile:(+254)718492836
>> >Email: fr...@aims.ac.za
>> >https://sites.google.com/a/aims.ac.za/fredo/
>> >
>> >On Mon, Mar 30, 2015 at 5:34 PM, stephen sefick 
>> >wrote:
>> >
>> >> Hi Frederic,
>> >>
>> >> Can you provide a minimal reproducible example including either real
>> >data
>> >> (dput), or simulated data that mimics your si

Re: [R] Multiple Plots using ggplot

2015-03-31 Thread John Kane
The data you supplied is still in a useless format.

Please send it to us in dput format (and don't post in html)

Here is a complete example of creating a data.frame and converting it to a 
useable data set that readers on R-help can use

##=Start Example===##
# Simple example data set in a data.frame
data1  <-  data.frame(xx = 1:20, yy = sample(letters[1:26], 20, replace = 
TRUE), zz  <-  rnorm(20))

dput(data1)  # convert to dput() format for tranfering to other userss
 
# dput() result. Copy and paste back into your editor
structure(list(xx = 1:20, yy = structure(c(6L, 3L, 7L, 12L, 1L, 
1L, 2L, 7L, 9L, 6L, 8L, 7L, 9L, 5L, 4L, 10L, 11L, 4L, 8L, 11L
), .Label = c("a", "f", "g", "h", "i", "j", "k", "o", "p", "u", 
"w", "z"), class = "factor"), zzrnorm.20. = c(0.379202224643519, 
-0.293649882956148, 2.27761155645142, 0.0378126031936277, 0.518138385757923, 
1.11655160886907, -1.64262245261915, 1.11341365979718, -0.184737977758355, 
0.439361470235051, 1.2597110753159, -0.795425331570368, 0.974654694801041, 
-0.309087884123705, -1.55929705211554, 0.147715827800676, -0.542626171203849, 
0.745294589678554, -0.254290052908619, 0.939894889209173)), .Names = c("xx", 
"yy", "zzrnorm.20."), row.names = c(NA, -20L), class = "data.frame")

#  Read data back into standard R format, calling the data "dat1"

dat1  <-  structure(list(xx = 1:20, yy = structure(c(6L, 3L, 7L, 12L, 1L, 
1L, 2L, 7L, 9L, 6L, 8L, 7L, 9L, 5L, 4L, 10L, 11L, 4L, 8L, 11L
), .Label = c("a", "f", "g", "h", "i", "j", "k", "o", "p", "u", 
"w", "z"), class = "factor"), zzrnorm.20. = c(0.379202224643519, 
-0.293649882956148, 2.27761155645142, 0.0378126031936277, 0.518138385757923, 
1.11655160886907, -1.64262245261915, 1.11341365979718, -0.184737977758355, 
0.439361470235051, 1.2597110753159, -0.795425331570368, 0.974654694801041, 
-0.309087884123705, -1.55929705211554, 0.147715827800676, -0.542626171203849, 
0.745294589678554, -0.254290052908619, 0.939894889209173)), .Names = c("xx", 
"yy", "zzrnorm.20."), row.names = c(NA, -20L), class = "data.frame")

dat1
##=End Example===##

John Kane
Kingston ON Canada


> -Original Message-
> From: ntfr...@gmail.com
> Sent: Tue, 31 Mar 2015 10:55:11 +0300
> To: jdnew...@dcn.davis.ca.us
> Subject: Re: [R] Multiple Plots using ggplot
> 
>  Hi All,
> 
> Sorry for the shape of data which was not good enough.This is how my
> data look like.
> 
> I want to plot multiple using ggplot function from a data frame of
> many columns. I want to plot only Start.of.Rain..i.,
> Start.of.Rain..ii. and  Start.of.Rain..iii. and I failed to make it.
> What I want is to compare Start.of.Rain..i., Start.of.Rain..ii. and
> Start.of.Rain..iii. by plotting vertical line. I also need to add
> points to the plot to be able to separate them. The x-axis must be
> date column. Thanks!
> 
> Here is how the data look like and how I tried to make it.
> 
> 
> 
> Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii.
> Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96
> 977.646
> 98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 100
> 12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11
> 
> 
> Here is how I tried to solve the problem.
> 
> df1 <-data.frame(data)
> df1
> df2 <- melt(df1 ,  id = 'Date', variable_name = 'start of Rains')
> df2
> 
> ggplot(df2, aes(Date,value)) + geom_line(aes(colour ="red"),type = "h")
> 
> Kindly any help is welcome. Thanks
> 
> Regards,
> Frederic.
> 
> Frederic Ntirenganya
> Maseno University,
> African Maths Initiative,
> Kenya.
> Mobile:(+254)718492836
> Email: fr...@aims.ac.za
> https://sites.google.com/a/aims.ac.za/fredo/
> 
> On Tue, Mar 31, 2015 at 9:24 AM, Jeff Newmiller
> 
> wrote:
> 
>> This is no better because (a) you are still posting using HTML format,
>> and
>> (b) using printed output loses the internal representation of the data.
>> The
>> dput function is very helpful for solving this. [1]
>> 
>> [1]
>> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>> ---
>> Jeff NewmillerThe .   .  Go
>> Live...
>> DCN:Basics: ##.#.   ##.#.  Live
>> Go...
>>   Live:   OO#.. Dead: OO#..  Playing
>> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
>> /Software/Embedded Controllers)   .OO#.   .OO#.
>> rocks...1k
>> ---
>> Sent from my phone. Please excuse my brevity.
>> 
>> On March 30, 2015 10:56:48 PM PDT, Frederic Ntirenganya
>> 
>> wrote:
>> >Hi Stephen,
>>> 
>> >Sorry, the data came in bad way.
>> >Here is the head of the data.
>>> 
 head(data)Date Number.of.Rain.Days Total.rain
>> >Start.of.Rain..i. Start.of.Rain..ii. Start.of.Rain..iii.
>> >Start.Rain..iv.

Re: [R] Multiple Plots using ggplot

2015-03-31 Thread Frederic Ntirenganya
Hi All,

Thanks for the help. I want to plot some of the columns on the same graph
not all of them. Sorry, I failed to follow the instructions. Here is the
output of *dput()* but I don't know how it works.

> dput(head(data))structure(list(Date = structure(c(-6575, -6209, -5844, -5479,
-5114, -4748), class = "Date"), Number.of.Rain.Days = c(86L,
96L, 114L, 119L, 123L, 124L), Total.rain = c(1139.952, 977.646,
1382.014, 1323.086, 1266.444, 1235.964), Start.of.Rain..i. = c(92L,
98L, 92L, 100L, 92L, 92L), Start.of.Rain..ii. = c(239L, 98L,
92L, 100L, 92L, 92L), Start.of.Rain..iii. = c(112L, 112L, 120L,
125L, 119L, 112L), Start.Rain..iv. = c(112L, 112L, 120L, 174L,
119L, 112L), End.of.Rain.Season = c(228L, 229L, 240L, 228L, 228L,
228L)), .Names = c("Date", "Number.of.Rain.Days", "Total.rain",
"Start.of.Rain..i.", "Start.of.Rain..ii.", "Start.of.Rain..iii.",
"Start.Rain..iv.", "End.of.Rain.Season"), row.names = c(NA, 6L
), class = "data.frame")

 I think I need subset function then melt. Here is the approach I used:

d <- subset(df1,
select=c(Date,Start.of.Rain..i.,Start.of.Rain..ii.,Start.of.Rain..iii.))
d
d2 <- melt(d ,  id = 'Date', variable_name = 'Start')

ggplot(d2, aes(Date,value)) + geom_line(aes(colour = start),type = "h")

 but the error is:

Don't know how to automatically pick scale for object of type
function. Defaulting to continuousError in data.frame(colour =
function (x, ...)  :
  arguments imply differing number of rows: 0, 183


Thanks,

Frederic.



Frederic Ntirenganya
Maseno University,
African Maths Initiative,
Kenya.
Mobile:(+254)718492836
Email: fr...@aims.ac.za
https://sites.google.com/a/aims.ac.za/fredo/

On Tue, Mar 31, 2015 at 4:20 PM, stephen sefick  wrote:

> Your data and post is still not provided in one of the formats provided
> here:
> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example.
> I am unsure of what you want to do, but I have made a reproducible example
> that might help.
>
> zz <- "Date Number.of.Rain.Days Total.rain Start.of.Rain..i.
> Start.of.Rain..ii.   Start.of.Rain..iii.
>  1952-01-01  86   1139.95292
>  239 11
>  1953-01-01  96977.64698
>   98 11
>  1954-01-01 114   1382.01492
>   92 12
>  1955-01-01 119   1323.086   100
>  100 12
>  1956-01-01 123   1266.44492
>   92 11
>  1957-01-01 124   1235.96492
>   92 11"
>
> library(reshape)
> library(ggplot2)
>
> Data <- read.table(text=zz, header = TRUE)
>
> df1 <-data.frame(Data)
>
> df2 <- melt(df1 ,  id = c('Date', 'Number.of.Rain.Days'))
>
> df3 <- df2[-grep("Total.rain", df2$variable),]
>
> qplot(Date,value, data=df3) +facet_wrap(~variable)
>
> On Tue, Mar 31, 2015 at 2:55 AM, Frederic Ntirenganya 
> wrote:
>
>>  Hi All,
>>
>> Sorry for the shape of data which was not good enough.This is how my data 
>> look like.
>>
>> I want to plot multiple using ggplot function from a data frame of many 
>> columns. I want to plot only Start.of.Rain..i., Start.of.Rain..ii. and  
>> Start.of.Rain..iii. and I failed to make it. What I want is to compare 
>> Start.of.Rain..i., Start.of.Rain..ii. and Start.of.Rain..iii. by plotting 
>> vertical line. I also need to add points to the plot to be able to separate 
>> them. The x-axis must be date column. Thanks!
>>
>> Here is how the data look like and how I tried to make it.
>>
>>
>>
>> Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii.
>> Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96 977.646
>> 98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 100
>> 12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11
>>
>>
>> Here is how I tried to solve the problem.
>>
>> df1 <-data.frame(data)
>> df1
>> df2 <- melt(df1 ,  id = 'Date', variable_name = 'start of Rains')
>> df2
>>
>> ggplot(df2, aes(Date,value)) + geom_line(aes(colour ="red"),type = "h")
>>
>> Kindly any help is welcome. Thanks
>>
>> Regards,
>> Frederic.
>>
>> Frederic Ntirenganya
>> Maseno University,
>> African Maths Initiative,
>> Kenya.
>> Mobile:(+254)718492836
>> Email: fr...@aims.ac.za
>> https://sites.google.com/a/aims.ac.za/fredo/
>>
>> On Tue, Mar 31, 2015 at 9:24 AM, Jeff Newmiller > > wrote:
>>
>>> This is no better because (a) you are still posting using HTML format,
>>> and (b) using printed output loses the internal representation of the data.
>>> The dput function is very helpful for solving this. [1]
>>>
>>> [1]
>>> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>>>
>>> ---
>>> Jeff NewmillerThe .   .  Go
>>> Live...
>>> DCN:Basics: ##.#.   ##.#.  Live
>>> Go

Re: [R] Plotting using tapply function output

2015-03-31 Thread John Kane
Reproducibility
https://github.com/hadley/devtools/wiki/Reproducibility
 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example


John Kane
Kingston ON Canada


> -Original Message-
> From: amc5...@gmail.com
> Sent: Mon, 30 Mar 2015 16:07:05 -0700
> To: r-help@r-project.org
> Subject: [R] Plotting using tapply function output
> 
> Hello,
> 
> I am trying to plot the hourly standard deviation of wind speeds from
> 13 different measured locations over many years. I imported the data
> using readLines and into a dataframe called finalData. Using tapply, I
> determined the standard deviation of the windspeed (ws) for each hour
> (hour) from every location (stn) using this command line:
> 
> statHour = tapply(finalData$ws,list(finalData$stn,finalData$hour),sd)
> 
> I want to plot the standard deviation for each hour of the day, with
> hours as the x-axis and the standard deviation for the y-axis, and
> each station as a different color.  I've managed to get a boxplot of
> this, but ideally, I'd like a scatter plot to determine the variations
> between each instrument throughout the day.  The boxplot command is
> this:
> 
> boxplot(statHour, names=colnames(statHour),xlab='Hour of the
> Day',ylab='Standard Deviation of Wind Speed')
> 
> I also tried to make a dataframe of the tapply output but it ends up
> using the hours as the column names instead of putting it into the
> dataframe.  Please help!!
> 
> I have R version 3.1.1
> 
> Thanks a lot,
> Alexandra
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


FREE ONLINE PHOTOSHARING - Share your photos online with your friends and 
family!
Visit http://www.inbox.com/photosharing to find out more!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple Plots using ggplot

2015-03-31 Thread Frederic Ntirenganya
Hi John,

Sorry for the mistake I made for providing useless data.
Here I am interest only on Tmin and Tmax columns. I want to use the same
approach with the previous data. I want to plot on the same graph not
separate graph. Thanks

> dput(head(BUTemp))structure(list(Year = c(1971L, 1971L, 1971L, 1971L, 1971L, 
> 1971L
), Month = c(2L, 2L, 2L, 2L, 2L, 2L), Day = 1:6, Rain = c(0,
0, 0, 0, 0, 0), Tmax = c(24.3, 25, 25.6, 26.5, 27.8, 27.5), Tmin = c(13.5,
13.2, 12.7, 12.7, 12.2, 14)), .Names = c("Year", "Month", "Day",
"Rain", "Tmax", "Tmin"), row.names = c(NA, 6L), class = "data.frame")

Regards,

Frederic.



Frederic Ntirenganya
Maseno University,
African Maths Initiative,
Kenya.
Mobile:(+254)718492836
Email: fr...@aims.ac.za
https://sites.google.com/a/aims.ac.za/fredo/

On Tue, Mar 31, 2015 at 4:46 PM, Frederic Ntirenganya 
wrote:

> Hi All,
>
> Thanks for the help. I want to plot some of the columns on the same graph
> not all of them. Sorry, I failed to follow the instructions. Here is the
> output of *dput()* but I don't know how it works.
>
> > dput(head(data))structure(list(Date = structure(c(-6575, -6209, -5844, 
> > -5479,
> -5114, -4748), class = "Date"), Number.of.Rain.Days = c(86L,
> 96L, 114L, 119L, 123L, 124L), Total.rain = c(1139.952, 977.646,
> 1382.014, 1323.086, 1266.444, 1235.964), Start.of.Rain..i. = c(92L,
> 98L, 92L, 100L, 92L, 92L), Start.of.Rain..ii. = c(239L, 98L,
> 92L, 100L, 92L, 92L), Start.of.Rain..iii. = c(112L, 112L, 120L,
> 125L, 119L, 112L), Start.Rain..iv. = c(112L, 112L, 120L, 174L,
> 119L, 112L), End.of.Rain.Season = c(228L, 229L, 240L, 228L, 228L,
> 228L)), .Names = c("Date", "Number.of.Rain.Days", "Total.rain",
> "Start.of.Rain..i.", "Start.of.Rain..ii.", "Start.of.Rain..iii.",
> "Start.Rain..iv.", "End.of.Rain.Season"), row.names = c(NA, 6L
> ), class = "data.frame")
>
>  I think I need subset function then melt. Here is the approach I used:
>
> d <- subset(df1, 
> select=c(Date,Start.of.Rain..i.,Start.of.Rain..ii.,Start.of.Rain..iii.))
> d
> d2 <- melt(d ,  id = 'Date', variable_name = 'Start')
>
> ggplot(d2, aes(Date,value)) + geom_line(aes(colour = start),type = "h")
>
>  but the error is:
>
> Don't know how to automatically pick scale for object of type function. 
> Defaulting to continuousError in data.frame(colour = function (x, ...)  :
>   arguments imply differing number of rows: 0, 183
>
>
> Thanks,
>
> Frederic.
>
>
>
> Frederic Ntirenganya
> Maseno University,
> African Maths Initiative,
> Kenya.
> Mobile:(+254)718492836
> Email: fr...@aims.ac.za
> https://sites.google.com/a/aims.ac.za/fredo/
>
> On Tue, Mar 31, 2015 at 4:20 PM, stephen sefick  wrote:
>
>> Your data and post is still not provided in one of the formats provided
>> here:
>> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example.
>> I am unsure of what you want to do, but I have made a reproducible example
>> that might help.
>>
>> zz <- "Date Number.of.Rain.Days Total.rain Start.of.Rain..i.
>> Start.of.Rain..ii.   Start.of.Rain..iii.
>>  1952-01-01  86   1139.95292
>>239 11
>>  1953-01-01  96977.64698
>> 98 11
>>  1954-01-01 114   1382.01492
>> 92 12
>>  1955-01-01 119   1323.086   100
>>100 12
>>  1956-01-01 123   1266.44492
>> 92 11
>>  1957-01-01 124   1235.96492
>> 92 11"
>>
>> library(reshape)
>> library(ggplot2)
>>
>> Data <- read.table(text=zz, header = TRUE)
>>
>> df1 <-data.frame(Data)
>>
>> df2 <- melt(df1 ,  id = c('Date', 'Number.of.Rain.Days'))
>>
>> df3 <- df2[-grep("Total.rain", df2$variable),]
>>
>> qplot(Date,value, data=df3) +facet_wrap(~variable)
>>
>> On Tue, Mar 31, 2015 at 2:55 AM, Frederic Ntirenganya 
>> wrote:
>>
>>>  Hi All,
>>>
>>> Sorry for the shape of data which was not good enough.This is how my data 
>>> look like.
>>>
>>> I want to plot multiple using ggplot function from a data frame of many 
>>> columns. I want to plot only Start.of.Rain..i., Start.of.Rain..ii. and  
>>> Start.of.Rain..iii. and I failed to make it. What I want is to compare 
>>> Start.of.Rain..i., Start.of.Rain..ii. and Start.of.Rain..iii. by plotting 
>>> vertical line. I also need to add points to the plot to be able to separate 
>>> them. The x-axis must be date column. Thanks!
>>>
>>> Here is how the data look like and how I tried to make it.
>>>
>>>
>>>
>>> Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii.
>>> Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96 977.646
>>> 98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 100
>>> 12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11
>>>
>>>
>>> Here is how I tried to solve the problem.
>>>
>>> df1 <-data.frame(data)
>>> df1
>>> 

Re: [R] Multiple Plots using ggplot

2015-03-31 Thread stephen sefick
The error message is very informative. You named a column in the melted
data "Start", and told ggplot to use "start". "start" is a function. R is
case sensitive.

On Tue, Mar 31, 2015 at 8:46 AM, Frederic Ntirenganya 
wrote:

> Hi All,
>
> Thanks for the help. I want to plot some of the columns on the same graph
> not all of them. Sorry, I failed to follow the instructions. Here is the
> output of *dput()* but I don't know how it works.
>
> > dput(head(data))structure(list(Date = structure(c(-6575, -6209, -5844, 
> > -5479,
> -5114, -4748), class = "Date"), Number.of.Rain.Days = c(86L,
> 96L, 114L, 119L, 123L, 124L), Total.rain = c(1139.952, 977.646,
> 1382.014, 1323.086, 1266.444, 1235.964), Start.of.Rain..i. = c(92L,
> 98L, 92L, 100L, 92L, 92L), Start.of.Rain..ii. = c(239L, 98L,
> 92L, 100L, 92L, 92L), Start.of.Rain..iii. = c(112L, 112L, 120L,
> 125L, 119L, 112L), Start.Rain..iv. = c(112L, 112L, 120L, 174L,
> 119L, 112L), End.of.Rain.Season = c(228L, 229L, 240L, 228L, 228L,
> 228L)), .Names = c("Date", "Number.of.Rain.Days", "Total.rain",
> "Start.of.Rain..i.", "Start.of.Rain..ii.", "Start.of.Rain..iii.",
> "Start.Rain..iv.", "End.of.Rain.Season"), row.names = c(NA, 6L
> ), class = "data.frame")
>
>  I think I need subset function then melt. Here is the approach I used:
>
> d <- subset(df1, 
> select=c(Date,Start.of.Rain..i.,Start.of.Rain..ii.,Start.of.Rain..iii.))
> d
> d2 <- melt(d ,  id = 'Date', variable_name = 'Start')
>
> ggplot(d2, aes(Date,value)) + geom_line(aes(colour = start),type = "h")
>
>  but the error is:
>
> Don't know how to automatically pick scale for object of type function. 
> Defaulting to continuousError in data.frame(colour = function (x, ...)  :
>   arguments imply differing number of rows: 0, 183
>
>
> Thanks,
>
> Frederic.
>
>
>
> Frederic Ntirenganya
> Maseno University,
> African Maths Initiative,
> Kenya.
> Mobile:(+254)718492836
> Email: fr...@aims.ac.za
> https://sites.google.com/a/aims.ac.za/fredo/
>
> On Tue, Mar 31, 2015 at 4:20 PM, stephen sefick  wrote:
>
>> Your data and post is still not provided in one of the formats provided
>> here:
>> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example.
>> I am unsure of what you want to do, but I have made a reproducible example
>> that might help.
>>
>> zz <- "Date Number.of.Rain.Days Total.rain Start.of.Rain..i.
>> Start.of.Rain..ii.   Start.of.Rain..iii.
>>  1952-01-01  86   1139.95292
>>239 11
>>  1953-01-01  96977.64698
>> 98 11
>>  1954-01-01 114   1382.01492
>> 92 12
>>  1955-01-01 119   1323.086   100
>>100 12
>>  1956-01-01 123   1266.44492
>> 92 11
>>  1957-01-01 124   1235.96492
>> 92 11"
>>
>> library(reshape)
>> library(ggplot2)
>>
>> Data <- read.table(text=zz, header = TRUE)
>>
>> df1 <-data.frame(Data)
>>
>> df2 <- melt(df1 ,  id = c('Date', 'Number.of.Rain.Days'))
>>
>> df3 <- df2[-grep("Total.rain", df2$variable),]
>>
>> qplot(Date,value, data=df3) +facet_wrap(~variable)
>>
>> On Tue, Mar 31, 2015 at 2:55 AM, Frederic Ntirenganya 
>> wrote:
>>
>>>  Hi All,
>>>
>>> Sorry for the shape of data which was not good enough.This is how my data 
>>> look like.
>>>
>>> I want to plot multiple using ggplot function from a data frame of many 
>>> columns. I want to plot only Start.of.Rain..i., Start.of.Rain..ii. and  
>>> Start.of.Rain..iii. and I failed to make it. What I want is to compare 
>>> Start.of.Rain..i., Start.of.Rain..ii. and Start.of.Rain..iii. by plotting 
>>> vertical line. I also need to add points to the plot to be able to separate 
>>> them. The x-axis must be date column. Thanks!
>>>
>>> Here is how the data look like and how I tried to make it.
>>>
>>>
>>>
>>> Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii.
>>> Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96 977.646
>>> 98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 100
>>> 12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11
>>>
>>>
>>> Here is how I tried to solve the problem.
>>>
>>> df1 <-data.frame(data)
>>> df1
>>> df2 <- melt(df1 ,  id = 'Date', variable_name = 'start of Rains')
>>> df2
>>>
>>> ggplot(df2, aes(Date,value)) + geom_line(aes(colour ="red"),type = "h")
>>>
>>> Kindly any help is welcome. Thanks
>>>
>>> Regards,
>>> Frederic.
>>>
>>> Frederic Ntirenganya
>>> Maseno University,
>>> African Maths Initiative,
>>> Kenya.
>>> Mobile:(+254)718492836
>>> Email: fr...@aims.ac.za
>>> https://sites.google.com/a/aims.ac.za/fredo/
>>>
>>> On Tue, Mar 31, 2015 at 9:24 AM, Jeff Newmiller <
>>> jdnew...@dcn.davis.ca.us> wrote:
>>>
 This is no better because (a) you are still posting using HTML format,
 and (b) using p

Re: [R] Multiple Plots using ggplot

2015-03-31 Thread John Kane
Hi Frederic,

Thanks for sending the data in dput() format. All it does in convert a data set 
into a standardized format (perfect copy) that anyone with R can read. People 
have different setups and defaults for reading data and so on and what you may 
read in to R as a character variable may be a factor when I read it it in and 
we can have some serious problems just trying to decide what the data looks 
like. 

I had a look at your code and it is confused. See my comments below

d <- subset(df1, 
select=c(Date,Start.of.Rain..i.,Start.of.Rain..ii.,Start.of.Rain..iii.)) 

d 

d2 <- melt(d , id = 'Date', variable_name = 'Start') 

# You do not have any variable in your data.frame called “Start”

# Reshape2 seems to have just ignored “variable_name = 'Start' and did the melt 
based on id = 'Date'. Strange, I would have expected an error but it worked !

d2 <- melt(d , id = 'Date') will give you exactly the same result.

ggplot(d2, aes(Date,value)) + geom_line(aes(colour = start),type = "h") 

Again you do not have a variable (column name) called 'start'. You have three 
column names (variables) in d2 These are "Date" "variable" and "value" .

ggplot(d2, aes(Date,value)) + geom_line(aes(colour = start),type = "h") 

Point one, you have no variable called start.  

Point two, what is type = “h” doing here? It is, as far as I can see not an 
option in geom_line for such an option. See ?geom_line for this point.

 I think you are confusing basic graphics commands ("type =")  with ggplot 
commands. Have a look at 
http://www.cookbook-r.com/Graphs/Shapes_and_line_types/ for some examples that 
show the differences.

Below is what I think you may be trying to do (note I use dat1 for the 
data.frame rather than your df1).

###==
dat1  <-  structure(list(Date = structure(c(-6575, -6209, -5844, -5479,
-5114, -4748), class = "Date"), Number.of.Rain.Days = c(86L,
96L, 114L, 119L, 123L, 124L), Total.rain = c(1139.952, 977.646,
1382.014, 1323.086, 1266.444, 1235.964), Start.of.Rain..i. = c(92L,
98L, 92L, 100L, 92L, 92L), Start.of.Rain..ii. = c(239L, 98L,
92L, 100L, 92L, 92L), Start.of.Rain..iii. = c(112L, 112L, 120L,
125L, 119L, 112L), Start.Rain..iv. = c(112L, 112L, 120L, 174L,
119L, 112L), End.of.Rain.Season = c(228L, 229L, 240L, 228L, 228L,
228L)), .Names = c("Date", "Number.of.Rain.Days", "Total.rain",
"Start.of.Rain..i.", "Start.of.Rain..ii.", "Start.of.Rain..iii.",
"Start.Rain..iv.", "End.of.Rain.Season"), row.names = c(NA, 6L
), class = "data.frame")

dd <- subset(dat1, 
select=c(Date,Start.of.Rain..i.,Start.of.Rain..ii.,Start.of.Rain..iii.))

d2 <- melt(dd ,  id = 'Date')

ggplot(d2, aes(Date,value)) + geom_line(aes(colour = variable))

ggplot(d2, aes(Date, value)) + 
   geom_histogram(  position="dodge",  stat = "identity", aes(fill = 
variable))

##
John Kane
Kingston ON Canada


> -Original Message-
> From: ntfr...@gmail.com
> Sent: Tue, 31 Mar 2015 16:55:56 +0300
> To: ssef...@gmail.com
> Subject: Re: [R] Multiple Plots using ggplot
> 
> Hi John,
> 
> Sorry for the mistake I made for providing useless data.
> Here I am interest only on Tmin and Tmax columns. I want to use the same
> approach with the previous data. I want to plot on the same graph not
> separate graph. Thanks
> 
>> dput(head(BUTemp))structure(list(Year = c(1971L, 1971L, 1971L, 1971L,
>> 1971L, 1971L
> ), Month = c(2L, 2L, 2L, 2L, 2L, 2L), Day = 1:6, Rain = c(0,
> 0, 0, 0, 0, 0), Tmax = c(24.3, 25, 25.6, 26.5, 27.8, 27.5), Tmin =
> c(13.5,
> 13.2, 12.7, 12.7, 12.2, 14)), .Names = c("Year", "Month", "Day",
> "Rain", "Tmax", "Tmin"), row.names = c(NA, 6L), class = "data.frame")
> 
> Regards,
> 
> Frederic.
> 
> 
> 
> Frederic Ntirenganya
> Maseno University,
> African Maths Initiative,
> Kenya.
> Mobile:(+254)718492836
> Email: fr...@aims.ac.za
> https://sites.google.com/a/aims.ac.za/fredo/
> 
> On Tue, Mar 31, 2015 at 4:46 PM, Frederic Ntirenganya 
> wrote:
> 
>> Hi All,
>> 
>> Thanks for the help. I want to plot some of the columns on the same
>> graph
>> not all of them. Sorry, I failed to follow the instructions. Here is the
>> output of *dput()* but I don't know how it works.
>> 
>>> dput(head(data))structure(list(Date = structure(c(-6575, -6209, -5844,
>>> -5479,
>> -5114, -4748), class = "Date"), Number.of.Rain.Days = c(86L,
>> 96L, 114L, 119L, 123L, 124L), Total.rain = c(1139.952, 977.646,
>> 1382.014, 1323.086, 1266.444, 1235.964), Start.of.Rain..i. = c(92L,
>> 98L, 92L, 100L, 92L, 92L), Start.of.Rain..ii. = c(239L, 98L,
>> 92L, 100L, 92L, 92L), Start.of.Rain..iii. = c(112L, 112L, 120L,
>> 125L, 119L, 112L), Start.Rain..iv. = c(112L, 112L, 120L, 174L,
>> 119L, 112L), End.of.Rain.Season = c(228L, 229L, 240L, 228L, 228L,
>> 228L)), .Names = c("Date", "Number.of.Rain.Days", "Total.rain",
>> "Start.of.Rain..i.", "Start.of.Rain..ii.", "Start.of.Rain..iii.",
>> "Start.Rain..iv.", "End.of.Rain.Season"), row.names = c(NA, 6L
>> ), class = "da

Re: [R] data.frame: data-driven column selections that vary by row??

2015-03-31 Thread John Kane
I think we need some data and code 
Reproducibility
https://github.com/hadley/devtools/wiki/Reproducibility
 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example



John Kane
Kingston ON Canada


> -Original Message-
> From: r...@catwhisker.org
> Sent: Mon, 30 Mar 2015 06:50:59 -0700
> To: r-help@r-project.org
> Subject: [R] data.frame: data-driven column selections that vary by row??
> 
> Sorry if that's confusing: I'm probably confused. :-(
> 
> I am collecting and trying to analyze data regarding performance of
> computer systems.
> 
> After extracting the data from its repository, I have created and
> used a Perl script to generate a (relatively) simple CSV, each
> record of which contains:
> * a POSIXct timestamp
> * a hostname
> * a collection of metrics for the interval identified by the timestamp,
>   and specific to the host in question, as well as some factors to
>   group the hosts (e.g., whether it's in a "control" vs. a "test"
>   group; a broad categorization of how the host is provisioned; which
>   version of the software it was running at the time...).  (Each
>   metric and factor is in a uniquely-named column.)
> 
> As extracted from the repository, there were several records for each
> such hostname/timestamp pair -- e.g., there would be separate records
> for:
> * Input bandwidth utilization for network interface 1
> * Output bandwidth utilization for network interface 1
> * Input bandwidth utilization for network interface 2
> * Output bandwidth utilization for network interface 2
> 
> (And the same field would be used for each of these -- the
> interpretation being driven by the content of other fields in teh
> record.)
> 
> Working with the data as described (immediately) above directly in R
> seemed... daunting, at best: thus the excursion into Perl.
> 
> And for some of the data, what I have works well enough.
> 
> But now I also want to analyze information from disk drives, and things
> get messy (as far as I can see).
> 
> First, each disk drive has a collection of 17 metrics (such as
> "busy_pct", "kb_per_transfer_read", and "transfers_per_second_write"),
> as well as a factor ("dev_type").  Each also has a device name that is
> unique within the host where it resides (e.g. "da1", "da2", "da3").
> (The "dev_type" factor identifies whether the drive is a solid-state
> device or a spinning disk.)
> 
> I have thus made the corresponding columns unique by pasting the drive
> name and the name of the metric (or factor), separating the two with
> "_" (e.g. "da7_busy_pct"; "ada0_mb_per_second_write";
> "ada4_queue_length").  I am not certain that's the best thing I could
> have done -- and I'm open to changing the approach.
> 
> The challenge for me is that different (classes of) machines are
> provisioned differently; some consequennces of that:
> * While da1 may be a spinning disk on host A, that has no bearing on
>   whether or not the "da1" on host B is a spinning disk or an SSD.
> * Host C may not even have a "da1" device.
> * Host D may be of a type that normally has a "da1," but in this case,
>   the drive has failed and has been disabled (so host D won't report
>   anything about "da1").
> 
> (I'm not too bothered about the "non-reporting" case, but cite it so we
> all know about it.)
> 
> I expect I will want to be using groupings:
> * All disk devices -- this one is easy.
> * All SSD devices (excluding spinning disks).
> * All spinning disks (excluding SSDs).
> 
> I'm having trouble with the latter two (though, certainly, if I solve
> one, the other is also solved).
> 
> Also, for some  of the metrics, I will want to sum them; for others,
> I will want to do other things -- find minima or maxima, or average
> them.  So pre-calculating such aggregates in the Perl script isn't
> something that appeals to me.
> 
> Finally (as far as complications go), I'm trying to write the code in
> such a way that if we deploy a new configuration of machine that has
> (say) twice as many drives as the biggest one we presently deploy, the
> code Just Works -- I shouldn't need to update the code merely to adapt
> to another hardware configuration.
> 
> I have been able to write a function that takes the data.frame obtained
> by reading the above-cited CSV, and generates a data.frame with a row
> for each host, and depicts the "dev_type" for each device for that host;
> here's an abbreviated (and slightly redacted) copy of its output to
> illustrate some of the above:
> 
>ada0 ada1 ada2 ada3 ada4 ada5 da30 da31 da32 da33 da34 da35 da36
> da3
> host_A  ssd  ssd  hdd  hdd  hdd  hdd  hdd  hdd  hdd  hdd  hdd  hdd  hdd
> hdd
> host_B  ssd  ssd  hdd  hdd  hdd  hdd  hdd  hdd  hdd  hdd  hdd  hdd  hdd
> hdd
> host_G  ssd  ssd  ssd  ssd  ssd  ssd
> ssd
> host_H  ssd  ssd  ssd  ssd  ssd  ssd
> ssd
> host_M  ssd  ssd  ssd  ssd  ssd  ssd
> ssd
> host_N  ssd  ssd  ssd  ssd  ssd  ssd
> ssd
> 
> (That function is written with the explicit assumption(!) that 

Re: [R] Debug package options

2015-03-31 Thread Keith S Weintraub
Duncan,
Thanks for the help.

Since I am the only person using this machine and I couldn’t figure out where 
to put the option statement aside from:
C:\Program Files\R\R-3.1.2\etc
In the file Rprofile.site

The option that I wanted was:
options(debug.font = "Consolas 12”)

Which allowed me to have the right size font and Tk window to be able to do 
debugging using the debug package.

In case you are interested I use Windows 7 on my Mac via Parallels.

Thanks again,
Best,
KW



> On Mar 30, 2015, at 2:05 PM, Duncan Murdoch  wrote:
> 
> On 30/03/2015 1:50 PM, Keith S Weintraub wrote:
>> Folks,
>> 
>> I would like change some of the options for the Tk window that pops up when 
>> using the debug package.
>> 
>> I know how to change the options: e.g. options(debug.font = "Courier 12 
>> italic”).
>> 
>> Is there a way to “preset” these in my environment so when debug starts up I 
>> have all the options set up the way I want them?
>> 
>> Do I do this in a .First file? Does the .First file have to load the debug 
>> package every time I start up R?
>> 
>> No need to do my work for me. Just point me to the right doc.
> 
> See the ?Startup help topic.  You probably want to use one of the
> profile files rather than .First, because .First needs to be in a
> workspace, and you shouldn't be loading a workspace every time.
> 
> Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Calculating Kendall's tau

2015-03-31 Thread Desta Yoseph via R-help
I am analyzing trend  using Mann-kendall  test for 31 independent sample, each 
sample  have 34 years dataset.  I supposed to find Kendall “tau” for each 
sample. The data is arranged in column wise (I attached  the data).To find 
Kendall tau, I wrote R script as:
     desta<-read.csv("rainfall.csv", header=T, sep=",")     require(Kendall)    
          MK<-function(y) {                 nc<-ncol(y)                 
MannKendalltau<- numeric(nc)                 for(i in 2:nc){                    
      MannKendalltau[i]<-MannKendall(y[,i])   }    
MannKendalltau    }    MK(desta)
The  displayed result showed  both “tau”  and “2-sided p-value”in unorganized 
way.  But, I want only “tau” value that is presented in organized  manner. 
Anyone can tell me how can I get orderly displayed  “tau” value? here is my 
sample result:      [[1]][1] 0
[[2]][1] 0.4352941attr(,"Csingle")[1] TRUE
[[3]][1] 0.5462185attr(,"Csingle")[1] TRUE
[[4]][1] 0.4218487attr(,"Csingle")[1] TRUEThank you for your guidance 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating Kendall's tau

2015-03-31 Thread Bert Gunter
This sounds like homework. Homework is discouraged on this list (but
you might get lucky).

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Tue, Mar 31, 2015 at 9:08 AM, Desta Yoseph via R-help
 wrote:
> I am analyzing trend  using Mann-kendall  test for 31 independent sample, 
> each sample  have 34 years dataset.  I supposed to find Kendall “tau” for 
> each sample. The data is arranged in column wise (I attached  the data).To 
> find Kendall tau, I wrote R script as:
>  desta<-read.csv("rainfall.csv", header=T, sep=",") require(Kendall)  
> MK<-function(y) { nc<-ncol(y) 
> MannKendalltau<- numeric(nc) for(i in 2:nc){  
> MannKendalltau[i]<-MannKendall(y[,i])   }
> MannKendalltau}MK(desta)
> The  displayed result showed  both “tau”  and “2-sided p-value”in unorganized 
> way.  But, I want only “tau” value that is presented in organized  manner. 
> Anyone can tell me how can I get orderly displayed  “tau” value? here is my 
> sample result:  [[1]][1] 0
> [[2]][1] 0.4352941attr(,"Csingle")[1] TRUE
> [[3]][1] 0.5462185attr(,"Csingle")[1] TRUE
> [[4]][1] 0.4218487attr(,"Csingle")[1] TRUEThank you for your guidance
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in lm() with very small (close to zero) regressor

2015-03-31 Thread William Dunlap
If you really want your coefficient estimates to be scale-equivariant you
should test those methods for such a thing.  E.g., here are functions that
let you check how scaling one predictor affects the estimated coefficients
- they should give the same results for any scale factor.

f <-
function (scale=1, n=100, data=data.frame(Y=seq_len(n),
X1=sqrt(seq_len(n)), X2=log(seq_len(n
{
cf <- coef(lm(data=data, Y ~ X1 + I(X2/scale)))
cf * c(1, 1, 1/scale)
}
g <-
function (scale=1, n=100, data=data.frame(Y=seq_len(n),
X1=sqrt(seq_len(n)), X2=log(seq_len(n
{
cf <- coef(fastLm(data=data, Y ~ X1 + I(X2/scale), method=4))
cf * c(1, 1, 1/scale)
}
h <-
function (scale=1, n=100, data=data.frame(Y=seq_len(n),
X1=sqrt(seq_len(n)), X2=log(seq_len(n
{
cf <- coef(fastLm(data=data, Y ~ X1 + I(X2/scale), method=5))
cf * c(1, 1, 1/scale)
}

See how they compare for scale factors between 10^-15 and 10^15.  lm() is
looking pretty good.
> options(digits=4)
> scale <- 10 ^ seq(-15,15,by=5)
> sapply(scale, f)
   [,1][,2][,3][,4][,5][,6][,7]
(Intercept)  -9.393  -9.393  -9.393  -9.393  -9.393  -9.393  -9.393
X1   19.955  19.955  19.955  19.955  19.955  19.955  19.955
I(X2/scale) -20.372 -20.372 -20.372 -20.372 -20.372 -20.372 -20.372
> sapply(scale, g)
 [,1][,2][,3][,4][,5][,6]   [,7]
(Intercept) 0.000e+00  -9.393  -9.393  -9.393  -9.393  -9.393 -3.126e+01
X1  2.772e-29  19.955  19.955  19.955  19.955  19.955  1.218e+01
I(X2/scale) 1.474e+01 -20.372 -20.372 -20.372 -20.372 -20.372 -2.892e-29
> sapply(scale, h)
 [,1]  [,2][,3][,4][,5]   [,6]
[,7]
(Intercept) 0.000e+00 3.807e-20  -9.395  -9.393  -9.393 -3.126e+01
-3.126e+01
X1  2.945e-29 2.772e-19  19.954  19.955  19.955  1.218e+01
 1.218e+01
I(X2/scale) 1.474e+01 1.474e+01 -20.369 -20.372 -20.372 -2.892e-19
 6.596e-30



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Mar 31, 2015 at 5:10 AM, RiGui  wrote:

> I found a fix to my problem using the fastLm() from package RcppEigen,
> using
> the Jacobi singular value decomposition (SVD) (method 4) or a method based
> on the eigenvalue-eigenvector decomposition of X'X - method 5 of the fastLm
> function
>
>
>
> install.packages("RcppEigen")
> library(RcppEigen)
>
> n_obs <- 1500
> y  <- rnorm(n_obs, 10,2.89)
> x1 <- rnorm(n_obs, 0.01235657,0.45)
> x2 <- rnorm(n_obs, 10,3.21)
> X  <- cbind(x1,x2)
>
>
>
> bFE <- fastLm(y ~ x1 + x2, method =4)
> bFE
>
> Call:
> fastLm.formula(formula = y ~ x1 + x2, method = 4)
>
> Coefficients:
> (Intercept)  x1  x2
> 9.94832839474159414 0.12293 0.00440078989949841
>
>
> Best,
>
> Raluca
>
>
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Error-in-lm-with-very-small-close-to-zero-regressor-tp4705185p4705328.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating Kendall's tau

2015-03-31 Thread Bert Gunter
OK.

But always reply to the list (which I am ccing here) so that everyone
knows -- and re-submit your OP in **PLAIN TEXT**, not html, as this is
a plain text  list and html typically garbles everything.

Also, reading and following the posting guide (see end of this email)
generally improves your chance of getting useful help.

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Tue, Mar 31, 2015 at 9:24 AM, Desta Yoseph  wrote:
> Dear Bert,
> It is not homework. Actually my real work is for 10,360 sample data. But if
> some one showed me for 31 sample dataset, i can manage for large sample
> data.
> hopefully this give you few hint why i really want  someone help.
> cheers
>
>
>
> On Tuesday, March 31, 2015 6:14 PM, Bert Gunter 
> wrote:
>
>
> This sounds like homework. Homework is discouraged on this list (but
> you might get lucky).
>
> Cheers,
> Bert
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
> (650) 467-7374
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
> Clifford Stoll
>
>
>
>
> On Tue, Mar 31, 2015 at 9:08 AM, Desta Yoseph via R-help
>  wrote:
>> I am analyzing trend  using Mann-kendall  test for 31 independent sample,
>> each sample  have 34 years dataset.  I supposed to find Kendall “tau” for
>> each sample. The data is arranged in column wise (I attached  the data).To
>> find Kendall tau, I wrote R script as:
>>  desta<-read.csv("rainfall.csv", header=T, sep=",")
>> require(Kendall)  MK<-function(y) {nc<-ncol(y)
>> MannKendalltau<- numeric(nc)for(i in 2:nc){
>> MannKendalltau[i]<-MannKendall(y[,i])  }MannKendalltau
>> }MK(desta)
>> The  displayed result showed  both “tau”  and “2-sided p-value”in
>> unorganized way.  But, I want only “tau” value that is presented in
>> organized  manner. Anyone can tell me how can I get orderly displayed  “tau”
>> value? here is my sample result:  [[1]][1] 0
>> [[2]][1] 0.4352941attr(,"Csingle")[1] TRUE
>> [[3]][1] 0.5462185attr(,"Csingle")[1] TRUE
>> [[4]][1] 0.4218487attr(,"Csingle")[1] TRUEThank you for your guidance
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] changing column labels for data frames inside a list

2015-03-31 Thread r-help
> Date: Mon, 30 Mar 2015 09:54:39 -0400
> From: Vikram Chhatre 
> To: r-help@r-project.org
> Subject: [R] changing column labels for data frames inside a list
> Message-ID:
>  Content-Type: text/plain; charset="UTF-8"
>
> > summary(mygenfreqt)
>   Length Class  Mode
> dat1.str 59220  -none- numeric
> dat2.str 59220  -none- numeric
> dat3.str 59220  -none- numeric
>
> > head(mylist[[1]])
>1 2 3 4 5 6 7 8 910
>  12
> L0001.1 0.60 0.500 0.325 0.675 0.600 0.500 0.500 0.375 0.550 0.475 0.3
> 0.275
> L0001.2 0.40 0.500 0.675 0.325 0.400 0.500 0.500 0.625 0.450 0.525 0.6
> 0.725
>
> I want to change 1:12 to pop1:pop12
>
> mylist<- lapply(mylist, function(e) colnames(e) <- paste0('pop',1:12))
>
> What this is doing is replacing the data frames with just names
> pop1:pop12.  I just want to replace the column labels.
>
> Thanks for any suggestions.

Some readers have already replied, but here is another option that exploits 
lapply()'s "..." parameter.  First, we make a reproducible example.

(lista <- list(mtcars, mtcars))

Now, we get the unique number of columns of the data frames in the variable 
"lista".

(n.cols <- unique(sapply(lista, ncol)))

Finally, we call lapply() and `colnames<-` to change the column names of both 
data frames in "lista".  See lapply()'s "..." parameter (?lapply).

(lista <- lapply(X = lista, FUN = `colnames<-`, paste0("pop", seq_len(n.cols

> [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to deal with changing weighting functions

2015-03-31 Thread Adams, Jean
Can you give a concrete simple example of inputs with expected results?  Is
phi a function?  Of omega 1 and 2?  Is the summation over everything
through V_d-k?

On Mon, Mar 30, 2015 at 2:58 PM, T.Riedle  wrote:

> Hi everybody,
> Does anybody have an idea how I can generate tau according to the attached
> formula? The point is that phi changes with k and I thought I could make it
> by using a for-function in R but I am not sure how to do that.
>
> Could anyone help me?
> Thanks in advance.
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Randomly interleaving data frames while preserving order

2015-03-31 Thread Kevin E. Thorpe

Hello.

I am trying to simulate recruitment in a randomized trial. Suppose I 
have three streams (strata) of patients represented by these data frames.


df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)

What I need to do is construct a data frame with all of these combined 
where the order of selection from one of the three data frames is 
randomized but once a stratum is selected patients are selected 
sequentially from that data frame.


To see what I'm looking to achieve, suppose the first five subjects were 
to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The 
expected result should look like this:


rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,])
   strat id  pid
1  1  1 1001
2  2  1 2001
21 1  2 1002
4  3  1 3001
22 2  2 2002

I hope what I'm trying to accomplish makes sense. Maybe I'm missing 
something obvious, but I really have no idea at the moment how to 
achieve this elegantly. Since I need to simulate many trial recruitments 
it needs to be general and compact.


I appreciate any advice.

Kevin

--
Kevin E. Thorpe
Head of Biostatistics,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael's
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data.frame: data-driven column selections that vary by row??

2015-03-31 Thread David Wolfskill
On Tue, Mar 31, 2015 at 07:11:28AM -0800, John Kane wrote:
> I think we need some data and code 
> Reproducibility
> https://github.com/hadley/devtools/wiki/Reproducibility
>  
> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
> 

I apologize for failing to provide that.

Here is a quite small subset of the data (with a few edits to reduce
excess verbosity in names of things) that still illustrates the
challenge I perceive:

> dput(bw)
structure(list(timestamp = c(1426892400L, 1426892400L, 1426892400L, 
1426892400L, 1426892400L, 1426892400L, 1426892460L, 1426892460L, 
1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892520L, 
1426892520L, 1426892520L, 1426892520L, 1426892520L, 1426892520L
), hostname = c("c001", "c002", "c021", "c022", "c041", "c051", 
"c001", "c002", "c021", "c022", "c041", "c051", "c001", "c002", 
"c021", "c022", "c041", "c051"), health = c(0.054937499983, 
0.25058541667, 1, 1, 0.577784167075767, 0.546805261621527, 
0.1599375, 0.24954375, 1, 1, 0.582307554123614, 0.558298168996525, 
0.2813125, 0.27087708333, 1, 1, 0.579231349457365, 0.542973020177151
), hw = c(1.9, 1.9, 1.4, 1.4, 1.5, 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 
1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 1.5), fw = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = "2015Q1.2", class = "factor"), role = structure(c(1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L), .Label = c("control", "test"), class = "factor"), type = structure(c(3L, 
3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 
2L), .Label = c("D", "F", "H"), class = "factor"), da20_busy_pct = c(79.1, 
62.8, NA, NA, NA, NA, 75, 64.8, NA, NA, NA, NA, 72.2, 74.5, NA, 
NA, NA, NA), da20_dev_type = structure(c(2L, 2L, 1L, 1L, 1L, 
1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("", 
"hdd"), class = "factor"), da20_kb_per_xfer_read = c(727.23, 
665.81, NA, NA, NA, NA, 737.04, 691.38, NA, NA, NA, NA, 721.71, 
668.96, NA, NA, NA, NA), da20_kb_per_xfer_write = c(0, 0, NA, 
NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_mb_per_sec_read = 
c(39.77, 
31.21, NA, NA, NA, NA, 36.71, 32.41, NA, NA, NA, NA, 35.94, 37.24, 
NA, NA, NA, NA), da20_mb_per_sec_write = c(0, 0, NA, NA, NA, 
NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_ms_per_xactn_read = 
c(43.5, 
31.6, NA, NA, NA, NA, 35.7, 30.2, NA, NA, NA, NA, 32.7, 34.6, 
NA, NA, NA, NA), da20_ms_per_xactn_write = c(0, 0, NA, NA, NA, 
NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_Q_length = c(0, 
0, NA, NA, NA, NA, 2, 0, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA
), da20_xfers_per_sec_other = c(0, 0, NA, NA, NA, NA, 0, 0, NA, 
NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_xfers_per_sec_read = c(56, 
48, NA, NA, NA, NA, 51, 48, NA, NA, NA, NA, 51, 57, NA, NA, NA, 
NA), da20_xfers_per_sec_write = c(0, 0, NA, NA, NA, NA, 0, 0, 
NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da2_busy_pct = c(84.5, 
81.8, 29.5, 26.7, 55.5, 50.9, 80.6, 79.7, 29.2, 27.3, 58.8, 50.2, 
74.6, 79.3, 29.4, 26.6, 55.4, 50.1), da2_dev_type = structure(c(2L, 
2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 
3L), .Label = c("", "hdd", "ssd"), class = "factor"), da2_kb_per_xfer_read = 
c(690.67, 
686.63, 613.78, 587, 571.64, 553.27, 692.26, 660.05, 612.01, 
594.28, 560.16, 566.41, 672.68, 670.25, 604.64, 592.16, 565.02, 
564.43), da2_kb_per_xfer_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), da2_mb_per_sec_read = c(44.52, 41.57, 
134.26, 120.38, 252.88, 229.09, 41.24, 39.96, 132.68, 123.61, 
268.04, 227.34, 37.44, 39.93, 133.45, 120.28, 251.06, 225.99), 
da2_mb_per_sec_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0), da2_ms_per_xactn_read = c(49.1, 47.8, 
2, 1.8, 2.6, 2.4, 40.3, 43.9, 2, 1.8, 2.8, 2.4, 37.1, 40.9, 
1.9, 1.8, 2.6, 2.4), da2_ms_per_xactn_write = c(0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_Q_length = c(0, 
2, 0, 1, 3, 0, 3, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 3), da2_xfers_per_sec_other 
= c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_xfers_per_sec_read 
= c(66, 
62, 224, 210, 453, 424, 61, 62, 222, 213, 490, 411, 57, 61, 
226, 208, 455, 410), da2_xfers_per_sec_write = c(0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("timestamp", 
"hostname", "health", "hw", "fw", "role", "type", "da20_busy_pct", 
"da20_dev_type", "da20_kb_per_xfer_read", "da20_kb_per_xfer_write", 
"da20_mb_per_sec_read", "da20_mb_per_sec_write", "da20_ms_per_xactn_read", 
"da20_ms_per_xactn_write", "da20_Q_length", "da20_xfers_per_sec_other", 
"da20_xfers_per_sec_read", "da20_xfers_per_sec_write", "da2_busy_pct", 
"da2_dev_type", "da2_kb_per_xfer_read", "da2_kb_per_xfer_write", 
"da2_mb_per_sec_read", "da2_mb_per_sec_write", "da2_ms_per_xactn_read", 
"da2_ms_per_xactn_write", "da2_Q_length", "da2_xfers_per_sec_other", 
"da2_xfers_per_sec_read", "da2_xfers_per_sec_write"), class = "data.frame", 
row.name

Re: [R] Randomly interleaving data frames while preserving order

2015-03-31 Thread Sarah Goslee
That's a fun one. Here's one possible approach. (Note that it can be
done without using a loop, but I find that a loop here increases
readability.)

I wrote it to work on a list of data frames. If the selection is
random, I'd set it up so that size is passed to the function, but
selection is generated within the function using sample().

recruitment <- function(dflist, selection) {
results <- data.frame(matrix(NA, nrow=length(selection),
ncol=ncol(dflist[[1]])))
colnames(results) <- colnames(dflist[[1]])
for(i in unique(selection)) {
results[selection == i, ] <- dflist[[i]][seq_len(sum(selection == i)),]
}
results
}


# and your example:


df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)

touse <- c(1, 2, 1, 3, 1) # could be generated using sample

dfall <- list(df1, df2, df3)

touse <- c(1, 2, 1, 3, 1)
# could be generated using sample given the size argument
# touse <- sample(seq_along(dfall), size=5, replace=TRUE)

> recruitment(dfall, touse)
  strat id  pid
1 1  1 1001
2 2  1 2001
3 1  2 1002
4 3  1 3001
5 1  3 1003

Sarah

On Tue, Mar 31, 2015 at 1:05 PM, Kevin E. Thorpe
 wrote:
> Hello.
>
> I am trying to simulate recruitment in a randomized trial. Suppose I have
> three streams (strata) of patients represented by these data frames.
>
> df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
> df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
> df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)
>
> What I need to do is construct a data frame with all of these combined where
> the order of selection from one of the three data frames is randomized but
> once a stratum is selected patients are selected sequentially from that data
> frame.
>
> To see what I'm looking to achieve, suppose the first five subjects were to
> come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The expected
> result should look like this:
>
> rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,])
>strat id  pid
> 1  1  1 1001
> 2  2  1 2001
> 21 1  2 1002
> 4  3  1 3001
> 22 2  2 2002
>
> I hope what I'm trying to accomplish makes sense. Maybe I'm missing
> something obvious, but I really have no idea at the moment how to achieve
> this elegantly. Since I need to simulate many trial recruitments it needs to
> be general and compact.
>
> I appreciate any advice.
>
> Kevin
>

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Randomly interleaving data frames while preserving order

2015-03-31 Thread Duncan Murdoch
On 31/03/2015 1:05 PM, Kevin E. Thorpe wrote:
> Hello.
> 
> I am trying to simulate recruitment in a randomized trial. Suppose I 
> have three streams (strata) of patients represented by these data frames.
> 
> df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
> df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
> df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)
> 
> What I need to do is construct a data frame with all of these combined 
> where the order of selection from one of the three data frames is 
> randomized but once a stratum is selected patients are selected 
> sequentially from that data frame.
> 
> To see what I'm looking to achieve, suppose the first five subjects were 
> to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The 
> expected result should look like this:
> 
> rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,])
> strat id  pid
> 1  1  1 1001
> 2  2  1 2001
> 21 1  2 1002
> 4  3  1 3001
> 22 2  2 2002
> 
> I hope what I'm trying to accomplish makes sense. Maybe I'm missing 
> something obvious, but I really have no idea at the moment how to 
> achieve this elegantly. Since I need to simulate many trial recruitments 
> it needs to be general and compact.
> 
> I appreciate any advice.

How about something like this:

# Permute an ordered vector of selections:
sel <- sample(c(rep(1, nrow(df1)), rep(2, nrow(df2)), rep(3, nrow(df3

# Create an empty dataframe to hold the results
df <- data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),]

# Put the original dataframes into the appropriate slots:
df[sel == 1,] <- df1
df[sel == 2,] <- df2
df[sel == 3,] <- df3

# Clean up the rownames
rownames(df) <- NULL

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] idiom for constructing data frame

2015-03-31 Thread Sarah Goslee
Hi folks,

I KNOW there has to be a way to do this more elegantly, but I
consistently fail to come up with it, as I was just reminded while
writing an example for a query on this list.

What's a nifty way to construct a data frame of a given size? The only
way I know of it to use matrix(), eg

data.frame(matrix(NA, nrow=10, ncol=3))

and then to set the colnames in a second step.

This comes up a lot when pre-allocated a data frame before using a
loop: I know the size and column names, but want an empty structure to
fill later.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] idiom for constructing data frame

2015-03-31 Thread Sarah Goslee
I just snagged this from Duncan Murdoch's reply to the same question:

# Create an empty dataframe to hold the results
df <- data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),]

This skips matrix(), but how to set the column names programmatically
within a function?

Sarah, still sure I'm missing something obvious


On Tue, Mar 31, 2015 at 1:46 PM, Sarah Goslee  wrote:
> Hi folks,
>
> I KNOW there has to be a way to do this more elegantly, but I
> consistently fail to come up with it, as I was just reminded while
> writing an example for a query on this list.
>
> What's a nifty way to construct a data frame of a given size? The only
> way I know of it to use matrix(), eg
>
> data.frame(matrix(NA, nrow=10, ncol=3))
>
> and then to set the colnames in a second step.
>
> This comes up a lot when pre-allocated a data frame before using a
> loop: I know the size and column names, but want an empty structure to
> fill later.
>
> Sarah
>

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Randomly interleaving data frames while preserving order

2015-03-31 Thread Kevin E. Thorpe

On 03/31/2015 01:44 PM, Duncan Murdoch wrote:

On 31/03/2015 1:05 PM, Kevin E. Thorpe wrote:

Hello.

I am trying to simulate recruitment in a randomized trial. Suppose I
have three streams (strata) of patients represented by these data frames.

df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)

What I need to do is construct a data frame with all of these combined
where the order of selection from one of the three data frames is
randomized but once a stratum is selected patients are selected
sequentially from that data frame.

To see what I'm looking to achieve, suppose the first five subjects were
to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The
expected result should look like this:

rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,])
 strat id  pid
1  1  1 1001
2  2  1 2001
21 1  2 1002
4  3  1 3001
22 2  2 2002

I hope what I'm trying to accomplish makes sense. Maybe I'm missing
something obvious, but I really have no idea at the moment how to
achieve this elegantly. Since I need to simulate many trial recruitments
it needs to be general and compact.

I appreciate any advice.


How about something like this:

# Permute an ordered vector of selections:
sel <- sample(c(rep(1, nrow(df1)), rep(2, nrow(df2)), rep(3, nrow(df3

# Create an empty dataframe to hold the results
df <- data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),]

# Put the original dataframes into the appropriate slots:
df[sel == 1,] <- df1
df[sel == 2,] <- df2
df[sel == 3,] <- df3

# Clean up the rownames
rownames(df) <- NULL

Duncan Murdoch



Thanks Duncan.

Once you see the solution it is indeed obvious.

Kevin

--
Kevin E. Thorpe
Head of Biostatistics,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael's
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Randomly interleaving data frames while preserving order

2015-03-31 Thread Tom Wright
samples<-sample(c(rep(1,10),rep(2,10),rep(3,10)),30)
samples[samples==1]<-1001:1010
samples[samples==2]<-2001:2010
samples[samples==3]<-3001:3010

fullDf<-rbind(df1,df2,df3)

fullDf[sort(order(samples),index.return=TRUE)$ix,]

On Tue, 2015-03-31 at 13:05 -0400, Kevin E. Thorpe wrote:
> Hello.
> 
> I am trying to simulate recruitment in a randomized trial. Suppose I 
> have three streams (strata) of patients represented by these data frames.
> 

> 
> What I need to do is construct a data frame with all of these combined 
> where the order of selection from one of the three data frames is 
> randomized but once a stratum is selected patients are selected 
> sequentially from that data frame.
> 
> To see what I'm looking to achieve, suppose the first five subjects were 
> to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The 
> expected result should look like this:
> 
> rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,])
> strat id  pid
> 1  1  1 1001
> 2  2  1 2001
> 21 1  2 1002
> 4  3  1 3001
> 22 2  2 2002
> 
> I hope what I'm trying to accomplish makes sense. Maybe I'm missing 
> something obvious, but I really have no idea at the moment how to 
> achieve this elegantly. Since I need to simulate many trial recruitments 
> it needs to be general and compact.
> 
> I appreciate any advice.
> 
> Kevin
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] idiom for constructing data frame

2015-03-31 Thread Duncan Murdoch

On 31/03/2015 1:52 PM, Sarah Goslee wrote:

I just snagged this from Duncan Murdoch's reply to the same question:

# Create an empty dataframe to hold the results
df <- data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),]

This skips matrix(), but how to set the column names programmatically
within a function?

Sarah, still sure I'm missing something obvious


The matrix() function has a dimnames argument, so you could do this:

names <- c("strat", "id", "pid")
data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names)))

Duncan Murdoch



On Tue, Mar 31, 2015 at 1:46 PM, Sarah Goslee  wrote:
> Hi folks,
>
> I KNOW there has to be a way to do this more elegantly, but I
> consistently fail to come up with it, as I was just reminded while
> writing an example for a query on this list.
>
> What's a nifty way to construct a data frame of a given size? The only
> way I know of it to use matrix(), eg
>
> data.frame(matrix(NA, nrow=10, ncol=3))
>
> and then to set the colnames in a second step.
>
> This comes up a lot when pre-allocated a data frame before using a
> loop: I know the size and column names, but want an empty structure to
> fill later.
>
> Sarah
>



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Randomly interleaving data frames while preserving order

2015-03-31 Thread Nordlund, Dan (DSHS/RDA)
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Kevin
> E. Thorpe
> Sent: Tuesday, March 31, 2015 10:53 AM
> To: Duncan Murdoch
> Cc: R Help Mailing List
> Subject: Re: [R] Randomly interleaving data frames while preserving
> order
> 
> On 03/31/2015 01:44 PM, Duncan Murdoch wrote:
> > On 31/03/2015 1:05 PM, Kevin E. Thorpe wrote:
> >> Hello.
> >>
> >> I am trying to simulate recruitment in a randomized trial. Suppose I
> >> have three streams (strata) of patients represented by these data
> frames.
> >>
> >> df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
> >> df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
> >> df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)
> >>
> >> What I need to do is construct a data frame with all of these
> combined
> >> where the order of selection from one of the three data frames is
> >> randomized but once a stratum is selected patients are selected
> >> sequentially from that data frame.
> >>
> >> To see what I'm looking to achieve, suppose the first five subjects
> were
> >> to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The
> >> expected result should look like this:
> >>
> >> rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,])
> >>  strat id  pid
> >> 1  1  1 1001
> >> 2  2  1 2001
> >> 21 1  2 1002
> >> 4  3  1 3001
> >> 22 2  2 2002
> >>
> >> I hope what I'm trying to accomplish makes sense. Maybe I'm missing
> >> something obvious, but I really have no idea at the moment how to
> >> achieve this elegantly. Since I need to simulate many trial
> recruitments
> >> it needs to be general and compact.
> >>
> >> I appreciate any advice.
> >
> > How about something like this:
> >
> > # Permute an ordered vector of selections:
> > sel <- sample(c(rep(1, nrow(df1)), rep(2, nrow(df2)), rep(3,
> nrow(df3
> >
> > # Create an empty dataframe to hold the results
> > df <- data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),]
> >
> > # Put the original dataframes into the appropriate slots:
> > df[sel == 1,] <- df1
> > df[sel == 2,] <- df2
> > df[sel == 3,] <- df3
> >
> > # Clean up the rownames
> > rownames(df) <- NULL
> >
> > Duncan Murdoch
> >
> 
> Thanks Duncan.
> 
> Once you see the solution it is indeed obvious.
> 
> Kevin
> 
> --
> Kevin E. Thorpe
> Head of Biostatistics,  Applied Health Research Centre (AHRC)
> Li Ka Shing Knowledge Institute of St. Michael's
> Assistant Professor, Dalla Lana School of Public Health
> University of Toronto
> email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016
> 

Another option would be to stack your strata and then sample from the combined 
data frame, something like this:

sample_size <- 10
population <- rbind(df1,df2,df3)
sim.sample <- pop[sample(nrow(pop),sample_size, replace=FALSE),]

Hope this is helpful,

Dan

Daniel J. Nordlund, PhD
Research and Data Analysis Division
Services & Enterprise Support Administration
Washington State Department of Social and Health Services


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data.frame: data-driven column selections that vary by row??

2015-03-31 Thread Ista Zahn
Hi David,

I suggest reading http://www.jstatsoft.org/v59/i10, then:

library(tidyr)
library(dplyr)
bw <- gather(bw, key = "tmp", value = "value", matches("^d[a-z]+[0-9]+"))
bw <- separate(bw, tmp, c("disc", "var"), "_", extra = "merge")
bw <- spread(bw, var, value)

Best,
Ista

On Tue, Mar 31, 2015 at 1:22 PM, David Wolfskill  wrote:
> On Tue, Mar 31, 2015 at 07:11:28AM -0800, John Kane wrote:
>> I think we need some data and code
>> Reproducibility
>> https://github.com/hadley/devtools/wiki/Reproducibility
>>  
>> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>> 
>
> I apologize for failing to provide that.
>
> Here is a quite small subset of the data (with a few edits to reduce
> excess verbosity in names of things) that still illustrates the
> challenge I perceive:
>
>> dput(bw)
> structure(list(timestamp = c(1426892400L, 1426892400L, 1426892400L,
> 1426892400L, 1426892400L, 1426892400L, 1426892460L, 1426892460L,
> 1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892520L,
> 1426892520L, 1426892520L, 1426892520L, 1426892520L, 1426892520L
> ), hostname = c("c001", "c002", "c021", "c022", "c041", "c051",
> "c001", "c002", "c021", "c022", "c041", "c051", "c001", "c002",
> "c021", "c022", "c041", "c051"), health = c(0.054937499983,
> 0.25058541667, 1, 1, 0.577784167075767, 0.546805261621527,
> 0.1599375, 0.24954375, 1, 1, 0.582307554123614, 0.558298168996525,
> 0.2813125, 0.27087708333, 1, 1, 0.579231349457365, 0.542973020177151
> ), hw = c(1.9, 1.9, 1.4, 1.4, 1.5, 1.5, 1.9, 1.9, 1.4, 1.4, 1.5,
> 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 1.5), fw = structure(c(1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
> ), .Label = "2015Q1.2", class = "factor"), role = structure(c(1L,
> 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
> 2L), .Label = c("control", "test"), class = "factor"), type = structure(c(3L,
> 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L,
> 2L), .Label = c("D", "F", "H"), class = "factor"), da20_busy_pct = c(79.1,
> 62.8, NA, NA, NA, NA, 75, 64.8, NA, NA, NA, NA, 72.2, 74.5, NA,
> NA, NA, NA), da20_dev_type = structure(c(2L, 2L, 1L, 1L, 1L,
> 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("",
> "hdd"), class = "factor"), da20_kb_per_xfer_read = c(727.23,
> 665.81, NA, NA, NA, NA, 737.04, 691.38, NA, NA, NA, NA, 721.71,
> 668.96, NA, NA, NA, NA), da20_kb_per_xfer_write = c(0, 0, NA,
> NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_mb_per_sec_read 
> = c(39.77,
> 31.21, NA, NA, NA, NA, 36.71, 32.41, NA, NA, NA, NA, 35.94, 37.24,
> NA, NA, NA, NA), da20_mb_per_sec_write = c(0, 0, NA, NA, NA,
> NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_ms_per_xactn_read = 
> c(43.5,
> 31.6, NA, NA, NA, NA, 35.7, 30.2, NA, NA, NA, NA, 32.7, 34.6,
> NA, NA, NA, NA), da20_ms_per_xactn_write = c(0, 0, NA, NA, NA,
> NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_Q_length = c(0,
> 0, NA, NA, NA, NA, 2, 0, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA
> ), da20_xfers_per_sec_other = c(0, 0, NA, NA, NA, NA, 0, 0, NA,
> NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_xfers_per_sec_read = c(56,
> 48, NA, NA, NA, NA, 51, 48, NA, NA, NA, NA, 51, 57, NA, NA, NA,
> NA), da20_xfers_per_sec_write = c(0, 0, NA, NA, NA, NA, 0, 0,
> NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da2_busy_pct = c(84.5,
> 81.8, 29.5, 26.7, 55.5, 50.9, 80.6, 79.7, 29.2, 27.3, 58.8, 50.2,
> 74.6, 79.3, 29.4, 26.6, 55.4, 50.1), da2_dev_type = structure(c(2L,
> 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L,
> 3L), .Label = c("", "hdd", "ssd"), class = "factor"), da2_kb_per_xfer_read = 
> c(690.67,
> 686.63, 613.78, 587, 571.64, 553.27, 692.26, 660.05, 612.01,
> 594.28, 560.16, 566.41, 672.68, 670.25, 604.64, 592.16, 565.02,
> 564.43), da2_kb_per_xfer_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_mb_per_sec_read = c(44.52, 41.57,
> 134.26, 120.38, 252.88, 229.09, 41.24, 39.96, 132.68, 123.61,
> 268.04, 227.34, 37.44, 39.93, 133.45, 120.28, 251.06, 225.99),
> da2_mb_per_sec_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0), da2_ms_per_xactn_read = c(49.1, 47.8,
> 2, 1.8, 2.6, 2.4, 40.3, 43.9, 2, 1.8, 2.8, 2.4, 37.1, 40.9,
> 1.9, 1.8, 2.6, 2.4), da2_ms_per_xactn_write = c(0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_Q_length = c(0,
> 2, 0, 1, 3, 0, 3, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 3), 
> da2_xfers_per_sec_other = c(0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
> da2_xfers_per_sec_read = c(66,
> 62, 224, 210, 453, 424, 61, 62, 222, 213, 490, 411, 57, 61,
> 226, 208, 455, 410), da2_xfers_per_sec_write = c(0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("timestamp",
> "hostname", "health", "hw", "fw", "role", "type", "da20_busy_pct",
> "da20_dev_type", "da20_kb_per_xfer_read", "da20_kb_per_xfer_write",
> "da20_mb_per_sec_read", "da20_mb_per_sec_write", "da20_ms_per_xactn_read",
> "da20_m

Re: [R] data.frame: data-driven column selections that vary by row??

2015-03-31 Thread Tom Wright
Nice clean-up!!!

On Tue, 2015-03-31 at 14:19 -0400, Ista Zahn wrote:
> library(tidyr)
> library(dplyr)
> bw <- gather(bw, key = "tmp", value = "value",
> matches("^d[a-z]+[0-9]+"))
> bw <- separate(bw, tmp, c("disc", "var"), "_", extra = "merge")
> bw <- spread(bw, var, value)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data.frame: data-driven column selections that vary by row??

2015-03-31 Thread Tom Wright
Not entirely sure I understand your problem here (your first email was a
lot of reading).

Would it make sense to add an extra column device_name

Thus ending up with something like:
Host  Device  Type
host_Aada0ssd
host_Aada1ssd
host_Aada2hdd
...
host_Nda3 ssd


You could then subset this dataframe:
subset(data,Type=="ssd" & Device=="ada0")

On Tue, 2015-03-31 at 10:22 -0700, David Wolfskill wrote:
> On Tue, Mar 31, 2015 at 07:11:28AM -0800, John Kane wrote:
> > I think we need some data and code 
> > Reproducibility
> > https://github.com/hadley/devtools/wiki/Reproducibility
> >  
> > http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
> > 
> 
> I apologize for failing to provide that.
> 
> Here is a quite small subset of the data (with a few edits to reduce
> excess verbosity in names of things) that still illustrates the
> challenge I perceive:
> 
> > dput(bw)
> structure(list(timestamp = c(1426892400L, 1426892400L, 1426892400L, 
> 1426892400L, 1426892400L, 1426892400L, 1426892460L, 1426892460L, 
> 1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892520L, 
> 1426892520L, 1426892520L, 1426892520L, 1426892520L, 1426892520L
> ), hostname = c("c001", "c002", "c021", "c022", "c041", "c051", 
> "c001", "c002", "c021", "c022", "c041", "c051", "c001", "c002", 
> "c021", "c022", "c041", "c051"), health = c(0.054937499983, 
> 0.25058541667, 1, 1, 0.577784167075767, 0.546805261621527, 
> 0.1599375, 0.24954375, 1, 1, 0.582307554123614, 0.558298168996525, 
> 0.2813125, 0.27087708333, 1, 1, 0.579231349457365, 0.542973020177151
> ), hw = c(1.9, 1.9, 1.4, 1.4, 1.5, 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 
> 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 1.5), fw = structure(c(1L, 1L, 
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
> ), .Label = "2015Q1.2", class = "factor"), role = structure(c(1L, 
> 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
> 2L), .Label = c("control", "test"), class = "factor"), type = structure(c(3L, 
> 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 
> 2L), .Label = c("D", "F", "H"), class = "factor"), da20_busy_pct = c(79.1, 
> 62.8, NA, NA, NA, NA, 75, 64.8, NA, NA, NA, NA, 72.2, 74.5, NA, 
> NA, NA, NA), da20_dev_type = structure(c(2L, 2L, 1L, 1L, 1L, 
> 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("", 
> "hdd"), class = "factor"), da20_kb_per_xfer_read = c(727.23, 
> 665.81, NA, NA, NA, NA, 737.04, 691.38, NA, NA, NA, NA, 721.71, 
> 668.96, NA, NA, NA, NA), da20_kb_per_xfer_write = c(0, 0, NA, 
> NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_mb_per_sec_read 
> = c(39.77, 
> 31.21, NA, NA, NA, NA, 36.71, 32.41, NA, NA, NA, NA, 35.94, 37.24, 
> NA, NA, NA, NA), da20_mb_per_sec_write = c(0, 0, NA, NA, NA, 
> NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_ms_per_xactn_read = 
> c(43.5, 
> 31.6, NA, NA, NA, NA, 35.7, 30.2, NA, NA, NA, NA, 32.7, 34.6, 
> NA, NA, NA, NA), da20_ms_per_xactn_write = c(0, 0, NA, NA, NA, 
> NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_Q_length = c(0, 
> 0, NA, NA, NA, NA, 2, 0, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA
> ), da20_xfers_per_sec_other = c(0, 0, NA, NA, NA, NA, 0, 0, NA, 
> NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_xfers_per_sec_read = c(56, 
> 48, NA, NA, NA, NA, 51, 48, NA, NA, NA, NA, 51, 57, NA, NA, NA, 
> NA), da20_xfers_per_sec_write = c(0, 0, NA, NA, NA, NA, 0, 0, 
> NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da2_busy_pct = c(84.5, 
> 81.8, 29.5, 26.7, 55.5, 50.9, 80.6, 79.7, 29.2, 27.3, 58.8, 50.2, 
> 74.6, 79.3, 29.4, 26.6, 55.4, 50.1), da2_dev_type = structure(c(2L, 
> 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 
> 3L), .Label = c("", "hdd", "ssd"), class = "factor"), da2_kb_per_xfer_read = 
> c(690.67, 
> 686.63, 613.78, 587, 571.64, 553.27, 692.26, 660.05, 612.01, 
> 594.28, 560.16, 566.41, 672.68, 670.25, 604.64, 592.16, 565.02, 
> 564.43), da2_kb_per_xfer_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_mb_per_sec_read = c(44.52, 41.57, 
> 134.26, 120.38, 252.88, 229.09, 41.24, 39.96, 132.68, 123.61, 
> 268.04, 227.34, 37.44, 39.93, 133.45, 120.28, 251.06, 225.99), 
> da2_mb_per_sec_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0), da2_ms_per_xactn_read = c(49.1, 47.8, 
> 2, 1.8, 2.6, 2.4, 40.3, 43.9, 2, 1.8, 2.8, 2.4, 37.1, 40.9, 
> 1.9, 1.8, 2.6, 2.4), da2_ms_per_xactn_write = c(0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_Q_length = c(0, 
> 2, 0, 1, 3, 0, 3, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 3), 
> da2_xfers_per_sec_other = c(0, 
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
> da2_xfers_per_sec_read = c(66, 
> 62, 224, 210, 453, 424, 61, 62, 222, 213, 490, 411, 57, 61, 
> 226, 208, 455, 410), da2_xfers_per_sec_write = c(0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("timestamp", 
> "hostname", "health", "hw", "fw", "role", "type", "da20_b

Re: [R] idiom for constructing data frame

2015-03-31 Thread Sarah Goslee
Hi,

Duncan Murdoch suggested:

> The matrix() function has a dimnames argument, so you could do this:
>
> names <- c("strat", "id", "pid")
> data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names)))

That's a definite improvement, thanks. But no way to skip matrix()? It
just seems unRlike, although since it's only full of NA values there
are no coercion issues with column types or anything, so it doesn't
hurt. It's just inelegant. :)

Sarah
-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Does fitCopula work for amhCopula and joeCopula?

2015-03-31 Thread Laura Gianfagna
Good evening, this is a part of my Routine  which calculates the copula 
parameter and loglikelihood for each pair of rows of a data matrix, choosing, 
for each pair, the copula which gives the maximum likelihood. If I do my 
computation with this routine with only:


f <- frankCopula(2,2)
  g <- gumbelCopula(2,2)
  c <- claytonCopula(2,2)


the program works correctly and gives the expected results.

If  I insert also:


  a <- amhCopula(1,2)
  j <- joeCopula(2,2)


then the program doesn’t work anymore. 

I tried on samples such as:


n <- 1000
f <- frankCopula(20,2)
x_1 <- rCopula(n,f)
f <- gumbelCopula(50,2)
x_2 <- rCopula(n,f)
f <- joeCopula(70,2)
x_3<- rCopula(n,f)
x <- cbind(x_1, x_2, x_3)
data <- t(x)
dim <- dim(data)[1]





Here is the part of code of Routine_Copula:

Routine_Copula <- function(data,dim){
  
  library(copula)
  library(gtools)
  
  n <- dim(data)[1];  # number of rows of the input matrix
  m <- dim(data)[2];  # number of columns of the input matrix
  
  # Probability integral transform of the data
  ecdf <- matrix(0,n,m);
  for (i in 1:n){
e <- matrix(data[i,],m,1);
#ecdf[i,] <- pobs(e);
ecdf[i,] <- pobs(e, na.last=TRUE);
#na.last for controlling the treatment of NAs. If TRUE, missing values in 
the data are put last; if FALSE, they are put first; if NA, they are removed; 
if "keep" they are kept with rank NA.

  }



f <- frankCopula(2,2)
  g <- gumbelCopula(2,2)
  c <- claytonCopula(2,2)
  a <- amhCopula(1,2)
  j <- joeCopula(2,2)





[….]


 for (j in 1:n_comb){
input <- t(ecdf[comb[,j],])

try(summary <- fitCopula(f,input,method='mpl',start=2),silent=TRUE);
resmatpar[j,1] <- summary@estimate;
resmatllk[j,1] <- summary@loglik;

try(summary <- fitCopula(g,input,method='mpl',start=2),silent=TRUE);
resmatpar[j,2] <- summary@estimate;
resmatllk[j,2] <- summary@loglik;

try(summary <- fitCopula(c,input,method='mpl',start=2),silent=TRUE);
resmatpar[j,3] <- summary@estimate;
resmatllk[j,3] <- summary@loglik;


try(summary <- fitCopula(a,input,method='mpl',start=1),silent=TRUE);
resmatpar[j,4] <- summary@estimate;
 resmatllk[j,4] <- summary@loglik;
 
try(summary <- fitCopula(j,input,method='mpl',start=2),silent=TRUE); 

 resmatpar[j,5] <- summary@estimate;
resmatllk[j,5] <- summary@loglik;

d <- 
c(resmatllk[j,1],resmatllk[j,2],resmatllk[j,3],resmatllk[j,4],resmatllk[j,5]);



copchoice[j] <- which(d==max(d));
param[j] <- resmatpar[j,copchoice[j]];
loglik[j] <- resmatllk[j,copchoice[j]];

  }


Thank you

Laura Gianfagna


​
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] idiom for constructing data frame

2015-03-31 Thread Ista Zahn
You can make it as elegant as you want, e.g.,

make.empty.df <- function(nrow,ncol, names) {
if(length(names) %% ncol != 0) stop("Lenght of names is not a
multiple of the number of colums")
data.frame(matrix(NA, nrow, ncol, dimnames = list(NULL, names)))
}


Best,
Ista

On Tue, Mar 31, 2015 at 2:37 PM, Sarah Goslee  wrote:
> Hi,
>
> Duncan Murdoch suggested:
>
>> The matrix() function has a dimnames argument, so you could do this:
>>
>> names <- c("strat", "id", "pid")
>> data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names)))
>
> That's a definite improvement, thanks. But no way to skip matrix()? It
> just seems unRlike, although since it's only full of NA values there
> are no coercion issues with column types or anything, so it doesn't
> hurt. It's just inelegant. :)
>
> Sarah
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] idiom for constructing data frame

2015-03-31 Thread William Dunlap
You can use structure() to attach the names to a list that is input to
data.frame.
E.g.,

dfNames <- c("First", "Second Name")
data.frame(lapply(structure(dfNames, names=dfNames),
function(name)rep(NA_real_, 5)))


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Mar 31, 2015 at 11:37 AM, Sarah Goslee 
wrote:

> Hi,
>
> Duncan Murdoch suggested:
>
> > The matrix() function has a dimnames argument, so you could do this:
> >
> > names <- c("strat", "id", "pid")
> > data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names)))
>
> That's a definite improvement, thanks. But no way to skip matrix()? It
> just seems unRlike, although since it's only full of NA values there
> are no coercion issues with column types or anything, so it doesn't
> hurt. It's just inelegant. :)
>
> Sarah
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: non-conformable arguments

2015-03-31 Thread Soheila Khodakarim
Dear All,

I want to run neural network on my data.

i run these codes:

#load mydata
dim(mydata)
# 20 3111
library(neuralnet)
fm <- as.formula(paste("resp ~", paste(colnames(mydata)[1:3110],
collapse="+")))
out <- neuralnet(fm,data=mydata, hidden = 4, lifesign = "minimal",
linear.output = FALSE, threshold = 0.1)
#load testset
dim(testset)
# 20 3111
out.results <- compute(out, testset)
Error in neurons[[i]] %*% weights[[i]] : non-conformable arguments

what should I do now?

Regards,
Soheila

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to obtain a cross tab count of unique values

2015-03-31 Thread Walter Anderson
I have a data frame that shows all of the parks (including duplicates)
that are impacted by a projects 'footprint':

PROJECT PARKNAME
A   PRK A
A   PRK B
A   PRK A
B   PRK C
B   PRK A
C   PRK B
C   PRK D
...

What I need is a cross tabulation that shows me the number of unique
parks for each project.  If I using the standard table(df$PROJECT) it
reports:

A 3
B 2
C 2
...

where I need it to ignore duplicates and report:

A 2
B 2
C 2
...

Anyone have any suggestions on how to do this within the R paradigm?

Walter Anderson

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to obtain a cross tab count of unique values

2015-03-31 Thread Sarah Goslee
Sure: tell R you want unique rows.

> mydf <- data.frame(PROJECT=c("A","A","A","B","B","C","C"), PARKNAME=c("PRK 
> A", "PRK B", "PRK A", "PRK C", "PRK A", "PRK B", "PRK D"), 
> stringsAsFactors=FALSE)
> mydf
  PROJECT PARKNAME
1   APRK A
2   APRK B
3   APRK A
4   BPRK C
5   BPRK A
6   CPRK B
7   CPRK D

> mydf.unique <- unique(mydf)
> table(mydf.unique$PROJECT)

A B C
2 2 2

Please provide reproducible data yourself in the future.

Sarah

On Tue, Mar 31, 2015 at 3:51 PM, Walter Anderson  wrote:
> I have a data frame that shows all of the parks (including duplicates)
> that are impacted by a projects 'footprint':
>
> PROJECT PARKNAME
> A   PRK A
> A   PRK B
> A   PRK A
> B   PRK C
> B   PRK A
> C   PRK B
> C   PRK D
> ...
>
> What I need is a cross tabulation that shows me the number of unique
> parks for each project.  If I using the standard table(df$PROJECT) it
> reports:
>
> A 3
> B 2
> C 2
> ...
>
> where I need it to ignore duplicates and report:
>
> A 2
> B 2
> C 2
> ...
>
> Anyone have any suggestions on how to do this within the R paradigm?
>
> Walter Anderson

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to obtain a cross tab count of unique values

2015-03-31 Thread Rui Barradas

Hello,

Try the following.

table(unique(df)$PROJECT)


And please note that 'df' is the name of an R function, use something else.

Hope this helps,

Rui Barradas

Em 31-03-2015 20:51, Walter Anderson escreveu:

I have a data frame that shows all of the parks (including duplicates)
that are impacted by a projects 'footprint':

PROJECT PARKNAME
A   PRK A
A   PRK B
A   PRK A
B   PRK C
B   PRK A
C   PRK B
C   PRK D
...

What I need is a cross tabulation that shows me the number of unique
parks for each project.  If I using the standard table(df$PROJECT) it
reports:

A 3
B 2
C 2
...

where I need it to ignore duplicates and report:

A 2
B 2
C 2
...

Anyone have any suggestions on how to do this within the R paradigm?

Walter Anderson

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to obtain a cross tab count of unique values

2015-03-31 Thread Tom Wright
table(unique(df)$PROJECT)

On Tue, 2015-03-31 at 14:51 -0500, Walter Anderson wrote:
> I have a data frame that shows all of the parks (including duplicates)
> that are impacted by a projects 'footprint':
> 
> PROJECT PARKNAME
> A   PRK A
> A   PRK B
> A   PRK A
> B   PRK C
> B   PRK A
> C   PRK B
> C   PRK D
> ...
> 
> What I need is a cross tabulation that shows me the number of unique
> parks for each project.  If I using the standard table(df$PROJECT) it
> reports:
> 
> A 3
> B 2
> C 2
> ...
> 
> where I need it to ignore duplicates and report:
> 
> A 2
> B 2
> C 2
> ...
> 
> Anyone have any suggestions on how to do this within the R paradigm?
> 
> Walter Anderson
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] idiom for constructing data frame

2015-03-31 Thread Sven E. Templer
If you don't mind an extra column, you could use something similar to:

data.frame(r=seq(8),foo=NA,bar=NA)

If you do, here is another approach (see function body):

empty.frame <- function (r = 1, n = 1, fill = NA_real_) {
  data.frame(setNames(lapply(rep(fill, length(n)), rep, times=r), n))
}
empty.frame()
empty.frame(, seq(3))
empty.frame(8, c("foo", "bar"))

I could not put it in one line either, without retyping at least one
argument (n in this case).
So I suggest a function is the way to go for a simplified syntax ...

Thanks to all for the ideas!
Sven

On 31 March 2015 at 20:55, William Dunlap  wrote:

> You can use structure() to attach the names to a list that is input to
> data.frame.
> E.g.,
>
> dfNames <- c("First", "Second Name")
> data.frame(lapply(structure(dfNames, names=dfNames),
> function(name)rep(NA_real_, 5)))
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Tue, Mar 31, 2015 at 11:37 AM, Sarah Goslee 
> wrote:
>
> > Hi,
> >
> > Duncan Murdoch suggested:
> >
> > > The matrix() function has a dimnames argument, so you could do this:
> > >
> > > names <- c("strat", "id", "pid")
> > > data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names)))
> >
> > That's a definite improvement, thanks. But no way to skip matrix()? It
> > just seems unRlike, although since it's only full of NA values there
> > are no coercion issues with column types or anything, so it doesn't
> > hurt. It's just inelegant. :)
> >
> > Sarah
> > --
> > Sarah Goslee
> > http://www.functionaldiversity.org
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] idiom for constructing data frame

2015-03-31 Thread Richard M. Heiberger
I got rid of the extra column.

data.frame(r=seq(8), foo=NA, bar=NA, row.names="r")

Rich

On Tue, Mar 31, 2015 at 6:18 PM, Sven E. Templer  wrote:
> If you don't mind an extra column, you could use something similar to:
>
> data.frame(r=seq(8),foo=NA,bar=NA)
>
> If you do, here is another approach (see function body):
>
> empty.frame <- function (r = 1, n = 1, fill = NA_real_) {
>   data.frame(setNames(lapply(rep(fill, length(n)), rep, times=r), n))
> }
> empty.frame()
> empty.frame(, seq(3))
> empty.frame(8, c("foo", "bar"))
>
> I could not put it in one line either, without retyping at least one
> argument (n in this case).
> So I suggest a function is the way to go for a simplified syntax ...
>
> Thanks to all for the ideas!
> Sven
>
> On 31 March 2015 at 20:55, William Dunlap  wrote:
>
>> You can use structure() to attach the names to a list that is input to
>> data.frame.
>> E.g.,
>>
>> dfNames <- c("First", "Second Name")
>> data.frame(lapply(structure(dfNames, names=dfNames),
>> function(name)rep(NA_real_, 5)))
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Tue, Mar 31, 2015 at 11:37 AM, Sarah Goslee 
>> wrote:
>>
>> > Hi,
>> >
>> > Duncan Murdoch suggested:
>> >
>> > > The matrix() function has a dimnames argument, so you could do this:
>> > >
>> > > names <- c("strat", "id", "pid")
>> > > data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names)))
>> >
>> > That's a definite improvement, thanks. But no way to skip matrix()? It
>> > just seems unRlike, although since it's only full of NA values there
>> > are no coercion issues with column types or anything, so it doesn't
>> > hurt. It's just inelegant. :)
>> >
>> > Sarah
>> > --
>> > Sarah Goslee
>> > http://www.functionaldiversity.org
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can not load Rcmdr

2015-03-31 Thread a b
I have a similar issue with tcl.

I am using R on a Linux server.  Rcmdr installed OK, but it won't run:

> R.Version()
$platform
[1] "x86_64-unknown-linux-gnu"

$arch
[1] "x86_64"

$os
[1] "linux-gnu"

$system
[1] "x86_64, linux-gnu"

$status
[1] ""

$major
[1] "3"

$minor
[1] "1.0"

$year
[1] "2014"

$month
[1] "04"

$day
[1] "10"

$`svn rev`
[1] "65387"

$language
[1] "R"

$version.string
[1] "R version 3.1.0 (2014-04-10)"

$nickname
[1] "Spring Dance"

> library(Rcmdr)
Error : .onAttach failed in attachNamespace() for 'Rcmdr', details:
  call: structure(.External(.C_dotTcl, ...), class = "tclObj")
  error: [tcl] Invalid state name hover.

Error: package or namespace load failed for 'Rcmdr'
> 


This is kind of frustrating because I don't have admin privileges to install
Rstudio on this server, either.  

I guess it's time to use Emacs.



--
View this message in context: 
http://r.789695.n4.nabble.com/Can-not-load-Rcmdr-tp4655656p4705370.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using matlab code in R

2015-03-31 Thread T.Riedle
Hi everybody,
I have a matlab code which I would like to use for my empirical analysis. 
Unfortunately, I am not familiar with matlab and it would be great if there was 
a tool to "translate" the matlab code into R so that I can work with the code 
in R.
Is there such a tool or package in R?

Kind regards,
T.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculating different PCAs in R

2015-03-31 Thread im db
Dear All, I want to use princomp() function in R in order to calculate 
Principle Component Analysis.In different papers, I have seen "PCA 1", "PCA 2", 
"PCA 11" , etc. Would you please tell me how can i calculate different PCAs in 
R?At the moment i just use this line "eigenVectors <- pca$loadings"But I don’t 
know if it is correct to use loadings.Thank you in advance.  Best regards,
Iman Dabbaghi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using matlab code in R

2015-03-31 Thread Jeff Newmiller
The Posting Guide recommends searching the archives before posting. Consider 
[1] and learn.

[1] https://stat.ethz.ch/pipermail/r-help/2007-March/127981.html
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On March 31, 2015 1:47:49 PM PDT, "T.Riedle"  wrote:
>Hi everybody,
>I have a matlab code which I would like to use for my empirical
>analysis. Unfortunately, I am not familiar with matlab and it would be
>great if there was a tool to "translate" the matlab code into R so that
>I can work with the code in R.
>Is there such a tool or package in R?
>
>Kind regards,
>T.
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] idiom for constructing data frame

2015-03-31 Thread Sarah Goslee
On Tue, Mar 31, 2015 at 6:35 PM, Richard M. Heiberger  wrote:
> I got rid of the extra column.
>
> data.frame(r=seq(8), foo=NA, bar=NA, row.names="r")

Brilliant!

After much fussing, including a disturbing detour into nested lapply
statements from which I barely emerged with my sanity (arguable, I
suppose), here is a one-liner that creates a data frame of arbitrary
number of rows given an existing data frame as template for column
number and name:


n <- 8
df1 <- data.frame(A=runif(9), B=runif(9))

do.call(data.frame, setNames(c(list(seq(n), "r"), as.list(rep(NA,
ncol(df1, c("r", "row.names", colnames(df1

It's not elegant, but it is fairly R-ish. I should probably stop
hunting for an elegant solution now.

Thanks, everyone!

Sarah


> Rich
>
> On Tue, Mar 31, 2015 at 6:18 PM, Sven E. Templer  
> wrote:
>> If you don't mind an extra column, you could use something similar to:
>>
>> data.frame(r=seq(8),foo=NA,bar=NA)
>>
>> If you do, here is another approach (see function body):
>>
>> empty.frame <- function (r = 1, n = 1, fill = NA_real_) {
>>   data.frame(setNames(lapply(rep(fill, length(n)), rep, times=r), n))
>> }
>> empty.frame()
>> empty.frame(, seq(3))
>> empty.frame(8, c("foo", "bar"))
>>
>> I could not put it in one line either, without retyping at least one
>> argument (n in this case).
>> So I suggest a function is the way to go for a simplified syntax ...
>>
>> Thanks to all for the ideas!
>> Sven
>>
>> On 31 March 2015 at 20:55, William Dunlap  wrote:
>>
>>> You can use structure() to attach the names to a list that is input to
>>> data.frame.
>>> E.g.,
>>>
>>> dfNames <- c("First", "Second Name")
>>> data.frame(lapply(structure(dfNames, names=dfNames),
>>> function(name)rep(NA_real_, 5)))
>>>
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com
>>>
>>> On Tue, Mar 31, 2015 at 11:37 AM, Sarah Goslee 
>>> wrote:
>>>
>>> > Hi,
>>> >
>>> > Duncan Murdoch suggested:
>>> >
>>> > > The matrix() function has a dimnames argument, so you could do this:
>>> > >
>>> > > names <- c("strat", "id", "pid")
>>> > > data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names)))
>>> >
>>> > That's a definite improvement, thanks. But no way to skip matrix()? It
>>> > just seems unRlike, although since it's only full of NA values there
>>> > are no coercion issues with column types or anything, so it doesn't
>>> > hurt. It's just inelegant. :)
>>> >
>>> > Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] idiom for constructing data frame

2015-03-31 Thread Henrik Bengtsson
I've got dataFrame() in R.utils for this purpose, e.g.

> df <- dataFrame(colClasses=c(a="integer", b="double", c="character"), 
> nrow=10L)
> str(df)
'data.frame':   10 obs. of  3 variables:
 $ a: int  0 0 0 0 0 0 0 0 0 0
 $ b: num  0 0 0 0 0 0 0 0 0 0
 $ c: chr  "" "" "" "" ...

Related: You can use the colClasses() function to generate the
'colClasses' argument "dynamically", e.g.

> cols <- colClasses("idc")
> names(cols) <- c("a", "b", "c")
> str(cols)
 Named chr [1:3] "integer" "double" "character"
 - attr(*, "names")= chr [1:3] "a" "b" "c"

> cols <- colClasses(sprintf("c2d%di", 4))
> df <- dataFrame(colClasses=cols, nrow=10L)
str(df)
'data.frame':   10 obs. of  7 variables:
 $ : chr  "" "" "" "" ...
 $ : num  0 0 0 0 0 0 0 0 0 0
 $ : num  0 0 0 0 0 0 0 0 0 0
 $ : int  0 0 0 0 0 0 0 0 0 0
 $ : int  0 0 0 0 0 0 0 0 0 0
 $ : int  0 0 0 0 0 0 0 0 0 0
 $ : int  0 0 0 0 0 0 0 0 0 0


dataFrame() is basically implemented as:

dataFrame <- function(colClasses, nrow=1L, ...) {
  df <- vector("list", length=length(colClasses))
  names(df) <- names(colClasses)
  for (kk in seq(along=df)) {
df[[kk]] <- vector(colClasses[kk], length=nrow)
  }
  attr(df, "row.names") <- seq(length=nrow)
  class(df) <- "data.frame"
  df
} # dataFrame()

/Henrik

On Tue, Mar 31, 2015 at 4:42 PM, Sarah Goslee  wrote:
> On Tue, Mar 31, 2015 at 6:35 PM, Richard M. Heiberger  wrote:
>> I got rid of the extra column.
>>
>> data.frame(r=seq(8), foo=NA, bar=NA, row.names="r")
>
> Brilliant!
>
> After much fussing, including a disturbing detour into nested lapply
> statements from which I barely emerged with my sanity (arguable, I
> suppose), here is a one-liner that creates a data frame of arbitrary
> number of rows given an existing data frame as template for column
> number and name:
>
>
> n <- 8
> df1 <- data.frame(A=runif(9), B=runif(9))
>
> do.call(data.frame, setNames(c(list(seq(n), "r"), as.list(rep(NA,
> ncol(df1, c("r", "row.names", colnames(df1
>
> It's not elegant, but it is fairly R-ish. I should probably stop
> hunting for an elegant solution now.
>
> Thanks, everyone!
>
> Sarah
>
>
>> Rich
>>
>> On Tue, Mar 31, 2015 at 6:18 PM, Sven E. Templer  
>> wrote:
>>> If you don't mind an extra column, you could use something similar to:
>>>
>>> data.frame(r=seq(8),foo=NA,bar=NA)
>>>
>>> If you do, here is another approach (see function body):
>>>
>>> empty.frame <- function (r = 1, n = 1, fill = NA_real_) {
>>>   data.frame(setNames(lapply(rep(fill, length(n)), rep, times=r), n))
>>> }
>>> empty.frame()
>>> empty.frame(, seq(3))
>>> empty.frame(8, c("foo", "bar"))
>>>
>>> I could not put it in one line either, without retyping at least one
>>> argument (n in this case).
>>> So I suggest a function is the way to go for a simplified syntax ...
>>>
>>> Thanks to all for the ideas!
>>> Sven
>>>
>>> On 31 March 2015 at 20:55, William Dunlap  wrote:
>>>
 You can use structure() to attach the names to a list that is input to
 data.frame.
 E.g.,

 dfNames <- c("First", "Second Name")
 data.frame(lapply(structure(dfNames, names=dfNames),
 function(name)rep(NA_real_, 5)))


 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com

 On Tue, Mar 31, 2015 at 11:37 AM, Sarah Goslee 
 wrote:

 > Hi,
 >
 > Duncan Murdoch suggested:
 >
 > > The matrix() function has a dimnames argument, so you could do this:
 > >
 > > names <- c("strat", "id", "pid")
 > > data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names)))
 >
 > That's a definite improvement, thanks. But no way to skip matrix()? It
 > just seems unRlike, although since it's only full of NA values there
 > are no coercion issues with column types or anything, so it doesn't
 > hurt. It's just inelegant. :)
 >
 > Sarah
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Edit plot adehabitatHS

2015-03-31 Thread Luis Fernando García
Dear R experts,

I am making a selectivity analysis using the  package. Nevertheless, I am
having some troubles, and I would like to know if somebody know how to help
me:

1) When changing the x-axis labels. The programm uses the name habitat
instead of the names specified in the file.

2) Is it possible to edit this plot, #3 (bottom-left), for example is it
possible to change the symbols style, the legend position or size? I do not
know how to edit this kind of data, if somebody has an example using this
package, I would really appreaciate it.

Thanks!!

Please find attached the plot, the script and the respective plot


library(adehabitatHS)
pse<-read.table("pseudos.txt", header=T)

attach(pse)
names(pse)
head(pse)
(wiRatio <- widesI(Diet, Dis))
png(filename = "plotpseudos3.png", width = 500, height = 500)
opar <- par(mfrow=c(2,2))
plot(wiRatio)

par(opar)
dev.off()
MSp Orden   Dis Diet
MSp52   Hemiptera   31  2
MSp84   Hemiptera   2   1
MSp92   Hymenoptera 47  2
MSp100  Hymenoptera 19  1
MSp101  Hymenoptera 31  28
MSp102  Hymenoptera 83  15
MSp104  Hymenoptera 77  40
MSp105  Hymenoptera 110 9
MSp106  Hymenoptera 41  3
MSp107  Hymenoptera 1   3
MSp108  Hymenoptera 1   2
MSp109  Hymenoptera 1   1
MSp110  Hymenoptera 1   1
MSp143  Mantodea1   1
MSp164  Neuroptera  5   1
MSp176  Araneae 6   1
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.